Details
Posted: 26-Sep-24
Location: United States,
Type: Full-time
Salary: Open
Special Instructions to Applicants: All interested applicants should attach a cover letter and a resume in the Supporting Documents section of the application. We suggest the documents be in a PDF format to avoid formatting issues.
Rice University is growing! Under President DesRoches, our research footprint is expanding, and we are hiring 200 new faculty. The Office of Research has launched several new research institutes for materials science, biology, sustainability, digital health, bioengineering, and more. Our research computing & data services are growing rapidly to support the needs of our research community, and to provide leading technologies to accelerate and advance the world-class research and scholarship that is underway across the University.
The Center for Research Computing (CRC), within the Office of IT, enables faculty and researchers to effectively use on and off-premises resources and services, including (1) shared high-performance computing systems, (2) VM and cloud computing, (3) data storage infrastructure, (4) a wide range of scientific instruments, and (5) broader cyberinfrastructure and services. The CRC currently manages three HPC systems and three research data storage services, including a general-purpose HPC / HTC cluster and 2 small GPU clusters.
Position Summary
We are seeking an experienced HPC Systems Administrator to join our team. Reporting to the Director of the Center for Research Computing, the Advanced Research Computing Solutions Engineer works with the HPC team to perform specialized functions for systems installation, management, problem-solving, and solution design, and serves as primary back-up for the lead HPC systems engineer. Additional technical functions include the implementation and support of HPC research environments, including databases, containers, HPC & hybrid/cloud compute and storage services, and security and access controls. The incumbent will participate on the HPC Systems & User-facing team to proactively and reactively identify and solve operational and software problems running on our HPC systems; and collaborate with Rice Information Security to properly secure the environment and any related information services: cloud-based or on-premise.
Research using commercial and federally-funded cloud resources is increasingly important, and responsibilities for this role will include working with CRC teams and faculty to facilitate best practice for cloud-based research computing. Additionally, while this is primarily a systems-facing role, the incumbent will participate in the training of scholars and students on campus for the use of the HPC and research computing facilities to support research, education, and outreach to industrial and governmental partners.
The ideal candidate has broad experience with managing HPC systems in research environments and the ability to work with a wide range of scholars to support the selection and use of cost-effective environments in which to carry out their research. Supporting research on environments including but not limited to cloud computing, regional or national data repositories, and supercomputers, other federal and institutional research computing resources, etc.
Workplace Requirements
Working onsite is required for this job. After the 6-month probationary period, the incumbent may be allowed to work up to 2 days remotely, with supervisor approval, provided they remain in the local area. Per Rice policy 440, work arrangements may be subject to change.
This is a full-time, benefits-eligible position, and the proposed salary range is $108,000 to $118,000 annually, depending on qualifications and experience. * Exempt (salaried) positions under FLSA are not eligible for overtime.
Minimum Requirements:
- Bachelor's Degree
- In lieu of the education requirement, additional related experience, above and beyond what is required, on an equivalent year-for-year basis may be substituted.
- 3+ years of experience in HPC systems integration and management and supporting researchers with HPC and/or cloud computing solutions.
- In lieu of the experience requirement, additional related education, above and beyond what is required, on an equivalent year-for-year basis may be substituted.
- Skills:
- Proven ability to develop appropriate plans to meet computing needs
- Proven ability to work on large/complex system deployment projects in a team environment
- Proficient level of understanding in the architecture, design, and development of High Performance Computing solutions
- Advanced knowledge of security trends and best practices
- Familiarity with generally accepted principles, patterns, and practices, of domain-driven design, test-driven design, and continuous integration
- Able to use critical thinking to provide support and troubleshoot systems
- Possess attention to detail, organizational skills, and excellent time management skills
- Strong communication skills, both written and oral
Preferences
- Master's and/or Ph.D. in computer science or STEM discipline.
- 5+ years of experience developing, installing, managing, and provisioning large-scale High Performance and High Throughput Computing environments.
- 2+ years developing Cloud-based solutions for research projects, managing the migration of projects from local HPC environments to commercial or academic cloud platforms
- Minimum of two years' experience in Linux systems administration.
- Experience with GPUs and GPU-based clusters
- Ability to optimize workflows and job scripts for optimal use of HPC systems.
- Experience in a university or similar research-oriented environment.
- Familiarity with schedulers such as SLURM (Simple Linux Utility for Resource Management).
- Familiarity with the design of HPC systems.
- Experience with implementing and maintaining system security strategies, policies, and procedures
- Advanced knowledge of parallel programming with OpenMP, MPI, and CUDA.
- Familiarity with virtualization environments for running background research applications.
- Proven experience working with Big Data applications
- Experience providing user support and training for High-Performance Computing (HPC) environments
Essential Functions
- Administer and program high performance and research computing environments that may include cloud-based systems, as well as local physical and virtual systems.
- Provide system maintenance and troubleshooting, primarily for Linux operating systems, leveraging industry standards and best practices
- Utilize monitoring and reporting tools on system health and status to inform CRC services
- Installs, and maintains operating systems, utilities, and applications software on computing systems
- Works collaboratively to resolve system complex issues that impact the integrity of user data and systems
- Engages in long-term planning about systems development and integration
- Performs capacity planning for system configuration, software services, network services, load distribution, and service interrelationships among computer systems
- Acts as a technical expert or lead for local computer system administration
- Manages vendor relationships and cost-effective hardware and software maintenance agreements with vendors
- Actively foster a collaborative work environment, promoting teamwork and open communication across departments.
- Performs all other duties as assigned
Additional Functions
- May be required to work extended hours (evenings and weekends) in emergency situations or to restore systems.
Rice University HR | Benefits: https://knowledgecafe.rice.edu/benefits
Rice Mission and Values: Mission and Values | Rice University
Rice University is an Equal Opportunity Employer committed to diversity at all levels and considers for employment qualified applicants without regard to race, color, religion, age, sex, sexual orientation, gender identity, national or ethnic origin, genetic information, disability, or protected veteran status.