Manager, Cloud & Research Computing Platforms

University of Chicago

University of Chicago

Chicago, IL, USA
USD 120k-135k / year
Posted on Dec 12, 2025

Department

PSD Gardner Group


About the Department

The MANIAC Lab, within the Enrico Fermi Institute of the Physical Sciences Division at the University of Chicago, designs, deploys, and operates advanced cyberinfrastructure in support of forefront particle physics research. The Lab operates one of the three sites of the Midwest Tier-2 (MWT2) federation, a data-intensive, high-throughput computing center that appears to ATLAS as a unified logical facility through harmonized services, federated operations, and shared configuration management.

A shared Tier-3 Analysis Facility complements these resources through a cloud-native Kubernetes environment integrating large-scale CPU and GPU resources, Ceph object storage, BinderHub, Coffea-Casa, Dask, and ServiceX. This platform supports more than 500 ATLAS physicists and serves as a national testbed for high-bandwidth analysis workflows and emerging AI-augmented research methodologies.

The Lab integrates advanced data-delivery systems with modern parallel scheduling frameworks and operates the Scalable Systems Laboratory, a cloud-native software testing platform for the NSF Institute for Research and Innovation in Software for High Energy Physics (IRIS-HEP). IRIS-HEP leads the development of next-generation software and computing technologies for the HL-LHC era, and the Lab contributes to these efforts through research on federated analysis environments, token-based data-access infrastructures, and next-generation HTTP/S caching technologies.

The Lab also maintains the ATLAS distributed analytics and AI-assisted observability and operations platform, a large-scale Elasticsearch-based system indexing more than eight years of workflow, data-transfer, and network-telemetry metadata. This infrastructure underpins AI-driven anomaly detection, operational intelligence, and natural-language interfaces that support distributed facility operations and improve reliability across U.S. ATLAS sites.

In addition, the Lab provides comprehensive computation and data-management support for HEP and astrophysics experiments within the Enrico Fermi Institute. It played a key role in building the online computing infrastructure for the South Pole Telescope (SPT-3G) and maintains its associated analysis systems. The Lab operates distributed data-management services for the XENON dark-matter experiment at Gran Sasso, supports simulation and analysis activities for the KOTO experiment at J-PARC—advancing frontier CP-violation studies via ultra-rare kaon decays—and contributes to simulation R&D for future collider initiatives.


Job Summary

As Manager, Cloud & Research Computing Platforms, you will report directly to the Principal Investigator of the MANIAC Lab and lead a technical team of systems administrators and research software engineers. In this role, you will develop high-level programmatic plans across multiple workstreams and translate them into detailed technical roadmaps for the Lab’s systems engineering efforts. You will collaborate extensively with the U.S. ATLAS Computing operations program, the international ATLAS software and computing community, IRIS-HEP partners, and IT teams within the Physical Sciences Division.

Success in this position requires advanced technical depth, strong communication skills, and disciplined organizational capabilities to address complex cyberinfrastructure challenges and ensure reliable operations. You will guide the Lab’s research and development agenda for computing facilities, advancing the transition from traditional HTC architectures to modern cloud-native systems, federated operational models, and AI-assisted monitoring, diagnostics and facility operations. Your leadership will be instrumental in shaping a forward-looking R&D program designed to meet the evolving demands of the HL-LHC.

Responsibilities

  • Leads the MANIAC Lab’s distributed computing and IT systems team, which is comprised of systems administrators and software engineers, overseeing Linux systems, cloud-native services, storage, networking, and cybersecurity.
  • Supports team development through training, mentorship, and continuous learning opportunities.
  • Develops clear technical plans, team goals, and operational milestones across all Lab-supported computing platforms.
  • Partners with the Principal Investigator to implement strategic upgrades and ensure reliable, efficient operation of the Lab’s cyberinfrastructure.
  • Guides modernization efforts, including automation, cloud-native adoption, and improved data-delivery workflows.
  • Collaborates with U.S. ATLAS, IRIS-HEP, and University partners to support shared operations and expand research capabilities.
  • Monitors system performance and applies proactive measures to improve reliability and scalability.
  • Engages with researchers to understand computing needs and deliver solutions that support data-intensive science.
  • Ensures adherence to best practices for network operations and cybersecurity.
  • Manages a single team's progress by maintaining accurate and up-to-date logs, ensures that all projects have the necessary management oversight and approvals for successful completion.
  • Ensures the implementation of approved best practices and information technology policies that result in the highest quality systems administration.
  • Manages the creation of standards and procedures to maintain production servers that run the operating system. Manages the installation, configuration, and maintenance of operating systems and utility software.
  • Performs other related work as needed.


Minimum Qualifications

Education:

Minimum requirements include a college or university degree in related field.


Work Experience:

Minimum requirements include knowledge and skills developed through 7+ years of work experience in a related job discipline.


Certifications:

---

Preferred Qualifications

Education:

  • Bachelor’s degree in computer science or related field in the physical sciences.

Experience:

  • Experience managing large-scale computing systems in academic, research, or enterprise environments.
  • Demonstrated leadership of technical staff and successful delivery of complex cyberinfrastructure projects.
  • Strong background in scientific or high-performance computing, distributed systems, and emerging cloud-native technologies.
  • Experience implementing modern operational practices such as container orchestration, automation, and advanced data-delivery services.
  • Familiarity with secure, policy-compliant operations, including network security and identity management.
  • Experience supporting large CPU/GPU clusters, multi-petabyte storage systems, and data-intensive workflows.
  • Proven ability to evaluate and integrate new technologies to enhance performance and efficiency.
  • Record of effective collaboration with external partners and participation in professional technical communities.

Preferred Competencies

  • Strong leadership, communication, and collaboration skills, with the ability to work effectively with researchers, technical staff, and institutional partners.
  • Ability to operate in a dynamic research environment and stay current with advances in scientific and cloud-native computing.
  • Proficiency in managing Unix/Linux systems, distributed storage platforms (e.g., Ceph), and high-performance networking.
  • Familiarity with container orchestration and cloud-native technologies, including Kubernetes, CI/CD pipelines, and GitOps methodologies.
  • Strong analytical and problem-solving abilities, with experience diagnosing and resolving complex infrastructure challenges.
  • Experience applying automation, monitoring, and modern operational practices to improve system reliability and efficiency.
  • Demonstrated ability to guide teams, build consensus, and drive process innovation in multi-stakeholder technical environments.

Working Conditions

  • Presence on campus full time at the Hyde Park campus of the University of Chicago is required.
  • Additionally, you should be capable of physically setting up server and networking equipment within professional data center environments.

Application Documents

  • Resume (required)
  • Cover Letter (required)
  • References (preferred)


When applying, the document(s) MUST be uploaded via the My Experience page, in the section titled Application Documents of the application.


Job Family

Information Technology


Role Impact

People Manager


Scheduled Weekly Hours

37.5


Drug Test Required

No


Health Screen Required

No


Motor Vehicle Record Inquiry Required

No


Pay Rate Type

Salary


FLSA Status

Exempt


Pay Range

$120,000.00 - $135,000.00

The included pay rate or range represents the University’s good faith estimate of the possible compensation offer for this role at the time of posting.


Benefits Eligible

Yes

The University of Chicago offers a wide range of benefits programs and resources for eligible employees, including health, retirement, and paid time off. Information about the benefit offerings can be found in the Benefits Guidebook.


Posting Statement

The University of Chicago is an equal opportunity employer and does not discriminate on the basis of race, color, religion, sex, sexual orientation, gender, gender identity, or expression, national or ethnic origin, shared ancestry, age, status as an individual with a disability, military or veteran status, genetic information, or other protected classes under the law. For additional information please see the University's Notice of Nondiscrimination.

Job seekers in need of a reasonable accommodation to complete the application process should call 773-702-5800 or submit a request via Applicant Inquiry Form.

All offers of employment are contingent upon a background check that includes a review of conviction history. A conviction does not automatically preclude University employment. Rather, the University considers conviction information on a case-by-case basis and assesses the nature of the offense, the circumstances surrounding it, the proximity in time of the conviction, and its relevance to the position.

The University of Chicago's Annual Security & Fire Safety Report (Report) provides information about University offices and programs that provide safety support, crime and fire statistics, emergency response and communications plans, and other policies and information. The Report can be accessed online at: http://securityreport.uchicago.edu. Paper copies of the Report are available, upon request, from the University of Chicago Police Department, 850 E. 61st Street, Chicago, IL 60637.