HPC Sr. Scientific Software Engineer (IT@JH Research Computing)

Johns Hopkins University

Johns Hopkins University

Software Engineering, IT
Baltimore, MD, USA
USD 99,800-175k / year
Posted on Nov 21, 2025

IT@JH Research Computing is seeking a HPC Sr. Scientific Software Engineer who will design, build, and support Johns Hopkins University’s high-performance computing and AI research infrastructure. This role integrates elements of both systems and software engineering, ensuring scalable, secure, and reproducible environments for scientific and data-intensive research. The Engineer develops and automates system and application workflows across CPU/GPU clusters, parallel storage, and hybrid cloud platforms. Responsibilities include configuring and optimizing large-scale Linux environments, implementing job scheduling and orchestration frameworks, containerizing applications, and supporting researchers in optimizing performance and reproducibility. Work combines project-based engineering with operational support, requiring both independent problem-solving and close collaboration with the Research Computing team and faculty stakeholders.

Specific Duties & Responsibilities

Software Deployment and Design

  • Develop and refine deployment strategies for scientific software on HPC and AI systems.
  • Design computational workflows, selecting optimal software configurations, and utilizing tools like Ansible for automation.
  • Assist teams in implementing, tuning, and optimizing AI models and gateway applications (e.g., XDMoD, Coldfront, Open OnDemand, CryoSPARC Live, SBGrid, AI Agents).

Performance Optimization

  • Analyze and optimize the performance of AI models and HPC applications, focusing on GPU-enabled computing.
  • Implement parallel processing, distributed computing, and resource management techniques for efficient job execution.

Integration and Optimization

  • Develop, debug, and maintain software tools, libraries, and frameworks supporting HPC and AI workloads.
  • Collaborate with the system team and software vendors (e.g., NVIDIA, Intel, Matlab) to optimize systems for maximum performance.
  • Utilize CUDA, DNN, TensorRT, and Intel Compilers to enhance system performance.

HPC Scientific Software Support

  • Manage and support scientific software deployment across HPC, cloud-based, and colocation facilities.
  • Oversee installation, configuration, and maintenance of HPC packages with tools like CMake, Make, EasyBuild, Spack, and Lua module files

Collaboration and Mentorship

  • Work closely with cross-functional teams, including researchers, data scientists, and software developers, to address complex HPC/AI challenges.
  • Mentor junior engineers and foster a culture of continuous learning.

Technical Support and Training Workshops and Troubleshooting

  • Resolve complex technical issues and perform root cause analysis for HPC/AI software challenges.
  • Implement effective solutions to prevent recurrence and improve system reliability
  • Provide training workshops for researchers and students, focusing on troubleshooting, optimizing workflows, and effectively using HPC systems.

Learning and Development

  • Stay current with advances in HPC and AI technologies and methodologies.
  • Incorporate new research findings into existing systems to improve performance and capabilities.

Container Orchestration

  • Develop and manage container orchestration strategies to ensure scalability, reliability, and security of applications.
  • Oversee the container lifecycle from creation and deployment to scaling and removal.

Documentation and Compliance

  • Create comprehensive documentation for system designs, performance metrics, and project status.
  • Ensure compliance with security and regulatory standards for all HPC and AI systems.

In Addition to the Duties Described Above

  • Design, deploy, and maintain large-scale Linux HPC clusters with CPU/GPU resources, high-speed networks, and distributed storage.
  • Develop and maintain automation frameworks for provisioning, monitoring, and software lifecycle management.
  • Implement and optimize job scheduling, container orchestration, and workflow automation tools to support diverse research workloads.
  • Collaborate with faculty and research teams to parallelize, containerize, and scale computational workflows for multi-GPU and distributed environments.
  • Benchmark and tune application performance across architectures, documenting findings and sharing best practices.
  • Integrate and support AI/ML frameworks, scientific libraries, and workflow engines (Snakemake, Nextflow, Dask, Ray).
  • Ensure system and application reliability through proactive monitoring (Prometheus, Grafana, ELK) and incident response participation.
  • Support reproducibility and FAIR data principles through version-controlled, containerized environments.
  • Contribute to documentation, training materials, and technical guidance to enhance user experience and self-service capabilities.
  • Participate in evaluation and adoption of new technologies to advance performance, efficiency, and sustainability in research computing.


Minimum Qualifications
  • PhD in a quantitative discipline.
  • Five years of experience in HPC user support, software deployment, and performance optimization within an academic or research environment.
  • Additional education may substitute for required experience and additional related experience may substitute for required education beyond a high school diploma/graduation equivalent, to the extent permitted by the JHU equivalency formula.


Preferred Qualifications
  • Eight + years of professional experience in high-performance computing, large-scale systems, or research software engineering.
  • Deep proficiency in Linux systems administration, performance tuning, and automation tools (Ansible, Terraform, Jenkins, or similar).
  • Experience with cluster management, workload schedulers (e.g., Slurm), and distributed or parallel file systems (e.g., GPFS, Lustre, WekaFS, Ceph).
  • Strong background in programming or scripting (Python, Bash, C/C++, Go, or Rust).
  • Familiarity with containerization and orchestration technologies used in HPC (Singularity, Apptainer, Docker, Kubernetes).
  • Understanding of high-speed interconnects (InfiniBand, 100/400 Gb Ethernet) and storage/data access patterns for AI and analytics.
  • Experience developing or maintaining CI/CD pipelines and module environments (Lmod/Spack) for research software.
  • Knowledge of GPU computing (CUDA, ROCm), MPI/OpenMP, and AI/ML frameworks.
  • Demonstrated ability to collaborate with researchers on performance optimization, workflow design, and reproducible computing.


Classified Title: HPC Sr. Scientific Software Engineer
Job Posting Title (Working Title): HPC Sr. Scientific Software Engineer (IT@JH Research Computing)
Role/Level/Range: ATP/04/PG
Starting Salary Range: $99,800 - $175,000 Annually (Commensurate w/exp.)
Employee group: Full Time
Schedule: Mon-Fri, 8:30am-5pm
FLSA Status: Exempt
Location: Johns Hopkins Bayview
Department name: IT@JH Research Computing
Personnel area: University Administration