Lead Site Reliability Engineer

The Walt Disney Company

The Walt Disney Company

Software Engineering
Orlando, FL, USA
Posted on Aug 22, 2025

Job Posting Title:

Lead Site Reliability Engineer

Req ID:

10129163

Job Description:

“We Power the Magic!” That’s our motto at Disney Experiences (DX). Our team creates world-class immersive digital experiences for the Company’s premier vacation brands including Disney’s Parks & Resorts worldwide, Disney Cruise Line, Aulani, a Disney Resort & Spa, and Disney Vacation Club.

We are responsible for the end-to-end digital and physical Guest experience for all technology & digital-led initiatives across the Attractions & Entertainment, Food & Beverage, Resorts & Transportation and Merchandise lines of business as well as other initiatives including MyDisneyExperience and Hey, Disney!

This role sits in the Commerce Shared Services organization within Technology & Digital for Disney Experiences. It works closely with Technical Operations and Product Delivery from across the company.

The Lead Site Reliability Engineer will report to the Manager-Site Reliability Engineer.

About The Role & Team:

This is a team lead role that focuses on engineering and reliability with a team of site reliability engineers. You will be responsible for coordinating the teams efforts for the portfolio of applications supported by the team. This team needs a strong mentor who can help develop and execute specific reliability plans in line with the business strategy of DX Tech and Digital.

What You'll Do:

  • Lead the evolution of DevOps practices within the broader team framework, guiding others in leveraging this culture to enhance observability practices.

  • Consult, design, build, and support development pipelines, automate infrastructure and operations, create telemetry for monitoring, engineer high reliability and reinforce best- practices to secure company data.

  • Expertise in systems administration skills on AWS Cloud, Docker, Kubernetes and must have extensive experience with web technologies, source control management using Nimbus, ECS, Tomcat, Harness, GitHub and GitLab.

  • Develop and advocate strategic directions for reliability, observability and recovery and bring practical knowledge on systems, network, operational excellence and application stability, security, performance, and capacity management.

  • Plan and coordinate larger efforts for the team of site reliability engineers.

  • You will be expected to stay up to date with emerging technologies so you can make informed recommendations.

  • Drive teams to consult, design, build, and support development pipelines, automate infrastructure and operations, build telemetry for monitoring, engineer high-reliability and reinforce best-practices to secure company data

Required Qualifications:

  • Minimum 7 years of related work experience

  • Demonstrated leadership in implementing observability principles across complex systems and environments, fostering a culture of reliability and resilience

  • Extensive experience with modern software delivery tools, including GitHub, GitLab, Harness.io, LaunchDarkly, Nimbus, Kubernetes and with optimizing workflows and ensuring seamless deployment processes

  • Outstanding communication and leadership abilities, to ensure effective growth and development of team

  • A visionary who motivates teams to excel and fosters creativity, consistently driving excellence in all endeavors

  • An advocate for a diverse and inclusive culture that encourages innovation and ensures every team member feels a sense of belonging

  • Proficient in implementing observability principles and advanced tools for system enhancement, applying expertise in major APM tools

  • Fluent in core scripting languages and advanced programming skills (Python, NodeJS, Golang), experienced with Linux, CLI's, and code editors like VS Code

  • Skilled in Source Control Management systems like GitHub and Gitlab, managing users, and repos, proficient in networking protocols, distributed systems, and container platforms (e.g., Docker, ECS)

  • Experience in cloud hosting services (AWS, Google Cloud, Azure), databases, tools, and security, with experience in CI pipelines, build tools like Jenkins, RESTful web service calls, and JSON

  • Outstanding troubleshooting methodology, including instructing new methodologies to the team and evaluating new systems and infrastructure solutions for technical feasibility against standards

Preferred Qualifications:

  • Leveraging AI for predictive insights, driving measurable continuous improvement in system reliability

Required Education:

  • Bachelor’s degree in Computer Science, Information Systems, Software, Electrical or Electronics Engineering, or comparable field of study, and/or equivalent work experience

#DISNEYTECH

Job Posting Segment:

Technology & Digital

Job Posting Primary Business:

Commerce

Primary Job Posting Category:

Site/System Reliability Engineer

Employment Type:

Full time

Primary City, State, Region, Postal Code:

Orlando, FL, USA

Alternate City, State, Region, Postal Code:

Date Posted:

2025-08-21