Oracle is seeking a motivated Principal Site Reliability Engineer who thrives in a fast-paced, rapidly evolving technology environment. This role requires broad expertise in Linux administration, automation, cloud computing, networking, cloud security, performance analysis, and monitoring to ensure the stability, security, performance, and reliability of infrastructure.

The Site Reliability Engineer will collaborate with multiple service and product teams to identify and resolve cross-team operational risks using strong engineering, troubleshooting, and operational guidance. The role also demands excellent communication and organizational skills, along with close partnership with service owners, engineers, and developers to deliver a superior support experience for the development community.

Only Oracle brings together the data, infrastructure, applications, and expertise to power everything from industry innovations to life-saving care. And with AI embedded across our products and services, we help customers turn that promise into a better future for all. Discover your potential at a company leading the way in AI and cloud solutions that impact billions of lives.

True innovation starts when everyone is empowered to contribute. That’s why we’re committed to growing a workforce that promotes opportunities for all with competitive benefits that support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.

We’re committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing accommodation-request_mb@oracle.com or by calling 1-888-404-2494 in the United States.

Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

Oracle is seeking a motivated Principal Site Reliability Engineer who thrives in a fast-paced, rapidly evolving technology environment. This role requires broad expertise in Linux administration, automation, cloud computing, networking, cloud security, performance analysis, and monitoring to ensure the stability, security, performance, and reliability of infrastructure. The Site Reliability Engineer will collaborate with multiple service and product teams to identify and resolve cross-team operational risks using strong engineering, troubleshooting, and operational guidance. The role also demands excellent communication and organizational skills, along with close partnership with service owners, engineers, and developers to deliver a superior support experience for the development community.

Career Level - IC4

Responsibilities

Drive incident response, root cause analysis (RCA), and remediation efforts; reduce repeat incidents through systemic fixes.
Own and improve service reliability, availability, performance, and operational readiness across critical systems.
Troubleshoot and resolve complex issues across Linux infrastructure and Oracle Cloud Infrastructure (OCI)
Serve as the escalation point for critical issues lacking documented procedures and deliver Root Cause Analysis (RCA)
Develop a comprehensive understanding of end-to-end configurations, technical dependencies, and characteristics of production infrastructure and services.
Quickly adapt to new, fast-changing technologies and incorporate them into automation and operational support.
Design and deliver mission-critical automation with strong focus on security, resiliency, scalability, and performance.
Create and maintain functional, technical, and SOP documentation.
Partner with development teams to define and implement improvements in service architecture.
Clearly communicate technical characteristics of services and technologies, guiding cross-functional teams to build and enhance internal tools.

Required Skills

6–12 years of experience in Linux system administration, kernel-level debugging, and performance tuning.
Strong expertise in automation, scripting, and development using Python and Terraform.
Proven experience supporting fault-tolerant, highly available, scalable distributed systems and production applications.
Skilled in troubleshooting across application, compute, storage, and database layers to improve reliability and availability.
Hands-on experience with cloud infrastructure, cloud security, compliance, patching, and operations/problem management.
Experience collaborating with global teams and working within Agile environments using tools like Jira.
Strong logical thinking, continuous learning mindset, teamwork, and excellent communication skills.

This job is no longer accepting applications

See open jobs at Oracle.See open jobs similar to "Principal Site Reliability Engineer" AnitaB.org.

See more open positions at Oracle

Powered by Getro.com

Privacy policy Cookie policy

Our Mission

Our History

Our Team

Our Board of Trustees

Board of Trustees Student Nominations

Audited Financials

Careers

Mentorship

Apprenticeship Pathway Program

Talent Network

Founders

Membership

Lifetime Membership

Responsible AI Certification (RAIC)

Apprenticeship Pathway Program Apprentice

Apprenticeship Pathway Program Industry Partners

NEXT

Tech Collabs

GHC

Donate

Recurring Donate

Sponsors & Partner Opportunities

Membership Sponsorship

Our Communities

Systers

Gift Membership

Case Studies & White Papers

Technical Equity Experience Study (TechEES)

Impact Reports

Visual Impact Report

Top Companies

Pass It On Awards

AnitaB.org Tech Journey Scholarship

Our Resources

Blog

Podcast

Become a Member

AnitaB.org Talent Network

Principal Site Reliability Engineer