Hero Image

AnitaB.org Talent Network

Connecting women in tech with the best professional opportunities!

Principal Site Reliability Engineer

Oracle

Oracle

Administration, Software Engineering
Bengaluru, Karnataka, India · Hyderabad, Telangana, India · Thiruvananthapuram, Kerala, India
Posted on Feb 17, 2026

Oracle is seeking a motivated Principal Site Reliability Engineer who thrives in a fast-paced, rapidly evolving technology environment. This role requires broad expertise in Linux administration, automation, cloud computing, networking, cloud security, performance analysis, and monitoring to ensure the stability, security, performance, and reliability of infrastructure.

The Site Reliability Engineer will collaborate with multiple service and product teams to identify and resolve cross-team operational risks using strong engineering, troubleshooting, and operational guidance. The role also demands excellent communication and organizational skills, along with close partnership with service owners, engineers, and developers to deliver a superior support experience for the development community.


Only Oracle brings together the data, infrastructure, applications, and expertise to power everything from industry innovations to life-saving care. And with AI embedded across our products and services, we help customers turn that promise into a better future for all. Discover your potential at a company leading the way in AI and cloud solutions that impact billions of lives.

True innovation starts when everyone is empowered to contribute. That’s why we’re committed to growing a workforce that promotes opportunities for all with competitive benefits that support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.

We’re committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing accommodation-request_mb@oracle.com or by calling 1-888-404-2494 in the United States.

Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.


Oracle is seeking a motivated Principal Site Reliability Engineer who thrives in a fast-paced, rapidly evolving technology environment. This role requires broad expertise in Linux administration, automation, cloud computing, networking, cloud security, performance analysis, and monitoring to ensure the stability, security, performance, and reliability of infrastructure. The Site Reliability Engineer will collaborate with multiple service and product teams to identify and resolve cross-team operational risks using strong engineering, troubleshooting, and operational guidance. The role also demands excellent communication and organizational skills, along with close partnership with service owners, engineers, and developers to deliver a superior support experience for the development community.

Career Level - IC4


Responsibilities

  • Drive incident response, root cause analysis (RCA), and remediation efforts; reduce repeat incidents through systemic fixes.
  • Own and improve service reliability, availability, performance, and operational readiness across critical systems.
  • Troubleshoot and resolve complex issues across Linux infrastructure and Oracle Cloud Infrastructure (OCI)
  • Serve as the escalation point for critical issues lacking documented procedures and deliver Root Cause Analysis (RCA)
  • Develop a comprehensive understanding of end-to-end configurations, technical dependencies, and characteristics of production infrastructure and services.
  • Quickly adapt to new, fast-changing technologies and incorporate them into automation and operational support.
  • Design and deliver mission-critical automation with strong focus on security, resiliency, scalability, and performance.
  • Create and maintain functional, technical, and SOP documentation.
  • Partner with development teams to define and implement improvements in service architecture.
  • Clearly communicate technical characteristics of services and technologies, guiding cross-functional teams to build and enhance internal tools.

Required Skills

  • 6–12 years of experience in Linux system administration, kernel-level debugging, and performance tuning.
  • Strong expertise in automation, scripting, and development using Python and Terraform.
  • Proven experience supporting fault-tolerant, highly available, scalable distributed systems and production applications.
  • Skilled in troubleshooting across application, compute, storage, and database layers to improve reliability and availability.
  • Hands-on experience with cloud infrastructure, cloud security, compliance, patching, and operations/problem management.
  • Experience collaborating with global teams and working within Agile environments using tools like Jira.
  • Strong logical thinking, continuous learning mindset, teamwork, and excellent communication skills.