Principal Software Developer
Oracle
About the Organization
Join Oracle Cloud Infrastructure’s Observability organization, a core OCI pillar enabling reliability, visibility, and operational excellence across all OCI services. The Telemetry Alarming team owns the monitoring and alerting layer that transforms raw telemetry into actionable insights for both OCI customers and internal service teams.
Our Mission
OCI Observability is building a world-class Integrated Observability and Management Platform that delivers seamless visibility across OCI, other clouds, and on-premises environments. The platform unifies Logging, Monitoring, Auditing, SIEM, Events, and Inventory into a cohesive experience providing actionable insights into the health, performance, and security of distributed systems.
What You’ll Do
Our systems evaluate millions of metrics per second across thousands of tenants, ensuring timely detection of anomalies, outages, and performance regressions at cloud scale. As a Principal Engineer, you will lead the architecture, design, and technical direction of the next-generation Alarming platform driving high availability, low-latency signal evaluation, intelligent suppression, and seamless integration with OCI’s unified Observability suite. This role provides deep ownership, visibility across the Observability stack, and the opportunity to shape OCI’s technical direction in monitoring and telemetry infrastructure.
- Define architecture, design, and technical direction for large-scale telemetry services.
- Lead design reviews, guide implementation quality, and ensure long-term maintainability.
- Collaborate with partner teams across OCI Observability, Control Plane, and Developer Platform.
- Mentor engineers, raise technical standards, and foster a culture of excellence and ownership.
- Anticipate and mitigate systemic risks, ensuring reliability and resilience at global scale.
What You’ll Get
- A supportive, engineering-driven culture that values innovation and technical rigor.
- Exposure to massive-scale distributed systems and deep infrastructure challenges.
- The agility of a focused team combined with the reach and stability of Oracle.
- Opportunities to expand skills across OCI’s broad cloud ecosystem.
- Continuous technical development and leadership growth.
- Comprehensive benefits and a collaborative, high-caliber engineering community.
As a world leader in cloud solutions, Oracle uses tomorrow’s technology to tackle today’s challenges. We’ve partnered with industry-leaders in almost every sector—and continue to thrive after 40+ years of change by operating with integrity.
We know that true innovation starts when everyone is empowered to contribute. That’s why we’re committed to growing an inclusive workforce that promotes opportunities for all.
Oracle careers open the door to global opportunities where work-life balance flourishes. We offer competitive benefits based on parity and consistency and support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.
We’re committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing accommodation-request_mb@oracle.com or by calling +1 888 404 2494 in the United States.
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
Principal Engineer leading architecture and development of OCI’s large-scale Monitoring and Alarming systems within the Telemetry organization.
Career Level - IC4
Job Responsibilities
- Define architecture and lead development of large-scale Monitoring and Alarming services handling multi-region, multi-tenant workloads.
- Design high-throughput evaluation pipelines for time-series data, optimized for low latency and fault tolerance.
- Drive the evolution of core capabilities such as alarm suppression, composite conditions, and intelligent correlation.
- Collaborate with peer teams in Telemetry to deliver integrated Observability experiences.
- Establish performance, reliability, and efficiency benchmarks for the Alarming platform.
- Mentor engineers, perform design reviews, and set technical standards across the organization.
- Lead incident analysis, root-cause investigations, and architectural remediation of complex production issues.
- Contribute to OCI-wide initiatives improving telemetry ingestion, query efficiency, and alerting reliability.
Required Qualifications
- BS/MS in Computer Science or related field, or equivalent practical experience.
- 6+ years of hands-on engineering experience, including 2+ years designing and leading cloud-scale systems.
- Expertise in distributed systems, microservices, and cloud-native architecture.
- Deep proficiency in at least one major programming language (Java, Go, or C#).
- Experience with one or more public clouds (OCI, AWS, Azure, GCP).
- Strong analytical, design, and debugging skills.
- Ability to communicate complex ideas clearly and lead technical discussions across teams.
- Passion for observability, telemetry, and building systems that operate at global scale.