Hero Image

AnitaB.org Talent Network

Connecting women in tech with the best professional opportunities!

Senior Software Engineer - Real-Time Ingestion

Yahoo

Yahoo

Software Engineering
United States
USD 128,250-266,875 / year
Posted on Mar 27, 2026
Yahoo serves as a trusted guide for hundreds of millions of people globally, helping them achieve their goals online through our portfolio of iconic products. For advertisers, Yahoo Advertising offers omnichannel solutions and powerful data to engage with our brands and deliver results.

About the Team

Our platform is the foundational identity and data layer for 900M+ monthly active users, serving 2.5B+ profiles at massive scale. We are building a predictive, identity-centric insights engine—ensuring our audience is understood with precision to deliver hyper-personalized experiences and advertising solutions across all our digital properties.

Our mission centers on first-party data strategy: capturing, enriching, and activating audience signals to build a 360-degree view of every user. We operate under a Privacy-by-Design philosophy, adhering to global regulations (GDPR, CCPA) and industry security standards, while leveraging a cloud-native stack across GCP (BigQuery, Spanner, Dataflow, Composer, GKE) and AWS, with modern MLOps practices to deliver measurable business impact.

About the Role

As a Senior Data Engineer on the Consumer Data Organization(CDO), you will design and implement streaming data pipelines that process billions of user signals daily, maintaining a real-time view of 2.5B+ profiles. Your pipelines handle critical third-party ID mutations, behavioral signals, and identity updates with sub-second latency, ensuring data freshness for downstream activation and monetization use cases worth hundreds of millions in annual revenue.

You will build scalable Kafka-based streaming infrastructure processing millions of events per second, implementing Apache Beam/Dataflow jobs for stream processing, enrichment, and validation. Your work requires balancing extreme throughput requirements, data quality guarantees, and operational reliability while ensuring privacy-compliant handling of sensitive user data.

This role demands expertise in real-time streaming architectures, distributed messaging systems (Kafka, Pub/Sub), and production data engineering at massive scale. You will collaborate closely with Storage, Privacy, and Platform teams to ensure efficient data flow from ingestion to activation.

Key Responsibilities

  • Develop and optimize real-time streaming pipelines for third-party ID mutations, behavioral signals, and user event ingestion
  • Build scalable Kafka-based data pipelines handling millions of events per second with exactly-once processing semantics
  • Implement Apache Dataflow/Beam jobs for stream processing, enrichment, validation, and transformation of user signals
  • Design comprehensive monitoring and data quality checks ensuring pipeline reliability, data freshness, and SLA compliance
  • Collaborate with Storage team on efficient Cloud Spanner write patterns, schema design, and high-throughput mutation strategies
  • Optimize pipeline performance to reduce lag, improve throughput, and minimize processing costs in GCP infrastructure
  • Implement dead letter queues, retry logic, and error handling strategies ensuring data loss prevention
  • Troubleshoot production data issues including pipeline failures, data quality problems, and performance degradation
  • Work with Privacy team to ensure compliant data handling, PII protection, and sensitive data detection in real-time streams
  • Create comprehensive documentation for pipeline architecture, operational runbooks, and on-call procedures
  • Participate in on-call rotation supporting production streaming pipelines with 99.9% uptime SLA
  • Partner with upstream data producers to ensure consistent event schemas and data quality

Required Qualifications

Education

  • Bachelor's degree in Computer Science, Data Engineering, Software Engineering, or related technical field

Experience

  • 5+ years data engineering experience building production data systems
  • 3+ years hands-on experience with real-time/streaming data processing systems at scale
  • 2+ years with GCP (Dataflow, Pub/Sub, BigQuery, Spanner, GCS) or AWS equivalents (Kinesis, EMR, DynamoDB)

Technical Skills

  • Strong proficiency in Python, Java, or Scala for data pipeline development
  • Hands-on experience with Apache Kafka, Google Pub/Sub, or other distributed messaging platforms
  • Experience with Apache Beam, Apache Dataflow, or Apache Spark Streaming for stream processing
  • Understanding of stream processing patterns: windowing, watermarks, exactly-once semantics, state management
  • SQL proficiency and experience with distributed databases (Spanner, Cassandra, DynamoDB)
  • Familiarity with data serialization formats: Avro, Protobuf, JSON, Parquet

Competencies

  • Strong problem-solving skills and operational excellence mindset in production environments
  • Demonstrated ability delivering reliable data pipelines on schedule with minimal guidance
  • Excellent collaboration across engineering, product, and infrastructure teams
  • Team-level impact with ability to influence technical decisions within immediate team
  • Understanding of data governance and privacy compliance (GDPR, CCPA) in data pipelines

Preferred Qualifications

  • Experience with Cloud Spanner writes at high throughput (millions of writes per second)
  • Knowledge of data governance frameworks, privacy compliance, and PII handling best practices
  • Prior experience in adtech, identity platforms, or consumer data systems processing user behavioral data
  • Familiarity with data quality frameworks: Great Expectations, Deequ, or custom validation systems
  • Understanding of event-driven architectures, change data capture (CDC), and event sourcing patterns
  • Experience with schema evolution, schema registries (Confluent Schema Registry, Apicurio)
  • Contributions to open-source streaming projects (Kafka, Beam, Flink) or data engineering communities
  • Self-driven, detail-oriented, excellent multitasking abilities in fast-paced environments

The material job duties and responsibilities of this role include those listed above as well as adhering to Yahoo policies; exercising sound judgment; working effectively, safely and inclusively with others; exhibiting trustworthiness and meeting expectations; and safeguarding business operations and brand integrity.

At Yahoo, we offer flexible hybrid work options that our employees love! While most roles don’t require regular office attendance, you may occasionally be asked to attend in-person events or team sessions. You’ll always get notice to make arrangements. Your recruiter will let you know if a specific job requires regular attendance at a Yahoo office or facility. If you have any questions about how this applies to the role, just ask the recruiter!

Yahoo is proud to be an equal opportunity workplace. All qualified applicants will receive consideration for employment without regard to, and will not be discriminated against based on age, race, gender, color, religion, national origin, sexual orientation, gender identity, veteran status, disability or any other protected category. Yahoo will consider for employment qualified applicants with criminal histories in a manner consistent with applicable law. Yahoo is dedicated to providing an accessible environment for all candidates during the application process and for employees during their employment. If you need accessibility assistance and/or a reasonable accommodation due to a disability, please submit a request via the Accommodation Request Form (www.yahooinc.com/careers/contact-us.html) or call +1.866.772.3182. Requests and calls received for non-disability related issues, such as following up on an application, will not receive a response.

We believe that a diverse and inclusive workplace strengthens Yahoo and deepens our relationships. When you support everyone to be their best selves, they spark discovery, innovation and creativity. Among other efforts, our 11 employee resource groups (ERGs) enhance a culture of belonging with programs, events and fellowship that help educate, support and create a workplace where all feel welcome.

The compensation for this position ranges from $128,250.00 - $266,875.00/yr and will vary depending on factors such as your location, skills and experience.The compensation package may also include incentive compensation opportunities in the form of discretionary annual bonus or commissions. Our comprehensive benefits include healthcare, a great 401k, backup childcare, education stipends and much (much) more.

Currently work for Yahoo? Please apply on our internal career site.