Senior Data Engineer
Siemens
Job Description
Job ID
Company
Organization
Job Family
Experience Level
Full Time / Part Time
Contract Type
Hello Visionary!
We empower our people to stay resilient and relevant in a constantly changing world. We’re looking for people who are always searching for creative ways to grow and learn. People who want to make a real impact, now and in the future.
We are looking for a highly skilled and experienced Senior Data Engineer to join our dynamic data engineering team.
The ideal candidate will be responsible for building and maintaining scalable, high-performance data pipelines and cloud infrastructure, with a focus on managing vast amounts of data efficiently in real-time and batch processing environments. The role requires expertise in advanced ETL processes, AWS services such as Glue, Lambda, S3, Redshift, and EMR, and hands-on experience with big data technologies like Apache Spark, Kafka, Kinesis, and Apache Airflow.
You will work closely with data scientists, software engineers, and analysts to ensure that data is accessible, clean, and reliable for business-critical operations and advanced analytics.
Key Responsibilities:
Design & Architect Scalable Data Pipelines: Architect, build, and optimize high-throughput ETL pipelines using AWS Glue, Lambda, and EMR to handle large datasets and complex data workflows. Ensure the pipeline scales efficiently and handles real-time and batch processing.
Cloud Data Infrastructure Management: Implement, monitor, and maintain a cloud-native data infrastructure using AWS services like S3 for data storage, Redshift for data warehousing, and EMR for big data processing. Build robust, cost-effective solutions for storing, processing, and querying large datasets efficiently.
Data Transformation & Processing: Develop highly performant data transformation processes using Apache Spark on EMR for distributed data processing and parallel computation. Write optimized Spark jobs in Python (PySpark) for efficient data transformation.
Real-time Data Streaming Solutions: Design and implement real-time data ingestion and streaming systems using AWS Kinesis or Apache Kafka to handle event-driven architectures, process continuous data streams, and support real-time analytics.
Orchestration & Automation: Use Apache Airflow to schedule and orchestrate complex ETL workflows. Automate data pipeline processes, ensuring reliability, data integrity, and ease of monitoring. Implement self-healing workflows to recover from failures automatically.
Data Warehouse Optimization & Management: Develop and optimize data models, schemas, and queries in Amazon Redshift to ensure low-latency querying and scalable analytics. Apply best practices for data partitioning, indexing, and query optimization to increase performance and minimize costs.
Containerization & Orchestration:
Leverage Docker to containerize data engineering applications for better portability and consistent runtime environments. Use AWS Fargate for running containerized applications in a serverless environment, ensuring easy scaling and reduced operational overhead.
Monitoring & Debugging: Build automated monitoring and alerting systems to proactively detect and troubleshoot pipeline issues, ensuring data quality and operational efficiency. Use tools like CloudWatch, Prometheus, or other logging frameworks to ensure end-to-end visibility of data pipelines.
Collaboration with Cross-functional Teams: Work closely with data scientists, analysts, and application developers to design data models and ensure proper data availability. Collaborate in the development of solutions that meet the business’s data needs, from experimentation to production.
Security & Compliance: Implement data governance policies, security protocols, and compliance measures for handling sensitive data, including encryption, auditing, and IAM role-based access control in AWS.
we are looking for 5+ years of hands-on experience in building, maintaining, and optimizing data pipelines, ideally in a cloud-native environment.
ETL Expertise: Solid understanding of ETL/ELT processes and experience with tools like AWS Glue for building serverless ETL pipelines. Expertise in designing data transformation logic to move and process data efficiently across systems.
AWS Services: Deep experience working with AWS cloud services:
S3: Designing data lakes, ensuring scalability and performance.
AWS Glue: Writing custom jobs for transforming data.
Lambda: Writing event-driven functions to process and transform data on-demand.
Redshift: Optimizing data warehousing operations for efficient query performance.
EMR (Elastic MapReduce): Running distributed processing frameworks like Apache Spark or Hadoop to process large datasets.
Big Data Technologies: Expertise in using Apache Spark for distributed data processing at scale. Experience with real-time data processing using Apache Kafka and AWS Kinesis for building streaming data pipelines.
Data Orchestration: Strong experience with Apache Airflow or similar workflow orchestration tools for scheduling, monitoring, and managing ETL jobs and data workflows.
Programming & Scripting: Proficiency in Python programming language for building custom data pipelines and Spark jobs. Knowledge of standard processes in coding for high performance, maintainability, and reliability.
SQL & Query Optimization: Advanced knowledge of SQL and experience in query optimization, partitioning, and indexing for working with large datasets in Redshift and other data platforms.
CI/CD & DevOps Tools: Experience with version control systems like Git and implementing CI/CD pipelines using tools like Terraform or AWS CloudFormation to automate deployment and infrastructure management.
Preferred Qualifications:
Data Streaming:
Experience in designing and building real-time data streaming solutions using Kafka or Kinesis for real-time analytics and event processing.
Data Governance & Security:
Familiarity with data governance practices, data cataloging, and data lineage tools to ensure the quality and security of data.
Advanced Data Analytics Support:
Knowledge of supporting machine learning pipelines and building data systems that can scale to meet the requirements of AI/ML workloads.
Certifications:
AWS certifications such as AWS Certified Big Data – Specialty or AWS Certified Solutions Architect are highly desirable.
Make your mark in our exciting world at Siemens.
This role, based in Bangalore, is an individual contributor position. You may be required to visit other locations within India and internationally. In return, you'll have the opportunity to work with teams shaping the future.
At Siemens, we are a collection of over 312,000 minds building the future, one day at a time, worldwide. We are dedicated to equality and welcome applications that reflect the diversity of the communities we serve. All employment decisions at Siemens are based on qualifications, merit, and business need.
Bring your curiosity and imagination, and help us shape tomorrow
We’ll support you with:
Hybrid working opportunities.
Diverse and inclusive culture.
Variety of learning & development opportunities.
Attractive compensation package.
Find out more about Siemens careers at: www.siemens.com/careers