BECOME A member Get involved Donate Now!

Donate Now! BECOME A member

FAQ

AnitaB.org Talent Network

Connecting women in tech with the best professional opportunities!

191

COMPANIES

33,792

JOBS

My job alerts

Data Engineer

Siemens

Software Engineering, Data Science

Bengaluru, Karnataka, India

Posted on Apr 14, 2025

Apply now

Job Description

Job ID

461160

Company

SIEMENS ENERGY INDIA LIMITED

Organization

Siemens Energy

Job Family

Engineering

Experience Level

Experienced Professional

Full Time / Part Time

Full-time

Contract Type

Permanent

Role Overview

We are seeking an experienced Data Engineer with 7-10 years of experience to design, develop, and optimize data pipelines while integrating machine learning (ML) capabilities into production workflows. The ideal candidate will have a strong background in data engineering, big data technologies, cloud platforms, and ML model deployment. This role requires expertise in building scalable data architectures, processing large datasets, and supporting machine learning operations (MLOps) to enable data-driven decision-making.

Key Responsibilities

Data Engineering & Pipeline Development

Design, develop, and maintain scalable, robust, and efficient data pipelines for batch and real-time data processing.
Build and optimize ETL/ELT workflows to extract, transform, and load structured and unstructured data from multiple sources.
Work with distributed data processing frameworks like Apache Spark, Hadoop, or Dask for large-scale data processing.
Ensure data integrity, quality, and security across the data pipelines.
Implement data governance, cataloging, and lineage tracking using appropriate tools.

Machine Learning Integration

Collaborate with data scientists to deploy, monitor, and optimize ML models in production.
Design and implement feature engineering pipelines to improve model performance.
Build and maintain MLOps workflows, including model versioning, retraining, and performance tracking.
Optimize ML model inference for low-latency and high-throughput applications.
Work with ML frameworks such as TensorFlow, PyTorch, Scikit-learn, and deployment tools like Kubeflow, MLflow, or SageMaker.

Cloud & Big Data Technologies

Architect and manage cloud-based data solutions using AWS, Azure, or GCP.
Utilize serverless computing (AWS Lambda, Azure Functions) and containerization (Docker, Kubernetes) for scalable deployment.
Work with data lakehouses (Delta Lake, Iceberg, Hudi) for efficient storage and retrieval.

Database & Storage Management

Design and optimize relational (PostgreSQL, MySQL, SQL Server) and NoSQL (MongoDB, Cassandra, DynamoDB) databases.
Manage and optimize data warehouses (Snowflake, BigQuery, Redshift, Databricks) for analytical workloads.
Implement data partitioning, indexing, and query optimizations for performance improvements.

Collaboration & Best Practices

Work closely with data scientists, software engineers, and DevOps teams to develop scalable and reusable data solutions.
Implement CI/CD pipelines for automated testing, deployment, and monitoring of data workflows.
Follow best practices in software engineering, data modeling, and documentation.
Continuously improve the data infrastructure by researching and adopting new technologies.

Required Skills & Qualifications

Technical Skills:

Programming Languages: Python, SQL, Scala, Java
Big Data Technologies: Apache Spark, Hadoop, Dask, Kafka
Cloud Platforms: AWS (Glue, S3, EMR, Lambda), Azure (Data Factory, Synapse), GCP (BigQuery, Dataflow)
Data Warehousing: Snowflake, Redshift, BigQuery, Databricks
Databases: PostgreSQL, MySQL, MongoDB, Cassandra
ETL/ELT Tools: Airflow, dbt, Talend, Informatica
Machine Learning Tools: MLflow, Kubeflow, TensorFlow, PyTorch, Scikit-learn
MLOps & Model Deployment: Docker, Kubernetes, SageMaker, Vertex AI
DevOps & CI/CD: Git, Jenkins, Terraform, CloudFormation

Soft Skills:

Strong analytical and problem-solving abilities.
Excellent collaboration and communication skills.
Ability to work in an agile and cross-functional team environment.
Strong documentation and technical writing skills.

Preferred Qualifications

Experience with real-time streaming solutions like Apache Flink or Spark Streaming.
Hands-on experience with vector databases and embeddings for ML-powered applications.
Knowledge of data security, privacy, and compliance frameworks (GDPR, HIPAA).
Experience with GraphQL and REST API development for data services.
Understanding of LLMs and AI-driven data analytics.

Apply now

See more open positions at Siemens

Privacy policy Cookie policy