Data Engineer
Siemens
Job Description
Job ID
Company
Organization
Job Family
Experience Level
Full Time / Part Time
Contract Type
Role Overview
We are seeking an experienced Data Engineer with 7-10 years of experience to design, develop, and optimize data pipelines while integrating machine learning (ML) capabilities into production workflows. The ideal candidate will have a strong background in data engineering, big data technologies, cloud platforms, and ML model deployment. This role requires expertise in building scalable data architectures, processing large datasets, and supporting machine learning operations (MLOps) to enable data-driven decision-making.
Key Responsibilities
Data Engineering & Pipeline Development
- Design, develop, and maintain scalable, robust, and efficient data pipelines for batch and real-time data processing.
- Build and optimize ETL/ELT workflows to extract, transform, and load structured and unstructured data from multiple sources.
- Work with distributed data processing frameworks like Apache Spark, Hadoop, or Dask for large-scale data processing.
- Ensure data integrity, quality, and security across the data pipelines.
- Implement data governance, cataloging, and lineage tracking using appropriate tools.
Machine Learning Integration
- Collaborate with data scientists to deploy, monitor, and optimize ML models in production.
- Design and implement feature engineering pipelines to improve model performance.
- Build and maintain MLOps workflows, including model versioning, retraining, and performance tracking.
- Optimize ML model inference for low-latency and high-throughput applications.
- Work with ML frameworks such as TensorFlow, PyTorch, Scikit-learn, and deployment tools like Kubeflow, MLflow, or SageMaker.
Cloud & Big Data Technologies
- Architect and manage cloud-based data solutions using AWS, Azure, or GCP.
- Utilize serverless computing (AWS Lambda, Azure Functions) and containerization (Docker, Kubernetes) for scalable deployment.
- Work with data lakehouses (Delta Lake, Iceberg, Hudi) for efficient storage and retrieval.
Database & Storage Management
- Design and optimize relational (PostgreSQL, MySQL, SQL Server) and NoSQL (MongoDB, Cassandra, DynamoDB) databases.
- Manage and optimize data warehouses (Snowflake, BigQuery, Redshift, Databricks) for analytical workloads.
- Implement data partitioning, indexing, and query optimizations for performance improvements.
Collaboration & Best Practices
- Work closely with data scientists, software engineers, and DevOps teams to develop scalable and reusable data solutions.
- Implement CI/CD pipelines for automated testing, deployment, and monitoring of data workflows.
- Follow best practices in software engineering, data modeling, and documentation.
- Continuously improve the data infrastructure by researching and adopting new technologies.
Required Skills & Qualifications
Technical Skills:
- Programming Languages: Python, SQL, Scala, Java
- Big Data Technologies: Apache Spark, Hadoop, Dask, Kafka
- Cloud Platforms: AWS (Glue, S3, EMR, Lambda), Azure (Data Factory, Synapse), GCP (BigQuery, Dataflow)
- Data Warehousing: Snowflake, Redshift, BigQuery, Databricks
- Databases: PostgreSQL, MySQL, MongoDB, Cassandra
- ETL/ELT Tools: Airflow, dbt, Talend, Informatica
- Machine Learning Tools: MLflow, Kubeflow, TensorFlow, PyTorch, Scikit-learn
- MLOps & Model Deployment: Docker, Kubernetes, SageMaker, Vertex AI
- DevOps & CI/CD: Git, Jenkins, Terraform, CloudFormation
Soft Skills:
- Strong analytical and problem-solving abilities.
- Excellent collaboration and communication skills.
- Ability to work in an agile and cross-functional team environment.
- Strong documentation and technical writing skills.
Preferred Qualifications
- Experience with real-time streaming solutions like Apache Flink or Spark Streaming.
- Hands-on experience with vector databases and embeddings for ML-powered applications.
- Knowledge of data security, privacy, and compliance frameworks (GDPR, HIPAA).
- Experience with GraphQL and REST API development for data services.
- Understanding of LLMs and AI-driven data analytics.