Akkodis is seeking a Data Engineer 3 for a Contract position. The role involves building and scaling data pipelines for machine-generated data, focusing on model training and production systems.
Responsibilities:
- Build and scale distributed data pipelines for large-scale time series and log data
- Design reliable, high-performance Spark/Python workflows for model training datasets
- Analyze and resolve performance bottlenecks (latency, memory, skew, throughput)
- Improve data quality, validation, and reproducibility for ML workloads
- Partner with ML engineers and researchers to accelerate foundation model development
- Measure and optimize application and transaction performance in production systems
Requirements:
- 5+ years of software engineering experience
- Strong proficiency in Python
- Hands-on experience with Apache Spark (PySpark or Scala)
- Experience building large-scale data pipelines in distributed environment
- Experience working with time series, logs, or high-volume event data
- Strong debugging and performance optimization skills
- Experience supporting ML or large model training workflows
- Familiarity with sequence modeling or time series data systems
- Experience with streaming systems (Kafka, Spark Streaming)
- Experience with cloud-native or Kubernetes-based platforms