itD is a consulting and software development company seeking a Software Engineer to design and scale data pipelines for machine-generated data. The role involves building distributed data pipelines, optimizing performance, and collaborating with machine learning engineers to support model training workflows.
Responsibilities:
- Build and scale distributed data pipelines for large-scale time series, log data, and high-volume event streams
- Design and maintain reliable, high-performance Spark and Python workflows to support model training datasets
- Analyze and resolve performance bottlenecks related to latency, memory utilization, data skew, and throughput
- Improve data quality, validation processes, and reproducibility for machine learning workloads
- Partner with machine learning engineers and researchers to accelerate foundation model development
- Measure and optimize application and transaction performance in production data systems
- Collaborate cross-functionally to ensure data infrastructure aligns with evolving research and product needs
- Attend regular internal practice community meetings
- Collaborate with your itD practice team on industry thought leadership
- Complete client case studies and learning material (blogs, media material)
- Build out material to contribute to the Digital Transformation practice
- Attend internal itD networking events (in person and virtual)
- Work with leadership on career fast-track opportunities
Requirements:
- 5+ years of software engineering experience
- Strong proficiency in Python
- Hands-on experience with Apache Spark (PySpark or Scala)
- Experience building large-scale data pipelines in distributed environments
- Experience working with time series data, logs, or high-volume event streams
- Strong debugging skills and experience with performance optimization in distributed systems
- Bachelor's degree in a relevant field or equivalent work experience required
- Experience supporting machine learning or large model training workflows
- Familiarity with sequence modeling or time series data systems
- Experience with streaming systems such as Kafka or Spark Streaming
- Experience with cloud-native or Kubernetes-based platforms
- Cisco experience a plus