Torc Robotics is a leader in autonomous driving technology, focused on developing software for automated trucks. The Senior Software Engineer - Data Pipeline will design and develop high-performance data converters and large-scale ingestion pipelines to support the company's autonomous driving stack, ensuring the reliability and quality of production datasets.
Responsibilities:
- Design and develop high‑performance data converters for multi‑sensor autonomous‑driving data (camera, lidar, radar), ensuring accurate time alignment and robust handling of raw sensor logs
- Design, build, and optimize large‑scale ingestion and transformation pipelines (ETL/ELT) capable of processing petabyte‑scale autonomous‑driving sensor data, and automate them for reliable, production‑grade deployment
- Work with data formats such as ROS bags, MCAP, and custom binary encodings; establish standards for schema evolution and metadata integrity
- Implement automated data validation, quality checks, and lineage tracking to ensure reliability of production datasets
- Collaborate closely with ML, annotation, simulation, and perception teams to ensure cross‑team ownership of data products and deliver datasets that are consistent, semantically correct, and ready for downstream consumption
- Proactively assess current capabilities to identify areas for improvement proposing solutions that align with core strategy and operation
Requirements:
- Bachelor's or Master's degree in STEM related field with 5+ years of working experience with cloud technologies & data operations
- Experience building or maintaining converters, decoders, or transformation pipelines for sensor‑rich data (e.g., lidar point clouds, camera streams, radar detections)
- Understanding of multimodal data synchronization, timestamp alignment, and multi‑sensor calibration workflows
- Experience with distributed compute frameworks (Ray, Spark, Beam) and cloud‑based platforms like Anyscale and Databricks for large‑scale data‑pipeline execution
- Experience with high‑performance computing techniques, including vectorized data processing (NumPy), multithreaded or parallel execution, and GPU‑accelerated compute for optimizing large‑scale sensor‑data workloads
- Proficiency in Python, SQL, Shell Scripting
- Experience with major cloud providers like AWS, Google Cloud Platform (GCP) or Azure
- Operates with broad autonomy, leading complex technical work and driving alignment across team boundaries
- Owns key data‑pipeline and converter solutions end‑to‑end, setting direction and building consensus
- Provides project leadership and mentors less‑experienced engineers to ensure high‑quality execution
- Working experience with design patterns & frameworks development for ML & operational data pipelines in the cloud
- Familiarity with 3D labeling and CV annotation workflows
- Experience optimizing I/O‑heavy workloads, including columnar formats (Parquet, Arrow)
- Knowledge of orchestration tools (Airflow, Argo, Prefect)
- Hands‑on experience designing CI/CD automation for data services, including GitHub Actions, Databricks pipelines, and cloud‑native deployment workflows
- Background in Agile Engineering Practices