Design, develop, and maintain scalable, reliable, and efficient ETL/ELT pipelines for batch and real-time data processing (e.g., MQTT/Kafka data ingestion).
Manage and optimize our data warehousing solutions, primarily Google BigQuery, ensuring efficient data storage, querying, and cost-effectiveness.
Implement and maintain data quality assertions across all data pipelines to ensure data integrity from source to consumption.
Develop and integrate new data sources into our existing data ecosystem.
Troubleshoot and resolve data pipeline issues, ensuring minimal disruption to data availability.
Collaborate closely with company teams to understand their data needs and develop tailored data solutions.
Design and implement data workflows to support machine learning workflows.
Contribute to the development of data-driven insights that improve robot autonomy and performance.
Contribute to the design and evolution of our overall data architecture, ensuring scalability, performance, and maintainability.
Implement and adhere to best practices for data modeling, schema design, and data governance.
Work with cloud infrastructure (GCP preferred) to deploy and manage data services.
Develop and maintain monitoring solutions for data pipeline health and performance.
Ensure data consistency and accuracy in reporting tools.
Mentor junior team members and contribute to a culture of continuous learning and knowledge sharing within the data team.
Requirements
Strong proficiency in Python for data engineering and scripting.
Extensive experience with SQL and relational databases (PostgreSQL preferred).
Proven expertise with Google Cloud Platform (GCP) services, especially BigQuery, Cloud Storage, Cloud Functions.
Experience designing, building, and maintaining robust ETL/ELT data pipelines.
Familiarity with data orchestration tools (e.g., Apache Airflow).
Experience with real-time data processing technologies (e.g., Kafka, MQTT).
Understanding of data modeling techniques (e.g., dimensional modeling, Kimball).
Familiarity with version control systems (Git/GitHub).