Innodata Inc. is a global data engineering company focused on advancing artificial intelligence through reliable data solutions. The Data Engineer will design and build enterprise data warehouses and pipelines, enabling data-driven decision-making and supporting advanced AI/ML use cases.
Responsibilities:
- Design and implement data-driven solutions on GCP including BigQuery, Cloud Storage, Dataflow, Pub/Sub, and Looker/BI
- Build ETL scripts using SQL and Python to extract, clean, and transform structured and unstructured data from ERP, procurement, logistics, and facility management systems
- Develop and optimize data pipelines for ingestion, transformation, and loading into enterprise data lakes and warehouses
- Build and extend end-to-end data and BI solutions, spanning extraction, storage, transformation, and visualization layers
- Partner with supply chain, real estate, and AI/ML teams to provide pipelines for AI solutions (e.g., RAG ingestion, Copilot integration, multi-agent workflows)
- Ensure data governance, lineage, and compliance across supply chain datasets
- Continuously optimize query performance, ETL processes, and pipeline reliability
Requirements:
- Advanced proficiency in SQL (complex queries, optimization) and Python (data engineering, scripting, APIs)
- Experience building ETL/ELT pipelines operating on structured and unstructured data sources
- Knowledge of enterprise data warehouse and data lake architectures
- Exposure to data pipelines for AI/ML (vector DB ingestion, embeddings, RAG pipelines, copilots, agents)
- Strong hands-on expertise with GCP services: BigQuery, Dataflow, Pub/Sub, Cloud Storage, Looker/BI (or similar, preferred)
- Familiarity with supply chain or data center operations data is a strong plus
- Bonus: experience with ML Engineering, data visualization tools (Looker, Tableau, Power BI) and MLOps practices