Innodata Inc. is a leading data engineering company specializing in AI technology solutions for major global clients. The Senior Data Engineer will design and optimize scalable data pipelines, develop event-driven architectures, and ensure data reliability and performance across distributed environments.
Responsibilities:
- Design, build, and optimize scalable data pipelines for batch and real-time processing
- Develop and maintain event-driven architectures for high-throughput systems
- Ensure data reliability, performance, and low-latency processing across distributed environments
- Collaborate with data scientists and application teams to enable analytics and AI use cases
- Implement best practices in performance tuning, monitoring, and cost optimization
- Advanced proficiency in Python for backend and large-scale data processing
- Strong experience building and managing big data pipelines in production environments
- Hands-on expertise with workflow orchestration tools such as Airflow or Google Cloud Composer
- Proven experience in batch and streaming data processing using:
-
-
- Experience designing and operating event-driven systems using Pub/Sub
- Strong understanding of distributed systems architecture and scalability patterns
- Experience managing globally distributed, low-latency datasets
- Hands-on experience with NoSQL databases and/or Google Cloud Spanner
- Strong knowledge of system reliability, fault tolerance, and performance optimization
Requirements:
- Design, build, and optimize scalable data pipelines for batch and real-time processing
- Develop and maintain event-driven architectures for high-throughput systems
- Ensure data reliability, performance, and low-latency processing across distributed environments
- Collaborate with data scientists and application teams to enable analytics and AI use cases
- Implement best practices in performance tuning, monitoring, and cost optimization
- Advanced proficiency in Python for backend and large-scale data processing
- Strong experience building and managing big data pipelines in production environments
- Hands-on expertise with workflow orchestration tools such as Airflow or Google Cloud Composer
- Proven experience in batch and streaming data processing using: Apache Spark, Apache Beam (Dataflow)
- Experience designing and operating event-driven systems using Pub/Sub
- Strong understanding of distributed systems architecture and scalability patterns
- Experience managing globally distributed, low-latency datasets
- Hands-on experience with NoSQL databases and/or Google Cloud Spanner
- Strong knowledge of system reliability, fault tolerance, and performance optimization
- Proficiency in Go, Java, or Scala
- Experience with Kafka or Flume for streaming ingestion
- Deep familiarity with the Google Cloud Platform ecosystem
- Experience with production monitoring, logging, and observability frameworks
- Exposure to high-availability, multi-region deployments