Architect scalable systems for ingesting, processing, and querying high-volume datasets
Lead development of core data infrastructure components, including scalable data pipelines
Design, implement, and maintain robust orchestration frameworks (e.g., Apache Airflow, Dagster, Prefect) to schedule, monitor, and manage large-scale data workflows
Develop and maintain self-service platform SDKs, libraries, and tooling to empower other engineering and analytics teams to build, deploy, and monitor pipelines independently
Optimize the performance, cost-efficiency, and reliability of platform services
Ensure high availability, observability, and compliance for platform components
Implement secure data handling practices and fault-tolerant systems
Mentor junior engineers and provide code reviews, guidance, and architectural leadership
Collaborate with analytics and customer-facing teams to align infrastructure with use-case requirements
Requirements
5+ years of experience building and maintaining production-grade data systems
Deep expertise in distributed processing frameworks (e.g., Spark, Flink, Kafka) and cloud architecture
Advanced proficiency in Python and SQL (especially Spark SQL)
Experience developing internal platform tooling or SDKs that enable self-service data and workflow management
Experience with geospatial data processing, formats, and services (e.g., Apache Sedona)
Strong understanding of CI/CD workflows, observability tools, and platform reliability engineering
Tech Stack
Airflow
Apache
Cloud
Kafka
Python
Spark
SQL
Benefits
Competitive salary
Comprehensive health, dental, and vision insurance plans
Flexible hybrid work environment
Additional benefits like flexible hours, work travel opportunities, competitive vacation time and parental leave