Collaborative Architecture: Partner with the team to provide architectural suggestions and formal proposals for our core data systems. You will help ensure a seamless flow between microservices, our data lake, and downstream analytics.
Modern Pipeline Development: Build and ship production-level data pipelines using PySpark and SQL. You will collaborate on establishing standards for idempotency, monitoring, and performance tuning.
Lakehouse Data Modeling: Implement robust data modeling patterns (Medallion architecture: Bronze/Silver/Gold). You will ensure our Lakehouse is not just a data dump, but a high-performance source of truth for both BI and ML.
Databricks & Lakehouse Evolution: Contribute to the continuous improvement of our Databricks environment, with a focus on Delta Lake optimization and robust governance via Unity Catalog.
Systems Evolution & Stability: Act as the steward of our production environment. You will lead the refactoring of legacy pipelines to improve observability, reduce technical debt, and ensure seamless data flow between our microservices and the Databricks Lakehouse.
Technical Influence: Work closely with the Director of Data to refine our technical roadmap. You will lead by example through deep-dive code reviews and by maintaining a high bar for technical documentation.
Reliability & Governance: Conduct comprehensive audits to identify system inefficiencies. You will share ownership of data quality, security, and privacy across the entire lifecycle of our datasets.
Requirements
Experience: 2
4+ years of hands-on data engineering experience, ideally within a high-growth SaaS or Fintech environment.
Tooling Expertise: Strong fundamental knowledge of the Databricks ecosystem (Spark, Delta Lake, Workflows) and AWS cloud infrastructure. Proficient experience with Apache Airflow for workflow management.
Technical Mastery: High proficiency in Python/PySpark and SQL. You should have a clear philosophy on what makes code maintainable and scalable.
AI/ML Literacy: Practical experience or a deep interest in the data requirements for GenAI, including handling data for LLMs and vector databases.
Platform Mindset: Experience with Infrastructure-as-Code (Terraform/CDK) and a strong commitment to CI/CD and automated testing.
Communication: "Strong opinions, loosely held." You can navigate complex technical trade-offs and communicate architectural proposals clearly to stakeholders at all levels.
Tech Stack
Airflow
Apache
AWS
Cloud
Microservices
PySpark
Python
Spark
SQL
Terraform
Unity
Benefits
Opportunity to leave your mark on a growing startup
An incredibly diverse team of brilliant minds from all over the world
Competitive compensation
Family-friendly policies
Work from home
Birthday treats, and a lunch of your choice every week (one of our values is Fun & Food!)