Parafin is a company dedicated to empowering small businesses by providing them with essential financial tools. They are seeking a seasoned software engineer to lead the development of their next-generation Data Platform, ensuring the reliability and scalability of their data infrastructure while collaborating with various teams to support data-driven initiatives.
Responsibilities:
- Design and build robust, highly scalable data pipelines and lakehouse infrastructure with PySpark, Databricks, and Airflow on AWS
- Improve the data platform development experience for Engineering, Data Science, and Product by creating intuitive abstractions, self‑service tooling, and clear documentation
- Own and maintain core data pipelines and models that power internal dashboards, ML models, and customer-facing products
- Own the Data & ML platform infrastructure using Terraform, including end‑to‑end administration of Databricks workspaces: manage user access, monitor performance, optimize configurations (e.g., clusters, lakehouse settings), and ensure high availability of data pipelines
- Lead projects to improve data quality, testing, observability, and cost efficiency across existing pipelines and backend systems (e.g., migrating Databricks SQL pipelines to dbt, scaling data ingestion, improving data-lineage tracking, and enhancing monitoring)
- Act as the primary engineering partner for the Data Science team—embedded closely to gather requirements, design scalable solutions, and provide end-to-end support on all engineering aspects of their work
- Work closely with backend engineers and data scientists to design performant data models and support new product development initiatives
- Share best practices and mentor other engineers working on data-centric systems
Requirements:
- 4+ years of experience in software engineering with a strong background in data infrastructure, pipelines, and distributed systems
- Advanced proficiency in Python and SQL
- Hands-on Spark development experience
- Expertise with modern cloud data stacks—AWS (S3, RDS), Databricks, and Airflow—and lakehouse architectures
- Hands-on experience with foundational data-infrastructure technologies such as Hadoop, Hive, Kafka (or similar streaming platforms), Delta Lake/Iceberg, and distributed query engines like Trino/Presto
- Familiarity with ingestion frameworks, developer-experience tooling, and best practices for data versioning, lineage, partitioning, and clustering
- Strong problem-solving skills and a proactive attitude toward ownership and platform health
- Excellent communication and collaboration skills, especially in cross-functional settings
- Experience with AWS infrastructure using Terraform
- Familiarity with observability tools (e.g., Datadog) and cost tracking in cloud environments
- Experience with financial systems or building platforms in a fintech setting
- Prior work on ML infrastructure: Feature stores (e.g., Tecton), ML model lifecycle (training, deployment, monitoring, retraining), real-time inference
- Contributions to internal tooling or open-source projects in the data ecosystem