Design, build, and maintain scalable data pipelines that ingest, normalize, and transform financial and clinical trial data from multiple internal and external sources, with a focus on making data suitable for analytics, reporting, and LLM-based AI agents.
Develop backend services and data access layers that expose high-quality financial data to internal systems and customer-facing features, ensuring data is structured for direct consumption by LLMs and automated workflows.
Implement and operate embedding pipelines and vectorized representations of structured and semi-structured data to support semantic search, RAG, and agentic workflows.
Optimize database performance, query execution, and batch processing jobs to support large-scale financial datasets and AI-driven access patterns.
Participate in architectural decisions around data infrastructure, storage strategies, and integration with LLMs, embeddings, and vector databases.
Work closely with product managers and analysts to translate business requirements into durable datasets, metrics, and APIs.
Contribute to frontend implementations to ensure data-heavy and AI-enabled features are presented clearly and correctly.
Mentor other engineers on data modeling best practices, SQL performance optimization, ETL design patterns, and building reliable, observable data systems.
Requirements
8+ years of professional software engineering experience, with a strong focus on backend development and data engineering.
Deep expertise in SQL and relational database design (PostgreSQL, MySQL, or similar), including complex analytical queries and performance tuning.
Experience building and operating ETL or ELT pipelines using orchestration frameworks (Airflow, Dagster, Prefect, or AWS Step Functions).
Production experience building AI-powered data systems, including vector databases (pgvector, Pinecone), embedding pipelines, and designing data access patterns for RAG and agentic workflows.
A pragmatic approach to the full stack, including a willingness to contribute to React codebases to ensure data-heavy features are delivered end-to-end.
Familiarity with data quality validation, testing, and monitoring practices in production systems.
Bachelor’s degree in Computer Science, Computer Engineering, or equivalent practical experience.
Strong proficiency in Python, with experience using web frameworks (Django, DRF).
Experience working with modern data warehouse platforms (Snowflake, BigQuery, Redshift).
Experience integrating backend systems or data pipelines with LLM APIs for enrichment, summarization, or analysis.
Experience with data transformation and validation tools (dbt, Great Expectations).
Experience building production AI features end-to-end, including LLM integration, prompt engineering, agentic workflows, and observability/evaluation frameworks.
Experience working in a Series A-C startup with rapid scale.
Tech Stack
Airflow
Amazon Redshift
AWS
BigQuery
Django
ETL
MySQL
Postgres
Python
React
SQL
Benefits
Competitive compensation and meaningful equity participation
Comprehensive employee benefits, including 100% company-paid health, dental, vision, and life insurance
401(k) plan with a 3% company match that vests immediately