BlastPoint is a B2B data analytics startup located in the East Liberty neighborhood of Pittsburgh. They are seeking a talented Senior Data Engineer to own and evolve their data processing pipeline, working across a production-scale medallion architecture to serve clients in the utility and financial services industries.

Responsibilities:

Design, develop, and maintain our core Python ETL framework by writing reusable, well-tested modules that power data transformations across client pipelines
Develop and optimize our automated refresh pipeline orchestrated through AWS Batch, Lambda, Step Functions, and Event Bridge
Build Python integrations with external systems (SFTP, third-party APIs, client platforms) that are robust, testable, and reuseable
Identify and eliminate manual bottlenecks in data onboarding and analysis through well-designed automation
Build and extend internal web applications (FastAPI, SQLAlchemy, PostgreSQL) that support pipeline orchestration, client configuration, and data platform operations
Ensure data integrity and security throughout project life cycles
Write efficient server-side Python code, leveraging the Pandas and PySpark DataFrame APIs for scalable data transformations and aggregations
Optimize Spark jobs for cost and performance at scale
Debug complex data quality issues across client pipelines
Mentor junior engineers on data transformation patterns, aggregation frameworks, and best practices
Contribute to our internal metadata management application (FastAPI backend, React/TypeScript frontend).uild API endpoints, write database migrations, and occasionally develop frontendfeatures
Maintain the metadata layer that drives pipeline configuration and data governance

Requirements:

Bachelor's degree in a related field like Data Engineering, Computer Science, Data Science, Math, Statistics with 3+ years of experience or 5+ years of relevant experience
Experience designing and maintaining production ETL/ELT pipelines with proper error handling, idempotency, and monitoring
Advanced proficiency in Python, with deep experience in Pandas and PySpark (DataFrame API, SQL, performance tuning, distributed joins)
Strong SQL skills with PostgreSQL, including query optimization, indexing strategies, and schema design
Hands-on experience with AWS services including but not limited to: S3, Lambda, Batch, SageMaker, and Step Functions
Experience with PyArrow and columnar data formats (Parquet) and data lake patterns
Strong problem-solving skills with the ability to work autonomously, make architectural decisions, and manage multiple concurrent projects
Excellent communication skills with the ability to drive cross-functional collaboration, proactively engaging stakeholders to align on requirements and solutions
Experience using Git for version control and repository management
Authorized to work in the United States
Experience with Infrastructure as Code (Terraform)
Experience implementing observability solutions (monitoring, logging, alerting) for production data pipelines
Experience developing REST APIs with FastAPI, SQLAlchemy, and Alembic (or equivalent web frameworks and ORMs)
Understanding of MLOps. Experience building and deploying LLM-powered agents
Experience with Apache Iceberg or similar data lakehouse technologies
Experience with geospatial data processing (geocoding, spatial joins)
Familiarity with React/TypeScript for contributing to internal tooling
Understanding of CI/CD (GitHub Actions)
Experience mentoring junior engineers
A willingness to travel domestically periodically for company events (roughly 2-4 times per year)

Sr. Data Engineer

Key skills

About this role

Responsibilities:

Requirements: