BlastPoint is a B2B data analytics startup located in the East Liberty neighborhood of Pittsburgh. They are seeking a talented Senior Data Engineer to own and evolve their data processing pipeline, working across a production-scale medallion architecture to serve clients in the utility and financial services industries.
Responsibilities:
- Design, develop, and maintain our core Python ETL framework by writing reusable, well-tested modules that power data transformations across client pipelines
- Develop and optimize our automated refresh pipeline orchestrated through AWS Batch, Lambda, Step Functions, and Event Bridge
- Build Python integrations with external systems (SFTP, third-party APIs, client platforms) that are robust, testable, and reuseable
- Identify and eliminate manual bottlenecks in data onboarding and analysis through well-designed automation
- Build and extend internal web applications (FastAPI, SQLAlchemy, PostgreSQL) that support pipeline orchestration, client configuration, and data platform operations
- Ensure data integrity and security throughout project life cycles
- Write efficient server-side Python code, leveraging the Pandas and PySpark DataFrame APIs for scalable data transformations and aggregations
- Optimize Spark jobs for cost and performance at scale
- Debug complex data quality issues across client pipelines
- Mentor junior engineers on data transformation patterns, aggregation frameworks, and best practices
- Contribute to our internal metadata management application (FastAPI backend, React/TypeScript frontend).uild API endpoints, write database migrations, and occasionally develop frontendfeatures
- Maintain the metadata layer that drives pipeline configuration and data governance
Requirements:
- Bachelor's degree in a related field like Data Engineering, Computer Science, Data Science, Math, Statistics with 3+ years of experience or 5+ years of relevant experience
- Experience designing and maintaining production ETL/ELT pipelines with proper error handling, idempotency, and monitoring
- Advanced proficiency in Python, with deep experience in Pandas and PySpark (DataFrame API, SQL, performance tuning, distributed joins)
- Strong SQL skills with PostgreSQL, including query optimization, indexing strategies, and schema design
- Hands-on experience with AWS services including but not limited to: S3, Lambda, Batch, SageMaker, and Step Functions
- Experience with PyArrow and columnar data formats (Parquet) and data lake patterns
- Strong problem-solving skills with the ability to work autonomously, make architectural decisions, and manage multiple concurrent projects
- Excellent communication skills with the ability to drive cross-functional collaboration, proactively engaging stakeholders to align on requirements and solutions
- Experience using Git for version control and repository management
- Authorized to work in the United States
- Experience with Infrastructure as Code (Terraform)
- Experience implementing observability solutions (monitoring, logging, alerting) for production data pipelines
- Experience developing REST APIs with FastAPI, SQLAlchemy, and Alembic (or equivalent web frameworks and ORMs)
- Understanding of MLOps. Experience building and deploying LLM-powered agents
- Experience with Apache Iceberg or similar data lakehouse technologies
- Experience with geospatial data processing (geocoding, spatial joins)
- Familiarity with React/TypeScript for contributing to internal tooling
- Understanding of CI/CD (GitHub Actions)
- Experience mentoring junior engineers
- A willingness to travel domestically periodically for company events (roughly 2-4 times per year)