Convey Health Solutions is a company that delivers technology and analytics services to health plans. They are seeking a Senior Data Engineer to drive the design and implementation of enterprise-grade data pipelines and ensure the data infrastructure is secure and scalable while collaborating with cross-functional teams.
Responsibilities:
- Build enterprise-level scalable, low-latency, fault-tolerant data platforms that provide meaningful and timely insights
- Collaborate with a team of engineers to design and build data pipelines using big data technologies (Spark, Snowflake, AWS Big Data Services, Iceberg, Airflow) for medium to large-scale datasets
- Influence best practices for data pipeline design, data architecture, and processing of structured and unstructured data
- Automate and manage Git workflows, including CI/CD pipelines and Git hooks for data pipeline validation
- Design and deploy event-driven serverless functions using AWS Lambda, Step Functions, or API Gateway
- Implement and manage feature flag frameworks to support safe data product rollouts
- Identify and resolve data quality issues, including inaccuracies and incomplete information
- Enhance data quality efforts by implementing improved procedures and processes
- Author detailed technical specifications and design documents for internal data products
- Work in a creative & collaborative environment driven by agile methodologies
- Continuously improve our products, systems, code, and team processes
- Implement best practices and raise the bar by introducing new ones
Requirements:
- Bachelor's degree in data engineering, Computer Science
- Python & PySpark (6–8 yrs) – Designed reusable data pipeline frameworks and transformations for healthcare data
- Airflow (5–6 yrs) – Authored complex DAGs to orchestrate multi-stage ETL workflows across systems
- AWS Glue & Lambda (4–6 yrs) – Led implementation of serverless ingestion pipelines integrated with S3
- Iceberg (2+ yrs) – Modeled transactional lakehouse tables with schema evolution and time-travel support
- EMR Studio (2–3 yrs) – Used for pipeline prototyping, debugging, and notebook-based collaboration
- Docker/Kubernetes (3–5 yr) – Built and deployed containerized ETL pipelines using Docker; leveraged Kubernetes to orchestrate scalable data workflows in production environments