Verato is a high growth healthcare technology company focused on providing a single source of truth for identity in healthcare. They are seeking a Data Engineer II to join their dynamic development team, responsible for designing data pipelines and enhancing their healthcare-specific Master Data Management platform.
Responsibilities:
- Design data pipelines for API, streaming, and batch processing to facilitate data loads into the Snowflake data warehouse
- Collaborate with other engineering and DevOps team members to implement, test, deploy, and operate data pipelines and ETL solutions
- Develop scripts to Extract, Load and Transform data and other utility function
- Optimize data pipelines, ETL processes, and data integrations for large-scale data analytics use cases
- Build necessary components to ensure data quality, monitoring, alerting, integrity, and governance standards are maintained in data processing workflows
- Able to navigate ambiguity and thrives in a fast-paced environment. Takes initiative and consistently delivers results with minimal supervision
- Performs data profiling and analysis required to perform development work or troubleshoot/assist in the resolution of data issues
Requirements:
- Bachelor's or master's degree in Computer Science, Information Systems, or related field
- 3+ years of experience in building and maintaining data pipelines and ETL/ELT processes in data-centric organizations
- Hands-on experience building streaming and batch data pipelines
- 1+ years of working experience with Snowflake cloud data warehouse including Snowflake data shares, Snowpipes, Snow SQL, Tasks etc
- Strong coding skills using Python. Familiar with Python libraries related to data engineering and cloud services including pandas, boto, etc
- Hands-on experience with cloud platforms such as AWS and Google Cloud
- Experience working with agile development methodology
- Experienced in CI/CD and release processes, proficient in Git or other source control management systems, to streamline development and deployment workflows
- Minimum 2 years of designing and implementing operational production grade large-scale data pipelines, ETL/ELT and data integration solutions
- Exposure to multi-tenant/multi-customer environments is a big plus
- Hands on experience with productionized data ingestion and processing pipelines
- Strong understanding of Snowflake Internals and integration of Snowflake with other data processing and reporting technologies
- Proficiency with Kafka, AWS S3, SQS, Lambda, Pub/Sub, AWS DMS, Glue, AWS Batch or similar services from other cloud providers
- Experience working with structured, semi-structured, and unstructured data
- Familiarity with MongoDB or similar NoSQL database systems
- Background in healthcare data especially patient centric clinical data and provider data is a plus
- Familiarity with API security frameworks, token management and user access control including OAuth, JWT etc