Vytwo Technologies Inc is seeking a Senior Data Engineer to design, develop, and maintain scalable data pipelines. The role involves building data processing frameworks on cloud platforms, implementing data governance standards, and collaborating with cross-functional teams to deliver effective data solutions.
Responsibilities:
- Design, develop, and maintain scalable data pipelines using Python, PySpark, and other modern programming languages to support both batch and streaming workloads
- Build and optimize data processing frameworks on cloud platforms such as Databricks or Snowflake, ensuring performance, reliability, and cost efficiency
- Design and implement robust data models, including transactional (OLTP) and dimensional (OLAP) schemas, to support analytics, reporting, and application integration
- Develop high quality SQL code including complex queries, stored procedures, and views, with a focus on performance tuning and efficient data access patterns
- Create and manage workflow orchestration using Apache Airflow or similar tools, ensuring reliable scheduling, dependency management, and monitoring
- Implement and enforce data governance and metadata standards through tools such as Microsoft Purview, including data lineage, classification, cataloging, and security policies
- Build automated data quality and validation frameworks to ensure accuracy, completeness, and reliability of production datasets
- Collaborate with cross functional teams including data architects, analysts, scientists, and business stakeholders to understand requirements and deliver scalable, well designed data solutions
- Lead technical design sessions and code reviews, promoting engineering best practices, reusability, and maintainability
- Support cloud infrastructure and DevOps practices, including CI/CD pipelines, version control, testing automation, and environment management
- Monitor and troubleshoot production data pipelines, proactively addressing issues, performance bottlenecks, and system failures
- Contribute to the evolution of the enterprise data platform, recommending tools, frameworks, and architectures to improve scalability and efficiency
Requirements:
- 5+ years of experience in data engineering, software engineering, or similar disciplines
- Hands-on experience with Databricks or Snowflake
- Experience with orchestration tools such as Apache Airflow
- Experience working with cloud ecosystems (Azure preferred; AWS/GCP acceptable)
- Advanced SQL skills and experience with OLTP and OLAP data modeling
- Solid understanding of modern data warehousing, data lake, and ELT/ETL design patterns
- Familiarity with data governance tools, especially Microsoft Purview
- Solid programming expertise in Python, PySpark, or similar languages
- Healthcare industry experience, including claims, clinical, FHIR, HL7, or provider data
- Experience with containerization (Docker, Kubernetes) for data workloads
- Experience supporting machine learning workflows or analytical data science pipelines
- Knowledge of distributed computing concepts and performance tuning