Avahi is a premier cloud-first consulting company recognized for its people, culture, and innovative solutions. They are seeking a Data Engineer to design, build, and maintain scalable AWS data platforms, ensuring high data quality and reliable analytics.
Responsibilities:
- Design, build, and maintain scalable AWS data platforms supporting batch and streaming pipelines, analytics, and AI/ML workloads, aligned with AWS Well-Architected best practices
- Build and operate data ingestion, transformation, and enrichment pipelines from internal systems and external APIs, handling structured, semi-structured, unstructured, and graph data. Implement data normalization workflows to ensure consistent schemas, high data quality, and reliable analytics, BI, and ML use cases
- Design and enforce data governance including cataloging, lineage, access control, and auditability
- Build and maintain knowledge graphs to model relationships across core business entities, enabling advanced analytics and inference
- Identify data gaps, inconsistencies, and missing relationships using strong analytical and inference skills
- Integrate data from enterprise platforms such as CRM and ERP systems (Salesforce, HubSpot, SAP, NetSuite, Dynamics 365, Workday)
- Design secure data access layers for analytics, BI, ML, and downstream applications. Implement monitoring, observability, and data quality checks for freshness, completeness, and pipeline health
- Optimize data architectures for performance and cost efficiency using partitioning, indexing, compression, and storage tiering
- Build internal tooling, dashboards, and standardized scaffolding to improve visibility, maintainability, and onboarding
- Collaborate with cross-functional teams to deliver high-impact data solutions and share best practices, documentation, and technical guidance
Requirements:
- Strong experience designing and operating AWS data platforms, including S3, Glue, Lake Formation, Athena, Redshift, EMR, Kinesis/MSK, DynamoDB, OpenSearch, and Neptune
- Strong Python skills for data engineering, focused on modular, testable, and maintainable code
- Solid understanding of distributed data systems, including batch and streaming pipelines, fault tolerance, idempotency, and event-driven architectures
- Experience with data warehouse and lakehouse architectures, ETL/ELT pipelines, and analytical query engines
- Hands-on experience with Spark, Hadoop, Hive, or Flink
- Strong data modeling skills, including normalized, denormalized, and graph-based models, with safe schema evolution
- Advanced SQL skills for analytics and data engineering, including window functions, CTEs, and query optimization
- Experience integrating external APIs and enterprise systems, especially CRM and ERP platforms
- Knowledge of data governance, security, and compliance, including encryption, access control, and audit logging
- Experience implementing monitoring, observability, and data quality checks using CloudWatch and CloudTrail
- Comfort with Infrastructure as Code using CloudFormation or Terraform
- Strong end-to-end ownership mindset, with a focus on scalability, reliability, and long-term maintainability
- Professional-level English communication skills, able to explain data architectures and trade-offs to technical and non-technical stakeholders