Apetan Consulting LLC is seeking a Lead Data Engineer to design, build, and scale enterprise-grade data platforms and pipelines. This role focuses on leveraging Python, AWS, SQL, and Generative AI technologies to enable advanced analytics and data integration while mentoring teams and driving innovation in data engineering practices.
Responsibilities:
- Design and develop scalable data pipelines and ETL/ELT processes
- Build and maintain data lakes, data warehouses, and real-time data systems
- Lead architecture and implementation of data solutions on AWS
- Integrate Generative AI capabilities into data platforms and workflows
- Collaborate with data scientists, analysts, and business stakeholders
- Ensure data quality, governance, security, and compliance
- Optimize data processing performance and cost efficiency
- Define best practices, coding standards, and data engineering frameworks
- Mentor junior engineers and lead technical reviews
- Evaluate and adopt new tools, technologies, and methodologies
Requirements:
- Bachelor's or Master's degree in Computer Science, Engineering, or related field
- 8–12+ years of experience in data engineering
- Strong programming skills in Python
- Advanced SQL skills and experience with large-scale data processing
- Hands-on experience with AWS services (e.g., S3, Redshift, Glue, Lambda, EMR)
- Experience building data pipelines using tools like Airflow or similar
- Solid understanding of data modeling and data warehousing concepts
- Experience with big data technologies (Spark, Hadoop)
- Exposure to Generative AI/LLMs and their integration into data workflows
- Experience with streaming technologies (Kafka, Kinesis)
- Knowledge of containerization and orchestration (Docker, Kubernetes)
- Familiarity with ML pipelines and MLOps
- Experience with data governance and catalog tools
- AWS certifications (e.g., AWS Certified Data Analytics or Solutions Architect)