GoGuardian is a company focused on improving learning environments for K-12 education through innovative technology solutions. They are seeking a Data Engineer II to design, build, and enhance their analytics and AI/ML ecosystem, collaborating with various teams to drive data-driven products and capabilities.
Responsibilities:
- Design, build, and optimize ETL pipelines that power analytics, data science, and ML workflows using tools such as Databricks, PySpark, and Airflow
- Develop and maintain labeling and retraining pipelines for machine learning models, ensuring quality, reproducibility, and observability
- Implement and support MLOps practices, including model versioning, CI/CD for ML, and model monitoring in production environments
- Collaborate with data scientists to productionize and scale model training, inference, and evaluation pipelines
- Contribute to the design and evolution of the data lakehouse, including schema design, partitioning strategies, and performance optimization
- Document and communicate data architecture, lineage, and dependencies to ensure transparency and maintainability across teams
- Champion data quality and governance, ensuring that datasets are accurate, well-structured, and compliant with organizational standards
- Leverage infrastructure-as-code and containerization to build reproducible, maintainable environments
- Participate in code reviews and continuous improvement of engineering best practices within the team
Requirements:
- Bachelor's degree in Computer Science, Engineering, or related field
- 2–4 years of experience building and operating large-scale data systems, ideally supporting analytics and ML workloads
- Proficiency in Python and SQL, with experience in PySpark, pandas, or similar data processing frameworks
- Experience with DBT
- Experience with modern data warehousing and lakehouse platforms, preferably Databricks
- Hands-on experience with workflow orchestration tools such as Airflow, Dagster, or Prefect
- Strong understanding of data modeling, ETL design, and distributed data systems
- Experience with AWS data and compute services (S3, Lambda, ECS, CloudWatch, etc.) or equivalent cloud platforms
- Familiarity with MLOps concepts (e.g., feature stores, model registries, CI/CD for ML)
- Experience using Infrastructure as Code, preferably Terraform
- Excellent problem-solving, collaboration, and communication skills; comfortable working in a dynamic, fast-paced environment