GoGuardian is a company focused on improving learning environments for K-12 education through innovative technology solutions. They are seeking a Data Engineer II to design, build, and enhance their analytics and AI/ML ecosystem, collaborating with various teams to drive data-driven products and capabilities.

Responsibilities:

Design, build, and optimize ETL pipelines that power analytics, data science, and ML workflows using tools such as Databricks, PySpark, and Airflow
Develop and maintain labeling and retraining pipelines for machine learning models, ensuring quality, reproducibility, and observability
Implement and support MLOps practices, including model versioning, CI/CD for ML, and model monitoring in production environments
Collaborate with data scientists to productionize and scale model training, inference, and evaluation pipelines
Contribute to the design and evolution of the data lakehouse, including schema design, partitioning strategies, and performance optimization
Document and communicate data architecture, lineage, and dependencies to ensure transparency and maintainability across teams
Champion data quality and governance, ensuring that datasets are accurate, well-structured, and compliant with organizational standards
Leverage infrastructure-as-code and containerization to build reproducible, maintainable environments
Participate in code reviews and continuous improvement of engineering best practices within the team

Requirements:

Bachelor's degree in Computer Science, Engineering, or related field
2–4 years of experience building and operating large-scale data systems, ideally supporting analytics and ML workloads
Proficiency in Python and SQL, with experience in PySpark, pandas, or similar data processing frameworks
Experience with DBT
Experience with modern data warehousing and lakehouse platforms, preferably Databricks
Hands-on experience with workflow orchestration tools such as Airflow, Dagster, or Prefect
Strong understanding of data modeling, ETL design, and distributed data systems
Experience with AWS data and compute services (S3, Lambda, ECS, CloudWatch, etc.) or equivalent cloud platforms
Familiarity with MLOps concepts (e.g., feature stores, model registries, CI/CD for ML)
Experience using Infrastructure as Code, preferably Terraform
Excellent problem-solving, collaboration, and communication skills; comfortable working in a dynamic, fast-paced environment

Data Engineer II

Key skills

About this role

Responsibilities:

Requirements: