Own the end-to-end data processing infrastructure that powers Yuzee's intelligent course and job matching platform
Design and maintain scalable, event-driven pipelines that process tens of thousands of daily records, generate semantic embeddings, and feed a growing knowledge graph
Generate and manage semantic embeddings via Amazon Bedrock (Titan v2), index them in MongoDB Atlas Vector Search, and calibrate similarity thresholds
Build and maintain a knowledge graph linking jobs, courses, skills, and industries using FP-Growth association rules and archetype-to-SOC code mapping
Build and improve a two-stage discovery and matching API on AWS Lambda
Maintain and improve daily job scrapers across multiple sources and build institution data scrapers with robust HTML cleaning pipelines
Requirements
1+ years of backend engineering experience focused on data pipelines, ML infrastructure, or search systems
Hands-on experience with AWS serverless and container services — Lambda, ECS Fargate, EventBridge, and Step Functions
Strong Python skills — Pandas, async processing, bulk database operations, and text cleaning
Familiarity with vector databases and semantic similarity search; MongoDB Atlas Vector Search experience is a strong plus
Cost-conscious infrastructure mindset — you think in per-record compute costs, free tiers, Spot resilience, and right-sizing
Ability to document and communicate complex architecture clearly to both technical and non-technical stakeholders
Nice to have: Experience with knowledge graphs or association rule mining (FP-Growth, Apriori)
Nice to have: Experience using LLMs for re-ranking or eligibility assessment on top of vector retrieval results
Background in edtech, jobtech, or recommendation/matching systems
Degree or existing proven experience
Tech Stack
AWS
MongoDB
Pandas
Python
Benefits
Fully remote / work-from-home role
Flexible working hours within the team’s expected schedule and business needs
Opportunity to work on real backend, data, and AI infrastructure projects
Exposure to practical engineering challenges in scraping, pipelines, retrieval, and cloud systems
Ongoing growth and development within a fast-moving technology environment
Opportunity to build long-term value and grow with the company based on performance, including progression and increased responsibility over time
Some flexibility in working hours, depending on team requirements and deliverables
Hands-on experience working on meaningful backend, data pipeline, and AI-related systems
Opportunity to contribute to a growing platform with real product and engineering challenges
Professional growth in a practical, fast-paced environment
Strong potential for long-term progression based on performance, regardless of location