IR Labs is the innovation lab inside Integrated Research, focused on turning cutting-edge AI research into impactful products. The Machine Learning Engineer will own the graph-ML roadmap, design and train modern GNNs, and build high-performance training pipelines while mentoring the team.
Responsibilities:
- Own the graph-ML roadmap end-to-end—turn research into production, balance SOTA with real-world constraints, and champion graph learning across teams
- Design and train modern GNNs/graph transformers; explore self-supervision, sparsity, and pretraining to lift retrieval, grounding, and reasoning
- Build high-performance training/inference pipelines on distributed GPUs with efficient sampling, mixed precision, and custom optimization where needed
- Fuse graphs with language systems to power retrieval and reasoning primitives across the product
- Model complex technical artifacts as graphs (e.g., code/IR or telemetry) and learn over them for analysis and optimization signals
- Ship low-latency, scalable graph services and APIs with streaming updates and robust SLAs
- Benchmark and harden sparse+dense kernels; instrument for performance, correctness, and reliability
- Establish ML/DataOps for large graphs (versioning, lineage, CI/CD) and embed security, privacy, and compliance by design; mentor and uplevel the team
Requirements:
- 8+ years delivering production ML; 5+ years leading large-scale graph learning in production (100M–B+ edges)
- Deep mastery of GNNs/geometric DL, graph theory, and practical graph querying
- Proven impact combining graphs with LLM/NLP (e.g., KG-augmented retrieval, entity linking, grounding)
- Experience deriving graphs from complex sources (such as code/IR) for analysis, optimization, or security use cases
- Strong systems chops: C++/CUDA or equivalent; fluency in GPU/distributed training and performance tuning
- Track record building reliable pipelines and operating large data/feature stores for graph workloads
- Operational excellence in orchestration, containerization, observability, drift detection, and automated retraining
- Clear, persuasive technical leadership and mentoring across both technical and non-technical stakeholders