Circle is building the world’s leading all-in-one platform for online communities, and they are seeking an AI Platform Engineer to help establish the foundation for their AI systems. This role focuses on building infrastructure to measure and improve AI system performance, with significant ownership in shaping the approach to production AI systems.
Responsibilities:
- Build evaluation infrastructure to measure AI system speed and accuracy — both offline (during development) and online (in production)
- Create observability tooling and dashboards that surface quality metrics week-over-week
- Diagnose quality gaps. When accuracy drops, trace whether it's retrieval, ranking, prompting, or something else causing the issue
- Experiment with different models and agent configurations, using data to guide decisions
- Prototype and validate improvements to our RAG pipeline — chunking strategies, retrieval methods, re-ranking approaches
- Analyze how our customers are using our AI features to help us identify improvements or new areas for development
- Work closely with other engineers to give them confidence that their changes improve quality
- Help us stay up-to-date with the cutting-edge AI research, techniques, and tools
Requirements:
- 6+ years of experience in software engineering, data science, or ML — with a foundation in areas like search/retrieval, information extraction, NLP, or recommendation systems before transitioning to LLM-powered applications
- Experience building and evaluating AI systems in production — including RAG pipelines, search / retrieval systems, LLM-powered applications, and both offline and LLM-based evaluation frameworks
- Strong proficiency in Python for prototyping and experimentation
- Openness to learning Ruby on Rails — our production system is built in Rails, and you'll need to integrate with and instrument the existing codebase
- Comfortable building infrastructure and tooling (eval pipelines, dashboards, data processing)
- Deep understanding of RAG architecture: chunking strategies, embeddings, retrieval, re-ranking, context management
- Strong experimentation mindset — you're comfortable designing and running A/B tests, measuring results, and iterating quickly to discover what works and what doesn't
- Strong data analysis skills — you can interpret results, identify patterns, and communicate findings clearly
- A desire to work in an environment which values speed of iteration and individual autonomy, while also embracing personal accountability and the ability to collaborate effectively as part of a dynamic team
- Comfortable in a fast-paced environment with a certain level of ambiguity, especially when learning and picking up new technologies when projects require it
- Strong alignment with our values, find our values on our career page if you haven't read up on them yet
- You are proficient in English (spoken, written, and reading) at a CEFR Level C2 / ILR Level 5
- Experience with evaluation frameworks (Braintrust, LangSmith, or similar)