Design agent systems from first principles. Decide the loop, the tools, the context strategy, the evaluation harness. Choose between single-agent and multi-agent topologies, between LLM reasoning and deterministic post-passes, between retrieval and direct context loading — and defend the choice with data.
Engineer the context. The hardest part of building a good agent is what goes into the prompt and what comes out. You'll obsess over context windows, tool surfaces, structured outputs, citation grounding, and the prompt itself.
Drive evaluation rigor. Build evals before you build the agent. Diagnose where it fails, fix the root cause, and prove the fix moved the metric.
Use AI tooling like a power user. A meaningful fraction of your day will be spent driving Claude Code, Codex, and similar tools to plan, scaffold, refactor, and debug your own work. We expect you to be faster with these tools than most engineers are without them.
Become a domain expert. Healthcare claims, coding guidelines, and the medical record itself are unavoidable parts of the job. Strong engineers who lean into the domain become outsized contributors here.
Requirements
2–4 years of applied ML / AI engineering experience with a Bachelor's in CS, Math, Engineering or equivalent — or a Master's in a similar program with no prior industry experience required. Either way, at least one production-quality system (industry, research, or substantial open-source) you owned end-to-end.
Strong Python engineering. Clean abstractions, type discipline, async, tested code.
Deep, hands-on understanding of agent loops — how a model decides to call a tool, how a tool result re-enters context, how loops terminate, where they fail.
Hands-on experience with at least one major agent SDK — OpenAI Agents SDK, Anthropic SDK / claude-agent-sdk, LangGraph, or equivalent — and an opinion on the tradeoffs.
Working knowledge of how modern coding agents are built and how they engineer context — what goes in the system prompt, how files are read and edited, how long-running tasks are planned and tracked, where they break.
Fluency with Claude Code / Codex as a power user. You should be able to brainstorm, plan, and execute non-trivial engineering tasks with these tools — including reading their source when needed to understand or extend behavior.
Solid command of VS Code and git — branches, rebases, worktrees, conflict resolution, PR workflows. Not optional.
A bias toward measurement: you don't ship without an eval, and you don't believe a number you can't reproduce.
Tech Stack
Python
Benefits
Work from anywhere in the US! Machinify is digital-first.
Top Medical/Dental/Vision offerings
FSA/HSA
Tuition reimbursement
Competitive salary, 401(k) with company match
Unlimited PTO
Additional health and wellness benefits and perks
Flexible and trusting environment where you’ll feel empowered to do your best work