Multi-step AI agents that handle complex, real-world workflows end to end
RAG pipelines that actually perform — not just LangChain defaults
Durable background workflows for processing, enrichment, and orchestration
Internal dashboards, admin tools, and integration surfaces (Slack, Discord, Intercom)
Eval harnesses to measure and improve model performance over time
The glue layer between LLMs, APIs, databases, and product features
Requirements
Python 3.12 as your primary language — you live in the AI/ML ecosystem; FastAPI for AI/agent service APIs
TypeScript (strict) in a monorepo on Bun + Turborepo — Next.js (App Router), React, Tailwind, shadcn/ui for UI; Hono + tRPC for the API layer; Better Auth for authentication
Comfortable with SQL and Drizzle ORM with PostgreSQL — comfortable querying and working with a relational data model
Hands-on with Anthropic, OpenAI, and Google Vertex AI (Gemini) APIs — function/tool calling, structured outputs, prompt caching; multi-provider failover via OpenRouter and Cloudflare AI Gateway
Production experience with an agent framework: LangGraph for agent/AI services — not just a weekend project
Familiar with Model Context Protocol (MCP) — building and consuming MCP clients/servers to extend agent capabilities
You know the boring-but-critical stuff: streaming, retries, token cost management, rate limiting
Experience with vector DBs: pgvector (preferred), Pinecone, Weaviate, or Qdrant
Intentional about embedding models and chunking strategy — you've reasoned about trade-offs, not just accepted defaults
Implemented hybrid search (BM25 + vector) when precision matters
Code-first workflow experience: BullMQ + Redis for queues, scheduled jobs, and durable background orchestration
Aware of low-code tools: n8n, Zapier, Make — can use them when appropriate
Background jobs, queues, and scheduled tasks are second nature
Active user of at least one eval/observability platform: Langfuse (self-hosted) for tracing and evals — our primary observability platform; familiarity with LangSmith, Braintrust, or Helicone is a plus
You've actually run an eval set — you have opinions on what makes a good one
Claude Code is your daily driver — plus comfort operating within a homegrown agentic-dev layer (hooks, skills, subagents)
You ship internal tools in days, not sprints
You have a 'vibe coding' instinct — but you don't let it erode eng discipline
Comfortable deploying on AWS (EC2, S3, SES, ECR, Parameter Store) in us-east-2; Docker / Docker Compose for dev and prod; Caddy for TLS and reverse proxy; GitHub Actions + OIDC for CI/CD
Docker / Docker Compose, env management, and secrets handling (AWS Parameter Store) are table stakes
Some Biome, Vitest, and Playwright familiarity is a plus for linting, unit testing, and end-to-end testing
Can spin up Next.js (App Router) / React dashboards and admin interfaces when needed
Familiar with Discord.js, Slack API, and Intercom API; comfortable with Asana, Fellow, and Google Workspace (Drive/Calendar/Gmail) integrations
Understands product instrumentation with Mixpanel or GA4
Tech Stack
AWS
Docker
EC2
JavaScript
Next.js
Postgres
Python
React
Redis
SQL
TypeScript
Benefits
You'll work on a real, fast-moving product.
AI is a first-class priority, not an afterthought.