Home
Jobs
Saved
Resumes
Senior AI Engineer – Inference, Agent Systems at Arcana Analytics | JobVerse
JobVerse
Home
Jobs
Recruiters
Companies
Pricing
Blog
Jobs
/
Senior AI Engineer – Inference, Agent Systems
Arcana Analytics
Website
LinkedIn
Senior AI Engineer – Inference, Agent Systems
India
Full Time
5 hours ago
Apply Now
Key skills
Docker
Kafka
Postgres
Python
Go
ML
LLM
OpenAI
Anthropic
Gemini
PostgreSQL
About this role
Role Overview
Drive TTFT below 400ms for multi-step agent pipelines
Streaming optimization: first token to user while sub-agents are still running
KV cache strategy, prompt compression, dynamic context window management
Multi-provider routing: model selection by latency, cost, and task type across OpenAI, Anthropic, Gemini, and open-weight models
Design and implement Plan-Execute-Synthesize pipelines that run sub-agents in parallel DAGs, not sequential chains
Build reliable orchestration on top of Temporal: retries, timeouts, partial failure recovery, idempotency
Structured output enforcement: JSON schema validation, retry loops on malformed LLM output, graceful degradation
Tool call design: schema design that LLMs actually follow reliably across providers
Own the eval framework end to end: ground truth datasets, automated scoring pipelines, regression detection on every PR
LLM-as-judge pipelines for qualitative output assessment
Latency regression testing
p50/p95/p99 tracked across every deployment
Adversarial test case design: ambiguous queries, missing data, conflicting sources, malformed tool responses
Model serving and cold start optimization
Async worker architecture for parallel sub-agent execution
Observability: trace every token, every tool call, every synthesis step
Requirements
You've built something that runs in production at a meaningful scale and you understand why it's fast (or why it isn't).
You've worked on inference pipelines where TTFT was the primary metric and you moved it meaningfully
You've built multi-step agent systems and you know where they break not from reading papers but from watching them fail in production
You've written eval harnesses from scratch and you have opinions about what makes a ground truth dataset actually useful
You've debugged LLM non-determinism in production and built systems resilient to it
You've worked with streaming LLM responses and built infrastructure around partial output handling
Strong ML research background without systems exposure
Stack familiarity: Go, Python, Temporal, Kafka, PostgreSQL, Docker
Tech Stack
Docker
Kafka
Postgres
Python
Go
Benefits
Health insurance
Flexible work arrangements
Professional development
Apply Now
Home
Jobs
Saved
Resumes