Dynamo AI is building the future of trustworthy AI for the enterprise, focusing on secure deployments across regulated sectors. The role involves defining and neutralizing security risks in agentic systems through research and engineering, including designing experiments and building prototypes.
Responsibilities:
- Define and validate threat models for agentic systems, identifying which tool characteristics must co-exist to enable data exfiltration and malicious state change, and how to break those combinations
- Design and run experiments: create synthetic environments like file systems and tools, create task distributions that have attack paths and apply different attack strategies
- Break (manually and using optimization algorithms such as RL) agentic systems in
- Design and improve static and dynamic analysis methods that automatically map tool capabilities to risk across diverse tool ecosystems, and make those methods scale
- Turn research insights into product-facing capabilities: risk classification, automated guardrail generation, and quantitative threat measurement
- Build measurement tools: eval harnesses, monitoring, dashboards, and feedback loops that quantify security outcomes
- Build capability and regression evals
- Optimize systems for real-world constraints (latency, cost, reliability) without losing scientific rigor
Requirements:
- You have an MS or PhD in CS/ML (or equivalent research experience) and enjoy working under uncertainty
- You've fine-tuned and evaluated models in practice and can reason about data quality, overfitting, evals, and deployment constraints
- You can write strong production code, and you're comfortable owning the infrastructure that makes agentic evals run end-to-end. You care about reproducibility and instrumentation. No AI slop
- You're motivated by security problems and enjoy thinking like both builder and attacker
- You reason about how capabilities combine into risk: not just individual vulnerabilities, but system-level attack surfaces across tool ecosystems
- You communicate clearly, iterate fast, and can hold a technical narrative from 'hypothesis' to 'shipped'