TechClub Inc is seeking a Principal AgentOps Engineer to take technical ownership of the AgentOps and AXIS platform components. The role involves driving system design decisions, implementing agentic workflows, and ensuring high availability and operational excellence on AWS.
Responsibilities:
- Act as technical owner and architect for AgentOps and AXIS platform components
- Drive system design decisions across scalability| resilience| performance| and security
- Evaluate tradeoffs and make high-impact architectural calls independently
- Ensure platform designs align with long-term enterprise and AIAgent strategy
- Design and implement agentic workflows| orchestration| and lifecycle management
- Extend and optimize Agentic frameworks (planning| memory| tool usage| coordination)
- Build guardrails for reliability| observability| and safe execution of autonomous agents
- Improve agent runtime performance and operational stability at scale
- Design and implement solutions using Graph Databases (e.g.| Neo4j| Neptune| TigerGraph| JanusGraph| etc.)
- Integrate graph-based knowledge with agentic systems and platform services
- Write and review production-grade code across critical services
- Build and enhance platform capabilities| APIs| and internal tooling
- Improve CICD pipelines| automation| and developer productivity tooling
- Lead debugging and resolution of complex production issues
- Architect and implement cloud-native solutions on AWS
- Ensure high availability| fault tolerance| and cost-efficient designs
- Embed observability (logging| metrics| tracing) and operational best practices
- Partner closely with Engineering Managers| Product| and Platform stakeholders
- Mentor senior engineers and unblock teams as needed
- Produce clear architecture docs| runbooks| and design artifacts
- Ensure strong knowledge transfer before contract completion
Requirements:
- Principal-Level Technical Ownership Act as technical owner and architect for AgentOps and AXIS platform components
- Drive system design decisions across scalability| resilience| performance| and security
- Evaluate tradeoffs and make high-impact architectural calls independently
- Ensure platform designs align with long-term enterprise and AIAgent strategy
- Design and implement agentic workflows| orchestration| and lifecycle management
- Extend and optimize Agentic frameworks (planning| memory| tool usage| coordination)
- Build guardrails for reliability| observability| and safe execution of autonomous agents
- Improve agent runtime performance and operational stability at scale
- Design and implement solutions using Graph Databases (e.g.| Neo4j| Neptune| TigerGraph| JanusGraph| etc.)
- Integrate graph-based knowledge with agentic systems and platform services
- Write and review production-grade code across critical services
- Build and enhance platform capabilities| APIs| and internal tooling
- Improve CICD pipelines| automation| and developer productivity tooling
- Lead debugging and resolution of complex production issues
- Architect and implement cloud-native solutions on AWS
- Deep hands-on experience with AWS services (e.g.| EKS| ECS| Lambda| EC2| S3| DynamoDB| RDS| OpenSearch| IAM)
- Ensure high availability| fault tolerance| and cost-efficient designs
- Embed observability (logging| metrics| tracing) and operational best practices
- Partner closely with Engineering Managers| Product| and Platform stakeholders
- Mentor senior engineers and unblock teams as needed
- Produce clear architecture docs| runbooks| and design artifacts
- Ensure strong knowledge transfer before contract completion