Blu Omega is a Woman-Owned Small Business delivering technology and cybersecurity solutions to federal agencies. They are seeking a Backend Infrastructure & Agentic AI Platforms Engineer to support a federal program focused on next-generation autonomous AI capabilities, responsible for building and maintaining backend infrastructure and agentic AI systems.
Responsibilities:
- Own the end-to-end backend infrastructure for GRACE on Microsoft Azure, including Azure Functions, API Management, Container Apps, and Azure OpenAI Service
- Manage data storage, retrieval pipelines, vector databases, and document indexing for internal knowledge search
- Implement and maintain infrastructure as code for all environments
- Develop and manage CI/CD pipelines, deployment automation, and release processes
- Ensure monitoring, alerting, logging, distributed tracing, and incident response are in place for all systems
- Manage secrets, API keys, and credential rotation
- Track and optimize cost and token economics across LLM providers
- Develop and maintain backend implementation of MCP, including server hosting, tool registration, and versioning
- Design and evolve communication patterns for agent interoperability and external integrations
- Build and operate RAG pipelines for document ingestion, embedding, and semantic search
- Implement fallback, retry, and degradation patterns for AI service dependencies
- Manage tool-calling infrastructure, including registration, execution, and observability
- Build and maintain observability for agent workflows, including latency, throughput, and error rates
- Implement evaluation pipelines for safety, regression, and grounding assessment
- Define and enforce system-level SLOs and alerting procedures
- Establish and improve coding standards, design reviews, and testing practices
- Mentor team members and communicate technical decisions clearly
- Ensure privacy, security, and compliance in all systems and data handling