Ad Hoc LLC is a technology company that empowers organizations to deliver scalable, impactful digital services. The Engineer Lead will oversee and mentor teams responsible for building and operating a next-generation digital platform focused on public health communication, AI capabilities, and advanced data integration.
Responsibilities:
- Cloud-native infrastructure design and automation
- Kubernetes-based compute platforms
- Enterprise CI/CD modernization
- Observability, reliability, and automated recovery frameworks
- Secure environments optimized for AI, high-velocity data, and modern application architectures
- Define and enforce standards for Infrastructure as Code, enabling the organization to operate with full automation, consistency, and auditability across environments
- Ensure infrastructure is versioned, tested, policy-validated, and easily scalable
- Rapidly integrate new data sources and AI capabilities
- Scale to support national-level communication and analytical workloads
- Operate with resilience, security, and reliability
- Innovate quickly while maintaining trust and performance
- Ability to design and review complex automation services, internal APIs, and tooling that streamline deployments, platform operations, and AI-driven workflows
- Guide teams in standardized patterns for automation, data integration, and platform observability
- Deep experience architecting IaC patterns for large-scale cloud environments
- Ability to set organization-wide standards for provisioning, configuration, compliance enforcement, and environment lifecycle management
- Lead the shift toward fully automated, reproducible, policy-driven infrastructure
- Design and govern real-time event pipelines supporting: AI/ML triggers, High-frequency social/media data ingestion, Real-time system monitoring and health automation, Platform-wide event-driven workflows
- Mentor teams on building durable, scalable streaming systems
- Expert-level experience running production workloads in Kubernetes (EKS/AKS/GKE)
- Strong foundation in cloud networking, multi-region high availability, service mesh, autoscaling, and secure distributed architecture
- Ability to architect and standardize enterprise CI/CD frameworks enabling rapid, safe delivery
- Skilled in automated testing, security gates, policy-as-code, and zero-downtime deployment strategies
- Define organizational SLIs/SLOs and implement resilient monitoring, alerting, and automated remediation
- Lead incident response and ensure long-term reliability improvements
- Experience managing DevOps, SRE, or Platform Engineering teams, including hiring, mentoring, and performance management
- Strength in stakeholder communication and cross-team technical alignment
- Ability to establish a strategic platform roadmap that supports emerging AI and data capabilities
Requirements:
- Bachelor's degree and 9+ years of experience
- Relevant years of experience may be substituted for education
- Ability to design and review complex automation services, internal APIs, and tooling that streamline deployments, platform operations, and AI-driven workflows
- Guides teams in standardized patterns for automation, data integration, and platform observability
- Infrastructure as Code (Terraform, Pulumi, or CloudFormation)
- Deep experience architecting IaC patterns for large-scale cloud environments
- Ability to set organization-wide standards for provisioning, configuration, compliance enforcement, and environment lifecycle management
- Leads the shift toward fully automated, reproducible, policy-driven infrastructure
- Event-Driven Architecture Expertise (Kafka, AWS EventBridge, Azure Event Hubs)
- Designs and governs real-time event pipelines supporting: AI/ML triggers, High-frequency social/media data ingestion, Real-time system monitoring and health automation, Platform-wide event-driven workflows
- Mentors teams on building durable, scalable streaming systems
- Cloud & Container Platform Mastery (AWS, Azure, or GCP + Docker + Kubernetes)
- Expert-level experience running production workloads in Kubernetes (EKS/AKS/GKE)
- Strong foundation in cloud networking, multi-region high availability, service mesh, autoscaling, and secure distributed architecture
- CI/CD Architecture Leadership (GitHub Actions, GitLab CI, Jenkins, or similar)
- Ability to architect and standardize enterprise CI/CD frameworks enabling rapid, safe delivery
- Skilled in automated testing, security gates, policy-as-code, and zero-downtime deployment strategies
- Observability & Reliability Engineering (Prometheus, Grafana, Datadog, CloudWatch, ELK)
- Defines organizational SLIs/SLOs and implements resilient monitoring, alerting, and automated remediation
- Leads incident response and ensures long-term reliability improvements
- Experience managing DevOps, SRE, or Platform Engineering teams, including hiring, mentoring, and performance management
- Strength in stakeholder communication and cross-team technical alignment
- Ability to establish a strategic platform roadmap that supports emerging AI and data capabilities
- Security & Compliance Automation (FedRAMP, NIST, Zero Trust) - Experience embedding automated controls, scanning, and compliance policies into CI/CD and runtime environments
- Workflow Orchestration (Airflow, Prefect) - Useful for managing complex operational tasks, data processing pipelines, and platform automation efforts
- Healthcare Interoperability Experience (FHIR, HL7) - Helpful for data exchange and system-integration needs across HHS
- Program-Level Platform Strategy - Experience defining cross-cutting platform modernization plans spanning infrastructure, automation, data, and AI readiness