AlphaX is building distributed observability and multi-cloud intelligence software for modern AI and data-intensive systems. They are seeking a Senior DevOps / Infrastructure Engineer to design, scale, and operate the systems that power their observability platform, focusing on multi-region cloud deployments and system reliability.

Responsibilities:

Architect and operate multi-region deployments across AWS, GCP, or Azure
Build and maintain high-throughput telemetry ingestion pipelines
Design autoscaling and failover strategies for mission-critical services
Own observability systems including Prometheus, Grafana, and distributed tracing
Improve MTTR and operational readiness processes
Manage CI/CD pipelines, GitOps workflows, and automated deployments
Collaborate with backend teams on API performance and infrastructure reliability
Harden infrastructure for security, compliance, and tenant isolation
Drive long-term infrastructure roadmap and architectural direction

Requirements:

Deep experience with Kubernetes, Docker, and container orchestration
Strong background in distributed systems and multi-region architectures
Experience with high-ingest, streaming, or event-driven systems
Hands-on experience with Prometheus, Grafana, and tracing/alerting frameworks
Proficiency with Terraform or similar infrastructure-as-code tools
Experience building and maintaining CI/CD pipelines
Strong understanding of AWS, GCP, or Azure
Python or Go scripting for automation and tooling
Experience operating high-availability, production-critical systems
Cloudflare (DNS, CDN, WAF, SSL)
Helm, Kustomize, or similar Kubernetes tooling
Experience with time-series databases, vector databases, or high-throughput storage systems
Background in SRE, platform engineering, or observability tooling
Experience supporting AI/ML workloads or GPU-based systems
Familiarity with OpenTelemetry, Jaeger, or similar distributed tracing frameworks

Senior DevOps / Infrastructure Engineer

Key skills

About this role

Responsibilities:

Requirements: