New Health Partners is seeking a Mid/Senior DevOps & Backend Engineer to bridge the gap between platform infrastructure and application development. In this role, you will design and operate cloud-native infrastructure for their InsurTech product suite while also contributing to backend services using TypeScript and Nest.js.

Responsibilities:

Design, build, and maintain production-grade CI/CD pipelines (GitHub Actions, GitLab CI) with automated testing, security scanning, and progressive deployment strategies (blue-green, canary, feature flags)
Manage and optimize AWS infrastructure including EKS, EC2, RDS, ECR, S3, Lambda, CloudFront, Route 53, and IAM—with a focus on cost optimization, high availability, and disaster recovery
Build and maintain Kubernetes clusters (EKS) with Helm charts, custom operators, autoscaling policies, and multi-environment management (dev, staging, production)
Automate infrastructure provisioning and configuration using Terraform (primary), Ansible, and CloudFormation with GitOps workflows and drift detection
Implement comprehensive observability using Prometheus, Grafana, Datadog, ELK/OpenSearch, and distributed tracing (Jaeger/OpenTelemetry) for full-stack visibility
Design and maintain networking architecture including VPCs, security groups, load balancers, service meshes (Istio/Linkerd), and DNS management
Provision and manage GPU-accelerated compute environments (AWS P4/P5 instances, Inferentia, SageMaker) for LLM training, fine-tuning, and inference workloads
Build containerized model-serving infrastructure supporting vLLM, TGI (Text Generation Inference), NVIDIA Triton, and custom inference endpoints with autoscaling based on request load and latency targets
Design and operate data pipelines and storage architectures (S3, EFS, FSx for Lustre) optimized for large-scale model training datasets and artifact management
Implement CI/CD automation specifically for ML/AI workflows—model versioning, automated evaluation gates, staged rollouts of model updates, and A/B inference routing
Collaborate with the AI team to optimize GPU utilization, manage spot instance strategies, and implement cost-aware scheduling for training jobs
Set up monitoring dashboards for model inference latency, throughput, token usage, GPU utilization, and cost tracking
Contribute to and extend backend services built with Nest.js and TypeScript, focusing on scalability, reliability, and clean architecture
Developing internal TypeScript framework
Build and maintain scalable microservices and RESTful/GraphQL APIs that integrate with AI inference endpoints and the LLM Composer platform
Design event-driven architectures using Kafka, SQS/SNS, and WebSockets for real-time data processing and AI-powered features
Ensure all deployments are production-ready, horizontally scalable, and follow 12-factor app principles with proper health checks, graceful shutdowns, and circuit breakers
Collaborate with backend and AI teams on system architecture, API contracts, database schema design, and reliability improvements
Implement database management best practices including migration strategies, read replicas, connection pooling, and query optimization for PostgreSQL and Redis

Requirements:

4–7+ years of professional experience in DevOps, Cloud Engineering, or Platform Engineering, with meaningful backend development experience
Hands-on Kubernetes experience (EKS strongly preferred), including cluster administration, Helm chart development, autoscaling, and troubleshooting
Strong proficiency with TypeScript and Nest.js (or comparable Node.js backend frameworks like Express, Fastify)
Deep AWS expertise across compute, storage, networking, IAM, and managed services—with experience optimizing for cost and performance
Strong Infrastructure-as-Code skills with Terraform; experience with modular, reusable configurations and state management
Solid understanding of microservices architecture, distributed systems patterns, and container orchestration
Experience with Docker, container registries, and container security best practices
Proficiency with CI/CD pipeline design including automated testing, security scanning, and deployment strategies
Familiarity with GitOps workflows and version-controlled infrastructure management
Strong Linux systems administration and shell scripting skills
Experience provisioning and managing GPU workloads for ML/AI model training and inference in cloud environments
Familiarity with ML model serving frameworks (vLLM, TGI, Triton, BentoML, SageMaker Endpoints)
Experience with Kafka, event-driven architectures, and real-time streaming systems
Familiarity with service mesh technologies (Istio, Linkerd) and API gateway management
Experience with HIPAA, SOC 2, or other healthcare/financial compliance frameworks in cloud environments
Knowledge of database technologies beyond PostgreSQL—vector databases (Pinecone, PGVector), graph databases, or time-series databases
Experience with chaos engineering, load testing, and reliability engineering practices (SRE)
AWS certifications (Solutions Architect, DevOps Engineer, or equivalent)

Mid/Senior DevOps & Backend Engineer

Key skills

About this role

Responsibilities:

Requirements: