eHealth, Inc. is dedicated to guiding consumers through their health insurance options with transparency and trust. They are seeking a highly experienced Sr. Staff Platform Engineer to architect and scale cloud-native platforms that empower engineering teams, focusing on AWS Cloud and Internal Developer Platforms with AI and ML capabilities.
Responsibilities:
- Lead the design, development, and continuous improvement of an Internal Developer Platform that integrates with AI/ML workflows enabling engineers to autonomously manage and deploy cloud-based services, tools, and applications
- Ability to create ChatOps interfaces and maintain self-service tools and automation frameworks, enabling developers to provision infrastructure, manage services, and deploy code with minimal friction
- Experience building and maintaining a set of standardized services, templates, and APIs to streamline cloud-based application development, focusing on both platform stability and developer experience
- Experience using AI for automate log analysis and anomaly detection identify root causes
- Experience leading the designing and optimization of AWS cloud infrastructure using key AWS services like EC2, EKS, RDS, Lambda, S3, CloudFormation, CloudWatch, VPC, and IAM
- Experience translating natural language requests directly into IaC or environment configurations
- Experience optimize infrastructure for AI model deployment
- Experience with Design and implementation of LLMs, vector databases, and GPU infrastructure to boost developer productivity and enable AI-powered observability
- Ensure AWS environments are scalable, cost-efficient, and resilient, with high availability and disaster recovery capabilities
- Establish best practices for cost management, performance optimization, and resource utilization across the organization’s AWS accounts
- Design and implement CI/CD pipelines using tools like Jenkins, GitLab CI, ArgoCD, enabling seamless and automated code deployment across environments
- Automate CI/CD pipelines with Generative AI, and ensuring secure, scalable AI applications
- Leverage Infrastructure as Code (IaC) principles and tools like Terraform, CloudFormation, and CDK to automate infrastructure provisioning, configuration, and management
- Create automation for platform lifecycle management, including environment provisioning, monitoring setup, and scaling strategies
- Ensure security best practices are followed across the platform, with strong controls for IAM, network security, data encryption, and audit logging
- Implement and enforce security policies and governance controls to meet organizational and regulatory compliance standards (e.g., SOC 2, GDPR, HIPAA)
- Work with security teams to automate security checks within the CI/CD pipeline and infrastructure provisioning processes
- Implement robust monitoring and observability solutions using AWS-native services (e.g., CloudWatch, X-Ray, AWS CloudTrail) and third-party tools like Prometheus and Grafana to ensure platform health and service performance
- Build dashboards and alerting systems that give development teams real-time visibility into application and infrastructure health
- Lead incident management processes, including root-cause analysis, postmortems, and performance tuning
- Provide mentorship and guidance to engineering teams on platform engineering best practices, cloud architecture, and efficient service deployment techniques
- Foster a culture of innovation, agility, and continuous improvement across engineering teams, encouraging the adoption of best practices for cloud-native development
- Drive technical decision-making and provide leadership on complex technical challenges related to platform architecture and infrastructure scalability
- Partner with cross-functional teams (DevOps, Security, Product, Engineering) to define and execute cloud infrastructure and platform initiatives that align with business objectives
- Communicate technical platform strategies and roadmaps to senior leadership and non-technical stakeholders, ensuring alignment with business goals
- Act as a key technical advisor, ensuring platform solutions are optimized for both short-term needs and long-term scalability
Requirements:
- Bachelor's or Master's Degree in Computer Science, Engineering or related fields
- 12+ years of experience in platform engineering, cloud architecture, or related fields, with at least 8 years focused on AWS Cloud
- Extensive experience in building and managing Internal Developer Platforms (IDPs), with a focus on automation, self-service workflows, and CI/CD
- Proven track record of successfully designing, implementing, and optimizing large-scale cloud platforms using AWS services
- Hands-on experience in AI/ML infrastructure, tools, and frameworks, and evaluate their adoption
- Extensive experience in designing and implementing complex AWS network architectures and security solutions
- The ability to lead the design, development, and continuous improvement of an Internal Developer Platform that integrate with AI/ML workflows which enables engineers to autonomously manage and deploy cloud-based services, tools, and applications
- Ability to create ChatOps interfaces and maintain self-service tools and automation frameworks, enabling developers to provision infrastructure, manage services, and deploy code with minimal friction
- Experience building and maintaining a set of standardized services, templates, and APIs to streamline cloud-based application development, focusing on both platform stability and developer experience
- Experience using AI for automate log analysis and anomaly detection identify root causes
- Experience leading the designing and optimization of AWS cloud infrastructure using key AWS services like EC2, EKS, RDS, Lambda, S3, CloudFormation, CloudWatch, VPC, and IAM
- Experience translating natural language requests directly into IaC or environment configurations
- Experience optimize infrastructure for AI model deployment
- Experience with Design and implementation of LLMs, vector databases, and GPU infrastructure to boost developer productivity and enable AI-powered observability
- Ensure AWS environments are scalable, cost-efficient, and resilient, with high availability and disaster recovery capabilities
- Establish best practices for cost management, performance optimization, and resource utilization across the organization's AWS accounts
- Design and implement CI/CD pipelines using tools like Jenkins, GitLab CI, ArgoCD, enabling seamless and automated code deployment across environments
- Automate CI/CD pipelines with Generative AI, and ensuring secure, scalable AI applications
- Leverage Infrastructure as Code (IaC) principles and tools like Terraform, CloudFormation, and CDK to automate infrastructure provisioning, configuration, and management
- Create automation for platform lifecycle management, including environment provisioning, monitoring setup, and scaling strategies
- Ensure security best practices are followed across the platform, with strong controls for IAM, network security, data encryption, and audit logging
- Implement and enforce security policies and governance controls to meet organizational and regulatory compliance standards (e.g., SOC 2, GDPR, HIPAA)
- Work with security teams to automate security checks within the CI/CD pipeline and infrastructure provisioning processes
- Implement robust monitoring and observability solutions using AWS-native services (e.g., CloudWatch, X-Ray, AWS CloudTrail) and third-party tools like Prometheus and Grafana to ensure platform health and service performance
- Build dashboards and alerting systems that give development teams real-time visibility into application and infrastructure health
- Lead incident management processes, including root-cause analysis, postmortems, and performance tuning
- Provide mentorship and guidance to engineering teams on platform engineering best practices, cloud architecture, and efficient service deployment techniques
- Foster a culture of innovation, agility, and continuous improvement across engineering teams, encouraging the adoption of best practices for cloud-native development
- Drive technical decision-making and provide leadership on complex technical challenges related to platform architecture and infrastructure scalability
- Partner with cross-functional teams (DevOps, Security, Product, Engineering) to define and execute cloud infrastructure and platform initiatives that align with business objectives
- Communicate technical platform strategies and roadmaps to senior leadership and non-technical stakeholders, ensuring alignment with business goals
- Act as a key technical advisor, ensuring platform solutions are optimized for both short-term needs and long-term scalability
- AWS Certified Solutions Architect – Professional (or equivalent certifications)
- Kubernetes Certified Administrator (CKA) or similar certifications
- Familiarity with service mesh technologies (e.g., Istio, Linkerd) or API gateways (e.g., Kong, AWS API Gateway)
- Experience with serverless architectures, particularly AWS Lambda, API Gateway, and Step Functions
- Experience in managing multi-region or multi-cloud AWS environments at scale
- Expertise in cloud migrations and legacy system modernization to cloud-native architectures