INSPYR Solutions is a national expert in delivering flexible technology and talent solutions. They are seeking a Sr. Cloud Engineer to develop infrastructure and platform pipelines for agile delivery, while also leading design and deployment of AWS GenAI solutions.
Responsibilities:
- 3 years AWS experience
- 3-5 years' experience developing infrastructure and platform pipelines for agile delivery to application teams
- AWS Certifications highly desired
- AI first mentality to improving processes
- Led design and production deployment of AWS GenAI solutions using Amazon Bedrock, enabling scalable, low-latency AI inference without managing model infrastructure or GPUs
- Established enterprise AI platform patterns, balancing performance, cost, and safety through managed inference, prompt governance, and observability
- Integrated Amazon Q/Kiro into cloud operations to accelerate troubleshooting, architecture analysis, and root-cause investigation using AWS context-aware AI
- Partnered with security and application teams to operationalize responsible AI usage aligned with enterprise standards
- Experience with AWS, VPC, AZs, Subnets, Route53, ALB/NLB, WAF, Stacksets, Security Groups, NACLs, EC2, Systems Manager, Azure Devops, Ansible, Jenkins
- Strong knowledge of Windows and Linux operating systems. RHEL 8.x-9.x, SUSE and Windows Server 2019 and greater
- Fundamental networking/distributed computing environment concepts (Routing protocols, DHCP, DNS, TCP/IP)
- Excellent understanding of Containers, including AWS technologies ECR, EMR, EKS
- Understanding of Monitoring tools like Cribl, CloudWatch and Zabbix
- Troubleshooting software and hardware issues including root cause analysis
- Strong developer mindset with experience developing in CloudFormation, Terraform, Python, YAML, JSON
- Strong platform engineering background automating provisioning of resources in AWS
- Experiences building infrastructure as code CICD pipelines using CloudFormation, Gold Images, configuration management automation
- Engineering automated, repeatable platform solutions on AWS that allow easy to use self-service capabilities for application teams
- Strong understanding of security constructs in AWS including LoadBalancers, WAF, SGs, NACLs, Patching, CloudTrail, Guard Duty, etc
- Strong background building automated self-service capabilities in AWS
- Strong communication skills
- Can-do attitude a must
- Demonstrate an understanding of Team Goals, Strategies and Priorities
- Demonstrate ability and willingness to share ideas with Associates, peers and management
- Embrace a coaching culture, provide feedback to Associates, peers and management
- Serve as a mentor to provide coaching and technical guidance to Associates
- Demonstrate a positive attitude and set an example for colleagues
- Attend regular manager and team lead status meetings and be engaged in meeting discussions and strategic planning
- Demonstrate critical thinking skills to help lead efforts to diagnose and troubleshoot issues
- Coordinate with the support service teams to identify common issues and develop appropriate documentation, training, and automation
- Establish working partnerships with IT teams and external partners to coordinate problem resolution for operational issues and analyze root cause issues to address underlying infrastructure problems
- Lead problem solving exercises by understanding the current state, conducting root problem analysis, solution identification, and then planning implementation and delivery activities
- Develop 'as is' and 'to be' diagrams to visually represent challenges, risks, and opportunities for improvements
- Create tickets for 100% of support requests
- Acknowledge assigned tickets/issues in alignment with established SLAs but no longer than within 1 business day to communicate to users that the ticket has been assigned and identify the steps and timeline that will be taken to pursue resolution
- Resolve most assigned tickets/issues within 3 business days
- In the event a ticket/issue cannot be resolved within 3 business days, provide customer updates at least weekly to manage customer expectations and communicate the steps and timeline that will be taken to pursue resolution
- Serve as an escalation point for Associates
- Use sound judgement to determine the need to escalate issues/tickets to management awareness
Requirements:
- Bachelor's degree in computer science, or combination of education and experience
- Excellent verbal and written communication skills
- Excellent critical thinking and problem solving skills
- Positive attitude and solutions oriented thinking
- Ability to communicate technical concepts to both technical and non-technical audiences
- Ability to work in a fast-paced environment and adapt to change
- 3 years AWS experience
- 3-5 years' experience developing infrastructure and platform pipelines for agile delivery to application teams
- AI first mentality to improving processes
- Led design and production deployment of AWS GenAI solutions using Amazon Bedrock, enabling scalable, low-latency AI inference without managing model infrastructure or GPUs
- Established enterprise AI platform patterns, balancing performance, cost, and safety through managed inference, prompt governance, and observability
- Integrated Amazon Q/Kiro into cloud operations to accelerate troubleshooting, architecture analysis, and root-cause investigation using AWS context-aware AI
- Partnered with security and application teams to operationalize responsible AI usage aligned with enterprise standards
- Experience with AWS, VPC, AZs, Subnets, Route53, ALB/NLB, WAF, Stacksets, Security Groups, NACLs, EC2, Systems Manager, Azure Devops, Ansible, Jenkins
- Strong knowledge of Windows and Linux operating systems. RHEL 8.x-9.x, SUSE and Windows Server 2019 and greater
- Fundamental networking/distributed computing environment concepts (Routing protocols, DHCP, DNS, TCP/IP)
- Excellent understanding of Containers, including AWS technologies ECR, EMR, EKS
- Understanding of Monitoring tools like Cribl, CloudWatch and Zabbix
- Troubleshooting software and hardware issues including root cause analysis
- Strong developer mindset with experience developing in CloudFormation, Terraform, Python, YAML, JSON
- Strong platform engineering background automating provisioning of resources in AWS
- Experiences building infrastructure as code CICD pipelines using CloudFormation, Gold Images, configuration management automation
- Engineering automated, repeatable platform solutions on AWS that allow easy to use self-service capabilities for application teams
- Strong understanding of security constructs in AWS including LoadBalancers, WAF, SGs, NACLs, Patching, CloudTrail, Guard Duty, etc
- Strong background building automated self-service capabilities in AWS
- Strong communication skills
- Can-do attitude a must
- Demonstrate an understanding of Team Goals, Strategies and Priorities
- Demonstrate ability and willingness to share ideas with Associates, peers and management
- Embrace a coaching culture, provide feedback to Associates, peers and management
- Serve as a mentor to provide coaching and technical guidance to Associates
- Demonstrate a positive attitude and set an example for colleagues
- Attend regular manager and team lead status meetings and be engaged in meeting discussions and strategic planning
- Demonstrate critical thinking skills to help lead efforts to diagnose and troubleshoot issues
- Coordinate with the support service teams to identify common issues and develop appropriate documentation, training, and automation
- Establish working partnerships with IT teams and external partners to coordinate problem resolution for operational issues and analyze root cause issues to address underlying infrastructure problems
- Lead problem solving exercises by understanding the current state, conducting root problem analysis, solution identification, and then planning implementation and delivery activities
- Develop 'as is' and 'to be' diagrams to visually represent challenges, risks, and opportunities for improvements
- Create tickets for 100% of support requests
- Acknowledge assigned tickets/issues in alignment with established SLAs but no longer than within 1 business day to communicate to users that the ticket has been assigned and identify the steps and timeline that will be taken to pursue resolution
- Resolve most assigned tickets/issues within 3 business days
- In the event a ticket/issue cannot be resolved within 3 business days, provide customer updates at least weekly to manage customer expectations and communicate the steps and timeline that will be taken to pursue resolution
- Serve as an escalation point for Associates
- Use sound judgement to determine the need to escalate issues/tickets to management awareness
- AWS certification highly preferred
- AWS Certifications highly desired