Outlook Amusements operates a leading B2C consumer marketplace through its flagship brand, California Psychics. They are seeking a Senior DevOps Engineer responsible for the design, implementation, administration, and continuous improvement of secure, scalable, and highly available AWS-based infrastructure and supporting operational services.
Responsibilities:
- Design, implement, and continuously improve secure, scalable, and highly available AWS-based cloud infrastructure and supporting services
- Build, manage, and optimize Infrastructure as Code (IaC) solutions for AWS environments, including networking, compute, storage, and related services
- Develop, maintain, and enhance Azure DevOps pipelines to improve deployment speed, consistency, reliability, and quality across development, test, and production environments
- Automate infrastructure provisioning, configuration management, system maintenance, and operational workflows to reduce manual effort and improve operational efficiency
- Monitor infrastructure, application health, logging, alerting, and service availability using Datadog; proactively identify trends, risks, and issues to minimize outages and improve service reliability
- Partner with engineering, security, and operations teams to support application deployments, infrastructure changes, production readiness, and incident response
- Implement and maintain disaster recovery, backup, resiliency, and business continuity capabilities to support operational readiness and recovery objectives
- Strengthen cloud security by supporting identity and access management, patching, vulnerability remediation, system hardening, and adherence to established policies and standards
- Evaluate existing infrastructure and operational practices and recommend improvements in scalability, performance, reliability, security, and cost optimization
- Monitor and optimize AWS resource utilization and cloud spend while maintaining performance, availability, and scalability requirements
- Perform capacity planning and forecast infrastructure resource requirements to support current and future business needs
- Provide technical leadership, mentorship, and guidance through collaboration, documentation, knowledge sharing, and operational best practices
- Participate in troubleshooting, root cause analysis, and post-incident remediation efforts to improve overall system stability and resilience
- Work with third-party vendors and service providers as needed to support infrastructure, tooling, and service delivery objectives
- Contribute to shared team and organizational objectives through strong cross-functional partnership, communication, and execution
Requirements:
- Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field, or equivalent combination of education and practical experience
- 7+ years of experience designing, implementing, and supporting production infrastructure in AWS cloud environments
- Hands-on experience with core AWS services such as EC2, VPC, S3, CloudFront, Route 53, IAM, and Lambda
- Strong experience building, maintaining, and improving deployment automation and CI/CD pipelines, preferably using Azure DevOps
- Strong hands-on experience with Datadog for infrastructure and application monitoring, alerting, dashboards, logging, and proactive operational visibility
- Proven ability to use monitoring and observability data to identify trends, troubleshoot issues, improve service reliability, and reduce incident response time
- Experience supporting high-availability, high-traffic, and customer-facing web or ecommerce platforms
- Strong experience administering Windows Server and Amazon Linux environments in production. Experience with Microsoft Active Directory and enterprise identity/access management
- Experience supporting database infrastructure such as Microsoft SQL Server and MySQL, including availability, performance, and operational considerations
- Strong understanding of core infrastructure disciplines, including networking, compute, storage, systems administration, and cloud architecture
- Experience supporting web and application hosting platforms, including IIS and Apache
- Strong understanding of security best practices, including identity and access management, patching, system hardening, backup, disaster recovery, and business continuity
- Experience troubleshooting complex production issues and participating in incident response, root cause analysis, and service restoration
- Ability to evaluate and optimize infrastructure for scalability, resiliency, operational efficiency, and cost management
- Experience working cross-functionally with engineering, security, and operations teams in a fast-paced production environment
- Experience with additional deployment or source control platforms such as Bitbucket or AWS CodeDeploy
- Experience with scripting and automation using PowerShell, Python, or Bash
- Experience with cloud cost optimization, usage analysis, and FinOps-related practices
- Experience supporting regulated, business-critical, or revenue-generating production environments
- Familiarity with modern release management, change control, and operational best practices