ComplyAuto is a growing RegTech SaaS company seeking a DevOps Engineer to drive their DevOps and Site Reliability Engineering initiatives. The role involves designing and improving automated deployment processes, maintaining infrastructure as code, and ensuring the reliability and performance of systems.
Responsibilities:
- Design, implement, and maintain automated deployment and configuration management systems
- Develop infrastructure as code (IaC) scripts for provisioning and managing infrastructure resources
- Continuously improve and optimize deployment processes for efficiency and reliability
- Implement and maintain monitoring, alerting, and logging systems to ensure the reliability and availability of services
- Collaborate with development teams to establish Service Level Objectives (SLOs) and Service Level Indicators (SLIs)
- Conduct blameless post-incident reviews and contribute to the creation of error budgets
- Work closely with software development teams to influence system design for scalability, reliability, and performance
- Provide expertise in designing and implementing high-availability systems
- Implement and manage CI/CD pipelines to automate the software delivery process
- Ensure seamless integration of automated testing into the deployment pipeline
- Collaborate with security teams to implement and monitor security best practices
- Ensure compliance with industry standards and regulations related to infrastructure and deployment
- Conduct capacity planning to ensure systems can handle current and future loads
- Identify and address performance bottlenecks in collaboration with development teams
- Create and maintain comprehensive documentation for infrastructure and deployment processes
- Contribute to the development of runbooks and knowledge base articles
- Stay current with emerging technologies and industry trends to drive innovation and continuous improvement
Requirements:
- Bachelor's degree in Computer Science, Software Engineering, or a related field
- 8+ years of experience in DevOps, with a focus in Site Reliability Engineering or any combination of education, experience, and training which provides the following knowledge, skills, and abilities
- Strong understanding of how to design, implement, and maintain highly available, fault-tolerant, and scalable infrastructure using Infrastructure-as-Code (IaC) tools
- Strong understanding of how to architect, maintain, and enhance CI/CD pipelines to automate testing, integration, and deployment processes, ensuring efficient and reliable software releases
- Strong understanding of how to Manage cloud and on-premises infrastructure, optimizing performance, scalability, and cost while ensuring compliance with security standards
- Ability to create and maintain clear documentation of infrastructure architecture, processes, and procedures
- Provide regular reports on system health, performance, and cost-efficiency metrics
- Excellent communication skills, with the ability to effectively communicate complex technical concepts to both technical and non-technical stakeholders
- Strong problem-solving and analytical skills
- Ability to meet regular attendance expectations and meet tight deliverables deadlines
- Proven ability to consistently perform duties with integrity, effectiveness, efficiency, and at the highest level of professionalism
- Strong interpersonal skills with ability to establish and maintain effective working relationships and successfully interact with people at all management and support levels, within and outside the organization
- Applicants must be authorized to work in the United States and provide proof of work authorization within three days of hire
- Background check required
- A passion for learning new technologies and staying up-to-date with industry trends