BayOne Solutions is seeking a DevOps Infrastructure/Cloud Engineer to manage and improve cloud infrastructure across AWS and GCP. The role involves converting provisioning scripts into Terraform, streamlining deployment workflows, and ensuring infrastructure consistency across environments.
Responsibilities:
- Manage and improve cloud infrastructure across AWS and GCP
- Convert existing manual provisioning scripts into Terraform
- Migrate AWS Lambda/serverless infrastructure from Serverless Framework to Terraform
- Build reusable Terraform modules for tenant provisioning, networking, IAM, deployment resources, and environment setup
- Ensure infrastructure is repeatable, version-controlled, documented, and auditable
- Improve infrastructure consistency across development, staging, and production environments
- Streamline deployment workflows across services and environments
- Move AWS Amplify deployments from direct Amplify-based deployment to GitHub-driven workflows
- Build and maintain GitHub Actions pipelines for application deployment, infrastructure deployment, and tenant provisioning
- Improve deployment speed, reliability, rollback safety, and visibility
- Optimize deployment workflows for multi-tenant environments
- Improve long-running Vertex AI Pipeline deployments across multiple tenants
- Establish clear promotion workflows between environments
- Review existing tenant provisioning scripts and workflows
- Convert tenant provisioning into Terraform-backed infrastructure workflows
- Automate tenant provisioning using GitHub Actions
- Improve repeatability, traceability, and rollback capability for tenant setup
- Reduce manual operational work and deployment risk
- Review, clean up, and improve existing VPC structures
- Define clear networking patterns across AWS and GCP
- Improve segmentation between environments and tenants where appropriate
- Review DNS, routing, security groups, firewall rules, and cloud networking configuration
- Document cloud network architecture and operational runbooks
- Implement least-privilege access across AWS, GCP, GitHub, and deployment systems
- Automate permission management for engineering and production environments
- Restrict production access based on role and operational need
- Implement or improve just-in-time access for on-call engineers
- Improve auditability of privileged access and production changes
- Review secrets management and recommend improvements where needed
- Implement GitHub repository rules and engineering workflow standards, including:
- Branch naming conventions
- Pull request requirements
- Required Jira ticket references
- Protected branches
- Required reviews
- Required CI checks
- Environment-based approvals
- Improve consistency of engineering workflows across repositories
- Ensure GitHub workflows support both developer velocity and compliance needs
- Review and improve monitoring, logging, metrics, and alerting
- Help identify deployment bottlenecks, infrastructure risks, and recurring operational issues
- Improve incident response readiness through runbooks and documentation
- Support production incident troubleshooting when needed
- Recommend improvements to reduce operational toil and improve system reliability
- Help maintain and improve technical controls required for SOC 2
- Support controls related to:
- Access management
- Change management
- Deployment approvals
- Infrastructure security
- Production access
- Audit logging
- Evidence collection
- Ensure infrastructure and deployment processes are auditable and documented
- Help create or improve runbooks, diagrams, and process documentation needed for compliance
Requirements:
- 7+ years of experience in DevOps, cloud infrastructure, platform engineering, or systems engineering
- Strong hands-on experience with Terraform in production environments
- Experience managing infrastructure across AWS and GCP
- Strong experience building and maintaining GitHub Actions workflows
- Experience with AWS services such as: Lambda, IAM, VPC, Amplify, CloudWatch, Secrets Manager or Parameter Store
- Experience with GCP services such as: Vertex AI, IAM, VPC networking, Cloud Logging / Monitoring
- Experience migrating manual or framework-based infrastructure to Infrastructure as Code
- Strong understanding of Linux, networking, DNS, IAM, and cloud security fundamentals
- Experience implementing least-privilege access and production access controls
- Experience with monitoring, logging, and observability tools
- Strong scripting ability in Bash, Python, Go, or similar languages
- Ability to work independently, clarify ambiguity, and drive implementation without heavy handholding
- Strong documentation and communication skills
- Experience with multi-tenant SaaS infrastructure
- Experience with Vertex AI Pipelines or ML/AI deployment workflows
- Experience optimizing long-running cloud deployment pipelines
- Experience with GitOps or declarative infrastructure patterns
- Experience with SOC 2, ISO 27001, or similar compliance frameworks
- Experience with just-in-time access tooling such as Okta, Teleport, AWS IAM Identity Center, Google IAM Conditions, or similar
- Experience with policy-as-code tools such as OPA, Checkov, Conftest, Sentinel, or Terraform Cloud policies
- Experience with Kubernetes, Docker, or containerized workloads
- Experience with cloud cost optimization and resource management
- Experience creating disaster recovery, backup, and business continuity processes