Role Overview

Lead the design and implementation of production-grade ML and Generative AI solutions on AWS (with awareness of multi-cloud environments).
Act as a hands-on expert and trusted advisor for customers running AI/ML workloads at scale, from initial discovery through deployment and optimization.
Translate complex business problems into cloud architectures that are secure, reliable, cost-efficient, and observable.
Help evolve how DoiT uses AI/ML internally and with customers by turning one-off solutions into reusable patterns and “gravel roads” that influence the product roadmap.
For Field Engineering, focus more on pre-sales, POVs, CloudBuild engagements, and partner-led growth motions.
For Delivery, focus more on install base health, product adoption, proactive engagements, and account-team work.
Own the technical success of engagements and ensure designs are production-ready (security, reliability, performance, cost).
Partner with Account Executives, Solution Engineers, and Growth FDEs to shape and win opportunities across all four GTM pillars in-region (product adoption, new logo acquisition, install base expansion, partner-led growth).
Serve as technical lead for extended POVs and CloudBuild engagements focused on AI/ML and GenAI, demonstrating clear value and de-risking customer adoption.
Document and measure the business and technical impact of your work, tying AI/ML initiatives to clear customer outcomes (cost, performance, reliability, productivity).

Requirements

4+ years of experience architecting, deploying, and managing cloud-based AI/ML solutions, including production workloads.
Proven track record designing and operating large, distributed systems on AWS, selecting appropriate services and patterns to meet business and technical goals.
Advanced proficiency with AWS services relevant to AI/ML and GenAI.
Hands-on experience with Amazon Bedrock for deploying and scaling foundation models and Generative AI workloads.
Experience fine-tuning and deploying Large Language Models (LLMs) and multimodal AI using Amazon SageMaker (including JumpStart).
Strong prompt engineering skills and familiarity with rigorous model evaluation (quality, safety, performance).
Understanding of agentic capabilities and patterns for AI agents that autonomously perform tasks and integrate with existing systems.
Experience with Amazon Q Business and Amazon Q Developer (or similar tools) to accelerate insight generation and development workflows.
In-depth knowledge of Amazon SageMaker components such as Pipelines, Model Monitor, Data Wrangler, and SageMaker Clarify for bias detection and interpretability.
Proficiency integrating TensorFlow, PyTorch, and other ML frameworks with SageMaker for model development, fine-tuning, and deployment.
Experience with distributed training (multi-GPU or multi-node) and performance optimization for inference.
Strong data-engineering skills on AWS: Amazon S3, AWS Glue, Lake Formation, Redshift for AI/ML data pipelines.
Experience building end-to-end AI/ML workflows using services like AWS Lambda, Step Functions, API Gateway, and containerized deployments on Amazon EKS / AWS Fargate.
Hands-on experience with CI/CD for AI/ML using AWS CodePipeline, CodeBuild, SageMaker Pipelines, or similar.
Proficiency in monitoring and operating AI systems using Amazon CloudWatch and SageMaker Model Monitor.
Strong understanding of AI governance, security, and compliance on AWS, including IAM, KMS, and data privacy patterns.
Familiarity with AI ethics and bias detection/mitigation (e.g., using SageMaker Clarify or similar tools).
Working knowledge of Google Cloud AI tools (e.g., Vertex AI, Cloud AutoML, BigQuery ML) sufficient to reason about multi-cloud architectures and integration points.
Proven ability to mentor peers, run enablement sessions, and collaborate across Sales, CS, and Product.
Excellent communication skills across technical and business audiences; able to simplify complex ideas and influence decisions.
Natural ownership mentality: you escalate early, resolve fast, and own the outcome.
Demonstrated ability to work effectively in a remote-first, global environment.

Tech Stack

Amazon Redshift
AWS
BigQuery
Cloud
Distributed Systems
Node.js
PyTorch
Tensorflow

Benefits

Unlimited Vacation
Flexible Working Options
Health Insurance
Parental Leave
Employee Stock Option Plan
Home Office Allowance
Professional Development Stipend
Peer Recognition Program

Senior Cloud Architect, ML/AI

Key skills

About this role

Role Overview

Requirements

Tech Stack

Benefits