Role Overview

Lead the design and deployment of advanced AI-driven systems and models, including GPT-based solutions (GPT-4.x, OpenAI APIs), reinforcement learning frameworks, and autonomous agentic workflows
Develop intelligent agents capable of handling complex tasks, decision-making, and automating workflows within F5 products and platforms
Enable generative AI capabilities, such as personalization, contextual understanding, natural language processing (NLP), and decision systems
Research and integrate state-of-the-art AI techniques, including transformer models, GANs (Generative Adversarial Networks), and hybrid AI architectures
Develop and optimize large-scale distributed AI infrastructure, ensuring fault tolerance, resilience, scalability, and performance in global workloads
Implement advanced observability systems for AI applications, leveraging telemetry pipelines (e.g., OpenTelemetry, Prometheus) and ensuring data quality validation
Create frameworks for real-time anomaly detection, predictive analytics, and failure recovery simulation in AI systems
Build frameworks for automated data ingestion, transformation, and validation at scale across distributed systems (e.g., Kafka, Flink, Spark)
Ensure robust CI/CD pipelines tailored for AI workflows, including validation, automation, and monitoring for model updates and deployments
Design synthetic data generation tools for model benchmarking, stress testing, and performance analysis of high-volume data sets and queries
Implement chaos engineering and resilience testing for AI-driven cloud environments (Kubernetes, Docker, Helm)
Create best-in-class AI architecture roadmaps, ensuring alignment with organizational goals and the latest advancements in AI technology
Partner with Product, Engineering, SRE, and DevOps teams to embed AI capabilities into the SDLC, promoting quality and efficiency at every stage
Mentor engineering teams on AI development, distributed system reliability, and automation strategies, fostering innovation and collaboration across teams
Investigate production issues, contributing to root cause analysis, remediation, and future-proofing AI systems.

Requirements

10+ years of hands-on experience in AI research, development, and deployment
Proficiency in natural language processing (NLP), computer vision, predictive analytics, decision systems, and multi-agent frameworks
Familiarity with cutting-edge AI techniques such as GANs (Generative Adversarial Networks), hybrid transformers, sequence modeling (RNNs, LSTMs), and unsupervised learning approaches
Expertise in building multi-modal AI systems capable of handling text, images, audio, and structured data
Proven experience in advanced AI tooling and platforms (e.g., Hugging Face Transformers, LangChain, DeepMind frameworks, AI interpretability tools, RASA conversational systems)
Deep working knowledge of distributed systems architecture, including large-scale data pipelines (e.g., Kafka, Flink, Spark), data lakes (ClickHouse, Iceberg, S3), and storage optimization techniques
Experience in MLOps practices, including model lifecycle management, scaling, retraining, and deployment in production environments
Proven expertise in Kubernetes, OpenShift, Terraform, and Helm for automating AI system deployment and scaling across multi-cloud infrastructures
Strong knowledge of fault tolerance, latency optimization, and large-scale AI infrastructure monitoring using Prometheus, Grafana, and Datadog
Advanced experience with anomaly detection, predictive modeling, and telemetry validation applied to distributed AI systems
Proficiency in Python, Go, JavaScript, or similar programming languages for creating robust AI workflows and solutions
Experience with benchmarking tools (Locust, Gatling, JMeter) and frameworks for AI performance testing at scale
Ability to design customized AI APIs and algorithms to optimize automation workflows
Proven ability to mentor and lead teams of engineers, QA specialists, and developers in adopting advanced AI practices
Strategic mindset to align technical solutions with business goals, incorporating cutting-edge AI advancements to solve complex challenges
Excellent communication skills for engaging stakeholders, presenting on AI strategies, and fostering cross-functional collaboration.

Tech Stack

Cloud
Distributed Systems
Docker
Grafana
JavaScript
JMeter
Kafka
Kubernetes
OpenShift
Prometheus
Python
SDLC
Spark
Terraform
Go

Benefits

Flexible work arrangements
Professional development opportunities

Principal Engineer – AI Specialist

Key skills

About this role

Role Overview

Requirements

Tech Stack

Benefits