Cognizant is seeking a highly skilled and experienced Principal Software Engineer focused on Agentic AI and DevOps. The ideal candidate will architect and deliver agentic microservices and platform capabilities, lead cloud-native DevOps at scale, and partner with organizational leaders to communicate strategy, status, and results.
Responsibilities:
- Architect, build, and operate agentic AI services and microservices leveraging LangChain, LangSmith, OpenAI/Azure OpenAI, and LiteLLM; implement tool-use orchestration, evaluation, and guardrails
- Design, build, and maintain CI/CD pipelines using Azure DevOps (ADO) YAML and GitHub Actions; enforce trunk-based workflows, quality gates, progressive delivery, and automated rollbacks
- Stand up and manage Azure infrastructure (AKS, Service Bus, Event Hubs, Storage Accounts, Key Vault, Bastion); codify environments with Terraform; implement secure networking, secrets, and RBAC
- Containerize and ship services with Docker/Buildah; operate Kubernetes with CNI networking and Linkerd service mesh; implement canary/blue-green strategies and autoscaling
- Create and operate Apache NiFi dataflows; deploy and manage NiFi clusters on AKS with VM Scale Sets, enabling resilient, scalable ingestion and orchestration
- Implement enterprise-grade observability and logging: ELK/EFK (Elasticsearch, Fluentd/Fluent Bit, Kibana), Prometheus metrics, Azure Dashboards, and KQL-based alerting
- Engineer data and analytics integrations: Azure Databricks, PostgreSQL, Snowflake; operationalize Power BI, SharePoint, and Jupyter-based workflows
- Build robust platform and app integrations: ServiceNow APIs, REST APIs, SMTP/IMAP/POP email automations; configure and manage NGINX/HAProxy load balancers
- Lead incident response, root-cause analysis, and postmortems; continuously improve reliability, performance, security, and cost
- Mentor teams, drive architectural runway, and communicate plans, trade-offs, and outcomes to stakeholders and leadership
Requirements:
- Expert-level hands-on DevOps across Azure and Kubernetes: CI/CD, Git workflows, infrastructure as code, automated testing, monitoring, and secure deployment
- Proficiency with Azure DevOps (ADO) YAML pipelines and GitHub Actions; experience optimizing pipelines for cloud-native systems
- Strong Kubernetes operations including CNI networking and service mesh (Linkerd); container build and supply chain (Docker, Buildah)
- Observability at scale using ELK/EFK, Prometheus, Fluentd/Fluent Bit, Azure Monitor dashboards and alerting (KQL)
- Deep automation with PowerShell, Bash, and Python to eliminate toil across build, release, environment, and operational workflows
- Infrastructure as Code expertise with Terraform (Azure resources: AKS, Service Bus, Event Hubs, Storage, Key Vault, Bastion)
- Proven track record reducing manual intervention, increasing repeatability, and improving MTTR through automation
- Practical, production experience delivering agentic AI solutions (task orchestration, tool-use, planning, retrieval, and evaluation)
- Hands-on with LangChain, LangSmith (tracing/eval), OpenAI/Azure OpenAI, and LiteLLM integration; familiarity with prompt engineering, safety/guardrails, and LLM observability (e.g., Arize)
- Experience operationalizing AI services within DevOps pipelines and platform governance
- Apache NiFi expertise: authoring and governing dataflows; deploying and scaling NiFi clusters on AKS with VM Scale Sets
- Azure services: AKS, Service Bus, Event Hubs (setup and integration), Storage Accounts (setup and integration), Key Vault, Bastion, Azure Dashboards & Kusto Query Language (KQL)
- Data/analytics: Azure Databricks, PostgreSQL, Snowflake; Power BI and SharePoint integrations; Jupyter Notebook workflows
- Networking fundamentals: DHCP/DNS; load balancer configuration and operations (NGINX, HAProxy); Kubernetes ingress best practices
- Messaging and email protocols: SMTP, IMAP/POP
- Microservices and app frameworks: Python and Node.js microservices (REST APIs), Electron build and packaging
- Windows PowerShell; Linux/Unix administration; Bash and Python
- Azure Cloud (architecture, security, cost, RBAC); Azure DevOps (ADO) with YAML; GitHub Actions
- Docker and Buildah; Kubernetes (CNI), Linkerd; ELK/EFK, Prometheus, Fluentd/Fluent Bit
- Apache NiFi flow development and clustered operations on Kubernetes with scale sets
- Azure Databricks; PostgreSQL; Snowflake; REST APIs; ServiceNow APIs; Power BI; SharePoint
- Azure Service Bus, Azure Event Hubs, Storage Accounts, Key Vault, Bastion
- Jira; Jupyter Notebook; Azure Dashboards and KQL; SMTP/IMAP/POP
- Python and Node.js microservice architecture; Electron build
- Plan, schedule, and coordinate multi-team deliveries and releases; manage dependencies, risks, and change
- Drive execution across platform, app, data, and AI workstreams with clear milestones and success criteria
- Establish SLOs/SLAs and error budgets; align roadmaps to business priorities
- Communicate architectural decisions, roadmaps, and trade-offs to technical and executive audiences
- Lead cross-functional ceremonies; produce clear runbooks, architecture docs, and dashboards
- Foster collaboration across engineering, product, security, and operations
- Rapid diagnosis and resolution of complex production issues; strong RCA and remediation planning
- Attention to detail in reliability, security, performance, and cost optimization
- Track and adopt evolving best practices in cloud, containers, DevOps, and agentic AI
- Champion continuous improvement in engineering excellence and platform governance
- Typically requires 10–15+ years in software engineering, DevOps/SRE, or platform engineering with principal-level impact
- Bachelor's degree in Computer Science, Information Technology, or related field preferred (or equivalent experience)