Avaya is an enterprise software leader that helps organizations and government agencies connect effectively. The Senior Data Engineer will design and operate low-latency streaming pipelines and robust batch ETL/ELT processes, ensuring data quality and compliance while collaborating with application teams and analytics to deliver data products.
Responsibilities:
- Design, build, and operate low-latency streaming pipelines (Kafka, Spark Structured Streaming) and robust batch ETL/ELT on Databricks Lakehouse
- Establish reliable orchestration and dependency management (Airflow), with strong SLAs and on-call readiness for business-critical data flows
- Model, optimize, and document curated datasets and interfaces that serve analytics, product features, and AI workloads
- Implement data quality checks, observability, and backfills; drive root-cause analysis and incident prevention
- Partner with application teams (Go/Java), analytics, and ML/AI to ship data products into production
- Build and maintain datasets and services that power RAG pipelines and agentic AI workflows (tool-use/function calling)
- When Spark/Databricks isn’t optimal, design and operate custom processors/services in Go to meet strict latency or specialized transformation requirements
- Instrument prompt/response and token usage telemetry to support LLMOps evaluation and cost optimization; provide datasets for labeling and golden sets
- Improve performance and cost (storage/compute), review code, and raise engineering standards
- Design data solutions aligned to enterprise security, privacy, and compliance requirements (e.g., SOC 2, ISO 27001, GDPR/CCPA as applicable), partnering with Security/Legal
- Implement RBAC/ABAC and least-privilege access; manage service principals, secrets, and key rotation; enforce encryption in transit and at rest
- Govern sensitive data: classification, PII handling, masking/tokenization, retention/archival, lineage, and audit logging across pipelines and storage
- Build observability for data security and quality; support incident response, access reviews, and audit readiness
- Embed controls in CI/CD (policy checks, dependency vulnerability scanning) and ensure infra-as-code adheres to guardrails
- Partner with security engineering on penetration tests, threat modeling, and red-team exercises; remediate findings and document controls
- Contribute to compliance audits (e.g., SOC 2/ISO 27001) with evidence collection and continuous control monitoring; support DPIAs/PIAs where required
Requirements:
- 6+ years building production-grade data pipelines at scale (streaming and batch)
- Deep proficiency in Python and SQL; strong Spark experience on Databricks (or similar)
- Advanced SQL: window functions, CTEs, partitioning/z-ordering, query planning and tuning in lakehouse environments
- Hands-on with Kafka (or equivalent) and an orchestrator (Airflow preferred)
- Strong data modeling skills and performance tuning for low latency and high throughput
- Production mindset: SLAs, monitoring, alerting, CI/CD, and on-call participation
- Proficient using AI coding assistants (Cursor, Claude Code) as part of daily development
- Proficiency building data services/processors in Go (or willingness to ramp quickly), and familiarity with alternative frameworks (e.g., Flink/Beam) is a plus
- Experience in multi-cloud or cloud migration (Azure plus either GCP or AWS)
- Exposure to building data for AI/RAG, LLM-powered features, and agentic AI patterns (tool-use/function calling, planning/execution, memory)
- Familiarity with LLMOps telemetry (prompt/response logs, token budgets) and agent evaluation pipelines
- Background in high-scale product engineering (vs. internal IT-only projects)
- Contact center or CRM data familiarity (nice-to-have, not required)
- Bachelor's or Master's in CS/EE/Math or similar; strong academic background and/or top-tier companies