Abnormal AI is focused on building and supporting world-class data pipelines for their AI-native security platform. The Senior Software Engineer - Data Engineering will establish the technical foundation for data excellence, ensure the reliability of critical data pipelines, and collaborate with various teams to align data infrastructure with business needs.
Responsibilities:
- Own mission-critical pipeline reliability: Take end-to-end ownership of our production data pipelines processing billions of messages weekly, ensuring 99.9% uptime for revenue-critical pipelines that directly enable sales and customer-facing AI products
- Build self-healing pipelines: Design and implement automated monitoring, testing, and recovery systems for data pipelines that eliminate manual intervention and reduce MTTR from hours to minutes
- Accelerate development velocity: Deploy CI/CD pipelines and self-service platforms that reduce deployment time from 3-5 days to under 2 hours, enabling Data Scientists to safely deploy models without engineering bottlenecks
- Architect for scale: Optimize data pipelines handling exponential annual growth, implementing cost-effective solutions that support regional expansion and compliance requirements (GDPR, FedRAMP, SOC2)
- Bridge technical and business domains: Partner with Sales, Finance, and Product teams to ensure data infrastructure aligns with business needs, making critical trade-off decisions when pipelines impact revenue
- Establish data engineering excellence: Define best practices for dbt, Airflow, Spark usage, PII anonymization, and cross-divisional data sharing while mentoring embedded Data Guild team members on these
- Enable AI and accessible data consumption: Design and maintain an accessible semantic layer that provides consistent, trustworthy definitions and abstractions, making it easy for stakeholders to consume data and incorporate AI-driven insights into their workflows
Requirements:
- 6+ years of software engineering experience in backend, distributed systems, or data-focused roles
- Proven experience designing and running large-scale, production-grade data pipelines
- Proficiency in our stack: Python, Spark/PySpark, Airflow, SQL, dbt, Databricks, Snowflake, AWS
- Proven track record of driving pipeline reliability to 99%+ uptime, including SLAs, observability tooling, and automated recovery patterns
- Strong systems-thinking skills with the ability to debug complex distributed systems, optimize for performance and cost, and make architectural decisions balancing short-term needs with long-term scalability
- Demonstrated ownership mindset and ability to drive projects from conception to production independently, including on-call responsibilities for critical systems
- Experience collaborating with Data Science, Analytics, Product, Finance, Marketing, and Sales, along with the ability to communicate technical decisions clearly to non-technical stakeholders and executives
- Bachelor's degree in Computer Science, Applied Sciences, Information Systems or other related quantitative fields
- Experience building or operating AI/ML data pipelines, including data readiness for training and evaluation
- Background in high-growth environments where data volume doubles annually, requiring frequent re-architecture and optimization
- Experience with compliance frameworks such as GDPR, SOC2, FedRAMP, plus familiarity with PII handling and anonymization
- Knowledge of multi-region data architectures, cellular/multi-tenant systems, or related large-scale distributed design patterns
- Background in cybersecurity, threat detection, or email security
- Experience building internal developer tools for data scientists and analysts
- Track record of mentorship, tech leadership, and driving cross-functional initiatives
- Advanced degree in Computer Science or related fields