Abnormal Security is focused on building a robust data platform that supports analytics, automation, and AI use cases. The Data Platform Engineer will be responsible for developing and maintaining data ingestion and ETL frameworks, managing data modeling, and ensuring the reliability and performance of the cloud data warehouse.
Responsibilities:
- Build reusable ingestion and ETL frameworks (Python and Spark) for APIs, databases, and un/semi-structured sources; handle JSON/Parquet and evolving schemas
- Own and evolve Medallion layers (bronze/silver/gold) for key domains with clear lineage, metadata, and ownership
- Design dimensional models and gold marts for core business metrics; ensure consistent grain and definitions
- Maintain semantic layers and partner on BI dashboards (Sigma or similar) so metrics are certified and self-serve
- Implement tests, freshness/volume monitoring, alerting, and runbooks; perform incident response and root-cause analysis (RCA) for data issues
- Administer and tune the cloud data warehouse (Snowflake or similar): compute sizing, permissions, query performance, and cost controls
- Build paved-road patterns (templates, operators, CI checks) and automate repetitive tasks to boost developer productivity
- Prepare curated datasets for AI/ML/LLM use cases (feature sets, embeddings prep) with appropriate governance
Requirements:
- 3–5+ years hands-on data engineering experience; strong SQL and Python; experience building data pipelines end-to-end in production
- Strong cloud fundamentals (AWS preferred; other major clouds acceptable): object storage, IAM concepts, logging/monitoring, and managed compute
- Experience building and operating production ETL pipelines with reliability basics: retries, backfills, idempotency, incremental processing patterns (e.g., SCDs, late-arriving data), and clear operational ownership (docs/runbooks)
- Solid understanding of Medallion / layered architecture concepts (bronze/silver/gold or equivalent) and experience working within each layer
- Strong data modeling fundamentals (dimensional modeling/star schema): can define grain, build facts/dimensions, and support consistent metrics
- Working experience in a modern cloud data warehouse (Snowflake or similar): can write performant SQL and understand core warehouse concepts
- Hands-on dbt experience: building and maintaining models, writing core tests (freshness/uniqueness/RI), and contributing to documentation; ability to work in an established dbt project
- Experience with analytics/BI tooling (Sigma, Looker, Tableau, etc.) and semantic layer concepts; ability to support stakeholders and troubleshoot issues end-to-end
- Snowflake administration depth: warehouse sizing and cost management, advanced performance tuning, clustering strategies, and designing RBAC models
- Advanced governance & security patterns: masking policies, row-level security, and least-privilege frameworks as a primary implementer/owner
- Strong Spark/PySpark proficiency: deep tuning/optimization and large-scale transformations
- dbt “platform-level” ownership: CI/CD-based deployments, environment/promotion workflows, advanced macros/packages, and leading large refactors or establishing standards from scratch
- Orchestration: Airflow/MWAA DAG design patterns, backfill strategies at scale, dependency management, and operational hardening
- Sigma-specific depth: semantic layer/metrics layer architecture in Sigma, advanced dashboard standards, and organization-wide “certified metrics” rollout
- Automation / iPaaS experience: Workato (or similar) for business integrations and operational workflows
- Infrastructure-as-code: Terraform (or similar) for data/cloud infrastructure provisioning, environment management, and safe change rollout
- Data observability & lineage tooling: OpenLineage/Monte Carlo-style patterns, automated lineage hooks, anomaly detection systems
- Lakehouse / unstructured patterns: Parquet/Iceberg, event/data contracts, and advanced handling of semi/unstructured sources
- AI/ML/LLM data workflows: feature stores, embeddings/RAG prep, and privacy-aware governance