Actively AI is building superintelligence for GTM teams, focused on increasing productivity through advanced data systems. The Senior/Staff Data Platform Engineer will design and scale the data ecosystem that supports decision-making processes across the company.
Responsibilities:
- Own the ingestion and transformation layer
- Design and scale pipelines that pull structured and unstructured data from CRM systems, call transcripts, and external signals, normalizing and enriching it into representations agents can reason over in real time
- Build for operational use, not just analytics
- Keep data current as the world changes
- Architect real-time and mini-batch workflows using technologies like Pub/Sub, Kafka, or modern ETL tools to ensure data stays synchronized as customer activity happens
- Solve for customer-specific variation at scale
- Own reliability end to end
- Observability, lineage, schema management, alerting; you define what 'trust in the data' means and make sure it holds across thousands of accounts, so agents and other teams can confidently build on top of it
- Work across the full stack
Requirements:
- 5+ years designing and operating core data infrastructure from ingestion and transformation to serving and observability in high-growth environments where the data needed to be right, fresh, and fast
- You've worked on data systems that power ML models, intelligent workflows, or real-time decisioning
- Proficient in Python, SQL, and DBT, with hands-on experience in BigQuery or Snowflake
- Familiar with orchestration tools like Fivetran, Airflow, or Polytomic
- You've built streaming and mini-batch pipelines using Pub/Sub, Kafka, Dataflow, or similar technologies
- You've either built a data platform from scratch at an early-stage company or worked at a data-focused product company (e.g. Segment, dbt Labs) scaling systems across many customers
- You take work from design to production without being managed through it, and you hold yourself responsible for whether the data your systems produce is actually trustworthy
- Prior experience at a data infrastructure or platform company (e.g. Segment, Databricks, Confluent, Fivetran) or meaningful contributions to open-source data tooling
- Familiarity with embedding and vector pipelines like chunking strategies, index management, and keeping representations in sync with fast-changing source data
- Experience building data pipelines where correctness was a hard requirement like financial data, compliance systems, or other domains where bad data has real downstream consequences