Samsara is a pioneer of the Connected Operations™ Cloud, focusing on improving the safety, efficiency, and sustainability of physical operations. As a Data Engineer II, you will own the data platforms that power Samsara’s GTM AI engine, building and optimizing data pipelines while partnering with data scientists and AI engineers to deliver innovative solutions.
Responsibilities:
- Build and maintain ETL/ELT data pipelines in Databricks and Spark, ensuring data is ingested, transformed, and delivered reliably for analytics and AI use cases
- Develop and evolve logical and physical data models to support reporting, experimentation, and advanced workflows (e.g., scoring models, signal generation)
- Implement monitoring, alerts, and testing for data quality, timeliness, and lineage to ensure trustworthy data delivery
- Support workflow orchestration with Databricks Jobs, DBT, or equivalent scheduling tools to operate at scale
- Contribute to data pipelines and tooling that support retrieval-augmented generation (RAG), vector integrations, or embedding workflows
- Design and optimize bulk GenAI data pipelines in Databricks to support generative AI applications at scale
- Partner with AI engineers and data scientists to enable experimentation, model training, and production-grade deployments
- Develop frameworks for data ingestion, transformation, governance, and monitoring across CRM, sales, and revenue systems
- Work with RevOps, sales, and customer success stakeholders to translate business needs into data requirements and stable technical implementations
Requirements:
- 2-3 years of industry experience in data engineering, with significant experience building large-scale data platforms
- Hands-on experience working with modern data technologies stack, such as Databricks, DBT, Redshift, RDS, Snowflake or similar solutions
- Proficiency in Python and SQL, with experience in designing robust ETL/ELT pipelines
- Experience orchestrating data workflows at scale and enabling machine learning or AI use cases
- Strong understanding of data modeling, performance optimization, and cost-efficient infrastructure design
- Located in and authorized to work in the United States (this is a fully remote role)
- Experience enabling generative AI workflows in Databricks or similar platforms
- Familiarity with vector databases, embeddings, and retrieval systems
- Experience with Salesforce, Gainsight, Gong, Outreach, or other CRM/enablement tools as data sources
- Proven ability to automate repetitive tasks, improve data hygiene, and enable experimentation across GTM data use cases aligning with the emerging responsibilities of GTM engineering where clean, reliable GTM data foundations enable high-leverage automation and insight generation
- Exposure to observability, monitoring, and governance best practices for data and AI systems
- Ability to collaborate closely with AI/ML teams while driving technical excellence in data engineering