Dr. Berg Nutritionals is one of the largest health education and supplement companies in the world, built around Dr. Eric Berg's YouTube channel. The Senior Data Engineer will be responsible for building and maintaining ingestion pipelines to ensure reliable and accurate data flow into the company's data warehouse.
Responsibilities:
- Partner with the Head of Data (or the CIO directly) to complete a technical audit of every existing data source — what's flowing, what's broken, what's missing
- Replace our current manual CSV-based Klaviyo ingestion with a direct API pipeline
- Stand up the first production pipelines for Amazon SP-API, Shopify, and NetSuite, with proper monitoring and alerting
- Establish our infrastructure-as-code practice (Bicep or Terraform) and CI/CD pipeline for data engineering changes
- Document everything — pipeline architecture, runbooks, on-call procedures
- Build and maintain ingestion pipelines. You will own the end-to-end pipelines from source systems into our warehouse. This includes Amazon Selling Partner API, Shopify Admin API, NetSuite (SuiteAnalytics Connect), Klaviyo, Recharge, YouTube Data and Analytics APIs, GA4 (via BigQuery export), Google Ads, Meta Ads, Triple Whale, and approximately 15 additional sources across our Layer 1–5 data model. For each pipeline you will design the ingestion approach, build it with proper error handling and idempotency, establish incremental-load patterns where appropriate, and monitor it in production
- Own orchestration and scheduling. You decide what runs when, in what order, and with what dependencies. Financial data needs to be fresh before finance’s morning reconciliation. YouTube analytics need to respect daily API quotas across 7,000+ videos. Klaviyo events need to stream continuously. This is your call to make — and your responsibility to get right
- Monitoring, alerting, and on-call. Every pipeline you build needs health checks: row counts within expected ranges, schema validation, freshness SLAs, and data quality gates. You will configure Azure Monitor alerts, decide what pages someone overnight versus what can wait, and lead post-incident reviews. You will take part in a one-in-four weekly on-call rotation once the team is fully staffed
- Performance and cost optimization. Our data volumes are substantial — YouTube analytics alone is 7,000+ videos × daily metrics × multiple channels. You will own partitioning strategy, query tuning, incremental processing patterns, and monthly cost reviews. At our scale, this work directly saves tens of thousands of dollars per year in warehouse compute
- Source system and vendor API management. When Shopify deprecates an endpoint, when Amazon changes reporting structure, when NetSuite releases a new ODBC driver — you're the person who reads the release notes, tests the change, and adapts the pipelines. You will own API keys, service accounts, rate-limit tracking, and vendor support escalations for data-source APIs
- Enforce data contracts. You define and enforce the contracts between source systems and downstream consumers — what fields exist, what's never null, what ranges are valid. When a source system violates its contract, your pipelines stop and alert rather than passing bad data downstream to our AI analyzer. This is what structurally prevents hallucinations
- Infrastructure-as-code and CI/CD. Pipelines are defined as code (Bicep, Terraform, or ARM templates) and deployed through peer review. You will own this practice, along with the dev/staging/production environment separation that lets us move fast without breaking the weekly brief
Requirements:
- 5–8+ years of professional data engineering experience, with at least 2–3 years working primarily in Azure (Data Factory, Synapse, Fabric, or comparable)
- Strong SQL — not just query-writing, but query tuning, execution plan analysis, and indexing strategy
- Production-level proficiency in C# and/or Python for custom connector work
- Demonstrated experience building pipelines against messy real-world APIs — ideally Amazon SP-API, NetSuite, or similarly difficult commerce/ERP sources. This is non-negotiable. Experience with only 'clean' SaaS APIs like Stripe or Salesforce is not a substitute
- Infrastructure-as-code experience using Bicep, Terraform, or ARM templates
- Real on-call experience — you know what good runbooks and alerting look like because you've been paged at 2am and you know what made the difference between a five-minute fix and a five-hour fire
- Strong written communication — because much of your work is documenting decisions and runbooks that others will rely on
- Direct experience with dbt or a comparable transformation framework
- Experience with Microsoft Fabric specifically, or a strong point of view on Fabric vs. Synapse vs. Snowflake
- Familiarity with Microsoft Agent Framework (Semantic Kernel, AutoGen) or comparable agent orchestration systems
- E-commerce or direct-to-consumer industry experience, particularly at multi-channel scale
- Experience with vector databases (Azure AI Search, pgvector, Pinecone) for AI-retrieval use cases
- Prior experience as the first or founding data engineer at a growing company