Define Technical Vision: Define the long-term data engineering strategy guided by company-wide priorities and engineering best practices.
Design Ecosystems: Create coherent designs across multiple pipelines and API boundaries. Reduce complex concepts to foundational components and simplify infrastructure to lower maintenance costs.
Drive Engineering Excellence: Make high-impact technical choices—including "build vs. buy" and framework selections— based on sound reasoning. Review designs to preemptively identify and resolve technical risks.
Improve Developer Efficiency: Implement solutions that measurably improve developer efficiency and establish engineering-wide quality and best practices.
Deliver at Scale: Roll out major features and systems reliably, including appropriate monitoring, failure domain characterization, and success metric definitions.
Align with Business Goals: Leverage a deep understanding of SmithRx’s business strategy to identify group-wide opportunities. Proactively refocus team efforts when projects are off-course or not moving the needle for the business.
Manage Data Quality & Governance: Enforce data governance policies (PII/PHI protection, security, compliance) and implement data quality principles to raise the bar for the reliability of data shared internally and externally.
Influence Cross-Functionally: Influence the roadmaps of other SmithRx teams. Act thoughtfully and decisively in critical situations, seeking diverse perspectives but ultimately leading decision-making to move priorities forward.
Mentor and Develop: Serve as a role model and coach for other engineers, taking into account their unique skills and providing constructive feedback to maximize their impact.
Executive Communication: Develop focused messaging and effectively present technical strategies and business cases at the executive level.
Champion Organizational Change: Break down silos, build deep cross-functional relationships, and create excitement to drive the adoption of new technologies or processes across the organization.
Requirements
8+ years of years of industrial experience in data engineering with an advanced degree or 12+ years with an undergraduate degree in Computer Science, Information Technology, or a related field.
Demonstrated mastery of data modeling concepts, database design principles, and data warehouse technologies (e.g., Snowflake) through production-grade implementations.
Strong skills in PySpark, SQL, and Python are required.
Experience in modern object-oriented or compiled languages such as C#/C++, Go, Java, or Scala is a plus.
Hands-on experience with leading ETL tools and frameworks (e.g., Apache Spark, Apache Airflow, dbt, Looker, Superset).
In-depth experience managing the entire data lifecycle, with direct responsibility for the development, implementation, and production release of complex data processing solutions utilizing distributed systems.
A proven track record of making decisions optimized for the wider engineering organization rather than locally optimal outcomes, especially in environments with significant ambiguity.
Tech Stack
Airflow
Apache
Distributed Systems
ETL
Java
PySpark
Python
Scala
Spark
SQL
Go
Benefits
Highly competitive wellness benefits including Medical, Pharmacy, Dental, Vision, and Life Insurance and AD&D Insurance
Flexible Spending Benefits
401(k) Retirement Savings Program
Short-term and long-term disability
Discretionary Paid Time Off
12 Paid Company Holidays
Wellness Benefits
Commuter Benefits
Paid Parental Leave benefits
Employee Assistance Program (EAP)
Well-stocked kitchen in office locations
Professional development and training opportunities