Connecticut Innovations is Connecticut’s strategic venture capital arm, passionate about serving its portfolio of companies. They are seeking a Staff Data Engineer to design and implement a next-generation data architecture, enabling data science, analytics, and product teams to operate more efficiently.
Responsibilities:
- Redesign Arccos's data architecture from the ground up with a focus on scalability, clarity, and long-term maintainability
- Create canonical models, schemas, and semantic layers that eliminate duplicated logic and reduce time-to-insight across teams
- Unify fragmented tables and data sources into coherent, well-documented, lineage-aware structures
- Identify where to shift business logic upstream and create reliable, auditable data transformations
- Architect data structures, metadata layers, and documentation standards that enable AI tooling, including Snowflake's Conversational Analytics and emerging agentic analysis workflows
- Build and maintain a comprehensive, AI-ready data dictionary across MySQL and Snowflake — ensuring every table and field is clearly described, contextualized, and optimized for LLM-based context retrieval
- Ensure schemas, lineage, semantic layers, and business logic are organized so AI systems can reliably understand context, meaning, and relationships across datasets
- Provide data architecture guidance and logging infrastructure to support Arccos's LLM/AI strategy
- Build with the assumption that a growing percentage of internal analytics, querying, and business insights will be generated or augmented by AI agents
- Own ingestion and transformation pipelines spanning: app telemetry, shot and sensor data, subscriptions and commerce, internal business systems, and product analytics
- Architect highly optimized Snowflake models and performance-tuned warehouse patterns
- Improve organization and access patterns across large volumes of raw and enriched S3 data, including flattening and restructuring JSON dumps and semi-structured data
- Use (or refine) existing orchestration tooling like Airflow; recommend improvements where appropriate
- Define naming conventions, folder structures, transformation standards, and SQL style guidelines
- Build and maintain comprehensive data documentation, lineage mapping, and data dictionaries
- Introduce robust data quality checks, automated validation, and monitoring
- Work closely with data science, engineering, product, and growth teams to create data structures that support analytics, ML, and product development
- Contribute to active feature development in parallel with onboarding — ensuring new data structures are designed correctly from the start rather than retrofitted later
- Dramatically simplify the downstream query experience so teams can be productive without heroic effort
- Be the company-wide steward of best practices for ingesting, modeling, and deploying data
- Develop a thorough understanding of Arccos's current data landscape — MySQL schemas, Snowflake warehouse, S3 data, pipelines, and key pain points
- Produce a documented assessment of the ecosystem: gaps, risks, duplication, and technical debt
- Begin contributing to active feature development and new initiatives (e.g., LLM/AI logging and data architecture needs) in parallel with onboarding
- Deliver a proposed Target State Architecture with clear principles and a phased implementation plan
- Deliver an AI-ready data dictionary covering key MySQL and Snowflake tables — every table and field documented with clear descriptions optimized for LLM consumption
- Stand up initial canonical models for key domains (e.g., users, sessions, commerce, shot-level data)
- Establish naming conventions, transformation standards, and folder structures
- Deliver initial documentation and lineage mapping for top critical pipelines
- Reduce query complexity for high-impact stakeholders (DS, product, exec analytics)
- Implement major portions of the redesigned ingest + transformation pipeline
- Migrate priority data sources to new canonical models
- Introduce automated data quality checks and monitoring across core domains
- Measurably reduce analysis time and eliminate major areas of schema confusion
- Deliver a documented, scalable architecture that supports ML, analytics, and product needs
- Be recognized internally as the owner and subject matter expert for Arccos's data platform
Requirements:
- 6–12+ years in data engineering or data architecture roles
- Demonstrated experience architecting both transactional database systems and data warehouses — understanding how data flows from OLTP sources into analytical environments and designing both sides well
- Deep experience with: AWS (S3, Lambda, IAM, Airflow)
- Deep experience with: Snowflake (performance tuning, modeling, optimization)
- Deep experience with: MySQL (schema design, query optimization, managing production OLTP data as a source for analytics)
- Deep experience with: Python for ETL/ELT development
- Deep experience with: SQL expertise at a high craft level
- Proven success architecting complex, multi-source data ecosystems
- Experience working with large semi-structured datasets (JSON, Parquet, logs)
- Demonstrated ability to bring order to fragmented, legacy, or fast-evolving environments
- Understanding of how metadata, documentation, and schema design influence LLM performance and context retrieval
- Strong communication and documentation skills
- Thrives as a high-ownership, self-directed individual contributor who can assess what needs to be done and drive toward it autonomously
- Experience with AI-driven analytics and agentic tools (e.g., Snowflake's Conversational Analytics)
- Ability to architect data systems that support natural-language interfaces and automated insight generation
- Experience with analytical modeling layer tools
- Experience supporting ML training pipelines (S3 → high-memory/GPU compute)
- Database administration familiarity (replication, uptime management, performance troubleshooting) — this role is not a DBA, but comfort interfacing with outsourced DBA resources is a plus
- Familiarity with MySQL schema migration strategies
- Interest in golf or sports data (not required)