Clear Fracture is building AI-driven data integration systems that enable organizations to connect, transform, and reason over complex data using agentic workflows. The Data Engineer will focus on data modeling and system design, building core data infrastructure and developing real use cases on the platform to enhance user interaction with data.
Responsibilities:
- Design and implement logical and physical data models for complex, evolving datasets
- Define schemas and access patterns that support multi-tenant usage and application-level workflows
- Balance normalization, performance, and flexibility across different storage systems
- Partner with product and engineering teams to translate requirements into scalable data designs
- Develop real-world data use cases on top of the platform to validate and extend its capabilities
- Design and build data interfaces and abstractions that help users understand and work with data
- Contribute to systems such as: Data glossaries, Semantic layers, Metadata and schema discovery tools
- Help define how users explore, model, and interact with data within the platform
- Translate complex data structures into intuitive, usable representations
- Build backend services and APIs that expose and operate on data models
- Implement data access layers that are reliable, maintainable, and performant
- Contribute to core application architecture where data and services intersect
- Write clean, testable, production-grade code
- Design and implement pipelines for ingesting, transforming, and validating data
- Support both batch and near-real-time processing workflows
- Build systems that handle structured, semi-structured, and unstructured data
- Enable data flows that support AI-driven and agent-based workflows
- Work with embeddings, context retrieval, and data representations used in modern AI systems
- Help design systems that make data accessible and useful for autonomous agents
- Implement validation, monitoring, and testing for data systems
- Ensure correctness, consistency, and observability of data pipelines and services
- Diagnose and resolve data-related issues in production environments
Requirements:
- Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience
- 6+ years of professional experience in software engineering and/or data engineering roles
- Due to the nature of the work, U.S. Citizenship and the ability to obtain a Secret Clearance are required
- Strong programming skills in Python (or similar backend language)
- Experience designing and implementing data models for production systems, with advanced knowledge of dimensional modeling topics like slowly changing dimensions and entity relationship diagrams
- Proficiency in SQL and experience with relational databases (e.g., PostgreSQL)
- Experience building backend services or APIs that interact with data systems
- Experience designing and operating data pipelines (ETL/ELT)
- Familiarity with NoSQL databases and different data storage paradigms
- Experience working with large datasets and performance optimization
- Experience with Docker and containerized development workflows
- Familiarity with Kubernetes-based environments
- Strong understanding of software engineering fundamentals (testing, version control, system design)
- Experience building multi-tenant data systems
- Familiarity with semantic layers, data catalogs, or data discovery systems
- Experience designing data-facing user interfaces or developer tooling
- Experience with streaming systems (e.g., Kafka or similar)
- Experience with orchestration tools (e.g., Airflow, Dagster, Prefect)
- Experience working with AI/ML data pipelines or agent-based systems
- Experience supporting on-prem or hybrid deployments
- Exposure to data governance, access control, and metadata systems
- Experience with cloud platforms (AWS, Azure, GCP)
- Familiarity with vector databases (e.g., Pinecone, ChromaDB) and embedding-based retrieval