Archetype AI is developing an innovative AI platform aimed at transforming real-world data into valuable insights. The Staff Software Engineer will be responsible for data processing and analysis across edge devices, building high-performance data pipelines and ensuring reliable software operation in constrained environments.
Responsibilities:
- Analyze raw data using Python for statistical analysis, visualization, and exploratory techniques to understand quality, patterns, and anomalies
- Prepare datasets for AI workflows: cleaning, normalization, imputation, filtering, resampling, and validation
- Execute iterative preprocessing cycles: refine transformations, evaluate results, compare against baselines, retain improvements
- Build tooling for data validation, quality monitoring, and automated preprocessing
- Generate clear reports and visualizations that communicate findings to technical and non-technical stakeholders
- Build and optimize data processing software in C++ that runs on small, resource-constrained Linux devices
- Ensure pipelines meet real-time performance requirements: low latency, bounded memory, reliable throughput
- Integrate sensor inputs and manage data flow on-device: ingestion, buffering, local processing, and transmission
- Work within device constraints: limited CPU, memory, storage, and intermittent connectivity
- Contribute to device deployment, configuration, and operational tooling
- Partner with Solutions Engineers to assess customer data assets and deployment requirements
- Translate customer data challenges into reusable pipeline components and analysis workflows
Requirements:
- 7+ years in data engineering, data analysis, or related technical roles with hands-on data processing focus
- Deep experience with time-series data (video a plus): ingestion, preprocessing, feature extraction, quality assessment
- Proven ability to apply diverse analytical techniques: statistical analysis, signal processing, visualization, anomaly detection
- Experience with iterative data workflows: hypothesis, transformation, evaluation, refinement
- Comfortable building and running software on Linux devices, familiarity with system-level concerns (resource usage, process management, I/O)
- Experience with real-time or streaming data processing under latency and throughput constraints
- Familiarity with data preparation for ML: dataset formatting, labeling workflows, train/eval splits, data validation
- C++ (production development): Strong proficiency building production data pipelines and device software. Experience with modern C++, memory management, multithreading, and performance optimization
- Python (analysis & prototyping): Strong proficiency for data exploration, statistical analysis, visualization, and rapid prototyping. Experience with NumPy, Pandas, Matplotlib, and Jupyter notebooks
- Proven expertise in Linux system architecture and performance, including process design, I/O strategies, and diagnosing complex production issues
- Debugging & profiling: Strong skills diagnosing performance issues, memory problems, and data pipeline failures in both C++ and Python
- Clear, structured written communication, including customer-facing documentation of findings, processes, and technical decisions
- Proven ability to present complex analytical and technical results directly to customers, translating them into concrete, actionable insights for technical teams and business stakeholders
- Background in signal processing, control systems, or physics-based data analysis
- Experience with embedding-space analysis or other AI/ML diagnostic techniques
- Prior work optimizing data pipelines for resource-constrained environments
- Background in solutions engineering or customer-facing technical work