Spear AI is a growing defense contracting company dedicated to delivering cutting-edge solutions that support our nation’s security. We’re seeking a skilled Data Engineer to build the next-generation data management and artificial intelligence platform for maritime domain awareness, involving the implementation of real-time and offline data pipelines, data warehousing, and ensuring data quality.
Responsibilities:
- Implement real-time data pipelines with MQTT and Redpanda for stream processing
- Implement offline data pipelines using Dagster for batch processing
- Parse and process binary message formats from various data sources
- Build data warehouses using Postgres, Apache Iceberg, Parquet, and S3
- Design data models that allow for high-performance queries
- Validate and normalize data sources
- Improve local development and CI/CD using modern tooling and GitHub Actions
Requirements:
- Expertise in time-series data processing and analysis (windowing, resampling, interpolation, etc.)
- Proficiency in Python and Rust for data engineering workflows
- Experience with binary message parsing
- Experience with row-based & columnar-based data formats
- Experience with OLTP & OLAP databases
- Knowledge of distributed systems, streaming architectures, and batch processing patterns
- Hands-on experience with a batch orchestrator such as Dagster/Airflow
- Hands-on experience with a streaming platform such as Redpanda/Kafka
- Hands-on experience with binary message formats such as Protobuf
- Experience with IoT devices and sensors
- Digital signal processing experience
- Geospatial analysis and GIS experience
- Familiar with working in monorepos