AbsenceSoft is transforming the employee experience with innovative technology designed for HR professionals. They are seeking an AI Data Engineer to design and manage data pipelines and infrastructure that support intelligent AI applications, collaborating closely with data scientists and product teams to ensure high-performance data architecture.

Responsibilities:

Design, build, and maintain data pipelines for structured, unstructured, and semi-structured data sources
Develop and optimize data models, ETL processes, and batch/streaming data infrastructure
Partner with data scientists to support training, evaluation, and deployment of ML and LLM models
Implement scalable architectures for embeddings, vector databases, and retrieval pipelines
Enable real-time and offline analytics workflows using best-in-class data engineering practices
Ensure data quality, lineage, observability, and governance across all data products
Deploy secure, cloud-native data infrastructure (AWS, Azure, GCP) for high-volume AI workloads
Contribute to the design of feature stores and MLOps platforms for continuous learning and model updates
Collaborate on Responsible AI workflows to ensure compliant data usage and access controls
Continuously evaluate new tools and technologies for improving performance, reliability, and agility

Requirements:

5+ years of experience as a Data Engineer building large-scale, production-grade data pipelines
Strong command of SQL, Python, and distributed data processing frameworks (Spark, Flink, Beam)
Hands-on experience with ETL/ELT tools and orchestration systems (Airflow, dbt, Prefect, Dagster)
Familiarity with cloud-native data platforms (Snowflake, BigQuery, Redshift, Databricks)
Experience supporting ML/AI workloads and collaborating with model development teams
Knowledge of vector databases (FAISS, Pinecone, Weaviate) and embeddings management
Understanding of data privacy, access control, and compliance in regulated environments
Proficiency in modern DevOps tooling for data infrastructure (Docker, Terraform, CI/CD)
Ability to work autonomously and thrive in a fast-paced, collaborative environment
Cloud: AWS (Redshift, S3, Lambda), Azure (Data Lake, Synapse), GCP (BigQuery, Cloud Functions)
Streaming: Kafka, Kinesis, Pub/Sub, Spark Streaming, Apache Flink
Workflow Tools: dbt, Airflow, Dagster, Prefect
Storage & Processing: Snowflake, Databricks, Parquet, Delta Lake
Vector Search: FAISS, Pinecone, Weaviate, txtai

AI Data Engineer

Key skills

About this role

Responsibilities:

Requirements: