CloudIngest is seeking a Data Scientist with expertise in AI and Machine Learning. The role involves building end-to-end machine learning models, fine-tuning large language models, and developing scalable data pipelines while utilizing various tools and frameworks.
Responsibilities:
- Strong experience with Python and ML frameworks (TensorFlow, PyTorch, Scikit-learn)
- End-to-end model building: data prep, training, evaluation, deployment
- Experience with NLP, embeddings, transformer architectures, and LLM fine-tuning
- Fine-tuning or prompting GPT-based LLMs
- Experience building RAG systems (Retrieval-Augmented Generation)
- Knowledge of vector databases
- Understanding of agentic frameworks (CrewAI, LangChain agents, AutoGen, etc.)
- Developing multi-agent systems with tools, memory, and planning loops
- Airflow for workflow orchestration
- Docker for containerization
- Kubernetes for scalable deployment
- CI/CD for model deployment
- Model monitoring and drift detection
- API development (FastAPI, Flask)
- Experience with SQL and NoSQL data stores
- Must have experience in:
- Palantir Foundry (data ingestion, transformations, ontology modeling, analytics workflows)
- Snowflake (advanced SQL, performance tuning, analytical schema design)
- Building scalable ETL/ELT data pipelines (batch, near real-time)
- Data modeling for analytics and reporting (layers)
- Working with relational and NoSQL databases
- Python for data processing and automation
- Data quality, governance, and lineage practices
- Unsupervised clustering and classification of structured data in DBMS/files and unstructured data
- Creating multi-stage analysis/transformation pipelines like Contour in Palantir Foundry
- Distributed data processing frameworks like Spark
- Querying using GraphQL, Scalding, etc
- Create apps, reports, dashboards like those in Palantir Foundry
- Good to have:
- Dbt, Airflow, Spark or similar orchestration/processing frameworks
- BI & visualization tools (Tableau, Power BI, Looker, etc.)
- Streaming data platforms (Kafka/Kinesis)
- Cloud platforms (AWS/Azure/GCP)
- ML feature engineering or analytics support
- Agentic AI skills
Requirements:
- Strong experience with Python and ML frameworks (TensorFlow, PyTorch, Scikit-learn)
- End-to-end model building: data prep, training, evaluation, deployment
- Experience with NLP, embeddings, transformer architectures, and LLM fine-tuning
- Fine-tuning or prompting GPT-based LLMs
- Experience building RAG systems (Retrieval-Augmented Generation)
- Knowledge of vector databases
- Understanding of agentic frameworks (CrewAI, LangChain agents, AutoGen, etc.)
- Developing multi-agent systems with tools, memory, and planning loops
- Airflow for workflow orchestration
- Docker for containerization
- Kubernetes for scalable deployment
- CI/CD for model deployment
- Model monitoring and drift detection
- API development (FastAPI, Flask)
- Experience with SQL and NoSQL data stores
- Must have experience in Palantir Foundry (data ingestion, transformations, ontology modeling, analytics workflows)
- Snowflake (advanced SQL, performance tuning, analytical schema design)
- Building scalable ETL/ELT data pipelines (batch, near real-time)
- Data modeling for analytics and reporting (layers)
- Working with relational and NoSQL databases
- Python for data processing and automation
- Data quality, governance, and lineage practices
- Unsupervised clustering and classification of structured data in DBMS/files and unstructured data
- Creating multi-stage analysis/transformation pipelines like Contour in Palantir Foundry
- Distributed data processing frameworks like Spark
- Querying using GraphQL, Scalding, etc
- Create apps, reports, dashboards like those in Palantir Foundry
- dbt, Airflow, Spark or similar orchestration/processing frameworks
- BI & visualization tools (Tableau, Power BI, Looker, etc.)
- Streaming data platforms (Kafka/Kinesis)
- Cloud platforms (AWS/Azure/GCP)
- ML feature engineering or analytics support
- Agentic AI skills