Architect Foundry Pipelines: Design, develop, and maintain scalable data pipelines within the Palantir Foundry ecosystem (Data Connection, Code Repos, Pipeline Builder).
Implement ETL processes: Implement ETL and real-time data pipelines for efficient data handling and integration.
Optimize PySpark: Write and tune high-performance Python/Spark code to transform massive insurance datasets into actionable insights.
Build the Ontology: Manage and scale the Foundry Ontology, ensuring data models accurately represent complex real-world insurance entities.
Implement AI/GenAI: Integrate GenAI capabilities (using Palantir AIP or external AI APIs) to automate workflows and enhance data enrichment.
Implement CI/CD: Implement practices using CodePipeline and CodeBuild to streamline development and deployment processes.
Collaboration & Agile Development: Gather requirements, set targets, define interface specifications, and conduct design sessions. Work closely with data consumers to ensure proper integration. Adapt and learn in a fast-paced project environment.
Requirements
Deep hands-on experience with the Palantir Foundry stack.
Strong SQL skills for ETL, data modeling, and performance tuning.
Experience with ML models.
Proficiency in Python, especially for handling and flattening complex JSON structures.
Production experience with deploying ML models and expertise in ML frameworks and tools.
Extensive experience with AWS cloud services.
Understanding of software engineering and testing practices within an Agile environment.
Experience with Data as Code; version control, small and regular commits, unit tests, CI/CD, packaging, familiarity with containerization tools such as Docker (must have) and Kubernetes (plus).
Excellent teamwork and communication skills.
Proficiency in English, with strong written and verbal communication skills.
Efficient, high-performance data pipelines for real-time and batch data processing.