Bayside Solutions is seeking an AI Infrastructure|Platform & Data Engineer for a remote role. The engineer will be responsible for developing and maintaining backend features, managing Kubernetes deployments, and ensuring security and compliance in the AI infrastructure.
Responsibilities:
- Python backend - Comfortable reading and contributing to an async Python/FastAPI codebase; able to build or extend backend features as needed
- Kubernetes + Helm - EKS deployments, Helm chart authoring and maintenance, HPA, network policies, ingress config
- Experience with RAG (Retrieval Augmented Generation)
- CI/CD pipelines - End-to-end build → test → deploy pipelines; familiarity with Vessel (or similar internal image build tools)
- Infrastructure as Code - Helm, Terraform, or equivalent
- Observability - Metrics (Prometheus/Grafana or equivalents), log aggregation, pipeline failure alerting
- Evaluation & testing infrastructure - Automated RAG evaluation: retrieval quality metrics (MRR, NDCG, recall@k), answer quality benchmarks, regression suites
- Security & compliance - Secret management, mTLS cert rotation, access control audits
Requirements:
- Python backend - Comfortable reading and contributing to an async Python/FastAPI codebase; able to build or extend backend features as needed
- Kubernetes + Helm - EKS deployments, Helm chart authoring and maintenance, HPA, network policies, ingress config
- Experience with RAG (Retrieval Augmented Generation)
- CI/CD pipelines - End-to-end build → test → deploy pipelines; familiarity with Vessel (or similar internal image build tools)
- Infrastructure as Code - Helm, Terraform, or equivalent
- Observability - Metrics (Prometheus/Grafana or equivalents), log aggregation, pipeline failure alerting
- Evaluation & testing infrastructure - Automated RAG evaluation: retrieval quality metrics (MRR, NDCG, recall@k), answer quality benchmarks, regression suites
- Security & compliance - Secret management, mTLS cert rotation, access control audits
- Experience with embedding/vector search platforms at scale
- Familiarity with workflow orchestration like Airflow, Temporal etc
- Load testing and capacity planning for high-volume embedding ingestion
- Some experience and understanding of working with AI 'Agent' workflows and how they work with Platforms