Apetan Consulting LLC is seeking a DevOps Engineer with strong experience in AI/ML infrastructure and a solid understanding of Slackware-based Linux systems. The role focuses on building, deploying, and maintaining scalable, reliable environments for machine learning workflows and data pipelines.
Responsibilities:
- Design, implement, and maintain CI/CD pipelines for AI/ML applications
- Manage and optimize infrastructure for model training, testing, and deployment
- Work with data scientists and ML engineers to streamline model lifecycle (training deployment monitoring)
- Maintain and customize Slackware-based systems for performance and stability
- Automate infrastructure provisioning using Infrastructure as Code (IaC) tools
- Monitor system performance, reliability, and security across environments
- Implement containerization and orchestration solutions (Docker, Kubernetes)
- Ensure reproducibility and scalability of ML experiments
- Troubleshoot system, networking, and deployment issues
Requirements:
- Strong experience with Linux systems (Slackware preferred or similar distros)
- Hands-on experience with DevOps tools (Jenkins, GitLab CI/CD, etc.)
- Proficiency in scripting (Bash, Python)
- Experience with containerization (Docker) and orchestration (Kubernetes)
- Familiarity with cloud platforms (AWS, GCP, or Azure)
- Understanding of ML workflows and tools (TensorFlow, PyTorch, MLflow, etc.)
- Knowledge of version control systems (Git)
- Experience with monitoring/logging tools (Prometheus, Grafana, ELK stack)
- Experience working in AI/ML production environments
- Knowledge of GPU-based workloads and optimization
- Familiarity with data pipelines (Airflow, Kafka, etc.)
- Experience managing custom Linux distributions like Slackware
- Understanding of security best practices in DevOps