Lockheed Martin is a leader in the aerospace and defense industry, committed to solving the world's most complex challenges through innovation. They are seeking a highly motivated AI Infrastructure & Platform Ops Engineer to contribute to the development and maintenance of their AI Factory toolset, collaborating with cross-functional teams to enhance AI systems and deploy them in disconnected environments.
Responsibilities:
- Develop and maintain existing AI Factory toolset, including debugging, testing, and optimizing AI systems
- Collaborate with AI Factory applications teams to support new feature development, providing technical expertise and guidance on AI system integration
- Work with stakeholders to identify and prioritize requirements for AI system improvements and new feature development
- Support the deployment of AI Factory in disconnected environments, ensuring seamless integration and functionality
- Participate in design reviews, code reviews, and testing to ensure high-quality AI systems and toolsets
- Stay up-to-date with emerging AI technologies and trends, applying this knowledge to improve AI Factory toolsets and systems
- Collaborate with internal teams to ensure AI systems meet Lockheed Martin Proprietary Information (LMPI) sensitivity requirements
Requirements:
- Strong programming skills in languages such as Python, Golang, C++, or Java
- Experience working with Kubernetes systems
- Experience with Pipeline Automation, such as ArgoCD, Tekton, GitLab CI/CD, Jenkins
- Familiarity with public cloud computing services, such as AWS, GCP, Azure
- Experience with Containers, including Open Container Initiative (OCI) and Docker
- Experience using Kubernetes deployment mechanisms like Helm and Kustomize
- Knowledge of cybersecurity principles and practices
- Knowledge of Machine Learning Architectures, including GPU Computing
- Familiarity with Monitoring and Performance tools, such as Prometheus, Grafana, Dynatrace, Sysdig
- Strong oral and written communication skills, and ability to collaborate with cross-functional partners
- Creative and resourceful when it comes to problem-solving
- Ability to work with internal stakeholders to collect feedback, prioritize tasks, and manage the engineering backlog
- Familiarity with AI and machine learning concepts, including deep learning frameworks and tools
- Strong communication and collaboration skills, with the ability to work effectively in a team environment
- Self-motivated, self-directed, and the ability to thrive in a fast-paced environment in an industry that constantly changes