Senior MLOps Engineer

  • AI/ML

  • Remote

  • Contract

About the job:
Title – Senior MLOps Engineer
Start date: Immediate
Position Type: Contract/ FTE
Location: Remote across Canada/USA
  
  
About the Role: 
We are looking for a highly skilled Senior MLOps Engineer to lead the development, deployment, and operationalization of an in-house, on-premise code generation system. This role requires hands-on experience with GitLab Actions and a strong foundation in LLMOps to manage the lifecycle of large language models (LLMs) effectively. 
As a critical member of the team, you will design, implement, and optimize robust MLOps pipelines, ensuring the seamless operation of our code generator. Your contributions will be instrumental in integrating cutting-edge AI into our workflows while maintaining the highest standards of performance, security, and scalability. 
  
Key Responsibilities: 
GitLab CI/CD and Automation

  • Design and implement CI/CD pipelines using GitLab Actions for model and code deployment in an on-prem environment. 
  • Automate testing, deployment, and rollback processes for machine learning workflows. 
  
LLMOps Expertise
  • Develop pipelines specifically tailored for managing large language models, including fine-tuning, version control, and automated deployments. 
  • Implement monitoring systems to track model performance, latency, drift, and data quality. 
  
MLOps Pipeline Development
  • Build scalable pipelines for model training, evaluation, and deployment, leveraging tools such as MLflow, Kubeflow, or Airflow. 
  • Ensure reproducibility and traceability of experiments and models. 
  
Infrastructure and Security
  • Architect and manage a secure, on-premise infrastructure optimized for high-performance compute environments (e.g., NVIDIA GPUs, TPUs). 
  • Implement robust security practices for handling sensitive data and ensure compliance with industry standards. 
Collaboration and Integration
  • Work closely with AI researchers, software engineers, and DevOps teams to integrate LLM-based code generation tools into existing systems. 
  • Provide guidance on best practices for LLMOps and MLOps adoption across the organization. 
  
Optimization and Scalability
  • Optimize LLM inference and training workflows for cost, speed, and efficiency. 
  • Scale the system to support multiple users and high daily API calls within an on-premise setup. 
  
Documentation and Training
  • Maintain detailed documentation for pipelines, infrastructure, and workflows. 
  • Train team members and stakeholders on tools and practices for effective LLM and MLOps workflows. 
  
Key Requirements: 
  
Technical Skills: 
  
GitLab CI/CD
  • Hands-on experience with GitLab Actions for CI/CD pipelines in machine learning projects. 
  • Expertise in automating complex workflows with GitLab. 
  
LLMOps
  • Proven experience managing large language models (LLMs) in production. 
  • Familiarity with fine-tuning and deploying LLMs using tools like Hugging Face Transformers or OpenAI APIs. 
  • Experience with continuous learning pipelines for LLMs. 
  
MLOps Fundamentals
  • Strong skills in MLOps tools like MLflow, DVC, or Kubeflow. 
  • Proficient in Python and machine learning libraries such as PyTorch or TensorFlow. 
  
Infrastructure
  • Experience with containerization and orchestration (Docker, Kubernetes). 
  • Knowledge of GPU-optimized workflows and distributed systems. 
  
Soft Skills: 
  • Exceptional problem-solving and troubleshooting abilities. 
  • Strong communication and collaboration skills to work with cross-functional teams. 
  • Leadership qualities to mentor junior engineers and lead complex projects. 
  
Preferred Qualifications: 
  • Experience with hybrid MLOps and DevOps workflows. 
  • Understanding of secure on-prem deployments for AI/ML systems. 
  • Knowledge of prompt engineering and evaluation techniques for LLMs. 

Main Logo
Rocket