Senior LLM Engineer (Fine-Tuning & LLMOps)
AI/ML
Remote
Contract
Title – Senior LLM Engineer (Fine-Tuning & LLMOps)
Start date: Immediate
Position Type: Contract
Location: Remote across Canada/USA
About the Role:
We are looking for a highly skilled and experienced Senior LLM Engineer to lead the fine-tuning and deployment of large language models (LLMs) in local environments. This role involves working on end-to-end LLM workflows, including model customization, deployment, and lifecycle management using modern LLMOps practices.
As a core contributor, you will develop robust CI/CD/CL (Continuous Integration/Continuous Deployment/Continuous Learning) pipelines, ensuring the efficient and scalable use of LLMs for various applications.
Key Responsibilities:
LLM Fine-Tuning:
- Fine-tune pre-trained large language models (e.g., GPT, LLaMA, Falcon) for specific use cases and domain-specific tasks.
- Design and implement custom data preprocessing and augmentation pipelines to improve model performance.
- Deploy and optimize LLMs on on-premise environments, ensuring resource efficiency.
- Configure and manage GPU clusters for high-performance local inference and training.
- Develop robust CI/CD pipelines to automate model versioning, testing, and deployment.
- Implement Continuous Learning (CL) pipelines to retrain models with fresh data and monitor performance.
- Establish comprehensive monitoring systems for tracking model performance, drift, and latency.
- Optimize inference workflows for low-latency and high-throughput performance.
- Collaborate with cross-functional teams, including data scientists, DevOps engineers, and software developers, to integrate LLMs into production systems.
- Maintain detailed documentation of workflows, pipelines, and model changes.
- Stay updated with the latest advancements in LLMs, fine-tuning techniques, and LLMOps tools.
- Experiment with new architectures and methodologies to improve efficiency and scalability.
Experience:
- 7+ years of programming experience in machine learning or AI-related roles.
- At least 3 years of hands-on experience fine-tuning and deploying large language models.
Technical Skills:
LLM Expertise:
- Proficiency in fine-tuning frameworks like Hugging Face Transformers, LoRA (Low-Rank Adaptation), or PEFT (Parameter Efficient Fine-Tuning).
- Experience with pre-trained models such as GPT, BERT, T5, LLaMA, or Falcon.
- Knowledge of prompt engineering and evaluation techniques.
- Strong experience in deploying LLMs locally on GPUs and optimizing for performance.
- Familiarity with tools like NVIDIA Triton Inference Server, TensorRT, and DeepSpeed.
- Proficiency in tools like MLflow, DVC, or Kubeflow for model lifecycle management.
- Expertise in CI/CD tools (e.g., GitLab Actions, Jenkins) and integrating them with machine learning pipelines.
- Experience implementing Continuous Learning pipelines for model retraining.
- Strong Python programming skills with experience in libraries such as PyTorch and TensorFlow.
- Proficiency in scripting and automation (e.g., Bash, Terraform).
- Experience with Kubernetes, Docker, and managing GPU clusters.
- Familiarity with hybrid cloud and on-premise deployments.
- Familiarity with distributed training frameworks (e.g., Horovod, PyTorch DDP).
- Experience working with knowledge bases, RAG (Retrieval-Augmented Generation), or hybrid AI systems.
- Strong understanding of secure deployment practices for sensitive applications.
- Excellent problem-solving skills and attention to detail.
- Strong collaboration and communication abilities.
- Self-motivated with a proactive approach to learning and experimentation.