We are seeking a highly skilled Machine Learning Engineer with deep expertise in building large language models (LLMs) from scratch. The ideal candidate will have extensive experience in training, fine-tuning, and deploying LLMs at scale, alongside the ability to optimise computational kernels for high GPU performance. Additionally, the role requires designing scalable model-serving architectures to ensure high-performance inference.
This role is a 12 month contract, remote and onsite in London. The rate is £800 per day.
Our client is a cutting-edge AI company specialising in the development and deployment of advanced large language models. Our mission is to push the boundaries of AI innovation, delivering transformative solutions that empower businesses and individuals alike.
Key responsibilities
- Design, develop, and train large-scale language models from the ground up, leveraging advanced techniques in NLP and deep learning.
- Fine-tune pre-trained models to meet specific application requirements, ensuring optimal performance.
- Optimise computational kernels using frameworks such as CUDA and Triton to maximise GPU efficiency.
- Architect scalable and robust model-serving solutions for high-throughput and low-latency inference.
- Collaborate on the design and implementation of distributed systems to support large-scale training and deployment pipelines.
- Utilize frameworks like PyTorch, TensorFlow, or JAX to implement and test model architectures and training workflows.
- Work with Kubernetes, Slurm, or similar distributed systems to manage resources and ensure seamless operation of large-scale infrastructure.
- Stay up to date with the latest advancements in LLMs, distributed computing, and GPU optimisation.