1. Home
  2. Job Portal
  3. Jobs
  4. Machine Learning Systems Engineer
BlueStone

Machine Learning Systems Engineer

The Machine Learning Systems Engineer contributes to ML infrastructure and the open-source ScalarLM codebase.

Location: Remote (San Francisco Bay Area / North or South America)
Experience Level: 3+ years in ML engineering or research

About the role:
The Machine Learning Systems Engineer contributes to ML infrastructure and the open-source ScalarLM codebase. The role intersects high-performance computing, distributed systems, and advanced machine learning research.

Key responsibilities:
• Develop and optimise distributed training algorithms for large language models (LLMs).
• Implement high-performance inference engines and optimisation techniques.
• Work on integrations between the vLLM, Megatron-LM and HuggingFace ecosystems.
• Build tools for seamless model training, fine-tuning and deployment.
• Optimise performance across advanced GPU architectures.
• Research and implement new techniques for self-improving AI agents.

Required technical expertise:
• Proficiency in both C/C++ and Python.
• Deep understanding of HPC concepts, including MPI programming and distributed computing across multiple GPUs/nodes.
• Experience with transformer architectures and distributed training techniques (data parallelism, model parallelism).
• Experience with large-scale distributed training frameworks (Megatron-LM, DeepSpeed) and inference-optimisation frameworks (vLLM, TensorRT).

BlueStone Solutions B.V. is certified in accordance with NEN 4400-1 and recognised sponsor with the IND.

⚡️ Website developed by Skeps.nl