SGlang

SGLang is a high-performance serving framework for large language models and multimodal models. It is designed to deliver low-latency and high-throughput inference across a wide range of setups, from a single GPU to large distributed clusters.

It is recommended by the engineering team of Apertus, and more information will be added here later. For now, please visit the official website and documentation for deployment instructions.