Overview
Apertus is an open source Large Languag Model (LLM) developed in Switzerland. This documentation shows you how to get started with the LLM, whether as user, researcher, or advanced contributor: we are maintaining this knowledge base for you, and could ✉️ use your feedback.
About the project
The model development team is part of the Swiss AI Initiative, which started in late 2023. This is a platform for over 80 data science projects including the LLM development. Key highlights of the LLM project, as announced in July, include:
- Multilingualism: Trained on more than 15 trillion tokens across 1,500+ languages, 40% non-English - equal usage cost across languages - see @epfml
- Performance: This is a large model (8 billion and 70 billion parameters), trained on a lot of tokens, and it will be continue to be actively optimized.
- Open & Transparent: Published under Apache-2.0 license - including source code, weights, and open training data.
- Data Privacy: Compliant with GDPR, EU AI Act, and Swiss data protection laws - see Fan et al 2025
- Infrastructure: Developed on the new Alps supercomputer at CSCS with over 10,000 NVIDIA GH200 Grace-Hopper chips
- Global Reach: Research and borderless applications in mind, for sovereign and international public-interest AI.
Tech specs
The Swiss LLM is trained on the Alps supercomputer, operational at CSCS since September 2024:
- 10,752 NVIDIA GH200 Grace-Hopper chips
- Computing power: 270-435 PFLOPS
- Ranked 6th on the TOP500 list (June 2024)
The Swiss LLM was trained on approximately 15 trillion tokens. Particularly noteworthy is the high proportion of non-English data (40%) and coverage of over 1,500 languages, including rare ones like Romansh or Zulu. The data was ethically sourced - without illegal scraping, respecting robots.txt and copyright requirements. While this limits access to certain specialized information, CSCS emphasizes: «For general tasks, this doesn’t lead to measurable performance losses.»
For more technical references, see the Sources further below.
Initial benchmarks
See the Evaluation section of the Apertus Model Card, and Section 5 of the Tech Report for more data. This is an initial independent evaluation, and we expect more to come soon:
| Model | MMLU (Knowledge) | Global-MMLU (Multilingual) | GSM8K (Math) | HumanEval (Code) | RULER @32k (Long Context) |
|---|---|---|---|---|---|
| Claude 3.5 Sonnet | 88.7% | — | 96.4% | 92.0% | — |
| Llama 3.1 70B | 83.6% | — | 95.1% | 80.5% | — |
| Apertus-70B | 69.6% | 62.7% | 77.6% | 73.0% | 80.6% |
| Apertus-8B | 60.9% | 55.7% | 62.9% | 67.0% | 69.5% |
Performance comparison
| Model | Parameters | Openness | Language Coverage | Training Hardware | Strengths |
|---|---|---|---|---|---|
| Swiss LLM | 8B / 70B | Open Source, Weights, Data | >1,500 | Alps: 10,752 GH200 GPUs | Linguistic diversity, data privacy, transparency |
| GPT-4.5 | ~2T (estimated) | Proprietary | ~80 - 120 | Azure: ~25,000 A100 GPUs | Creativity, natural conversation, agentic planning |
| Claude 4 | Not published | Proprietary | ? | Anthropic: ? | Adaptive reasoning, coding |
| Llama 4 | 109B / 400B | Open Weight | 12, with 200+ in training | Meta: ~20,000 H100 GPUs | Multimodality, large community, agentic tasks |
| Grok 4 | ~1.8T MoE | Proprietary | ? | Colossus: 200,000 H100 GPUs | Reasoning, real-time data, humor… |
Source: effektiv.ch
Sources
For further information:
- Swiss AI Initiative (swiss-ai.org)
- July Announcement (ethz.ch)
- ETH Zurich AI Center (ai.ethz.ch)
- EPFL Machine Learning Lab (epfl.ch)
- Apertus Tech Session (swiss-ai-weeks.ch)
- Can the Swiss LLM Compete? (effektiv.ch)
- AlgorithmWatch statement & position paper
- Alps Supercomputer ranking (TOP500.org)