Apertus is an open source Large Languag Model (LLM) developed in Switzerland. This documentation shows you how to get started with the LLM, whether as user, researcher, or advanced contributor: we are maintaining this knowledge base for you, and could ✉️ use your feedback.

About the project

The model development team is part of the Swiss AI Initiative, which started in late 2023. This is a platform for over 80 data science projects including the LLM development. Key highlights of the LLM project, as announced in July, include:

  • Multilingualism: Trained on more than 15 trillion tokens across 1,500+ languages, 40% non-English - equal usage cost across languages - see @epfml
  • Performance: This is a large model (8 billion and 70 billion parameters), trained on a lot of tokens, and it will be continue to be actively optimized.
  • Open & Transparent: Published under Apache-2.0 license - including source code, weights, and open training data.
  • Data Privacy: Compliant with GDPR, EU AI Act, and Swiss data protection laws - see Fan et al 2025
  • Infrastructure: Developed on the new Alps supercomputer at CSCS with over 10,000 NVIDIA GH200 Grace-Hopper chips
  • Global Reach: Research and borderless applications in mind, for sovereign and international public-interest AI.

Tech specs

The Swiss LLM is trained on the Alps supercomputer, operational at CSCS since September 2024:

The Swiss LLM was trained on approximately 15 trillion tokens. Particularly noteworthy is the high proportion of non-English data (40%) and coverage of over 1,500 languages, including rare ones like Romansh or Zulu. The data was ethically sourced - without illegal scraping, respecting robots.txt and copyright requirements. While this limits access to certain specialized information, CSCS emphasizes: «For general tasks, this doesn’t lead to measurable performance losses.»

For more technical references, see the Sources further below.

Initial benchmarks

See the Evaluation section of the Apertus Model Card, and Section 5 of the Tech Report for more data. This is an initial independent evaluation, and we expect more to come soon:

ModelMMLU (Knowledge)Global-MMLU (Multilingual)GSM8K (Math)HumanEval (Code)RULER @32k (Long Context)
Claude 3.5 Sonnet88.7%96.4%92.0%
Llama 3.1 70B83.6%95.1%80.5%
Apertus-70B69.6%62.7%77.6%73.0%80.6%
Apertus-8B60.9%55.7%62.9%67.0%69.5%

Performance comparison

ModelParametersOpennessLanguage CoverageTraining HardwareStrengths
Swiss LLM8B / 70BOpen Source, Weights, Data>1,500Alps: 10,752 GH200 GPUsLinguistic diversity, data privacy, transparency
GPT-4.5~2T (estimated)Proprietary~80 - 120Azure: ~25,000 A100 GPUsCreativity, natural conversation, agentic planning
Claude 4Not publishedProprietary?Anthropic: ?Adaptive reasoning, coding
Llama 4109B / 400BOpen Weight12, with 200+ in trainingMeta: ~20,000 H100 GPUsMultimodality, large community, agentic tasks
Grok 4~1.8T MoEProprietary?Colossus: 200,000 H100 GPUsReasoning, real-time data, humor…

Source: effektiv.ch

Sources

For further information: