Main LLMs and resources we've contributed to in 2025
LLMs
- 🔥 Lucie 7b: main contributor is Linagora. It's an open-source, open-weights and open-data LLM pretrained from scratch on >3T tokens focused on French and other languages.
- 🔥 Mille Pensées: main contributor is Gabriel Lauzzana. It's a post-trained 7b LLM with good results on maths that reasons in French.
- 🔥 SpeechLLM for Wolof: main contributor is Yaya Sy. It's an open Wolof Speech-LLM with continued pretrained HuBERT encoder.
- LlamaER-8b: main contributor is Imed Ghebriout. It's a finetuned Llama-3.1-8b on SimSamu (medical emergency calls).
- BaldWhisper: main contributor is Yaya Sy. It's an open low-rank compressed and distilled Whisper model for Bambara.
- BaldGemma3: main contributor is Yaya Sy. It's a google/gemma-3-270m-it LLM compressed down to 140m parameters.
- Qwen3-8b-FRnews: It's an LLM finetuned on French News from 2024.
Other LLM-related resources
- Lillama: main contributor is Yaya Sy. It's a low-cost compression pipeline that we have applied to Mixtral-8x7b, Mistral-7b, Phi-3 14b, Phi-2 3b, Mamba-3b.
- Libriquote: main contributor is Gaspard Michel. It's a 12,700 hours expressive speech dataset for TTS.