Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech (2309.05472v2)

Published 11 Sep 2023 in cs.CL, cs.AI, cs.SD, and eess.AS

Abstract: Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-equipped French speech technologies. It includes documented, large-scale and heterogeneous corpora with up to 14,000 hours of heterogeneous speech, ten pre-trained SSL wav2vec 2.0 models containing from 26 million to one billion learnable parameters shared with the community, and an evaluation protocol made of six downstream tasks to complement existing benchmarks. LeBenchmark 2.0 also presents unique perspectives on pre-trained SSL models for speech with the investigation of frozen versus fine-tuned downstream models, task-agnostic versus task-specific pre-trained models as well as a discussion on the carbon footprint of large-scale model training. Overall, the newly introduced models trained on 14,000 hours of French speech outperform multilingual and previous LeBenchmark SSL models across the benchmark but also required up to four times more energy for pre-training.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (8)
  1. Mathematics into Type, American Mathematical Society. Online available:
  2. The LATEXCompanion, by F. Mittelbach and M. Goossens
  3. More Math into LaTeX, by G. Grätzer
  4. AMS-StyleGuide-online.pdf, published by the American Mathematical Society
  5. H. Sira-Ramirez. “On the sliding mode control of nonlinear systems,” Systems & Control Letters, vol. 19, pp. 303–312, 1992.
  6. A. Levant. “Exact differentiation of signals with unbounded higher derivatives,” in Proceedings of the 45th IEEE Conference on Decision and Control, San Diego, California, USA, pp. 5585–5590, 2006.
  7. M. Fliess, C. Join, and H. Sira-Ramirez. “Non-linear estimation is easy,” International Journal of Modelling, Identification and Control, vol. 4, no. 1, pp. 12–27, 2008.
  8. R. Ortega, A. Astolfi, G. Bastin, and H. Rodriguez. “Stabilization of food-chain systems using a port-controlled Hamiltonian description,” in Proceedings of the American Control Conference, Chicago, Illinois, USA, pp. 2245–2249, 2000.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (22)
  1. Titouan Parcollet (49 papers)
  2. Ha Nguyen (14 papers)
  3. Solene Evain (2 papers)
  4. Marcely Zanon Boito (18 papers)
  5. Adrien Pupier (3 papers)
  6. Salima Mdhaffar (11 papers)
  7. Hang Le (9 papers)
  8. Sina Alisamir (6 papers)
  9. Natalia Tomashenko (32 papers)
  10. Marco Dinarelli (20 papers)
  11. Shucong Zhang (16 papers)
  12. Alexandre Allauzen (26 papers)
  13. Maximin Coavoux (15 papers)
  14. Jerome Goulian (2 papers)
  15. Benjamin Lecouteux (14 papers)
  16. Solange Rossato (5 papers)
  17. Fabien Ringeval (14 papers)
  18. Didier Schwab (23 papers)
  19. Laurent Besacier (76 papers)
  20. Mickael Rouvier (25 papers)
Citations (14)