Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 130 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 29 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 76 tok/s Pro
Kimi K2 196 tok/s Pro
GPT OSS 120B 434 tok/s Pro
Claude Sonnet 4.5 39 tok/s Pro
2000 character limit reached

Kanana: Compute-efficient Bilingual Language Models (2502.18934v3)

Published 26 Feb 2025 in cs.CL and cs.LG

Abstract: We introduce Kanana, a series of bilingual LLMs that demonstrate exceeding performance in Korean and competitive performance in English. The computational cost of Kanana is significantly lower than that of state-of-the-art models of similar size. The report details the techniques employed during pre-training to achieve compute-efficient yet competitive models, including high quality data filtering, staged pre-training, depth up-scaling, and pruning and distillation. Furthermore, the report outlines the methodologies utilized during the post-training of the Kanana models, encompassing supervised fine-tuning and preference optimization, aimed at enhancing their capability for seamless interaction with users. Lastly, the report elaborates on plausible approaches used for LLM adaptation to specific scenarios, such as embedding, retrieval augmented generation, and function calling. The Kanana model series spans from 2.1B to 32.5B parameters with 2.1B models (base, instruct, embedding) publicly released to promote research on Korean LLMs.

Summary

  • The paper introduces Kanana, bilingual models achieving over 11% compute savings in pre-training through staged training, up-scaling, and iterative pruning techniques.
  • Kanana uses rigorously filtered bilingual datasets to achieve competitive performance on benchmarks like MMLU, KMMLU, and HAE-RAE, demonstrating particular strength in Korean-specific tasks.
  • Kanana utilizes a comprehensive post-training pipeline, including SFT and preference optimization, for robust performance across diverse NLP applications and domain-specific tasks.

This paper presents Kanana, a bilingual LLM family that emphasizes compute efficiency through innovative training and adaptation techniques.

  • It reduces pre-training compute by employing staged pre-training, depth up-scaling, and iterative pruning/distillation, saving over 11% of resources compared to training from scratch.
  • It leverages rigorously filtered bilingual datasets to achieve competitive performance on benchmarks such as MMLU, KMMLU, and HAE-RAE while excelling in Korean-specific tasks.
  • Its comprehensive post-training pipeline—including supervised fine-tuning, offline and online preference optimization, and domain-specific adaptations for embedding, retrieval, and function calling—ensures robust performance across diverse NLP applications.
Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 1 like.

Upgrade to Pro to view all of the tweets about this paper: