Papers
Topics
Authors
Recent
2000 character limit reached

Taiwan LLM 13B v2.0 Chat Model

Updated 8 January 2026
  • Taiwan-LLM-13B-v2.0-Chat is a culturally aligned large language model that leverages continued pretraining and instruction fine-tuning on Taiwan-specific Traditional Chinese data to address linguistic underrepresentation.
  • The model adapts the Llama2-13B backbone through rigorous data curation and locally relevant feedback, resulting in enhanced performance on TC-Eval benchmarks.
  • Its comprehensive evaluation and open-source distribution set a precedent for culturally and linguistically customized AI, fostering improved authenticity in local applications.

Taiwan-LLM-13B-v2.0-Chat is a culturally aligned LLM designed specifically for Traditional Chinese as used in Taiwan. Developed through targeted continue-pretraining and instruction fine-tuning on culturally curated data, it leverages the Llama 2-13B architecture to address the historical underrepresentation of Taiwanese linguistic and cultural traits in existing LLMs. Benchmarked by the TC-Eval suite, this model demonstrates superior performance over LLMs predominantly trained on Simplified Chinese or English and sets new baselines for culturally resonant AI communication in the Taiwanese context (Lin et al., 2023).

1. Model Architecture and Configuration

Taiwan-LLM-13B-v2.0-Chat is based on continue-pretraining (cPT) and supervised fine-tuning of the Llama 2-13B backbone. No architectural modifications were introduced beyond the base model. The core transformer hyperparameters are summarized below:

Configuration Value Notes
Number of layers 40 Standard
Hidden dimension 5,120
Attention heads 40
Context window length 4,096 tokens
Total parameters 13×10⁹

The model includes no novel attention mechanisms (e.g., grouped-query attention) or additional layers beyond the base Llama 2-13B. The focus is on cultural and linguistic adaptation at the data level, not the architectural level (Lin et al., 2023).

2. Pretraining Corpus and Data Curation

Pretraining data was exclusively curated from Taiwan-relevant sources:

Source # Documents # Tokens (B) % of Total
Social media 8.24 M 16.6 47.3%
News 8.60 M 10.4 29.6%
Knowledge base 3.19 M 5.7 16.3%
Books 4 K 2.4 6.8%
Total 20.0 M 35.1 100%

Key strategies included strict site-level filtering (to exclude spam/misinformation) and strong bias toward authentic Traditional Chinese texts from Taiwan-centric sources. Tokenization employed the existing Llama 2 byte-level BPE vocabulary; no additional Chinese-specific tokens were introduced, but data composition was highly skewed toward zh-TW. Preprocessing was minimal, consisting of HTML cleaning, Unicode normalization, and spam removal (Lin et al., 2023).

3. Instruction Fine-Tuning and User-Feedback Alignment

Instruction alignment involved sequential supervised fine-tuning (SFT) stages followed by a feedback-based SFT stage. The objective for both stages is maximizing the expected log-likelihood of target responses:

a) Supervised Fine-Tuning:

πSFT=argmaxπE(x,y)Ddialogue[logπ(yx)]\pi_\mathrm{SFT} = \arg\max_\pi\, \mathbb{E}_{(x,y)\sim D_\mathrm{dialogue}}\,[\log \pi(y|x)]

Fine-tuning data comprised multi-turn prompt–response pairs, each translated to Traditional Chinese and paired with responses by gpt-3.5-turbo, as well as author-written dialogues. Representative datasets include SuperNI, CoT, Flan V2, Dolly, Open Assistant 1, among others, totaling hundreds of thousands of turns (see table below for indicative sizes):

Dataset Instances Notes
SuperNI 18,547 Human-written instructions
CoT 35,990 Chain-of-thought annotation
Alpaca 41,133 GPT-3 Davinci-003
Taiwan Instr. 947 Author-written, Taiwan-only

b) Feedback Supervised Fine-Tuning:

A further 20,000 prompt–response pairs, positively rated by Taiwanese users (binary feedback), were used:

πFeedSFT=argmaxπE(x,y)Fpos[logπ(yx)]\pi_\mathrm{Feed\,SFT} = \arg\max_\pi\, \mathbb{E}_{(x,y)\sim F_\mathrm{pos}}\,[\log\pi(y|x)]

This process primarily adjusted tone, politeness, and the embedding of local cultural references. No reinforcement learning (RLHF, PPO) was conducted in version 2.0, but the paper suggests these may be topics for future study (Lin et al., 2023).

4. Evaluation Methodology, Results, and Ablation

Model performance was assessed using the TC-Eval suite, with metrics including EM, ROUGE-2, and accuracy (Acc) for key tasks:

Model DRCD (EM) FGC (EM) TTQA (Acc) TMMLU (Acc) XSum-TC (R2) IMDB-TC (Acc) Table (Acc) Avg (%)
Taiwan-LLM 13B 87.57% 50.00% 70.87% 39.04% 5.23% 92.36% 32.89% 53.99%
– no cPT 75.81% 38.00% 56.31% 36.28% 0.06% 93.94% 26.84% 46.75%
+ CommonCrawl 70.08% 34.00% 77.67% 31.53% 3.92% 79.36% 26.17% 46.11%
GPT-4 96.68% 42.00% 53.40% 60.48% 4.30% 86.90% 62.42% 58.03%
Llama2-13B-chat 45.95% 18.00% 59.22% 35.36% 0.00% 52.40% 27.52% 34.06%

Ablation confirms the essential role of cPT (yielding ≈7 percentage point average increases) and indicates that adding uncurated web data can degrade overall performance. Feedback SFT yields marginal improvements (≈0.2 percentage points on the 7B model).

Evaluation also included human-scored cultural authenticity via closed sets of Taiwan-specific questions (e.g., “22K” economic term), qualitative ratings of idiomaticity and local terminology, and binary user feedback (Lin et al., 2023).

5. Cultural and Linguistic Alignment Strategies

Cultural adaptation relied on several synergistic mechanisms:

  • Corpus selection: All pretraining sources were Taiwan-specific. Spam/misinformation was filtered at the site level.
  • Custom instructional data: The “Taiwan Instruction” dataset (947 dialogue examples) was constructed to encode colloquial Taiwanese Mandarin, local festivals, place names (e.g., NTU’s location as “臺北市”), and idioms (“22K” for graduate income expectations).
  • Translation and adaption: gpt-3.5-turbo translated global instruction data into Traditional Chinese with explicit retention of Taiwanese cultural nuance.
  • Feedback SFT: User interactions on twllm.com were rated by native speakers, ensuring local appropriateness in register, politeness, and reference.
  • Qualitative evaluation: Probes confirmed fidelity in handling Taiwan-specific questions and expressions.

A plausible implication is that this methodology establishes a template for culturally adaptive LLM development in other linguistic settings (Lin et al., 2023).

6. Open Source Distribution and Prospects

All model weights, datasets, and resources were publicly released to facilitate further research and downstream applications. The open-source approach is intended to foster collaboration and continued innovation in the domain of culturally and linguistically customized LLMs. The explicit invitation for community engagement is positioned as a means to improve coverage of underrepresented language varieties and sociolinguistic phenomena. Future improvements may explore integration of RLHF or DPO for finer-grained preference learning (Lin et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Taiwan-LLM-13B-v2.0-Chat.