Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 86 tok/s

Gemini 2.5 Pro 49 tok/s Pro

GPT-5 Medium 17 tok/s Pro

GPT-5 High 14 tok/s Pro

GPT-4o 88 tok/s Pro

GPT OSS 120B 471 tok/s Pro

Kimi K2 207 tok/s Pro

2000 character limit reached

Tuning Language Models by Mixture-of-Depths Ensemble (2410.13077v1)

Published 16 Oct 2024 in cs.CL and cs.AI

Abstract: Transformer-based LLMs traditionally rely on final-layer loss for training and final-layer representations for predictions, potentially overlooking the predictive power embedded in intermediate layers. Surprisingly, we find that focusing training efforts on these intermediate layers can yield training losses comparable to those of final layers, with complementary test-time performance. We introduce a novel tuning framework, Mixture-of-Depths (MoD), which trains late layers as ensembles contributing to the final logits through learned routing weights. With the auxiliary distillation loss and additional normalization modules, we ensure that the outputs of the late layers adapt to LLMing. Our MoD framework, which can be integrated with any existing tuning method, shows consistent improvement on various LLMling tasks. Furthermore, by replacing traditional trainable modules with MoD, our approach achieves similar performance with significantly fewer trainable parameters, demonstrating the potential of leveraging predictive power from intermediate representations during training.

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Tuning Language Models by Mixture-of-Depths Ensemble (2410.13077v1)

Collections

Summary

Paper Prompts

Follow-up Questions

Authors (2)

Don't miss out on important new AI/ML research

Tuning Language Models by Mixture-of-Depths Ensemble (2410.13077v1)

Collections

Summary

Paper Prompts

Follow-up Questions

Related Papers

Authors (2)

Don't miss out on important new AI/ML research