Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 89 tok/s

Gemini 2.5 Pro 49 tok/s Pro

GPT-5 Medium 29 tok/s Pro

GPT-5 High 31 tok/s Pro

GPT-4o 98 tok/s Pro

GPT OSS 120B 424 tok/s Pro

Kimi K2 164 tok/s Pro

2000 character limit reached

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling (2312.15166v3)

Published 23 Dec 2023 in cs.CL, cs.AI, and cs.LG

Abstract: We introduce SOLAR 10.7B, a LLM with 10.7 billion parameters, demonstrating superior performance in various NLP tasks. Inspired by recent efforts to efficiently up-scale LLMs, we present a method for scaling LLMs called depth up-scaling (DUS), which encompasses depthwise scaling and continued pretraining. In contrast to other LLM up-scaling methods that use mixture-of-experts, DUS does not require complex changes to train and inference efficiently. We show experimentally that DUS is simple yet effective in scaling up high-performance LLMs from small ones. Building on the DUS model, we additionally present SOLAR 10.7B-Instruct, a variant fine-tuned for instruction-following capabilities, surpassing Mixtral-8x7B-Instruct. SOLAR 10.7B is publicly available under the Apache 2.0 license, promoting broad access and application in the LLM field.

References (60)

Citations (97)

View on Semantic Scholar

Collections

Summary

The paper presents a novel depth up-scaling method that structurally expands transformer architectures and recovers performance with continued pretraining.
It demonstrates record-breaking performance by outperforming models like Llama 2 and Mistral 7B on benchmarks including ARC, HellaSwag, and GSM8K.
The open-source release encourages broader NLP research by enabling efficient scaling with reduced computational complexity.

Overview of SOLAR 10.7B: Scaling LLMs

The paper presents SOLAR 10.7B, a LLM incorporating a new methodology termed depth up-scaling (DUS). This approach achieves significant scalability and performance efficiency without the complexities seen in previous techniques such as the mixture-of-experts (MoE) framework.

Depth Up-Scaling Methodology

Depth up-scaling revolves around enhancing model depth rather than merely expanding model size horizontally. The method commences with a base model using a transformer architecture, followed by two core steps:

Depthwise Scaling: This involves duplicating the existing model's architecture to form an expanded model. A portion of layers is removed from both ends of the original and duplicated architectures, then concatenated to yield the final, scaled model. In practice, the SOLAR model follows specifics where $n=32$ , $s=48$ , and $m=8$ , indicating a transition from 32 to 48 layers with 8 layers removed from each duplicate end.
Continued Pretraining: Despite structural expansion, initial scaling degrades performance, necessitating further pretraining. This step not only recovers pre-scaling performance but capitalizes on the scalable architecture to exceed initial benchmarks.

Results and Performance

SOLAR 10.7B and its fine-tuned variant SOLAR 10.7B-Instruct both demonstrate superior performance over competitive LLMs such as Llama 2 and Mistral 7B across multiple natural language processing benchmarks. Notable achievements include outperforming the Mixtral-8x7B-Instruct in instruction-following tasks, underscoring the efficiency and capacity enhancements that DUS contributes, particularly in computational efficiency and task adaptability.

Evaluation and Implications

The model's efficacy is substantiated through extensive evaluations across benchmarks like ARC, HellaSwag, MMLU, TruthfulQA, Winogrande, and GSM8K. The performance improvements reaffirm the underlying potential of DUS to not only effectively utilize pretrained weights but also optimize model architecture without resorting to additional training modules or software enhancements.

The release of SOLAR 10.7B under an open-source license presents practical implications for broader applicability in NLP research and development. It encourages further exploration into transformer scalability, accommodating diverse computational resources without imposing prohibitive hardware or software adjustments.

Future Directions

The work paves pathways for more explorative research into deep scaling methods that might further enhance performance efficiently. Speculatively, future advancements may integrate additional scaling nuances without complicating existing frameworks, potentially merging with or transcending current methods like MoE, particularly as hardware capabilities evolve.

Conclusion

SOLAR 10.7B demonstrates a promising leap in scaling LLMs through the implementation of depth up-scaling. The model's capacity to efficiently scale and improve performance suggests significant theoretical and practical potentials in the optimization of LLM architectures, fostering further innovations in AI and natural language processing fields. The paper's contribution reaffirms the essential advocacy for computational efficiency and the scalability of LLMs in contemporary AI research.

PDF Markdown

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

Authors (18)

First 10 authors:

Tweets

https://twitter.com/younesbelkada/status/1771173613137445141

https://twitter.com/jeethu/status/1765339132698546676

https://twitter.com/AvrahamSheinin/status/1787467370895687985

https://twitter.com/YebHavinga/status/1790368807996797293

https://twitter.com/osanseviero/status/1746225578850533721

https://twitter.com/upstageai/status/1773171734054625545

YouTube

Show All Videos

[R] SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling (33 points, 10 comments)