Differential impact of appending EOS tokens in encoder-only versus decoder-only LLMs

Determine how appending an End-of-Sequence (EOS) token to the input sequence affects encoder-only models such as DeBERTa versus decoder-only models such as Mistral and Llama3 when these models are fine-tuned for predicting forward stock returns from financial newsflow, clarifying the different impacts across model families.

Background

To obtain sequence-level representations from token-level embeddings, the paper considers a bottleneck approach that appends an EOS token and an alternative aggregation approach. The authors observe that appending EOS appears more helpful for encoder-only models.

For comparability, EOS is appended for both encoder-only and decoder-only LLMs, but the authors leave a systematic paper of the differing impacts across architectures to future work.

References

In experiments, we observed that appending the EOS token is more helpful for encoder-only LLMs. For a comparison on the same ground, we append EOS tokens for both encoder-only and decoder-only LLMs and leave the study on the different impacts of appending tokens to future work.

— Fine-Tuning Large Language Models for Stock Return Prediction Using Newsflow (2407.18103 - Guo et al., 25 Jul 2024) in Section 3.2 (Methodology), Bottleneck Representations vs. Aggregated Representations

Differential impact of appending EOS tokens in encoder-only versus decoder-only LLMs

Background

References

Related Problems