Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Attention Scheme Inspired Softmax Regression (2304.10411v2)

Published 20 Apr 2023 in cs.LG

Abstract: LLMs have made transformed changes for human society. One of the key computation in LLMs is the softmax unit. This operation is important in LLMs because it allows the model to generate a distribution over possible next words or phrases, given a sequence of input words. This distribution is then used to select the most likely next word or phrase, based on the probabilities assigned by the model. The softmax unit plays a crucial role in training LLMs, as it allows the model to learn from the data by adjusting the weights and biases of the neural network. In the area of convex optimization such as using central path method to solve linear programming. The softmax function has been used a crucial tool for controlling the progress and stability of potential function [Cohen, Lee and Song STOC 2019, Brand SODA 2020]. In this work, inspired the softmax unit, we define a softmax regression problem. Formally speaking, given a matrix $A \in \mathbb{R}{n \times d}$ and a vector $b \in \mathbb{R}n$, the goal is to use greedy type algorithm to solve \begin{align*} \min_{x} | \langle \exp(Ax), {\bf 1}_n \rangle{-1} \exp(Ax) - b |_22. \end{align*} In certain sense, our provable convergence result provides theoretical support for why we can use greedy algorithm to train softmax function in practice.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yichuan Deng (21 papers)
  2. Zhihang Li (17 papers)
  3. Zhao Song (253 papers)
Citations (39)