Applicability of ConsMax Without Fine-Tuning
Determine whether the ConsMax softmax method proposed by Liu et al. (2024), which uses INT8 inputs/outputs with internal FP16 computations, can be applied to pretrained Transformer models without any fine-tuning of the model parameters.
References
Although Liu et al. achieves convergence to the same perplexity as the original GPT-2 during training, it remains unclear whether this approach can be applied without fine-tuning.
— VEXP: A Low-Cost RISC-V ISA Extension for Accelerated Softmax Computation in Transformers
(2504.11227 - Wang et al., 15 Apr 2025) in Section 7: Comparison with the State-of-the-Art