YOCO++: Enhancing YOCO with KV Residual Connections for Efficient LLM Inference

Published 15 Apr 2026 in cs.CL | (2604.13556v1)

Abstract: Cross-layer key-value (KV) compression has been found to be effective in efficient inference of LLMs. Although they reduce the memory consumption of the KV cache, such methods usually introduce non-negligible performance degradation. In this work, we aim to enhance the performance of YOCO, a cross-layer KV compression method that shares the KVs of the middle layer with the top-half layers. We propose YOCO++, an enhanced YOCO that incorporates a weighted residual connection between the KVs of each bottom-half layer and the bottom layer. Compared to YOCO, YOCO++ increases model capacity while maintaining the same training and inference efficiency. Our experiments show that YOCO++ achieves state-of-the-art performance among the cross-layer KV compression methods at a 50% KV cache compression rate, outperforming the standard Transformer.