Lower bound on CAT compressor depth
Determine the minimal (lower bound) transformer depth required for the CAT compressor to effectively produce chunk representations that preserve downstream performance, including whether a single-layer compressor suffices for accurate compression of token chunks.
References
However, what is the limit, and can one go to even a 1 layer of compressor is an interesting question to ask. There might be some lower bound on the compressor depth to start compressing chunks of tokens, but we leave this to future work.
— Attention and Compression is all you need for Controllably Efficient Language Models
(2511.05313 - Prakash et al., 7 Nov 2025) in Appendix, Section "Ablation on depth of the compressor"