Vygotsky Distance: Measure for Benchmark Task Similarity (2402.14890v2)
Abstract: Evaluation plays a significant role in modern natural language processing. Most modern NLP benchmarks consist of arbitrary sets of tasks that neither guarantee any generalization potential for the model once applied outside the test set nor try to minimize the resource consumption needed for model evaluation. This paper presents a theoretical instrument and a practical algorithm to calculate similarity between benchmark tasks, we call this similarity measure "Vygotsky distance". The core idea of this similarity measure is that it is based on relative performance of the "students" on a given task, rather that on the properties of the task itself. If two tasks are close to each other in terms of Vygotsky distance the models tend to have similar relative performance on them. Thus knowing Vygotsky distance between tasks one can significantly reduce the number of evaluation tasks while maintaining a high validation quality. Experiments on various benchmarks, including GLUE, SuperGLUE, CLUE, and RussianSuperGLUE, demonstrate that a vast majority of NLP benchmarks could be at least 40% smaller in terms of the tasks included. Most importantly, Vygotsky distance could also be used for the validation of new tasks thus increasing the generalization potential of the future NLP models.
- Akiko Aizawa. 2003. An information-theoretic perspective of tf–idf measures. Information Processing & Management, 39(1):45–65.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403.
- Teaching by examples: Implications for the process of category acquisition. The Quarterly Journal of Experimental Psychology Section A, 50(3):586–606.
- A unifying framework for complexity measures of finite systems. In Proceedings of ECCS, volume 6. Citeseer.
- Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015.
- Metro: Efficient denoising pretraining of large scale autoencoding language models with model generated signals. arXiv preprint arXiv:2204.06644.
- Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48.
- Christopher M Bishop et al. 1995. Neural networks for pattern recognition. Oxford university press.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
- Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning, 20(3):273–297.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- Jeffrey L Elman. 1993. Learning and development in neural networks: The importance of starting small. Cognition, 48(1):71–99.
- Tom Kocmi and Ondřej Bojar. 2017. Curriculum learning and minibatch bucketing in neural machine translation. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pages 379–386, Varna, Bulgaria. INCOMA Ltd.
- Hpc resources of the higher school of economics. In Journal of Physics: Conference Series, volume 1740, page 012050. IOP Publishing.
- M Zakaria Kurdi. 2020. Text complexity classification based on linguistic information: Application to intelligent tutoring of esl. Journal of Data Mining & Digital Humanities, 2020.
- Henry W Lin and Max Tegmark. 2017. Critical behavior in physics and probabilistic formal languages. Entropy, 19(7):299.
- Sanmit Narvekar. 2017. Curriculum learning in reinforcement learning. In IJCAI, pages 5195–5196.
- Curriculum learning for reinforcement learning domains: A framework and survey. Journal of Machine Learning Research, 21(181):1–50.
- Competence-based curriculum learning for neural machine translation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1162–1172, Minneapolis, Minnesota. Association for Computational Linguistics.
- Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21:1–67.
- Vygotsky Lev Semyonovich. 1978. Mind in society. the development of higher psychological processes.
- Russiansuperglue: A russian language understanding evaluation benchmark. arXiv preprint arXiv:2010.15925.
- Petru Soviany. 2020. Curriculum learning with diversity for supervised computer vision tasks. In MRC@ECAI.
- Noisy text data: Achilles’ heel of bert. In Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pages 16–21.
- Alpaca: A strong, replicable instruction-following model. Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/2023/03/13/alpaca. html, 3(6):7.
- A measure for brain complexity: relating functional segregation and integration in the nervous system. Proceedings of the National Academy of Sciences, 91(11):5033–5037.
- Frans van der Sluis and Egon L van den Broek. 2010. Using complexity measures in information retrieval. In Proceedings of the third symposium on information interaction in context, pages 383–388.
- Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
- Superglue: A stickier benchmark for general-purpose language understanding systems. Advances in neural information processing systems, 32.
- Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461.
- A fully progressive approach to single-image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 864–873.
- Nezha: Neural contextualized representation for chinese language understanding. arXiv preprint arXiv:1909.00204.
- Christopher K Williams and Carl Edward Rasmussen. 2006. Gaussian processes for machine learning, volume 2. MIT press Cambridge, MA.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
- When do curricula work? In International Conference on Learning Representations.
- Curriculum learning for natural language understanding. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6095–6104.
- Wizardlm: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244.
- Clue: A chinese language understanding evaluation benchmark. arXiv preprint arXiv:2004.05986.
- Character-level convolutional networks for text classification. In Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 1, pages 649–657.
- An empirical exploration of curriculum learning for neural machine translation. arXiv preprint arXiv:1811.00739.
- Designing effective sparse expert models. arXiv preprint arXiv:2202.08906.
Collections
Sign up for free to add this paper to one or more collections.