Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 91 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 29 tok/s
GPT-5 High 26 tok/s Pro
GPT-4o 98 tok/s
GPT OSS 120B 470 tok/s Pro
Kimi K2 216 tok/s Pro
2000 character limit reached

A Mathematical Theory for Learning Semantic Languages by Abstract Learners (2404.07009v3)

Published 10 Apr 2024 in cs.CL, cs.IT, cs.LG, and math.IT

Abstract: Recent advances in LLMs have demonstrated the emergence of capabilities (learned skills) when the number of system parameters and the size of training data surpass certain thresholds. The exact mechanisms behind such phenomena are not fully understood and remain a topic of active research. Inspired by the skill-text bipartite graph model proposed by Arora and Goyal for modeling semantic languages, we develop a mathematical theory to explain the emergence of learned skills, taking the learning (or training) process into account. Our approach models the learning process for skills in the skill-text bipartite graph as an iterative decoding process in Low-Density Parity Check (LDPC) codes and Irregular Repetition Slotted ALOHA (IRSA). Using density evolution analysis, we demonstrate the emergence of learned skills when the ratio of the number of training texts to the number of skills exceeds a certain threshold. Our analysis also yields a scaling law for testing errors relative to this ratio. Upon completion of the training, the association of learned skills can also be acquired to form a skill association graph. We use site percolation analysis to derive the conditions for the existence of a giant component in the skill association graph. Our analysis can also be extended to the setting with a hierarchy of skills, where a fine-tuned model is built upon a foundation model. It is also applicable to the setting with multiple classes of skills and texts. As an important application, we propose a method for semantic compression and discuss its connections to semantic communication.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. S. Arora and A. Goyal, “A theory for emergence of complex skills in language models,” arXiv preprint arXiv:2307.15936, 2023.
  2. OpenAI, “GPT-4 technical report,” https://cdn.openai.com/papers/gpt-4.pdf, 2023.
  3. S. Pichai and D. Hassabis, “Introducing gemini: our largest and most capable AI model,” Google. Retrieved December, 2023.
  4. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  5. T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901, 2020.
  6. J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, “Scaling laws for neural language models,” arXiv preprint arXiv:2001.08361, 2020.
  7. J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. d. L. Casas, L. A. Hendricks, J. Welbl, A. Clark et al., “Training compute-optimal large language models,” arXiv preprint arXiv:2203.15556, 2022.
  8. D. Ganguli, D. Hernandez, L. Lovitt, A. Askell, Y. Bai, A. Chen, T. Conerly, N. Dassarma, D. Drain, N. Elhage et al., “Predictability and surprise in large generative models,” in 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022, pp. 1747–1764.
  9. J. Wei, Y. Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler et al., “Emergent abilities of large language models,” arXiv preprint arXiv:2206.07682, 2022.
  10. J. Wei, Y. Tay, and Q. V. Le, “Inverse scaling can become u-shaped,” arXiv preprint arXiv:2211.02011, 2022.
  11. C.-S. Chang, “A simple explanation for the phase transition in large language models with list decoding,” arXiv preprint arXiv:2303.13112, 2023.
  12. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  13. H. Ramsauer, B. Schäfl, J. Lehner, P. Seidl, M. Widrich, T. Adler, L. Gruber, M. Holzleitner, M. Pavlović, G. K. Sandve et al., “Hopfield networks is all you need,” arXiv preprint arXiv:2008.02217, 2020.
  14. R. Gallager, “Low-density parity-check codes,” IRE Transactions on Information Theory, vol. 8, no. 1, pp. 21–28, 1962.
  15. M. A. Shokrollahi, “New sequences of linear time erasure codes approaching the channel capacity,” in International Symposium on Applied Algebra, Algebraic Algorithms, and Error-Correcting Codes.   Springer, 1999, pp. 65–76.
  16. T. J. Richardson, M. A. Shokrollahi, and R. L. Urbanke, “Design of capacity-approaching irregular low-density parity-check codes,” IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 619–637, 2001.
  17. G. Liva, “Graph-based analysis and optimization of contention resolution diversity slotted ALOHA,” IEEE Transactions on Communications, vol. 59, no. 2, pp. 477–487, 2011.
  18. K. R. Narayanan and H. D. Pfister, “Iterative collision resolution for slotted ALOHA: An optimal uncoordinated transmission policy,” in Turbo Codes and Iterative Information Processing (ISTC), 2012 7th International Symposium on.   IEEE, 2012, pp. 136–139.
  19. E. Paolini, G. Liva, and M. Chiani, “Random access on graphs: A survey and new results,” in Signals, Systems and Computers (ASILOMAR), 2012 Conference Record of the Forty Sixth Asilomar Conference on.   IEEE, 2012, pp. 1743–1747.
  20. D. Jakovetić, D. Bajović, D. Vukobratović, and V. Crnojević, “Cooperative slotted ALOHA for multi-base station systems,” IEEE Transactions on Communications, vol. 63, no. 4, pp. 1443–1456, 2015.
  21. Č. Stefanović and D. Vukobratović, “Coded random access,” in Network Coding and Subspace Designs.   Springer, 2018, pp. 339–359.
  22. Y.-H. Chiang, Y.-J. Lin, C.-S. Chang, and Y.-W. P. Hong, “Parallel decoding of irsa with noise,” in 2022 IEEE 33rd Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC).   IEEE, 2022, pp. 320–326.
  23. M. Luby, M. Mitzenmacher, and M. A. Shokrollahi, “Analysis of random processes via and-or tree evaluation,” in SODA, vol. 98, 1998, pp. 364–373.
  24. M. Luby, M. Mitzenmacher, A. Shokrollah, and D. Spielman, “Analysis of low density codes and improved designs using irregular graphs,” in Proceedings of the thirtieth annual ACM symposium on Theory of computing, 1998, pp. 249–258.
  25. T. J. Richardson and R. L. Urbanke, “The capacity of low-density parity-check codes under message-passing decoding,” IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 599–618, 2001.
  26. C.-H. Yu, L. Huang, C.-S. Chang, and D.-S. Lee, “Poisson receivers: a probabilistic framework for analyzing coded random access,” IEEE/ACM Transactions on Networking, vol. 29, no. 2, pp. 862–875, 2021.
  27. T.-H. Liu, C.-H. Yu, Y.-J. Lin, C.-M. Chang, C.-S. Chang, and D.-S. Lee, “ALOHA receivers: a network calculus approach for analyzing coded multiple access with SIC,” IEEE/ACM Transactions on Networking, vol. 29, no. 2, pp. 862–875, 2021.
  28. C.-M. Chang, Y.-J. Lin, C.-S. Chang, and D.-S. Lee, “On the stability regions of coded poisson receivers with multiple classes of users and receivers,” IEEE/ACM Transactions on Networking, vol. 31, no. 1, pp. 234–247, 2022.
  29. W. Weaver, “Recent contributions to the mathematical theory of communication,” ETC: a review of general semantics, pp. 261–281, 1953.
  30. H. Xie, Z. Qin, G. Y. Li, and B.-H. Juang, “Deep learning enabled semantic communication systems,” IEEE Transactions on Signal Processing, vol. 69, pp. 2663–2675, 2021.
  31. Q. Zhou, R. Li, Z. Zhao, C. Peng, and H. Zhang, “Semantic communication with adaptive universal transformer,” IEEE Wireless Communications Letters, vol. 11, no. 3, pp. 453–457, 2022.
  32. Q. Hu, G. Zhang, Z. Qin, Y. Cai, G. Yu, and G. Y. Li, “Robust semantic communications with masked VQ-VAE enabled codebook,” IEEE Transactions on Wireless Communications, pp. 1–1, 2023.
  33. I. Sutskever. A theory of unsupervised learning. Youtube. [Online]. Available: https://www.youtube.com/watch?v=AKMuA_TVz3A
Citations (1)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.