Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Recurrent Alignment with Hard Attention for Hierarchical Text Rating (2402.08874v2)

Published 14 Feb 2024 in cs.CL

Abstract: While LLMs excel at understanding and generating plain text, they are not tailored to handle hierarchical text structures or directly predict task-specific properties such as text rating. In fact, selectively and repeatedly grasping the hierarchical structure of large-scale text is pivotal for deciphering its essence. To this end, we propose a novel framework for hierarchical text rating utilizing LLMs, which incorporates Recurrent Alignment with Hard Attention (RAHA). Particularly, hard attention mechanism prompts a frozen LLM to selectively focus on pertinent leaf texts associated with the root text and generate symbolic representations of their relationships. Inspired by the gradual stabilization of the Markov Chain, recurrent alignment strategy involves feeding predicted ratings iteratively back into the prompts of another trainable LLM, aligning it to progressively approximate the desired target. Experimental results demonstrate that RAHA outperforms existing state-of-the-art methods on three hierarchical text rating datasets. Theoretical and empirical analysis confirms RAHA's ability to gradually converge towards the underlying target through multiple inferences. Additional experiments on plain text rating datasets verify the effectiveness of this Markov-like alignment. Our data and code can be available in https://github.com/ECNU-Text-Computing/Markov-LLM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  2. Personalized distillation: Empowering open-sourced llms with adaptive learning for code generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6737–6749, 2023.
  3. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, 2019.
  4. Task-level thinking steps help large language models for challenging classification task. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2454–2470, 2023.
  5. Machine reasoning: Technology, dilemma and future. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts, pages 1–6, 2020.
  6. A dynamic network measure of technological change. Manag. Sci., 63:791–817, 2017.
  7. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR, 2019.
  8. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022.
  9. Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403, 2022.
  10. Large language models can self-improve. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 1051–1068, 2023.
  11. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
  12. Deductive verification of chain-of-thought reasoning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  13. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems, 35:1950–1965, 2022.
  14. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 2023.
  15. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023.
  16. Decoupled weight decay regularization. In International Conference on Learning Representations, 2019.
  17. Rethinking the role of demonstrations: What makes in-context learning work? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11048–11064, 2022.
  18. OpenAI. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  19. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  20. Mad-x: An adapter-based framework for multi-task cross-lingual transfer. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7654–7673, 2020.
  21. The art of socratic questioning: Recursive thinking with large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 4177–4199, 2023.
  22. The nature of verbal comprehension. Poetics, 11(2):155–187, 1982.
  23. Decoding the silent majority: Inducing belief augmented social graph with large language model for response forecasting. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 43–57, 2023.
  24. Arnetminer: extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 990–998, 2008.
  25. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  26. From lsat: The progress and challenges of complex reasoning. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30:2201–2216, 2022.
  27. Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2609–2634, 2023.
  28. Self-consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations, 2023.
  29. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  30. Unveiling the implicit toxicity in large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 1322–1338, 2023.
  31. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100, 2022.
  32. Large teams develop and small teams disrupt science and technology. Nature, 566:378–382, 2019.
  33. Large language models as optimizers. arXiv preprint arXiv:2309.03409, 2023.
  34. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023.
  35. Nature language reasoning, a survey. arXiv preprint arXiv:2303.14725, 2023.
  36. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1–9, 2022.
  37. Star: Bootstrapping reasoning with reasoning. Advances in Neural Information Processing Systems, 35:15476–15488, 2022.
  38. GLM-130b: An open bilingual pre-trained model. In The Eleventh International Conference on Learning Representations, 2023.
  39. Utilizing citation network structure to predict paper citation counts: A deep learning approach. Journal of Informetrics, 16(1):101235, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Chenxi Lin (8 papers)
  2. Jiayu Ren (1 paper)
  3. Guoxiu He (15 papers)
  4. Zhuoren Jiang (24 papers)
  5. Haiyan Yu (4 papers)
  6. Xiaomin Zhu (13 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets