Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
98 tokens/sec
Gemini 2.5 Pro Premium
51 tokens/sec
GPT-5 Medium
34 tokens/sec
GPT-5 High Premium
28 tokens/sec
GPT-4o
115 tokens/sec
DeepSeek R1 via Azure Premium
91 tokens/sec
GPT OSS 120B via Groq Premium
453 tokens/sec
Kimi K2 via Groq Premium
140 tokens/sec
2000 character limit reached

Rule by Rule: Learning with Confidence through Vocabulary Expansion (2411.00049v1)

Published 30 Oct 2024 in cs.CL, cs.AI, and cs.LG

Abstract: In this paper, we present an innovative iterative approach to rule learning specifically designed for (but not limited to) text-based data. Our method focuses on progressively expanding the vocabulary utilized in each iteration resulting in a significant reduction of memory consumption. Moreover, we introduce a Value of Confidence as an indicator of the reliability of the generated rules. By leveraging the Value of Confidence, our approach ensures that only the most robust and trustworthy rules are retained, thereby improving the overall quality of the rule learning process. We demonstrate the effectiveness of our method through extensive experiments on various textual as well as non-textual datasets including a use case of significant interest to insurance industries, showcasing its potential for real-world applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Sajid Ali et al. Explainable artificial intelligence (xai): What we know and what is left to attain trustworthy artificial intelligence. Information Fusion, 99, 2023. ISSN 1566-2535. doi: https://doi.org/10.1016/j.inffus.2023.101805.
  2. Semantic text classification: A survey of past and recent advances. Information Processing and Management, 54(6):1129–1153, 2018. ISSN 0306-4573. doi: https://doi.org/10.1016/j.ipm.2018.08.001. URL https://www.sciencedirect.com/science/article/pii/S0306457317305757.
  3. Plamen P. Angelov et al. Explainable artificial intelligence: an analytical review. WIREs Data Mining and Knowledge Discovery, 11(5), 2021. doi: https://doi.org/10.1002/widm.1424.
  4. How large a vocabulary does text classification need? a variational approach to vocabulary selection. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3487–3497, Minneapolis, Minnesota, 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1352. URL https://aclanthology.org/N19-1352.
  5. William W. Cohen. Fast effective rule induction. In Machine Learning Proceedings 1995, pages 115–123, San Francisco (CA), 1995. doi: 10.1016/b978-1-55860-377-6.50023-2.
  6. Inductive logic programming at 30: A new introduction. J. Artif. Int. Res., 74, 2022. ISSN 1076-9757. doi: 10.1613/jair.1.13507.
  7. An analysis of the relative difficulty of Reuters-21578 subsets. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04), Lisbon, Portugal, 2004. European Language Resources Association (ELRA). URL http://www.lrec-conf.org/proceedings/lrec2004/pdf/21.pdf.
  8. UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ml.
  9. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, page 226–231, 1996.
  10. George Forman. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res., 3:1289–1305, 2003. ISSN 1532-4435.
  11. Foundations of Rule Learning. Cognitive Technologies. Springer Berlin, Heidelberg, 2012. doi: 10.1007/978-3-540-75197-7.
  12. A survey on text classification algorithms: From text to predictions. Information, 13(2), 2022. ISSN 2078-2489. doi: 10.3390/info13020083. URL https://www.mdpi.com/2078-2489/13/2/83.
  13. Jerzy W. Grzymala-Busse. A new version of the rule induction system lers. Fundam. Inf., 31(1):27–39, 1997. ISSN 0169-2968.
  14. David Gunning et al. Xai—explainable artificial intelligence. Science Robotics, 4(37), 2019. doi: 10.1126/scirobotics.aay7120.
  15. Large-scale hierarchical text classification without labelled data. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, page 685–694, New York, NY, USA, 2011. Association for Computing Machinery. ISBN 9781450304931. doi: 10.1145/1935826.1935919.
  16. The influence of preprocessing on text classification using a bag-of-words representation. PLoS ONE, 15, 2020. URL https://api.semanticscholar.org/CorpusID:218479987.
  17. Neuro-symbolic artificial intelligence: The state of the art. In Neuro-Symbolic Artificial Intelligence, 2021. URL https://api.semanticscholar.org/CorpusID:245698629.
  18. Tim Hulsen. Explainable artificial intelligence (xai): Concepts and challenges in healthcare. AI, 4(3):652–666, 2023. doi: 10.3390/ai4030034.
  19. Attention is not Explanation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3543–3556, Minneapolis, Minnesota, 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1357. URL https://aclanthology.org/N19-1357.
  20. The class imbalance problem: A systematic study. Intell. Data Anal., 6:429–449, 11 2002. doi: 10.3233/IDA-2002-6504.
  21. A decision-tree-based symbolic rule induction system for text categorization. IBM Systems Journal, 41(3):428–437, 2002. doi: 10.1147/sj.413.0428.
  22. Text classification algorithms: A survey. Information, 10(4), 2019. ISSN 2078-2489. doi: 10.3390/info10040150. URL https://www.mdpi.com/2078-2489/10/4/150.
  23. B. Krawczyk. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 5:221 – 232, 2016. URL https://api.semanticscholar.org/CorpusID:207475120.
  24. Raymond Lee. Artificial Intelligence in Daily Life. Springer, 01 2020. ISBN 978-981-15-7694-2. doi: 10.1007/978-981-15-7695-9.
  25. David Lewis. Reuters-21578 Text Categorization Collection. UCI Machine Learning Repository, 1997. DOI: https://doi.org/10.24432/C52G6M.
  26. A survey on text classification: From traditional to deep learning. ACM Trans. Intell. Syst. Technol., 13(2), 2022. ISSN 2157-6904. doi: 10.1145/3495162.
  27. Proofread: Fixes all errors with one tap. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 286–293, Bangkok, Thailand, 2024. Association for Computational Linguistics. doi: 10.18653/v1/2024.acl-demos.27.
  28. Roberta: A robustly optimized bert pretraining approach. ArXiv, abs/1907.11692, 2019. URL https://api.semanticscholar.org/CorpusID:198953378.
  29. Yang Lu. Artificial intelligence: a survey on evolution, models, applications and future trends. Journal of Management Analytics, 6(1):1–29, 2019. doi: 10.1080/23270012.2019.1570365.
  30. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA, 2011. Association for Computational Linguistics. URL https://aclanthology.org/P11-1015.
  31. Umap: Uniform manifold approximation and projection for dimension reduction, 2020. URL https://arxiv.org/abs/1802.03426.
  32. From outputs to insights: a survey of rationalization approaches for explainable text classification. Front. Artif. Intell. 7, 2024. doi: 10.3389/frai.2024.1363531.
  33. Deep learning–based text classification: A comprehensive review. ACM Comput. Surv., 54(3), 2021. ISSN 0360-0300. doi: 10.1145/3439726.
  34. Incremental and iterative learning of answer set programs from mutually distinct examples. Theory Pract. Log. Program., 18(3-4):623–637, 2018. doi: 10.1017/S1471068418000248.
  35. A voting approach for explainable classification with rule learning. In Artificial Intelligence Applications and Innovations, pages 155–169. Springer Nature Switzerland, 2024. ISBN 978-3-031-63223-5.
  36. Rule learning by modularity. Machine Learning, pages 1–30, 07 2024. doi: 10.1007/s10994-024-06556-5.
  37. Game-theoretic vocabulary selection via the shapley value and banzhaf index. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2789–2798. Association for Computational Linguistics, 2021. doi: 10.18653/v1/2021.naacl-main.223. URL https://aclanthology.org/2021.naacl-main.223.
  38. Riddle: Rule induction with deep learning. Proceedings of the Northern Lights Deep Learning Workshop, 4, 2023. doi: 10.7557/18.6801.
  39. A genetic algorithm for text classification rule induction. In Machine Learning and Knowledge Discovery in Databases, pages 188–203. Springer Berlin Heidelberg, 2008. ISBN 978-3-540-87481-2.
  40. Automatic rule induction for efficient semi-supervised learning. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 28–44, Abu Dhabi, United Arab Emirates, 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.findings-emnlp.3. URL https://aclanthology.org/2022.findings-emnlp.3.
  41. J. Ross Quinlan. Learning logical definitions from relations. Mach. Learn., 5:239–266, 1990. doi: 10.1007/BF00117105.
  42. Decision Trees, pages 165–192. Springer US, Boston, MA, 2005. ISBN 978-0-387-25465-4. doi: 10.1007/0-387-25465-X˙9.
  43. Vimalraj S Spelmen and R Porkodi. A review on handling imbalanced data. In 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), pages 1–11, 2018. doi: 10.1109/ICCTCT.2018.8551020.
  44. Attention is not not explanation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 11–20, Hong Kong, China, 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1002. URL https://aclanthology.org/D19-1002.
  45. Study on artificial intelligence: The state of the art and future prospects. Journal of Industrial Information Integration, 23, 2021. ISSN 2452-414X. doi: https://doi.org/10.1016/j.jii.2021.100224.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube