Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 30 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 18 tok/s Pro
GPT-5 High 12 tok/s Pro
GPT-4o 91 tok/s Pro
Kimi K2 184 tok/s Pro
GPT OSS 120B 462 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

BanglaAutoKG: Automatic Bangla Knowledge Graph Construction with Semantic Neural Graph Filtering (2404.03528v3)

Published 4 Apr 2024 in cs.CL, cs.IR, cs.LG, cs.NE, and cs.SI

Abstract: Knowledge Graphs (KGs) have proven essential in information processing and reasoning applications because they link related entities and give context-rich information, supporting efficient information retrieval and knowledge discovery; presenting information flow in a very effective manner. Despite being widely used globally, Bangla is relatively underrepresented in KGs due to a lack of comprehensive datasets, encoders, NER (named entity recognition) models, POS (part-of-speech) taggers, and lemmatizers, hindering efficient information processing and reasoning applications in the language. Addressing the KG scarcity in Bengali, we propose BanglaAutoKG, a pioneering framework that is able to automatically construct Bengali KGs from any Bangla text. We utilize multilingual LLMs to understand various languages and correlate entities and relations universally. By employing a translation dictionary to identify English equivalents and extracting word features from pre-trained BERT models, we construct the foundational KG. To reduce noise and align word embeddings with our goal, we employ graph-based polynomial filters. Lastly, we implement a GNN-based semantic filter, which elevates contextual understanding and trims unnecessary edges, culminating in the formation of the definitive KG. Empirical findings and case studies demonstrate the universal effectiveness of our model, capable of autonomously constructing semantically enriched KGs from any text.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Constraint-based question answering with knowledge graph. In Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, pages 2503–2514.
  2. Banglabert: Language model pretraining and benchmarks for low-resource language understanding evaluation in bangla. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1318–1327.
  3. Banglanlg and banglat5: Benchmarks and resources for evaluating low-resource natural language generation in bangla. In Findings of the Association for Computational Linguistics: EACL 2023, pages 714–723.
  4. Dipesh Chakrabarty. 1991. Europe reconsidered: Perceptions of the west in nineteenth century bengal. by tapan raychaudhuri. delhi: Oxford university press, 1988. xviii, 369 pp. rs. 170.00. The Journal of Asian Studies, 50(3):723–724.
  5. The Bengali Literary Collection of Rabindranath Tagore: Search and Study of Lexical Richness, page 302–314. IGI Global.
  6. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems.
  7. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics.
  8. Thibaut d’Hubert. 2018. Literary history of bengal, 8th-19th century ad. Oxford Research Encyclopedia of Asian History.
  9. Banglarqa: A benchmark dataset for under-resourced bangla language reading comprehension-based question answering with diverse question-answer types. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2518–2532.
  10. Octavian-Eugen Ganea and Thomas Hofmann. 2017. Deep joint entity disambiguation with local neural attention. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2619–2629.
  11. Semantically smooth knowledge graph embedding. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 84–94.
  12. Sentnob: A dataset for analysing sentiment on noisy bangla texts. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3265–3271.
  13. Exploring in-context learning capabilities of foundation models for generating knowledge graphs from text. In TEXT2KG/BiKE@ESWC.
  14. Dongkwan Kim and Alice Oh. 2021. How to find your friendly neighborhood: Graph attention design with self-supervision. In International Conference on Learning Representations.
  15. Phong Le and Ivan Titov. 2018. Improving entity linking by modeling latent relations between mentions. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1595–1604.
  16. Auto-encoding knowledge graph for unsupervised medical report generation. Advances in Neural Information Processing Systems, 34:16266–16279.
  17. Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 4602–4609.
  18. Reinforcement learning over sentiment-augmented knowledge graphs towards accurate and explainable recommendation. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pages 784–793.
  19. Improving knowledge base construction from robust infobox extraction. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers), pages 138–148.
  20. Label noise reduction in entity typing by heterogeneous partial-label embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1825–1834.
  21. Tree-kgqa: an unsupervised approach for question answering over knowledge graphs. IEEE Access, 10:50467–50478.
  22. Salim Sazzed. 2022. An annotated dataset and automatic approaches for discourse mode identification in low-resource bengali language. In Proceedings of the Workshop on Multilingual Information Access (MIA), pages 9–15.
  23. Nettaxo: Automated topic taxonomy construction from text-rich network. In Proceedings of the Web Conference 2020, pages 1908–1919.
  24. A comparison of semantic similarity methods for maximum human interpretability. In 2019 Artificial Intelligence for Transforming Business and Society (AITB), volume 1, pages 1–4.
  25. Unsupervised embedding enhancements of knowledge graphs using textual associations. In IJCAI, pages 5218–5225.
  26. Graph attention networks. In International Conference on Learning Representations.
  27. Ripplenet: Propagating user preferences on the knowledge graph for recommender systems. In Proceedings of the 27th ACM international conference on information and knowledge management, pages 417–426.
  28. Explicit semantic ranking for academic search via knowledge graph embedding. In Proceedings of the 26th international conference on world wide web, pages 1271–1279.
  29. Peng Xu and Denilson Barbosa. 2018. Neural fine-grained entity type classification with hierarchy-aware loss. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 16–25.
  30. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of the 2015 conference on empirical methods in natural language processing, pages 1753–1762.
  31. Towards explainable automatic knowledge graph construction with human-in-the-loop. Frontiers in artificial intelligence and applications.
  32. Correlation coefficients and semantic textual similarity. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 951–962.
  33. Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th annual meeting of the association for computational linguistics (volume 2: Short papers), pages 207–212.
Citations (5)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube