Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Large-scale investigation of weakly-supervised deep learning for the fine-grained semantic indexing of biomedical literature (2301.09350v2)

Published 23 Jan 2023 in cs.CL, cs.DL, and cs.LG

Abstract: Objective: Semantic indexing of biomedical literature is usually done at the level of MeSH descriptors with several related but distinct biomedical concepts often grouped together and treated as a single topic. This study proposes a new method for the automated refinement of subject annotations at the level of MeSH concepts. Methods: Lacking labelled data, we rely on weak supervision based on concept occurrence in the abstract of an article, which is also enhanced by dictionary-based heuristics. In addition, we investigate deep learning approaches, making design choices to tackle the particular challenges of this task. The new method is evaluated on a large-scale retrospective scenario, based on concepts that have been promoted to descriptors. Results: In our experiments concept occurrence was the strongest heuristic achieving a macro-F1 score of about 0.63 across several labels. The proposed method improved it further by more than 4pp. Conclusion: The results suggest that concept occurrence is a strong heuristic for refining the coarse-grained labels at the level of MeSH concepts and the proposed method improves it further.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Anastasios Nentidis (12 papers)
  2. Thomas Chatzopoulos (2 papers)
  3. Anastasia Krithara (13 papers)
  4. Grigorios Tsoumakas (50 papers)
  5. Georgios Paliouras (43 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.