Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Domain-independent Extraction of Scientific Concepts from Research Articles (2001.03067v1)

Published 9 Jan 2020 in cs.IR and cs.DL

Abstract: We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the phrasal level in a joint effort with domain experts. The resulting dataset is used in a set of benchmark experiments to (a) provide baseline performance for this task, (b) examine the transferability of concepts between domains. Second, we present two deep learning systems as baselines. In particular, we propose active learning to deal with different domains in our task. The experimental results show that (1) a substantial agreement is achievable by non-experts after consultation with domain experts, (2) the baseline system achieves a fairly high F1 score, (3) active learning enables us to nearly halve the amount of required training data.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Arthur Brack (7 papers)
  2. Jennifer D'Souza (49 papers)
  3. Anett Hoppe (21 papers)
  4. Sören Auer (107 papers)
  5. Ralph Ewerth (61 papers)
Citations (43)

Summary

We haven't generated a summary for this paper yet.