Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PET: An Annotated Dataset for Process Extraction from Natural Language Text (2203.04860v2)

Published 9 Mar 2022 in cs.CL

Abstract: Process extraction from text is an important task of process discovery, for which various approaches have been developed in recent years. However, in contrast to other information extraction tasks, there is a lack of gold-standard corpora of business process descriptions that are carefully annotated with all the entities and relationships of interest. Due to this, it is currently hard to compare the results obtained by extraction approaches in an objective manner, whereas the lack of annotated texts also prevents the application of data-driven information extraction methodologies, typical of the natural language processing field. Therefore, to bridge this gap, we present the PET dataset, a first corpus of business process descriptions annotated with activities, gateways, actors, and flow information. We present our new resource, including a variety of baselines to benchmark the difficulty and challenges of business process extraction from text. PET can be accessed via huggingface.co/datasets/patriziobellan/PET

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Patrizio Bellan (3 papers)
  2. Han van der Aa (20 papers)
  3. Mauro Dragoni (8 papers)
  4. Chiara Ghidini (28 papers)
  5. Simone Paolo Ponzetto (52 papers)
Citations (19)