Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Combining Deep Learning and Reasoning for Address Detection in Unstructured Text Documents (2202.03103v1)

Published 7 Feb 2022 in cs.AI, cs.IR, and cs.LG

Abstract: Extracting information from unstructured text documents is a demanding task, since these documents can have a broad variety of different layouts and a non-trivial reading order, like it is the case for multi-column documents or nested tables. Additionally, many business documents are received in paper form, meaning that the textual contents need to be digitized before further analysis. Nonetheless, automatic detection and capturing of crucial document information like the sender address would boost many companies' processing efficiency. In this work we propose a hybrid approach that combines deep learning with reasoning for finding and extracting addresses from unstructured text documents. We use a visual deep learning model to detect the boundaries of possible address regions on the scanned document images and validate these results by analyzing the containing text using domain knowledge represented as a rule based system.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Matthias Engelbach (3 papers)
  2. Dennis Klau (7 papers)
  3. Jens Drawehn (4 papers)
  4. Maximilien Kintz (7 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.