Improving Address Matching using Siamese Transformer Networks
Abstract: Matching addresses is a critical task for companies and post offices involved in the processing and delivery of packages. The ramifications of incorrectly delivering a package to the wrong recipient are numerous, ranging from harm to the company's reputation to economic and environmental costs. This research introduces a deep learning-based model designed to increase the efficiency of address matching for Portuguese addresses. The model comprises two parts: (i) a bi-encoder, which is fine-tuned to create meaningful embeddings of Portuguese postal addresses, utilized to retrieve the top 10 likely matches of the un-normalized target address from a normalized database, and (ii) a cross-encoder, which is fine-tuned to accurately rerank the 10 addresses obtained by the bi-encoder. The model has been tested on a real-case scenario of Portuguese addresses and exhibits a high degree of accuracy, exceeding 95% at the door level. When utilized with GPU computations, the inference speed is about 4.5 times quicker than other traditional approaches such as BM25. An implementation of this system in a real-world scenario would substantially increase the effectiveness of the distribution process. Such an implementation is currently under investigation.
- “Deep Contrast Learning Approach for Address Semantic Matching” In Applied Sciences 11.16, 2021, pp. 7608
- “Machine learning innovations in address matching: A practical comparison of Word2vec and CRFs” In Transactions in GIS 23.2, 2019, pp. 334–348
- “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) Association for Computational Linguistics, 2019, pp. 4171–4186
- SeatGeek - FuzzyWuzzy, 2011 URL: https://github.com/seatgeek/fuzzywuzzy
- “Re2G: Retrieve, Rerank, Generate” In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Seattle, United States: Association for Computational Linguistics, 2022, pp. 2701–2715
- “Improvement in Semantic Address Matching using Natural Language Processing” In 2021 2nd International Conference for Emerging Technology (INCET), 2021, pp. 1–5
- Geoffrey Hinton, Oriol Vinyals and Jeff Dean “Distilling the Knowledge in a Neural Network” In arXiv preprint arXiv:1503.02531 2.7, 2015
- “Dense Passage Retrieval for Open-Domain Question Answering” In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) Online: Association for Computational Linguistics, 2020, pp. 6769–6781
- Vladimir I Levenshtein “Binary Codes Capable of Correcting Deletions, Insertions, and Reversals” In Soviet physics. Doklady 10.8, 1965, pp. 707–710
- “A Deep Learning Architecture for Semantic Address Matching” In International Journal of Geographical Information Science 34.3, 2020, pp. 559–576
- “Efficient Estimation of Word Representations in Vector Space” In arXiv preprint arXiv:1301.3781, 2013
- Marius Mosbach, Maksym Andriushchenko and Dietrich Klakow “On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines” In arXiv preprint arXiv:2006.04884, 2020
- “A Comparison of String Similarity Measures for Toponym Matching” In COMP 2013 - ACM SIGSPATIAL International Workshop on Computational Models of Place, 2013, pp. 54–61
- “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks” In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing Association for Computational Linguistics, 2019
- “DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter” In arXiv preprint arXiv:1910.01108, 2019
- Rui Santos, Patricia Murrieta-Flores and Bruno Martins “Learning to Combine Multiple String Similarity Metrics for Effective Toponym Matching” In International Journal of Digital Earth 11.9, 2018, pp. 913–938
- retrieved 20 Mar.2023 Statista, 2021 URL: https://www.statista.com/chart/10922/parcel-shipping-volume-and-parcel-spend-in-selected-countries/
- “Learning to Speak and Act in a Fantasy Text Adventure Game” In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) Hong Kong, China: Association for Computational Linguistics, 2019, pp. 673–683
- “Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering”, 2019 arXiv:1908.08167 [cs.CL]
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.