Papers
Topics
Authors
Recent
Search
2000 character limit reached

Improving Address Matching using Siamese Transformer Networks

Published 5 Jul 2023 in cs.LG and cs.IR | (2307.02300v1)

Abstract: Matching addresses is a critical task for companies and post offices involved in the processing and delivery of packages. The ramifications of incorrectly delivering a package to the wrong recipient are numerous, ranging from harm to the company's reputation to economic and environmental costs. This research introduces a deep learning-based model designed to increase the efficiency of address matching for Portuguese addresses. The model comprises two parts: (i) a bi-encoder, which is fine-tuned to create meaningful embeddings of Portuguese postal addresses, utilized to retrieve the top 10 likely matches of the un-normalized target address from a normalized database, and (ii) a cross-encoder, which is fine-tuned to accurately rerank the 10 addresses obtained by the bi-encoder. The model has been tested on a real-case scenario of Portuguese addresses and exhibits a high degree of accuracy, exceeding 95% at the door level. When utilized with GPU computations, the inference speed is about 4.5 times quicker than other traditional approaches such as BM25. An implementation of this system in a real-world scenario would substantially increase the effectiveness of the distribution process. Such an implementation is currently under investigation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. “Deep Contrast Learning Approach for Address Semantic Matching” In Applied Sciences 11.16, 2021, pp. 7608
  2. “Machine learning innovations in address matching: A practical comparison of Word2vec and CRFs” In Transactions in GIS 23.2, 2019, pp. 334–348
  3. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) Association for Computational Linguistics, 2019, pp. 4171–4186
  4. SeatGeek - FuzzyWuzzy, 2011 URL: https://github.com/seatgeek/fuzzywuzzy
  5. “Re2G: Retrieve, Rerank, Generate” In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Seattle, United States: Association for Computational Linguistics, 2022, pp. 2701–2715
  6. “Improvement in Semantic Address Matching using Natural Language Processing” In 2021 2nd International Conference for Emerging Technology (INCET), 2021, pp. 1–5
  7. Geoffrey Hinton, Oriol Vinyals and Jeff Dean “Distilling the Knowledge in a Neural Network” In arXiv preprint arXiv:1503.02531 2.7, 2015
  8. “Dense Passage Retrieval for Open-Domain Question Answering” In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) Online: Association for Computational Linguistics, 2020, pp. 6769–6781
  9. Vladimir I Levenshtein “Binary Codes Capable of Correcting Deletions, Insertions, and Reversals” In Soviet physics. Doklady 10.8, 1965, pp. 707–710
  10. “A Deep Learning Architecture for Semantic Address Matching” In International Journal of Geographical Information Science 34.3, 2020, pp. 559–576
  11. “Efficient Estimation of Word Representations in Vector Space” In arXiv preprint arXiv:1301.3781, 2013
  12. Marius Mosbach, Maksym Andriushchenko and Dietrich Klakow “On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines” In arXiv preprint arXiv:2006.04884, 2020
  13. “A Comparison of String Similarity Measures for Toponym Matching” In COMP 2013 - ACM SIGSPATIAL International Workshop on Computational Models of Place, 2013, pp. 54–61
  14. “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks” In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing Association for Computational Linguistics, 2019
  15. “DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter” In arXiv preprint arXiv:1910.01108, 2019
  16. Rui Santos, Patricia Murrieta-Flores and Bruno Martins “Learning to Combine Multiple String Similarity Metrics for Effective Toponym Matching” In International Journal of Digital Earth 11.9, 2018, pp. 913–938
  17. retrieved 20 Mar.2023 Statista, 2021 URL: https://www.statista.com/chart/10922/parcel-shipping-volume-and-parcel-spend-in-selected-countries/
  18. “Learning to Speak and Act in a Fantasy Text Adventure Game” In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) Hong Kong, China: Association for Computational Linguistics, 2019, pp. 673–683
  19. “Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering”, 2019 arXiv:1908.08167 [cs.CL]
Citations (2)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.