Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Can LLM Substitute Human Labeling? A Case Study of Fine-grained Chinese Address Entity Recognition Dataset for UAV Delivery (2403.06097v2)

Published 10 Mar 2024 in cs.CL, cs.AI, and cs.IR

Abstract: We present CNER-UAV, a fine-grained \textbf{C}hinese \textbf{N}ame \textbf{E}ntity \textbf{R}ecognition dataset specifically designed for the task of address resolution in \textbf{U}nmanned \textbf{A}erial \textbf{V}ehicle delivery systems. The dataset encompasses a diverse range of five categories, enabling comprehensive training and evaluation of NER models. To construct this dataset, we sourced the data from a real-world UAV delivery system and conducted a rigorous data cleaning and desensitization process to ensure privacy and data integrity. The resulting dataset, consisting of around 12,000 annotated samples, underwent human experts and \textbf{L}arge \textbf{L}anguage \textbf{M}odel annotation. We evaluated classical NER models on our dataset and provided in-depth analysis. The dataset and models are publicly available at \url{https://github.com/zhhvvv/CNER-UAV}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  2. Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020).
  3. LERT: A Linguistically-motivated Pre-trained Language Model. arXiv:2211.05344 [cs.CL]
  4. PERT: Pre-training BERT with Permuted Language Model. arXiv:2203.06906 [cs.CL]
  5. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
  6. Chatgpt outperforms crowd-workers for text-annotation tasks. arXiv preprint arXiv:2303.15056 (2023).
  7. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016).
  8. A survey on deep learning for named entity recognition. IEEE Transactions on Knowledge and Data Engineering 34, 1 (2020), 50–70.
  9. L3cube-mahaner: A marathi named entity recognition dataset and bert models. In Proceedings of the WILDRE-6 Workshop within the 13th Language Resources and Evaluation Conference. 29–34.
  10. Wanli: Worker and ai collaboration for natural language inference dataset creation. arXiv preprint arXiv:2201.05955 (2022).
  11. Chinese named entity recognition: The state of the art. Neurocomputing 473 (2022), 37–53.
  12. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
  13. RecRanker: Instruction Tuning Large Language Model as Ranker for Top-k Recommendation. arXiv preprint arXiv:2312.16018 (2023).
  14. Integrating Large Language Models into Recommendation via Mutual Augmentation and Adaptive Aggregation. arXiv preprint arXiv:2401.13870 (2024).
  15. Meituan. 2021. Food Delivery Giant Meituan Unveils Drones for Delivery Service, Offering New User Experience. https://pandaily.com/food-delivery-giant-meituan-unveils-drones-for-delivery-service/
  16. Nanyun Peng and Mark Dredze. 2015. Named entity recognition for chinese social media with jointly trained embeddings. In Proceedings of the 2015 conference on empirical methods in natural language processing. 548–554.
  17. Attention is all you need. Advances in neural information processing systems 30 (2017).
  18. Mathcoder: Seamless code integration in llms for enhanced mathematical reasoning. arXiv preprint arXiv:2310.03731 (2023).
  19. Want to reduce labeling cost? GPT-3 can help. arXiv preprint arXiv:2108.13487 (2021).
  20. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32 (2019).
  21. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414 (2022).
  22. ERNIE: Enhanced language representation with informative entities. arXiv preprint arXiv:1905.07129 (2019).
  23. Solving challenging math word problems using gpt-4 code interpreter with code-based self-verification. arXiv preprint arXiv:2308.07921 (2023).
Citations (6)

Summary

We haven't generated a summary for this paper yet.