Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications (2211.04054v2)

Published 8 Nov 2022 in cs.CL, cs.AI, cs.SD, and eess.AS

Abstract: Personal assistants, automatic speech recognizers and dialogue understanding systems are becoming more critical in our interconnected digital world. A clear example is air traffic control (ATC) communications. ATC aims at guiding aircraft and controlling the airspace in a safe and optimal manner. These voice-based dialogues are carried between an air traffic controller (ATCO) and pilots via very-high frequency radio channels. In order to incorporate these novel technologies into ATC (low-resource domain), large-scale annotated datasets are required to develop the data-driven AI systems. Two examples are automatic speech recognition (ASR) and natural language understanding (NLU). In this paper, we introduce the ATCO2 corpus, a dataset that aims at fostering research on the challenging ATC field, which has lagged behind due to lack of annotated data. The ATCO2 corpus covers 1) data collection and pre-processing, 2) pseudo-annotations of speech data, and 3) extraction of ATC-related named entities. The ATCO2 corpus is split into three subsets. 1) ATCO2-test-set corpus contains 4 hours of ATC speech with manual transcripts and a subset with gold annotations for named-entity recognition (callsign, command, value). 2) The ATCO2-PL-set corpus consists of 5281 hours of unlabeled ATC data enriched with automatic transcripts from an in-domain speech recognizer, contextual information, speaker turn information, signal-to-noise ratio estimate and English language detection score per sample. Both available for purchase through ELDA at http://catalog.elra.info/en-us/repository/browse/ELRA-S0484. 3) The ATCO2-test-set-1h corpus is a one-hour subset from the original test set corpus, that we are offering for free at https://www.atco2.org/data. We expect the ATCO2 corpus will foster research on robust ASR and NLU not only in the field of ATC communications but also in the general research community.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. International Civil Aviation Organization, “ICAO phraseology reference guide,” 2020.
  2. “Automatic processing pipeline for collecting and annotating air-traffic voice communication data,” Engineering Proceedings, vol. 13, no. 1, pp. 8, 2021.
  3. “Automatic Speech Recognition Benchmark for Air-Traffic Communications,” in proceedings of Interspeech 2020, 2020, pp. 2297–2301.
  4. “Automatic call sign detection: Matching air surveillance data with air traffic spoken communications,” in Multidisciplinary Digital Publishing Institute Proceedings, 2020, vol. 59, p. 14.
  5. “Automated speech recognition in ATC environment,” in Proceedings of the 2nd International Conference on Application and Theory of Automation in Command and Control Systems, 2012, pp. 46–53.
  6. “Design and characterization of the non-native military air traffic communications database (nnMATC),” in Eighth Annual Conference of the International Speech Communication Association, 2007.
  7. “The ATCOSIM corpus of non-prompted clean air traffic control speech,” in LREC, 2008.
  8. “Air traffic control communication (ATCC) speech corpora and their use for ASR and TTS development,” Language Resources and Evaluation, vol. 53, no. 3, pp. 449–464, 2019.
  9. John Godfrey, “The Air Traffic Control Corpus (ATC0) - LDC94S14A,” 1994.
  10. “A speech interface for air traffic control terminals,” Aerospace Science and Technology, vol. 21, no. 1, pp. 7–15, 2012.
  11. “Common voice: A massively-multilingual speech corpus,” arXiv preprint arXiv:1912.06670, 2019.
  12. “SWITCHBOARD: Telephone speech corpus for research and development,” in Acoustics, Speech, and Signal Processing, IEEE International Conference on. IEEE Computer Society, 1992, vol. 1, pp. 517–520.
  13. “Librispeech: an ASR corpus based on public domain audio books,” in International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015, pp. 5206–5210.
  14. “The ATIS spoken language systems pilot corpus,” in Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27, 1990, 1990.
  15. “Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces,” arXiv preprint arXiv:1805.10190, 2018.
  16. “How Does Pre-trained Wav2Vec2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications,” IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar, 2023.
  17. “BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications,” IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar, 2023.
  18. “Robust command recognition for lithuanian air traffic control tower utterances,” in Interspeech, 2021.
  19. “An assessment of the technology of automatic speech recognition for military applications,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 25, no. 4, pp. 310–322, 1977.
  20. “Microcomputer system integration for air control training,” Tech. Rep., Naval Training Systems Center, Orlando FL, 1989.
  21. “Adapting probability-transitions in DP matching processing for an oral task-oriented dialogue,” in International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 1990, pp. 569–572.
  22. “An automated simulation pilot capability to support advanced air traffic controller training,” in The 26th Congress of ICAS and 8th AIAA ATIO, 2008.
  23. “Reducing controller workload with automatic speech recognition,” in 2016 IEEE/AIAA 35th Digital Avionics Systems Conference (DASC). IEEE, 2016, pp. 1–10.
  24. “Increasing ATM efficiency with assistant based speech recognition,” in Proc. of the 13th USA/Europe Air Traffic Management Research and Development Seminar, Seattle, USA, 2017.
  25. “Automated interpretation of air traffic control communication: The journey from spoken words to a deeper understanding of the meaning,” in 2021 IEEE/AIAA 40th Digital Avionics Systems Conference (DASC). IEEE, 2021, pp. 1–9.
  26. “The AMI meeting corpus,” in Proceedings of the 5th international conference on methods and techniques in behavioral research. Citeseer, 2005, vol. 88, p. 100.
  27. “TED-LIUM: an Automatic Speech Recognition dedicated corpus,” in LREC, 2012, pp. 125–129.
  28. “Semi-supervised adaptation of assistant based speech recognition models for different approach areas,” in 2018 IEEE/AIAA 37th Digital Avionics Systems Conference (DASC). IEEE, 2018, pp. 1–10.
  29. “Semi-supervised learning with semantic knowledge extraction for improved speech recognition in air traffic control,” in Proc. of the 18th Annual Conference of the International Speech Communication Association, 2017.
  30. “A Real-life, French-accented Corpus of Air Traffic Control Communications,” in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018.
  31. “Vocalise: Assessing the impact of data link technology on the R/T channel,” in 24th Digital Avionics Systems Conference. IEEE, 2005, vol. 1, pp. 5–C.
  32. “Linguistic analysis of english phraseology and plain language in air-ground communication,” Journal of Air Transport Studies, vol. 4, no. 1, pp. 44–60, 2013.
  33. “The HIWIRE database, a noisy and non-native English speech corpus for cockpit communication,” Online. http://www. hiwire. org, 2007.
  34. “Detecting English Speech in the Air Traffic Control Voice Communication,” in Proc. Interspeech 2021, 2021, pp. 3286–3290.
  35. “Ontology for transcription of ATC speech commands of SESAR 2020 solution PJ. 16-04,” in IEEE/AIAA 37th Digital Avionics Systems Conference (DASC). IEEE, 2018, pp. 1–10.
  36. “Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis,” in Ninth Annual Conference of the International Speech Communication Association, 2008.
  37. “Analysis of BUT-PT Submission for NIST LRE 2017,” in Odyssey, 2018, pp. 47–53.
  38. “Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks,” Computer Speech & Language, vol. 71, pp. 101254, 2022.
  39. “The Airbus Air Traffic Control speech recognition 2018 challenge: towards ATC automatic transcription and call sign detection,” arXiv preprint arXiv:1810.12614, 2018.
  40. “The Kaldi speech recognition toolkit,” in IEEE workshop on automatic speech recognition and understanding. IEEE Signal Processing Society, 2011, number CONF.
  41. “Hybrid neural network/hidden markov model systems for continuous speech recognition,” in Advances in Pattern Recognition Systems Using Neural Network Technologies, pp. 255–272. World Scientific, 1993.
  42. Connectionist speech recognition: a hybrid approach, vol. 247, Springer Science & Business Media, 1993.
  43. “Speaker adaptation of neural network acoustic models using i-vectors,” in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE, 2013, pp. 55–59.
  44. “Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI,” in INTERSPEECH 2016, San Francisco, CA, USA, September 2016. 2016, pp. 2751–2755, ISCA.
  45. “Semi-orthogonal low-rank matrix factorization for Deep Neural Networks,” in Proceedings of INTERSPEECH 2018, 09 2018, pp. 3743–3747.
  46. “Contextual Semi-Supervised Learning: An Approach to Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems,” in Interspeech, 2021, pp. 3296–3300.
  47. “Boosting of Contextual Information in ASR for Air-Traffic Call-Sign Recognition,” in Interspeech, 2021, pp. 3301–3305.
  48. “A two-step approach to leverage contextual data: speech recognition in air-traffic communications,” in ICASSP, 2022.
  49. “Improving callsign recognition with air-surveillance data in air-traffic communication,” arXiv preprint arXiv:2108.12156, 2021.
  50. Yi Lin, “Spoken instruction understanding in air traffic control: Challenge, technique, and application,” Aerospace, vol. 8, no. 3, pp. 65, 2021.
  51. “Grammar Based Identification Of Speaker Role For Improving ATCO And Pilot ASR,” arXiv preprint arXiv:2108.12175, 2021.
  52. “Readback error detection by automatic speech recognition to increase ATM safety,” in Proceedings of the Fourteenth USA/Europe Air Traffic Management Research and Development Seminar (ATM2021), Virtual Event, 2021, pp. 20–23.
  53. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv preprint arXiv:1810.04805, 2018.
  54. “What do we Really Know about State of the Art NER?,” arXiv preprint arXiv:2205.00034, 2022.
  55. “Message understanding conference-6: A brief history,” in COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics, 1996.
  56. “Natural language processing (almost) from scratch,” Journal of machine learning research, vol. 12, pp. 2493–2537, 2011.
  57. “The first cross-lingual challenge on recognition, normalization, and matching of named entities in Slavic languages,” in Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, 2017, pp. 76–85.
  58. “A survey on recent advances in named entity recognition from deep learning models,” in Proceedings of the 27th International Conference on Computational Linguistics, 2018, pp. 2145–2158.
  59. “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” arXiv preprint arXiv:1907.11692, 2019.
  60. “DeBERTa: Decoding-enhanced BERT with Disentangled Attention,” in International Conference on Learning Representations, 2021.
  61. “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter,” arXiv preprint arXiv:1910.01108, 2019.
  62. “Transformers: State-of-the-art natural language processing,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2020, pp. 38–45.
  63. “Datasets: A community library for natural language processing,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2021, pp. 175–184.
  64. “Dropout: a simple way to prevent neural networks from overfitting,” The journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014.
  65. “Gaussian error linear units (gelus),” arXiv preprint arXiv:1606.08415, 2016.
  66. “On the difficulty of training recurrent neural networks,” in International conference on machine learning. PMLR, 2013, pp. 1310–1318.
  67. “Decoupled weight decay regularization,” in International Conference on Learning Representations, 2019.
  68. “Call-sign recognition and understanding for noisy air-traffic transcripts using surveillance information,” in ICASSP, 2022, pp. 8357–8361.
  69. “A Survey on Recent Advances in Sequence Labeling from Deep Learning Models,” arXiv preprint arXiv:2011.06727, 2020.
  70. “Pattern based sequence classification,” IEEE Transactions on knowledge and Data Engineering, vol. 28, no. 5, pp. 1285–1298, 2015.
  71. “Comparing supervised and self-supervised embedding for ExVo Multi-Task learning track,” arXiv preprint arXiv:2206.11968, 2022.
  72. “A comprehensive survey on sentiment analysis: Approaches, challenges and trends,” Knowledge-Based Systems, vol. 226, pp. 107134, 2021.
  73. “Legal and ethical challenges in recording air traffic control speech,” in Proceedings of the Workshop on Ethical and Legal Issues in Human Language Technologies and Multilingual De-Identification of Sensitive Data In Language Resources within the 13th Language Resources and Evaluation Conference, 2022, pp. 79–83.
Citations (19)

Summary

We haven't generated a summary for this paper yet.