ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications (2211.04054v2)
Abstract: Personal assistants, automatic speech recognizers and dialogue understanding systems are becoming more critical in our interconnected digital world. A clear example is air traffic control (ATC) communications. ATC aims at guiding aircraft and controlling the airspace in a safe and optimal manner. These voice-based dialogues are carried between an air traffic controller (ATCO) and pilots via very-high frequency radio channels. In order to incorporate these novel technologies into ATC (low-resource domain), large-scale annotated datasets are required to develop the data-driven AI systems. Two examples are automatic speech recognition (ASR) and natural language understanding (NLU). In this paper, we introduce the ATCO2 corpus, a dataset that aims at fostering research on the challenging ATC field, which has lagged behind due to lack of annotated data. The ATCO2 corpus covers 1) data collection and pre-processing, 2) pseudo-annotations of speech data, and 3) extraction of ATC-related named entities. The ATCO2 corpus is split into three subsets. 1) ATCO2-test-set corpus contains 4 hours of ATC speech with manual transcripts and a subset with gold annotations for named-entity recognition (callsign, command, value). 2) The ATCO2-PL-set corpus consists of 5281 hours of unlabeled ATC data enriched with automatic transcripts from an in-domain speech recognizer, contextual information, speaker turn information, signal-to-noise ratio estimate and English language detection score per sample. Both available for purchase through ELDA at http://catalog.elra.info/en-us/repository/browse/ELRA-S0484. 3) The ATCO2-test-set-1h corpus is a one-hour subset from the original test set corpus, that we are offering for free at https://www.atco2.org/data. We expect the ATCO2 corpus will foster research on robust ASR and NLU not only in the field of ATC communications but also in the general research community.
- International Civil Aviation Organization, “ICAO phraseology reference guide,” 2020.
- “Automatic processing pipeline for collecting and annotating air-traffic voice communication data,” Engineering Proceedings, vol. 13, no. 1, pp. 8, 2021.
- “Automatic Speech Recognition Benchmark for Air-Traffic Communications,” in proceedings of Interspeech 2020, 2020, pp. 2297–2301.
- “Automatic call sign detection: Matching air surveillance data with air traffic spoken communications,” in Multidisciplinary Digital Publishing Institute Proceedings, 2020, vol. 59, p. 14.
- “Automated speech recognition in ATC environment,” in Proceedings of the 2nd International Conference on Application and Theory of Automation in Command and Control Systems, 2012, pp. 46–53.
- “Design and characterization of the non-native military air traffic communications database (nnMATC),” in Eighth Annual Conference of the International Speech Communication Association, 2007.
- “The ATCOSIM corpus of non-prompted clean air traffic control speech,” in LREC, 2008.
- “Air traffic control communication (ATCC) speech corpora and their use for ASR and TTS development,” Language Resources and Evaluation, vol. 53, no. 3, pp. 449–464, 2019.
- John Godfrey, “The Air Traffic Control Corpus (ATC0) - LDC94S14A,” 1994.
- “A speech interface for air traffic control terminals,” Aerospace Science and Technology, vol. 21, no. 1, pp. 7–15, 2012.
- “Common voice: A massively-multilingual speech corpus,” arXiv preprint arXiv:1912.06670, 2019.
- “SWITCHBOARD: Telephone speech corpus for research and development,” in Acoustics, Speech, and Signal Processing, IEEE International Conference on. IEEE Computer Society, 1992, vol. 1, pp. 517–520.
- “Librispeech: an ASR corpus based on public domain audio books,” in International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015, pp. 5206–5210.
- “The ATIS spoken language systems pilot corpus,” in Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27, 1990, 1990.
- “Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces,” arXiv preprint arXiv:1805.10190, 2018.
- “How Does Pre-trained Wav2Vec2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications,” IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar, 2023.
- “BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications,” IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar, 2023.
- “Robust command recognition for lithuanian air traffic control tower utterances,” in Interspeech, 2021.
- “An assessment of the technology of automatic speech recognition for military applications,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 25, no. 4, pp. 310–322, 1977.
- “Microcomputer system integration for air control training,” Tech. Rep., Naval Training Systems Center, Orlando FL, 1989.
- “Adapting probability-transitions in DP matching processing for an oral task-oriented dialogue,” in International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 1990, pp. 569–572.
- “An automated simulation pilot capability to support advanced air traffic controller training,” in The 26th Congress of ICAS and 8th AIAA ATIO, 2008.
- “Reducing controller workload with automatic speech recognition,” in 2016 IEEE/AIAA 35th Digital Avionics Systems Conference (DASC). IEEE, 2016, pp. 1–10.
- “Increasing ATM efficiency with assistant based speech recognition,” in Proc. of the 13th USA/Europe Air Traffic Management Research and Development Seminar, Seattle, USA, 2017.
- “Automated interpretation of air traffic control communication: The journey from spoken words to a deeper understanding of the meaning,” in 2021 IEEE/AIAA 40th Digital Avionics Systems Conference (DASC). IEEE, 2021, pp. 1–9.
- “The AMI meeting corpus,” in Proceedings of the 5th international conference on methods and techniques in behavioral research. Citeseer, 2005, vol. 88, p. 100.
- “TED-LIUM: an Automatic Speech Recognition dedicated corpus,” in LREC, 2012, pp. 125–129.
- “Semi-supervised adaptation of assistant based speech recognition models for different approach areas,” in 2018 IEEE/AIAA 37th Digital Avionics Systems Conference (DASC). IEEE, 2018, pp. 1–10.
- “Semi-supervised learning with semantic knowledge extraction for improved speech recognition in air traffic control,” in Proc. of the 18th Annual Conference of the International Speech Communication Association, 2017.
- “A Real-life, French-accented Corpus of Air Traffic Control Communications,” in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018.
- “Vocalise: Assessing the impact of data link technology on the R/T channel,” in 24th Digital Avionics Systems Conference. IEEE, 2005, vol. 1, pp. 5–C.
- “Linguistic analysis of english phraseology and plain language in air-ground communication,” Journal of Air Transport Studies, vol. 4, no. 1, pp. 44–60, 2013.
- “The HIWIRE database, a noisy and non-native English speech corpus for cockpit communication,” Online. http://www. hiwire. org, 2007.
- “Detecting English Speech in the Air Traffic Control Voice Communication,” in Proc. Interspeech 2021, 2021, pp. 3286–3290.
- “Ontology for transcription of ATC speech commands of SESAR 2020 solution PJ. 16-04,” in IEEE/AIAA 37th Digital Avionics Systems Conference (DASC). IEEE, 2018, pp. 1–10.
- “Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis,” in Ninth Annual Conference of the International Speech Communication Association, 2008.
- “Analysis of BUT-PT Submission for NIST LRE 2017,” in Odyssey, 2018, pp. 47–53.
- “Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks,” Computer Speech & Language, vol. 71, pp. 101254, 2022.
- “The Airbus Air Traffic Control speech recognition 2018 challenge: towards ATC automatic transcription and call sign detection,” arXiv preprint arXiv:1810.12614, 2018.
- “The Kaldi speech recognition toolkit,” in IEEE workshop on automatic speech recognition and understanding. IEEE Signal Processing Society, 2011, number CONF.
- “Hybrid neural network/hidden markov model systems for continuous speech recognition,” in Advances in Pattern Recognition Systems Using Neural Network Technologies, pp. 255–272. World Scientific, 1993.
- Connectionist speech recognition: a hybrid approach, vol. 247, Springer Science & Business Media, 1993.
- “Speaker adaptation of neural network acoustic models using i-vectors,” in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE, 2013, pp. 55–59.
- “Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI,” in INTERSPEECH 2016, San Francisco, CA, USA, September 2016. 2016, pp. 2751–2755, ISCA.
- “Semi-orthogonal low-rank matrix factorization for Deep Neural Networks,” in Proceedings of INTERSPEECH 2018, 09 2018, pp. 3743–3747.
- “Contextual Semi-Supervised Learning: An Approach to Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems,” in Interspeech, 2021, pp. 3296–3300.
- “Boosting of Contextual Information in ASR for Air-Traffic Call-Sign Recognition,” in Interspeech, 2021, pp. 3301–3305.
- “A two-step approach to leverage contextual data: speech recognition in air-traffic communications,” in ICASSP, 2022.
- “Improving callsign recognition with air-surveillance data in air-traffic communication,” arXiv preprint arXiv:2108.12156, 2021.
- Yi Lin, “Spoken instruction understanding in air traffic control: Challenge, technique, and application,” Aerospace, vol. 8, no. 3, pp. 65, 2021.
- “Grammar Based Identification Of Speaker Role For Improving ATCO And Pilot ASR,” arXiv preprint arXiv:2108.12175, 2021.
- “Readback error detection by automatic speech recognition to increase ATM safety,” in Proceedings of the Fourteenth USA/Europe Air Traffic Management Research and Development Seminar (ATM2021), Virtual Event, 2021, pp. 20–23.
- “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv preprint arXiv:1810.04805, 2018.
- “What do we Really Know about State of the Art NER?,” arXiv preprint arXiv:2205.00034, 2022.
- “Message understanding conference-6: A brief history,” in COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics, 1996.
- “Natural language processing (almost) from scratch,” Journal of machine learning research, vol. 12, pp. 2493–2537, 2011.
- “The first cross-lingual challenge on recognition, normalization, and matching of named entities in Slavic languages,” in Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, 2017, pp. 76–85.
- “A survey on recent advances in named entity recognition from deep learning models,” in Proceedings of the 27th International Conference on Computational Linguistics, 2018, pp. 2145–2158.
- “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” arXiv preprint arXiv:1907.11692, 2019.
- “DeBERTa: Decoding-enhanced BERT with Disentangled Attention,” in International Conference on Learning Representations, 2021.
- “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter,” arXiv preprint arXiv:1910.01108, 2019.
- “Transformers: State-of-the-art natural language processing,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2020, pp. 38–45.
- “Datasets: A community library for natural language processing,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2021, pp. 175–184.
- “Dropout: a simple way to prevent neural networks from overfitting,” The journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014.
- “Gaussian error linear units (gelus),” arXiv preprint arXiv:1606.08415, 2016.
- “On the difficulty of training recurrent neural networks,” in International conference on machine learning. PMLR, 2013, pp. 1310–1318.
- “Decoupled weight decay regularization,” in International Conference on Learning Representations, 2019.
- “Call-sign recognition and understanding for noisy air-traffic transcripts using surveillance information,” in ICASSP, 2022, pp. 8357–8361.
- “A Survey on Recent Advances in Sequence Labeling from Deep Learning Models,” arXiv preprint arXiv:2011.06727, 2020.
- “Pattern based sequence classification,” IEEE Transactions on knowledge and Data Engineering, vol. 28, no. 5, pp. 1285–1298, 2015.
- “Comparing supervised and self-supervised embedding for ExVo Multi-Task learning track,” arXiv preprint arXiv:2206.11968, 2022.
- “A comprehensive survey on sentiment analysis: Approaches, challenges and trends,” Knowledge-Based Systems, vol. 226, pp. 107134, 2021.
- “Legal and ethical challenges in recording air traffic control speech,” in Proceedings of the Workshop on Ethical and Legal Issues in Human Language Technologies and Multilingual De-Identification of Sensitive Data In Language Resources within the 13th Language Resources and Evaluation Conference, 2022, pp. 79–83.