Machine Learning for Urban Air Quality Analytics: A Survey
Abstract: The increasing air pollution poses an urgent global concern with far-reaching consequences, such as premature mortality and reduced crop yield, which significantly impact various aspects of our daily lives. Accurate and timely analysis of air pollution is crucial for understanding its underlying mechanisms and implementing necessary precautions to mitigate potential socio-economic losses. Traditional analytical methodologies, such as atmospheric modeling, heavily rely on domain expertise and often make simplified assumptions that may not be applicable to complex air pollution problems. In contrast, Machine Learning (ML) models are able to capture the intrinsic physical and chemical rules by automatically learning from a large amount of historical observational data, showing great promise in various air quality analytical tasks. In this article, we present a comprehensive survey of ML-based air quality analytics, following a roadmap spanning from data acquisition to pre-processing, and encompassing various analytical tasks such as pollution pattern mining, air quality inference, and forecasting. Moreover, we offer a systematic categorization and summary of existing methodologies and applications, while also providing a list of publicly available air quality datasets to ease the research in this direction. Finally, we identify several promising future research directions. This survey can serve as a valuable resource for professionals seeking suitable solutions for their specific challenges and advancing their research at the cutting edge.
- Interventional causal representation learning. In International Conference on Machine Learning. PMLR, 372–407.
- A generic regional spatio-temporal co-occurrence pattern mining model: a case study for air pollution. Journal of Geographical Systems 17 (2015), 249–274.
- Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
- Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence 41, 2 (2018), 423–443.
- Model-based compressive sensing. IEEE Transactions on information theory 56, 4 (2010), 1982–2001.
- Deep learning for time series forecasting: Tutorial and literature survey. Comput. Surveys 55, 6 (2022), 1–36.
- Pangu-weather: A 3d high-resolution model for fast and accurate global weather forecast. arXiv preprint arXiv:2211.02556 (2022).
- A comparative study of air quality index based on factor analysis and US-EPA methods for an urban environment. Aerosol and Air Quality Research 9, 1 (2009), 1–17.
- Avrim Blum and Tom Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory. 92–100.
- Leo Breiman. 2001. Random forests. Machine learning 45 (2001), 5–32.
- Brits: Bidirectional recurrent imputation for time series. Advances in neural information processing systems 31 (2018).
- Spatially fine-grained urban air quality estimation using ensemble semi-supervised learning and pruning. In Proceedings of the 2016 ACM international joint conference on pervasive and ubiquitous computing. 1076–1087.
- Group-aware graph neural network for nationwide city air quality forecasting. arXiv preprint arXiv:2108.12238 (2021).
- Evanet: An extreme value attention network for long-term air quality prediction. In 2020 IEEE International Conference on Big Data (Big Data). IEEE, 4545–4552.
- A neural attention model for urban air quality inference: Learning the weights of monitoring stations. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
- Yizong Cheng. 1995. Mean shift, mode seeking, and clustering. IEEE transactions on pattern analysis and machine intelligence 17, 8 (1995), 790–799.
- Ict: In-field calibration transfer for air quality sensor deployments. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 1 (2019), 1–19.
- Finding dynamic co-evolving zones in spatial-temporal time series data. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part III 16. Springer, 129–144.
- TIP-Air: Tracking pollution transfer for accurate air quality prediction. In Adjunct Proceedings of the 2021 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2021 ACM International Symposium on Wearable Computers. 589–599.
- On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014).
- Filling the g_ap_s: Multivariate time series imputation by graph neural networks. arXiv preprint arXiv:2108.00298 (2021).
- Taming Local Effects in Graph-based Spatiotemporal Forecasting. arXiv preprint arXiv:2302.04071 (2023).
- Low-cost outdoor air quality monitoring and sensor calibration: A survey and critical analysis. ACM Transactions on Sensor Networks (TOSN) 17, 2 (2021), 1–44.
- Noel Cressie. 2015. Statistics for spatial data. John Wiley & Sons.
- Use of electrochemical sensors for measurement of air pollution: correcting interference response and validating measurements. Atmospheric Measurement Techniques 10, 9 (2017), 3575–3588.
- Spatio-temporal neural structural causal models for bike flow prediction. arXiv preprint arXiv:2301.07843 (2023).
- Robust Event Forecasting with Spatiotemporal Confounder Learning. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 294–304.
- AirVis: Visual analytics of air pollution propagation. IEEE transactions on visualization and computer graphics 26, 1 (2019), 800–810.
- Visual cascade analytics of large-scale spatiotemporal data. IEEE Transactions on Visualization and Computer Graphics 28, 6 (2021), 2486–2499.
- Support vector regression machines. Advances in neural information processing systems 9 (1996).
- Deep air quality forecasting using hybrid deep learning framework. IEEE Transactions on Knowledge and Data Engineering 33, 6 (2019), 2412–2424.
- Interactive reinforcement learning for feature selection with decision tree in the loop. IEEE Transactions on Knowledge and Data Engineering (2021).
- Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning. PMLR, 1050–1059.
- Mosaic: A low-cost mobile sensing system for urban air quality monitoring. In IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications. IEEE, 1–9.
- Mobile crowd sensing and computing: The review of an emerging human-powered sensing paradigm. ACM computing surveys (CSUR) 48, 1 (2015), 1–31.
- Semi-Supervised Air Quality Forecasting via Self-Supervised Hierarchical Graph Neural Network. IEEE Transactions on Knowledge and Data Engineering (2022).
- Kill Two Birds with One Stone: A Multi-View Multi-Adversarial Learning Approach for Joint Air Quality and Weather Prediction. IEEE Transactions on Knowledge and Data Engineering (2023).
- Joint air quality and weather prediction based on multi-adversarial spatiotemporal networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4081–4089.
- Fine-Grained Air Quality Inference via Multi-Channel Attention Model.. In IJCAI. 2512–2518.
- A domain-specific Bayesian deep-learning approach for air pollution forecast. IEEE Transactions on Big Data 8, 4 (2020), 1034–1046.
- Deriving high-resolution urban air pollution maps using mobile sensor nodes. Pervasive and Mobile Computing 16 (2015), 268–285.
- Pushing the spatio-temporal resolution limit of urban air pollution maps. In 2014 IEEE International Conference on Pervasive Computing and Communications (PerCom). IEEE, 69–77.
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
- A review of land-use regression models to assess spatial variation of outdoor air pollution. Atmospheric environment 42, 33 (2008), 7561–7578.
- Inferring air quality for station location recommendation based on urban big data. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 437–446.
- An overview of air quality analysis by big data techniques: Monitoring, forecasting, and traceability. Information Fusion 75 (2021), 28–40.
- Bayesian compressive sensing. IEEE Transactions on signal processing 56, 6 (2008), 2346–2356.
- A survey of frequent subgraph mining algorithms. The Knowledge Engineering Review 28, 1 (2013), 75–105.
- Enhancing air quality prediction with social media and natural language processing. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2627–2632.
- Using social media to detect outdoor air pollution and monitor air quality index (AQI): a geo-targeted spatiotemporal analysis framework with Sina Weibo (Chinese Twitter). PloS one 10, 10 (2015), e0141185.
- Spatio-Temporal Graph Neural Networks for Predictive Learning in Urban Computing: A Survey. arXiv preprint arXiv:2303.14483 (2023).
- Methods for imputation of missing values in air quality data sets. Atmospheric environment 38, 18 (2004), 2895–2907.
- A region-based model for estimating urban air pollution. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 28.
- Computational deep air quality prediction techniques: a systematic review. Artificial Intelligence Review (2023), 1–46.
- Bayesian inference for source determination with applications to a complex urban environment. Atmospheric environment 41, 3 (2007), 465–479.
- Samuli Laine and Timo Aila. 2016. Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242 (2016).
- Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems 30 (2017).
- GraphCast: Learning skillful medium-range global weather forecasting. arXiv preprint arXiv:2212.12794 (2022).
- Yann LeCun et al. 1989. Generalization and network design strategies. Connectionism in perspective 19, 143-155 (1989), 18.
- Deep learning. nature 521, 7553 (2015), 436–444.
- Source apportionment of PM2. 5: Comparing PMF and CMB results for four ambient monitoring sites in the southeastern United States. Atmospheric Environment 42, 18 (2008), 4126–4137.
- DDGNet: A Dual-Stage Dynamic Spatio-Temporal Graph Network for PM 2.5 Forecasting. In 2021 IEEE International Conference on Big Data (Big Data). IEEE, 1679–1685.
- Jin Li and Andrew D Heap. 2011. A review of comparative studies of spatial interpolation methods in environmental sciences: Performance and impact factors. Ecological Informatics 6, 3-4 (2011), 228–241.
- Jin Li and Andrew D Heap. 2014. Spatial interpolation methods applied in the environmental sciences: A review. Environmental Modelling & Software 53 (2014), 173–189.
- Sensing the air we breathe—the OpenSense Zurich dataset. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 26. 323–325.
- Discovering pollution sources and propagation patterns in urban area. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1863–1872.
- Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv preprint arXiv:1707.01926 (2017).
- Geoman: Multi-level attention networks for geo-sensory time series prediction.. In IJCAI, Vol. 2018. 3428–3434.
- AirFormer: Predicting Nationwide Air Quality in China with Transformers. arXiv preprint arXiv:2211.15979 (2022).
- Mining public datasets for modeling intra-city PM2. 5 concentrations at a fine spatial resolution. In Proceedings of the 25th ACM SIGSPATIAL international conference on advances in geographic information systems. 1–10.
- Calibrating low-cost sensors by a two-phase learning approach for urban air quality measurement. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 1 (2018), 1–18.
- Exploiting spatiotemporal patterns for accurate air quality forecasting using deep learning. In Proceedings of the 26th ACM SIGSPATIAL international conference on advances in geographic information systems. 359–368.
- Unified route representation learning for multi-modal transportation recommendation with spatiotemporal pre-training. The VLDB Journal (2022), 1–18.
- Community-Aware Multi-Task Transportation Demand Prediction. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, Vol. 35. 320–327.
- Third-eye: A mobilephone-enabled crowdsensing system for air quality monitoring. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 1 (2018), 1–26.
- Data-driven machine learning in environmental pollution: gains and problems. Environmental science & technology 56, 4 (2022), 2124–2133.
- Fine-Grained Individual Air Quality Index (IAQI) Prediction Based on Spatial-Temporal Causal Convolution Network: A Case Study of Shanghai. Atmosphere 13, 6 (2022), 959.
- Multivariate time series imputation with generative adversarial networks. Advances in neural information processing systems 31 (2018).
- E2gan: End-to-end generative adversarial network for multivariate time series imputation. In Proceedings of the 28th international joint conference on artificial intelligence. AAAI Press, 3094–3100.
- AccuAir: Winning solution to air quality prediction for KDD Cup 2018. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1842–1850.
- Fine-grained air pollution inference with mobile sensing systems: A weather-related deep autoencoder model. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 2 (2020), 1–21.
- A survey on sensor calibration in air pollution monitoring deployments. IEEE Internet of Things Journal 5, 6 (2018), 4857–4870.
- W-air: Enabling personal air pollution monitoring on wearables. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 1 (2018), 1–25.
- Machine translation of cortical activity to text with an encoder–decoder framework. Nature neuroscience 23, 4 (2020), 575–582.
- Learning to reconstruct missing data from spatiotemporal graphs with sparse observations. arXiv preprint arXiv:2205.13479 (2022).
- Randall V Martin. 2008. Satellite remote sensing of surface air quality. Atmospheric environment 42, 34 (2008), 7823–7843.
- Quantification method for electrolytic sensors in long-term monitoring of ambient air quality. Sensors 15, 10 (2015), 27283–27302.
- Inferring air pollution by sniffing social media. In 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014). IEEE, 534–539.
- Generative semi-supervised learning for multivariate time series imputation. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 8983–8991.
- Harvey J Miller. 2004. Tobler’s first law and spatial analysis. Annals of the association of American geographers 94, 2 (2004), 284–289.
- Toward massive scale air quality monitoring. IEEE Communications Magazine 58, 2 (2020), 54–59.
- Radford M Neal. 2012. Bayesian learning for neural networks. Vol. 118. Springer Science & Business Media.
- Discovering congestion propagation patterns in spatio-temporal traffic data. IEEE Transactions on Big Data 3, 2 (2016), 169–180.
- Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016).
- World Health Organization et al. 2015. Economic cost of the health impact of air pollution in Europe: Clean air, health and wealth. Technical Report. World Health Organization. Regional Office for Europe.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
- Crowdsensing air quality with camera-enabled mobile devices. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31. 4728–4733.
- Accurate and scalable Gaussian processes for fine-grained air quality inference. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 12080–12088.
- Deep air learning: Interpolation, prediction, and feature analysis of fine-grained air quality. IEEE Transactions on Knowledge and Data Engineering 30, 12 (2018), 2285–2297.
- Network-wide traffic states imputation using self-interested coalitional learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1370–1378.
- Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
- Rapid deployment with confidence: Calibration and fault detection in environmental sensor networks. (2006).
- Semi-supervised learning with ladder networks. Advances in neural information processing systems 28 (2015).
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems 35 (2022), 36479–36494.
- Temporal convolutional denoising autoencoder network for air pollution prediction with missing values. Urban Climate 38 (2021), 100872.
- Does air quality really impact COVID-19 clinical severity: coupling NASA satellite datasets with geometric deep learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 3540–3548.
- Inferring gas consumption and pollution emission of vehicles throughout a city. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 1027–1036.
- Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Advances in neural information processing systems 28 (2015).
- Graph-based semi-supervised learning: A comprehensive review. IEEE Transactions on Neural Networks and Learning Systems (2022).
- Sequence to sequence learning with neural networks. Advances in neural information processing systems 27 (2014).
- Antti Tarvainen and Harri Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems 30 (2017).
- Michael E Tipping and Anita C Faul. 2003. Fast marginal likelihood maximisation for sparse Bayesian models. In International workshop on artificial intelligence and statistics. PMLR, 276–283.
- Spatial crowdsourcing: a survey. The VLDB Journal 29 (2020), 217–250.
- Estimating ground-level PM2. 5 using aerosol optical depth determined from satellite remote sensing. Journal of Geophysical Research: Atmospheres 111, D21 (2006).
- Jesper E Van Engelen and Holger H Hoos. 2020. A survey on semi-supervised learning. Machine Learning 109, 2 (2020), 373–440.
- Modelling air quality in street canyons: a review. Atmospheric environment 37, 2 (2003), 155–182.
- Deep uncertainty quantification: A machine learning approach for weather forecasting. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2087–2095.
- Modeling inter-station relationships with attentive temporal graph convolutional network for air quality prediction. In Proceedings of the 14th ACM international conference on web search and data mining. 616–634.
- Deep learning for spatio-temporal data mining: A survey. IEEE transactions on knowledge and data engineering (2020).
- Pm2. 5-gnn: A domain knowledge enhanced graph neural network for pm2. 5 forecasting. In Proceedings of the 28th international conference on advances in geographic information systems. 163–166.
- Sparse recovery: from vectors to tensors. National Science Review 5, 5 (2018), 756–767.
- Real-Time Estimation of the Urban Air Quality with Mobile Sensor System. ACM Transactions on Knowledge Discovery from Data (TKDD) 13, 5 (2019), 11–19.
- Integrating physics-based modeling with machine learning: A survey. arXiv preprint arXiv:2003.04919 1, 1 (2020), 1–34.
- Comparison of spatial interpolation methods for the estimation of air quality data. Journal of Exposure Science & Environmental Epidemiology 14, 5 (2004), 404–415.
- Quantifying uncertainty in deep spatiotemporal forecasting. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1841–1851.
- When sharing economy meets IoT: towards fine-grained urban air quality monitoring through mobile crowdsensing on bike-share system. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 2 (2020), 1–26.
- A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems 32, 1 (2020), 4–24.
- Msstn: Multi-scale spatial temporal network for air pollution prediction. In 2019 IEEE International Conference on Big Data (Big Data). IEEE, 1547–1556.
- Predicting traffic congestion propagation patterns: A propagation graph approach. In Proceedings of the 11th ACM SIGSPATIAL International Workshop on Computational Transportation Science. 60–69.
- Air quality index, indicatory air pollutants and impact of COVID-19 event on the air quality near central China. Aerosol and Air Quality Research 20, 6 (2020), 1204–1221.
- Yanan Xu and Yanmin Zhu. 2016. When remote sensing data meet ubiquitous urban data: Fine-grained air quality inference. In 2016 IEEE International Conference on Big Data (Big Data). IEEE, 1252–1261.
- Fine-grained air quality inference with remote sensing data and ubiquitous urban data. ACM Transactions on Knowledge Discovery from Data (TKDD) 13, 5 (2019), 1–27.
- Xifeng Yan and Jiawei Han. 2002. gspan: Graph-based substructure pattern mining. In 2002 IEEE International Conference on Data Mining, 2002. Proceedings. IEEE, 721–724.
- Predicting fine-grained air quality based on deep neural networks. IEEE Transactions on Big Data 8, 5 (2020), 1326–1339.
- Deep distributed fusion network for air quality prediction. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 965–973.
- ST-MVL: filling missing values in geo-sensory time series data. In Proceedings of the 25th International Joint Conference on Artificial Intelligence.
- CGF: A Category Guidance Based PM __\__{2.52.52.52.5} Sequence Forecasting Training Framework. IEEE Transactions on Knowledge and Data Engineering (2023).
- Airnet: A calibration model for low-cost air monitoring sensors using dual sequence encoder networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 1129–1136.
- Deepthermal: Combustion optimization for thermal power generating units using offline reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 4680–4688.
- Assembler: Efficient discovery of spatial co-evolving patterns in massive geo-sensory data. In Proceedings of the 21th ACM SIGKDD international conference on Knowledge discovery and data mining. 1415–1424.
- Deep learning based recommender system: A survey and new perspectives. ACM computing surveys (CSUR) 52, 1 (2019), 1–38.
- Multi-agent graph convolutional reinforcement learning for dynamic electric vehicle charging pricing. In Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining. 2471–2481.
- Semi-supervised hierarchical recurrent graph neural network for city-wide parking availability prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 1186–1193.
- Intelligent electric vehicle charging recommendation based on multi-agent reinforcement learning. In Proceedings of the Web Conference 2021. 1856–1867.
- Real-time air quality forecasting, part I: History, techniques, and current status. Atmospheric Environment 60 (2012), 632–655.
- Real-time air quality forecasting, part II: State of the science, current research needs, and future prospects. Atmospheric Environment 60 (2012), 656–676.
- Air Pollution Hotspot Detection and Source Feature Analysis using Cross-Domain Urban Data. In Proceedings of the 29th International Conference on Advances in Geographic Information Systems. 592–595.
- Multi-Group Encoder-Decoder Networks to Fuse Heterogeneous Data for Next-Day Air Quality Prediction.. In IJCAI. 4341–4347.
- Incorporating spatio-temporal smoothness for air quality inference. In 2017 IEEE International Conference on Data Mining (ICDM). IEEE, 1177–1182.
- Air pollution lowers Chinese urbanites’ expressed happiness on social media. Nature human behaviour 3, 3 (2019), 237–243.
- Yu Zheng. 2015. Methodologies for cross-domain data fusion: An overview. IEEE transactions on big data 1, 1 (2015), 16–34.
- Urban computing: concepts, methodologies, and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 5, 3 (2014), 1–55.
- U-air: When urban air quality inference meets big data. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 1436–1444.
- Forecasting fine-grained air quality based on big data. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 2267–2276.
- Machine learning: new ideas and tools in environmental science and engineering. Environmental Science & Technology 55, 19 (2021), 12741–12754.
- Zhi-Hua Zhou and Ming Li. 2005. Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on knowledge and Data Engineering 17, 11 (2005), 1529–1541.
- pg-causality: Identifying spatiotemporal causal pathways for air pollutants with urban big data. IEEE Transactions on Big Data 4, 4 (2017), 571–585.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.