Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Data Imputation with Iterative Graph Reconstruction (2212.02810v2)

Published 6 Dec 2022 in cs.LG

Abstract: Effective data imputation demands rich latent structure" discovery capabilities fromplain" tabular data. Recent advances in graph neural networks-based data imputation solutions show their strong structure learning potential by directly translating tabular data as bipartite graphs. However, due to a lack of relations between samples, those solutions treat all samples equally which is against one important observation: similar sample should give more information about missing values." This paper presents a novel Iterative graph Generation and Reconstruction framework for Missing data imputation(IGRM). Instead of treating all samples equally, we introduce the concept:friend networks" to represent different relations among samples. To generate an accurate friend network with missing data, an end-to-end friend network reconstruction solution is designed to allow for continuous friend network optimization during imputation learning. The representation of the optimized friend network, in turn, is used to further optimize the data imputation process with differentiated message passing. Experiment results on eight benchmark datasets show that IGRM yields 39.13% lower mean absolute error compared with nine baselines and 9.04% lower than the second-best. Our code is available at https://github.com/G-AILab/IGRM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. The treatment of missing values and its effect on classifier accuracy. In Classification, clustering, and data mining applications, 639–647. Springer.
  2. Generative adversarial denoising autoencoder for face completion. School of Interactive Computing, College of Computing, Georgia Institute of Technology.
  3. UCI machine learning repository.
  4. Multiple imputation by chained equations: what is it and how does it work? International journal of methods in psychiatric research, 20(1): 40–49.
  5. Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263.
  6. From predictive methods to missing data imputation: an optimization approach. The Journal of Machine Learning Research, 18(1): 7133–7171.
  7. mice: Multivariate imputation by chained equations in R. Journal of statistical software, 1–68.
  8. A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence, (2): 224–227.
  9. A gentle introduction to imputation of missing values. Journal of clinical epidemiology, 59(10): 1087–1091.
  10. Polynomial matrix completion for missing data imputation and transductive learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 3842–3849.
  11. Recovering missing data via matrix completion in electricity distribution systems. In 2016 IEEE 17th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), 1–6. IEEE.
  12. Multiple imputation using deep denoising autoencoders. arXiv preprint arXiv:1705.02737, 280.
  13. Inductive representation learning on large graphs. In Advances in neural information processing systems, 1024–1034.
  14. Representation learning on graphs: Methods and applications. arXiv preprint arXiv:1709.05584.
  15. Mining frequent patterns without candidate generation. ACM sigmod record, 29(2): 1–12.
  16. Matrix completion and low-rank SVD via fast alternating least squares. The Journal of Machine Learning Research, 16(1): 3367–3402.
  17. A benchmark for data imputation methods. Frontiers in big Data, 48.
  18. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144.
  19. Cluster-based KNN missing value imputation for DNA microarray data. In 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 445–450. IEEE.
  20. Variational graph auto-encoders, 2016. In Bayesian Deep Learning Workshop (NIPS 2016), arXiv preprint (arXiv: 161107308).[Google Scholar].
  21. Gans for sequences of discrete elements with the gumbel-softmax distribution. arXiv preprint arXiv:1611.04051.
  22. MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms. Advances in Neural Information Processing Systems, 34: 23806–23817.
  23. Visualizing Data using t-SNE. Journal of Machine Learning Research, 9(2605): 2579–2605.
  24. Mining of massive data sets. Cambridge university press.
  25. Missing value imputation: a review and analysis of the literature (2006–2017). Artificial Intelligence Review, 53(2): 1487–1509.
  26. Statistical analysis with missing data. Technical report, J. Wiley.
  27. Multivariate time series imputation with generative adversarial networks. Advances in neural information processing systems, 31.
  28. The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712.
  29. K-nearest neighbor in missing data imputation. International Journal of Engineering Research and Development, 5(1): 5–7.
  30. Spectral regularization algorithms for learning large incomplete matrices. The Journal of Machine Learning Research, 11: 2287–2322.
  31. Missing data imputation using optimal transport. In International Conference on Machine Learning, 7130–7140. PMLR.
  32. Handling incomplete heterogeneous data using vaes. Pattern Recognition, 107: 107501.
  33. Rousseeuw, P. J. 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, 20: 53–65.
  34. Missing data imputation with adversarially-trained graph convolutional networks. Neural Networks, 129: 249–260.
  35. Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6): 520–525.
  36. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, 1096–1103.
  37. Multiple imputation using chained equations: issues and guidance for practice. Statistics in medicine, 30(4): 377–399.
  38. Gain: Missing data imputation using generative adversarial nets. In International Conference on Machine Learning, 5689–5698. PMLR.
  39. Handling missing data with graph representation learning. arXiv preprint arXiv:2010.16418.
  40. G2SAT: Learning to generate sat formulas. Advances in neural information processing systems, 32.
  41. Inductive matrix completion based on graph neural networks. arXiv preprint arXiv:1904.12058.
  42. Data augmentation for graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 11015–11023.
Citations (6)

Summary

We haven't generated a summary for this paper yet.