Data Readiness for AI: A 360-Degree Survey (2404.05779v2)
Abstract: AI applications critically depend on data. Poor quality data produces inaccurate and ineffective AI models that may lead to incorrect or unsafe use. Evaluation of data readiness is a crucial step in improving the quality and appropriateness of data usage for AI. R&D efforts have been spent on improving data quality. However, standardized metrics for evaluating data readiness for use in AI training are still evolving. In this study, we perform a comprehensive survey of metrics used to verify data readiness for AI training. This survey examines more than 140 papers published by ACM Digital Library, IEEE Xplore, journals such as Nature, Springer, and Science Direct, and online articles published by prominent AI experts. This survey aims to propose a taxonomy of data readiness for AI (DRAI) metrics for structured and unstructured datasets. We anticipate that this taxonomy will lead to new standards for DRAI metrics that will be used for enhancing the quality, accuracy, and fairness of AI training and inference.
- 2008. PEVQ – the Standard for Perceptual Evaluation of Video Quality. http://www.pevq.com/pevq.html Accessed 22 July 2023.
- n.d.. BS.1387 : Method for Objective Measurements of Perceived Audio Quality. https://www.itu.int/rec/R-REC-BS.1387/en. Accessed 17 July 2023.
- n.d. The Gunning Fog Index. https://readable.com/readability/gunning-fog-index/. Accessed 12 July 2023.
- n.d.. JNDmetrix Technology. http://www.sarnoff.com/products_services/video_vision/jndmetrix/. Accessed 12 July 2023.
- n.d.. Kaggle. https://www.kaggle.com/ Accessed: Sept 2023.
- n.d.. PSNR. https://www.mathworks.com/help/vision/ref/psnr.html Data Accessed 7/26/2023.
- Georgios Afendras and Marianthi Markatou. 2019. Optimality of training/test size and resampling effectiveness in cross-validation. Journal of Statistical Planning and Inference 199 (2019), 286–301. https://doi.org/10.1016/j.jspi.2018.07.005
- Data Readiness Report. arXiv:2010.07213 [cs.DB]
- Learning from Imbalanced Data Sets. Springer.
- Robbie Allen. 2019. Assessing Your Data Readiness for Machine Learning. Medium. https://medium.com/machine-learning-in-practice/assessing-your-data-readiness-for-machine-learning-ab97e0e81166 Accessed on June 27, 2023.
- Is your dataset big enough? Sample size requirements when using artificial neural networks for discrete choice analysis. Journal of Choice Modelling 28 (2018), 167–182. https://doi.org/10.1016/j.jocm.2018.07.002
- Sevgi Arca and Rattikorn Hewett. 2020. Is Entropy enough for measuring Privacy?. In 2020 International Conference on Computational Science and Computational Intelligence (CSCI). 1335–1340. https://doi.org/10.1109/CSCI51800.2020.00249
- E-FAIR-DB: Functional Dependencies to Discover Data Bias and Enhance Data Equity. J. Data and Information Quality 14, 4, Article 29 (nov 2022), 26 pages. https://doi.org/10.1145/3552433
- J. Beerends and J. Stemerdink. 1994. A Perceptual Speech Quality Measure Based on a Psychoacoustic Sound Representation. Journal of Audio Eng. Soc. 42 (December 1994), 115–123.
- Michele Bezzi. 2007. An entropy based method for measuring anonymity. In 2007 Third International Conference on Security and Privacy in Communications Networks and the Workshops - SecureComm 2007. 28–32. https://doi.org/10.1109/SECCOM.2007.4550303
- Edd Biddle and Paul Christensen. n.d. Prepare Your Data for AI and Data Science. https://www.ibm.com/garage/method/practices/code/data-preparation-ai-data-science/. Accessed 26 June 2023.
- B. Blaiszik et al. 2016. The Materials Data Facility: Data Services to Advance Materials Science Research. JOM 68 (2016). https://doi.org/10.1007/s11837-016-2001-3
- Roger Blake and Paul Mangiameli. 2011. The Effects and Interactions of Data Quality and Problem Complexity on Classification. J. Data and Information Quality 2, 2, Article 8 (feb 2011), 28 pages. https://doi.org/10.1145/1891879.1891881
- Latent Dirichlet Allocation. Journal of Machine Learning Research 3 (January 2003), 993–1022. Submitted 2/02; Published 1/03.
- Netflix Technology Blog. 2017. Toward a practical perceptual video quality metric. https://netflixtechblog.com/toward-a-practical-perceptual-video-quality-metric-653f208b9652
- Visual Interactive Creation, Customization, and Analysis of Data Quality Metrics. J. Data and Information Quality 10, 1, Article 3 (may 2018), 26 pages. https://doi.org/10.1145/3190578
- Dealing with overlap and imbalance: a new metric and approach. Pattern Anal Applic 21, 2 (2018), 381–395. https://doi.org/10.1007/s10044-016-0583-6
- LOF: Identifying density-based local outliers. In Proc. ACM SIGMOD Int. Conf. Manage. Data. 93–104.
- The Privacy Onion Effect: Memorization is Relative. arXiv:2206.10469 [cs.LG]
- Data preprocessing to mitigate bias: A maximum entropy based approach. arXiv:1906.02164 [cs.LG]
- Damon M. Chandler and Sheila S. Hemami. 2007. VSNR: A Wavelet-Based Visual Signal-to-Noise Ratio for Natural Images. IEEE Transactions on Image Processing 16, 9 (2007), 2284–2298. https://doi.org/10.1109/TIP.2007.901820
- FAIRshake: toolkit to evaluate the FAIRness of research digital resources. Cell systems 9, 5 (2019), 417–421.
- J. Cohen. 1960. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20, 1 (1960), 37–46. https://doi.org/10.1177/001316446002000104
- Meri Coleman and Ta Lin Liau. 1975. A computer readability formula designed for machine scoring. J. of Applied Psychology 60 (1975), 283–284.
- Rapid Identification of Column Heterogeneity. In Sixth International Conference on Data Mining (ICDM’06). 159–170. https://doi.org/10.1109/ICDM.2006.132
- John C. Davis and Robert J. Sampson. 1986. Statistics and Data Analysis in Geology. Vol. 646. Wiley, New York.
- Ali Degirmenci and Omer Karal. 2021. Robust Incremental Outlier Detection Approach Based on a New Metric in Data Streams. IEEE Access 9 (2021), 160347–160360. https://doi.org/10.1109/ACCESS.2021.3131402
- IBM Developer. 2021. IBM Data Quality AI Toolkit. https://developer.ibm.com/learningpaths/data-quality-ai-toolkit/overview/ Date accessed: June 12, 2023.
- Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml.
- Pattern Classification. John Wiley & Sons.
- SHAPr: An Efficient and Versatile Membership Privacy Risk Metric for Machine Learning. arXiv:2112.02230 [cs.CR]
- Editor. 2020. Preparing your dataset for Machine Learning: 10 basic techniques that make your data better. https://www.altexsoft.com/blog/datascience/preparing-your-dataset-for-machine-learning-8-basic-techniques-that-make-your-data-better/
- Duplicate Record Detection: A Survey. IEEE Transactions on Knowledge and Data Engineering 19, 1 (2007), 1–16. https://doi.org/10.1109/TKDE.2007.250581
- Certifying and Removing Disparate Impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Sydney, NSW, Australia) (KDD ’15). Association for Computing Machinery, New York, NY, USA, 259–268. https://doi.org/10.1145/2783258.2783311
- Rudolf Flesch. 1986. The Art of Readable Writing (19th print.-collier books ed ed.). MacMillan.
- George Forman. 2003. An Extensive Empirical Study of Feature Selection Metrics for Text Classification. J. Mach. Learn. Res. 3 (mar 2003), 1289–1305.
- Datasheets for Datasets. arXiv preprint arXiv:1803.09010 (2018).
- Amirata Ghorbani and James Zou. 2019. Data Shapley: Equitable Valuation of Data for Machine Learning. arXiv:1904.02868 [stat.ML]
- C. Gini. 1912. Variability and Mutability: Contribution to the Study of Statistical Distribution and Relations. Studi Economico-Giuridici della R (1912).
- Data Quality Toolkit: Automatic assessment of data quality and remediation for machine learning datasets. arXiv:2108.05935 [cs.LG]
- Mark A. Hall and Lloyd A. Smith. 1999. Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper. In FLAIRS. 235–239.
- Hugh Harvey and Ben Glocker. 2019. A Standardised Approach for Preparing Imaging Data for Machine Learning Tasks in Radiology. Springer International Publishing, Cham, 61–72. https://doi.org/10.1007/978-3-319-94878-2_6
- Simon S. Haykin. 2009. Neural networks and learning machines (third ed.). Pearson Education, Upper Saddle River, NJ.
- Laplacian Score for Feature Selection. In NIPS. 507–514.
- Bernd Heinrich and Mathias Klier. 2015. Metric-based data quality assessment — Developing and evaluating a probability-based currency metric. Decision Support Systems 72 (2015), 82–96. https://doi.org/10.1016/j.dss.2015.02.009
- Q. Huynh-Thu and M. Ghanbari. 2008. Scope of validity of PSNR in image/video quality assessment. Electronics Letters 44, 13 (Jun 19 2008), 1–2. http://proxy.lib.ohio-state.edu/login?url=https://www.proquest.com/scholarly-journals/scope-validity-psnr-image-video-quality/docview/1625957339/se-2 Copyright - Copyright The Institution of Engineering & Technology Jun 19, 2008; Document feature - Graphs; Tables; ; Last updated - 2015-03-27; CODEN - ELLEAK.
- Helen Hwang. 2022. New AI readiness report reveals insights into ML lifecycle. https://www.datacenterknowledge.com/machine-learning/new-ai-readiness-report-reveals-insights-ml-lifecycle. Accessed on May 15, 2023.
- International Telecommunication Union. 2018. ITU-T Recommendation P.808: Subjective Evaluation of Speech Quality with a Crowdsourcing Approach. Technical Report. International Telecommunication Union, Geneva.
- F. Itakura and S. Saito. 1968. Analysis Synthesis Telephony Based on the Maximum Likelihood Method. In Proc. 6th Int. Congr. Acoust. Tokyo, Japan, C–17–C–20.
- M.A. Jaro. 1976. Unimatch: A Record Linkage System: User’s Manual. Technical Report. US Bureau of the Census, Washington, D.C.
- N.C. Jayant and P. Noll. 1984. Digital Coding of Waveforms: Principles and Applications to Speech and Video. Prentice Hall, NJ, USA.
- OpenDataVal: a Unified Benchmark for Data Valuation. arXiv:2306.10577 [cs.LG]
- Matthew B Jones and Peter Slaughter. 2019. https://www.dataone.org/uploads/dataonewebinar_jonesslaughter_fairmetadata_190514.pdf
- V. Roshan Joseph. 2022. Optimal Ratio for Data Splitting. Statistical Analysis and Data Mining: The ASA Data Science Journal 15, 4 (August 2022), 531–538. https://doi.org/10.1002/sam.11583
- A Benchmark for Data Imputation Methods. Frontiers in Big Data 4 (2021), 693674. https://doi.org/10.3389/fdata.2021.693674
- [PDF] how to measure data quality? - A metric-based approach: Semantic scholar. https://www.semanticscholar.org/paper/How-to-Measure-Data-Quality-A-Metric-Based-Approach-Kaiser-Klier/afcdf53c5a88f3320c861ad3f09f28237b6744cb
- Martin Kemka. 2019. Learning Amazon Sagemaker. https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-data-bias-metric-cddl.html
- Paras Lakhani. 2020. The Importance of Image Resolution in Building Deep Learning Models for Medical Imaging. Radiology: Artificial Intelligence 2, 1 (2020), e190177. https://doi.org/10.1148/ryai.2019190177
- Annotation quality framework-accuracy, credibility, and consistency. In NEURIPS 2021 Workshop for Data Centric AI.
- Neil D. Lawrence. 2017. Data Readiness Levels. arXiv:1705.02245 [cs.DB]
- George Lawton. 2022. Data Preparation in Machine Learning: 6 key steps. https://www.techtarget.com/searchbusinessanalytics/feature/Data-preparation-in-machine-learning-6-key-steps
- V.I. Levenshtein. 1965. Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Doklady Akademii Nauk SSSR 163, 4 (1965), 845–848. Original in Russian—translation in Soviet Physics Doklady, vol. 10, no. 8, pp. 707–710, 1966.
- David D. Lewis. 1992. Feature Selection and Feature Extraction for Text Categorization. In Workshop on Speech and Natural Language. 212–217.
- Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. J. Exp. Social Psychol. 49, 4 (2013), 764–766.
- Feature Selection: A Data Perspective. ACM Comput. Surv. 50, 6, Article 94 (dec 2017), 45 pages. https://doi.org/10.1145/3136625
- CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). 13–24. https://doi.org/10.1109/ICDE51399.2021.00009
- Visual distortion gauge based on discrimination of noticeable contrast changes. IEEE transactions on circuits and systems for video technology 15, 7 (2005), 900–909.
- Weisi Lin and C.-C. Jay Kuo. 2011. Perceptual visual quality metrics: A survey. Journal of Visual Communication and Image Representation 22, 4 (2011), 297–312. https://doi.org/10.1016/j.jvcir.2011.01.005
- Huan Liu and Rudy Setiono. 1995. Chi2: Feature Selection and Discretization of Numeric Attributes. In ICTAI. 388–391.
- An ADMM-based Framework for AutoML Pipeline Configuration. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 4892–4899.
- Entropy as a Measure of Average Loss of Privacy. Thai Journal of Mathematics (2017), 7–15. https://api.semanticscholar.org/CorpusID:6672504
- Bayes Imbalance Impact Index: A Measure of Class Imbalanced Dataset for Classification Problem. arXiv:1901.10173 [cs.LG]
- H. P. Luhn. 1957. A Statistical Approach to Mechanized Encoding and Searching of Literary Information. IBM Journal of Research and Development 1, 4 (1957), 309–317. https://doi.org/10.1147/rd.14.0309
- A no-reference perceptual blur metric. In Proceedings. International conference on image processing, Vol. 3. IEEE, III–III.
- Philip M McCarthy. 2005. An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual, lexical diversity (MTLD). Ph. D. Dissertation. The University of Memphis.
- Peter M. McCarthy and Scott Jarvis. 2010. MTLD, VOCD-D, and HD-D: A Validation Study of Sophisticated Approaches to Lexical Diversity Assessment. Behavior Research Methods 42, 2 (2010), 381–392. https://doi.org/10.3758/BRM.42.2.381
- Optimizing Semantic Coherence in Topic Models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (Edinburgh, United Kingdom) (EMNLP ’11). Association for Computational Linguistics, USA, 262–272.
- Model Cards for Model Reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 220–229.
- A.E. Monge and C.P. Elkan. 1996. The Field Matching Problem: Algorithms and Applications. In Proc. Second Int’l Conf. Knowledge Discovery and Data Mining (KDD ’96). 267–270.
- Evaluating Topic Models for Digital Libraries. In Proceedings of the 10th Annual Joint Conference on Digital Libraries (Gold Coast, Queensland, Australia) (JCDL ’10). Association for Computing Machinery, New York, NY, USA, 215–224. https://doi.org/10.1145/1816123.1816156
- Trace Ratio Criterion for Feature Selection. In AAAI. 671–676.
- Bias in data-driven artificial intelligence systems—An introductory survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 10, 3 (2020), e1356. https://doi.org/10.1002/widm.1356
- Sejong Oh. 2011. A new dataset evaluation method based on category overlap. Computers in Biology and Medicine 41, 2 (2011), 115–122. https://doi.org/10.1016/j.compbiomed.2010.12.006
- Measuring the class-imbalance extent of multi-class problems. Pattern Recognition Letters 98 (2017), 32–38. https://doi.org/10.1016/j.patrec.2017.08.002
- Bias in Word Embeddings. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona, Spain) (FAT* ’20). Association for Computing Machinery, New York, NY, USA, 446–457. https://doi.org/10.1145/3351095.3372843
- Automatic Assessment of Quality of Your Data for AI. In 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD) (Bangalore, India) (CODS-COMAD 2022). Association for Computing Machinery, New York, NY, USA, 354–357. https://doi.org/10.1145/3493700.3493774
- A Data Centric AI Framework for Automating Exploratory Data Analysis and Data Quality Tasks. J. Data and Information Quality (jun 2023). https://doi.org/10.1145/3603709
- Data Quality Assessment. Commun. ACM 45, 4 (apr 2002), 211–218. https://doi.org/10.1145/505248.506010
- Incremental local outlier detection for data streams. In Proc. IEEE Symp. Comput. Intell. Data Mining. 504–515.
- A Survey of Data Quality Requirements That Matter in ML Development Pipelines. J. Data and Information Quality (apr 2023). https://doi.org/10.1145/3592616 Just Accepted.
- Shahzad Qaiser and Ramsha Ali. 2018. Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents. International Journal of Computer Applications 181 (07 2018). https://doi.org/10.5120/ijca2018917395
- Juan Ramos et al. 2003. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, Vol. 242. Citeseer, 29–48.
- FAIR principles for AI models with a practical application for accelerated high energy diffraction microscopy. Scientific Data 9, 1 (nov 2022). https://doi.org/10.1038/s41597-022-01712-9
- Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), Vol. 2. 749–752 vol.2. https://doi.org/10.1109/ICASSP.2001.941023
- Marko Robnik-Šikonja and Igor Kononenko. 2003. Theoretical and Empirical Analysis of ReliefF and RReliefF. Machine Learning 53, 1-2 (2003), 23–69.
- Exploring the Space of Topic Coherence Measures. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining (Shanghai, China) (WSDM ’15). Association for Computing Machinery, New York, NY, USA, 399–408. https://doi.org/10.1145/2684822.2685324
- Bernard Rosner. 1983. Percentage points for a generalized ESD many-outlier procedure. Technometrics 25, 2 (1983), 165–172.
- Peter J. Rousseeuw and Mia Hubert. 2018. Anomaly detection by robust statistics. WIREs Data Mining Knowl. Discovery 8, 2 (Mar. 2018), e1236.
- R.C. Russell. 1922. Index. http://patft.uspto.gov/netahtml/srchnum.htm
- Carl F. Sabottke and Bradley M. Spieler. 2020. The Effect of Image Resolution on Deep Learning in Radiography. Radiology: Artificial Intelligence 2, 1 (2020), e190015. https://doi.org/10.1148/ryai.2019190015
- How distance metrics influence missing data imputation with k-nearest neighbours. Pattern Recognition Letters 136 (2020), 111–119. https://doi.org/10.1016/j.patrec.2020.05.032
- Ron Schmelzer. 2019. The Achilles’ Heel of AI. https://www.forbes.com/sites/cognitiveworld/2019/03/07/the-achilles-heel-of-ai/?sh=20e53e4d7be7
- Representation Bias in Data: A Survey on Identification and Resolution Techniques. ACM Comput. Surv. (mar 2023). https://doi.org/10.1145/3588433 Just Accepted.
- H.R. Sheikh and A.C. Bovik. 2006. Image information and visual quality. IEEE Transactions on Image Processing 15, 2 (2006), 430–444. https://doi.org/10.1109/TIP.2005.859378
- Data quality: A survey of data quality dimensions. In 2012 International Conference on Information Retrieval & Knowledge Management. 300–304. https://doi.org/10.1109/InfRKM.2012.6204995
- Simha. 2021. Understanding TF-IDF for Machine Learning. https://www.capitalone.com/tech/machine-learning/understanding-tf-idf/
- Metrics for Identifying Bias in Datasets. SYSTEM (2021).
- E. Simpson. 1949. Measurement of Diversity. Nature 163, 688 (1949), 688. https://doi.org/10.1038/163688a0
- Liwei Song and Prateek Mittal. 2021. Systematic Evaluation of Privacy Risks of Machine Learning Models. In 30th USENIX Security Symposium (USENIX Security 21). USENIX Association, 2615–2632. https://www.usenix.org/conference/usenixsecurity21/presentation/song
- Karen Sparck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. J. of Documentation 28, 1 (1972), 11–21.
- A short-time objective intelligibility measure for time-frequency weighted noisy speech. In 2010 IEEE Int. Conf. on Acoustics, Speech and Signal Processing. 4214–4217. https://doi.org/10.1109/ICASSP.2010.5495701
- Data Preparation for Machine Learning: 5 critical steps to ensure AI success. https://www.informatica.com/blogs/data-preparation-for-machine-learning-5-critical-steps-to-ensure-ai-success.html Accessed 26 June 2023.
- Maxine Templin. 1957. Certain Language Skills in Children. University of Minnesota Press, Minneapolis.
- Kim-Han Thung and Paramesran Raveendran. 2009. A survey of image quality measures. In 2009 International Conference for Technical Postgraduates (TECHPOS). 1–4. https://doi.org/10.1109/TECHPOS.2009.5412098
- Privacy risk quantification in education data using Markov model. British Journal of Educational Technology 53, 4 (2022), 804–821. https://doi.org/10.1111/bjet.13223 arXiv:https://bera-journals.onlinelibrary.wiley.com/doi/pdf/10.1111/bjet.13223
- Isabel Wagner and David Eckhoff. 2018. Technical Privacy Metrics: A Systematic Survey. ACM Comput. Surv. 51, 3, Article 57 (jun 2018), 38 pages. https://doi.org/10.1145/3168389
- Jiachen T. Wang and Ruoxi Jia. 2023. Data Banzhaf: A Robust Data Valuation Framework for Machine Learning. arXiv:2205.15466 [cs.LG]
- Zhou Wang and A.C. Bovik. 2002. A universal image quality index. IEEE Signal Processing Letters 9, 3 (2002), 81–84. https://doi.org/10.1109/97.995823
- Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612. https://doi.org/10.1109/TIP.2003.819861
- Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, Vol. 2. 1398–1402 Vol.2. https://doi.org/10.1109/ACSSC.2003.1292216
- Some Biological Sequence Metrics. Advances in Math. 20, 4 (1976), 367–387.
- Wes McKinney. 2010. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Stéfan van der Walt and Jarrod Millman (Eds.). 56 – 61. https://doi.org/10.25080/Majora-92bf1922-00a
- A design framework and exemplar metrics for fairness. https://www.nature.com/articles/sdata2018118
- Alex Woodie. 2020. Data Prep Still Dominates Data Scientists’ Time, Survey Finds. https://www.datanami.com/2020/07/06/data-prep-still-dominates-data-scientists-time-survey-finds/. Accessed on May 15, 2023.
- Mehdi Yalaoui and Saida Boukhedouma. 2021. A survey on data quality: principles, taxonomies and comparison of approaches. In 2021 International Conference on Information Systems and Advanced Technologies (ICISAT). 1–9. https://doi.org/10.1109/ICISAT54145.2021.9678209
- Zheng Zhao and Huan Liu. 2007. Spectral Feature Selection for Supervised and Unsupervised Learning. In ICML. 1151–1157.
- LRID: A new metric of multi-class imbalance degree based on likelihood-ratio test. Pattern Recognition Letters 116 (2018), 36–42. https://doi.org/10.1016/j.patrec.2018.09.012