A Survey of Dataset Refinement for Problems in Computer Vision Datasets (2210.11717v2)
Abstract: Large-scale datasets have played a crucial role in the advancement of computer vision. However, they often suffer from problems such as class imbalance, noisy labels, dataset bias, or high resource costs, which can inhibit model performance and reduce trustworthiness. With the advocacy of data-centric research, various data-centric solutions have been proposed to solve the dataset problems mentioned above. They improve the quality of datasets by re-organizing them, which we call dataset refinement. In this survey, we provide a comprehensive and structured overview of recent advances in dataset refinement for problematic computer vision datasets. Firstly, we summarize and analyze the various problems encountered in large-scale computer vision datasets. Then, we classify the dataset refinement algorithms into three categories based on the refinement process: data sampling, data subset selection, and active learning. In addition, we organize these dataset refinement methods according to the addressed data problems and provide a systematic comparative description. We point out that these three types of dataset refinement have distinct advantages and disadvantages for dataset problems, which informs the choice of the data-centric method appropriate to a particular research objective. Finally, we summarize the current literature and propose potential future research topics.
- Variance reduction in sgd by distributed importance sampling. arXiv preprint arXiv:1511.06481 (2015).
- Görkem Algan and Ilkay Ulusoy. 2021. Image classification with deep learning in the presence of noisy labels: A survey. Knowledge-Based Systems 215 (2021), 106771.
- Gradient based sample selection for online continual learning. Advances in neural information processing systems (NIPS) 32 (2019), 11816–11825.
- How Important is Importance Sampling for Deep Budgeted training?. In British Machine Vision Conference (BMVC). BMVA Press, 335.
- A closer look at memorization in deep networks. In International Conference on Machine Learning (ICML), Vol. 70. PMLR, Sydney, NSW, Australia, 233–242.
- Deep batch active learning by diverse, uncertain gradient lower bounds. arXiv preprint arXiv:1906.03671 (2019).
- Hybridized sine cosine algorithm with convolutional neural networks dropout regularization application. Scientific Reports 12, 1 (2022), 1–20.
- Coresets for nonparametric estimation-the case of DP-means. In International Conference on Machine Learning (ICML), Vol. 37. PMLR, Lille, France, 209–217.
- Recognition in terra incognita. In Proceedings of the European conference on computer vision (ECCV). Springer, Munich, Germany, 456–473.
- The power of ensembles for active learning in image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Salt Lake City, UT, USA, 9368–9377.
- Curriculum Learning. In Proceedings of the 26th Annual International Conference on Machine Learning. Association for Computing Machinery, New York, NY, USA, 41–48.
- Are we done with imagenet? arXiv preprint arXiv:2006.07159 (2020).
- Mustafa Bilgic and Lise Getoor. 2009. Link-based active learning. In NIPS Workshop on Analyzing Networks and Learning with Graphs, Vol. 4. 9.
- Semantic Redundancies in Image-Classification Datasets: The 10% You Don’t Need. arXiv preprint arXiv:1901.11409 (2019).
- Coresets via bilevel optimization for continual learning and streaming. Advances in Neural Information Processing Systems (NIPS) 33 (2020), 14879–14890.
- Carla E Brodley and Mark A Friedl. 1999. Identifying mislabeled training data. Journal of Artificial Intelligence Research 11 (1999), 131–167.
- Trevor Campbell and Tamara Broderick. 2018. Bayesian coreset construction via greedy iterative geodesic ascent. In International Conference on Machine Learning (ICML), Vol. 80. PMLR, Stockholm, Sweden, 698–706.
- Domain balancing: Face recognition on long-tailed domains. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, 5671–5679.
- Dataset distillation by matching training trajectories. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New Orleans, LA, USA, 4750–4759.
- Active bias: Training more accurate neural networks by emphasizing high variance samples. Advances in Neural Information Processing Systems (NIPS) 30 (2017), 1002–1012.
- SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002), 321–357.
- Understanding and utilizing deep neural networks trained with noisy labels. In International Conference on Machine Learning (ICML), Vol. 97. PMLR, Long Beach, California, USA, 1062–1070.
- Learning with instance-dependent label noise: A sample sieve approach. arXiv preprint arXiv:2010.02347 (2020).
- Selection via proxy: Efficient data selection for deep learning. arXiv preprint arXiv:1906.11829 (2019).
- A meta-learning approach to one-step active learning. arXiv preprint arXiv:1706.08334 (2017).
- R Dennis Cook and Sanford Weisberg. 1980. Characterizations of an empirical influence function for detecting influential cases in regression. Technometrics 22, 4 (1980), 495–508.
- Filipe R Cordeiro and Gustavo Carneiro. 2020. A Survey on Deep Learning with Noisy Labels: How to train your model when you cannot trust on the annotations?. In 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI). IEEE, 9–16.
- Batch-mode active-learning methods for the interactive classification of remote sensing images. IEEE Transactions on Geoscience and Remote Sensing 49, 3 (2010), 1014–1031.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Miami, Florida, USA, 248–255.
- Robust multiobjective evolutionary feature subset selection algorithm for binary classification using machine learning techniques. Neurocomputing 241 (2017), 128–146.
- A comprehensive survey on recent metaheuristics for feature selection. Neurocomputing 494 (2022), 269–296.
- C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In Workshop on learning from imbalanced datasets II, Vol. 11. Citeseer, 1–8.
- Restricted strong convexity implies weak submodularity. The Annals of Statistics 46, 6B (2018), 3539–3568.
- Dissimilarity-based sparse subset selection. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 11 (2015), 2182–2197.
- A convex optimization framework for active learning. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). IEEE Computer Society, Sydney, Australia, 209–216.
- A survey on bias in visual datasets. Computer Vision and Image Understanding 223 (2022), 103552.
- Learning what data to learn. arXiv preprint arXiv:1702.08635 (2017).
- Learning how to active learn: A deep reinforcement learning approach. arXiv preprint arXiv:1708.02383 (2017).
- Reza Zanjirani Farahani and Masoud Hekmatfar. 2009. Facility location: concepts, models, algorithms and case studies. Springer Science & Business Media, 1–545.
- Dan Feldman. 2020. Core-Sets: Updated Survey. Springer International Publishing, Cham, 23–44.
- Hybrid machine learning methods combined with computer vision approaches to estimate biophysical parameters of pastures. Evolutionary Intelligence (2022), 1–14.
- Satoru Fujishige. 2005. Submodular functions and optimization. Elsevier Science.
- Shortcut learning in deep neural networks. Nature Machine Intelligence 2, 11 (2020), 665–673.
- ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231 (2018).
- Amirata Ghorbani and James Zou. 2019. Data shapley: Equitable valuation of data for machine learning. In International Conference on Machine Learning (ICML), Vol. 97. PMLR, Long Beach, California, USA, 2242–2251.
- Towards understanding deep learning from noisy labels with small-loss criterion. arXiv preprint arXiv:2106.09291 (2021).
- Deepcore: A comprehensive library for coreset selection in deep learning. In Database and Expert Systems Applications: 33rd International Conference, DEXA 2022, Vienna, Austria, August 22–24, 2022, Proceedings, Part I. Springer, Vienna, Austria, 181–195.
- Curriculumnet: Weakly supervised learning from large-scale web images. In Proceedings of the European conference on computer vision (ECCV). Springer, Munich, Germany, 135–150.
- Yuhong Guo. 2010. Active instance sampling via matrix partition. Advances in Neural Information Processing Systems 23 (2010), 802–810.
- Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in neural information processing systems (NIPS) 31 (2018), 8536–8546.
- SlimML: Removing non-critical input data in large-scale iterative machine learning. IEEE Transactions on Knowledge and Data Engineering 33, 5 (2019), 2223–2236.
- Sariel Har-Peled and Akash Kushal. 2007. Smaller coresets for k-median and k-means clustering. Discrete & Computational Geometry 37, 1 (2007), 3–19.
- Deep active learning with adaptive acquisition. arXiv preprint arXiv:1906.11471 (2019).
- Haibo He and Edwardo A Garcia. 2009. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21, 9 (2009), 1263–1284.
- Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Las Vegas, NV, USA, 770–778.
- O2u-net: A simple noisy label detection approach for deep neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Seoul, South Korea, 3326–3334.
- Coresets for scalable Bayesian logistic regression. Advances in Neural Information Processing Systems (NIPS) 29 (2016), 4080–4088.
- David Isele and Akansel Cosgun. 2018. Selective experience replay for lifelong learning. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, New Orleans, LA, USA, 3302–3309.
- Accelerating deep learning by focusing on the biggest losers. arXiv preprint arXiv:1910.00762 (2019).
- Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In International Conference on Machine Learning (ICML), Vol. 80. PMLR, Stockholm, Sweden, 2304–2313.
- Multi-class active learning for image classification. In 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Miami, Florida, USA, 2372–2379.
- Relational subsets knowledge distillation for long-tailed retinal diseases recognition. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 3–12.
- Unicon: Combating label noise through uniform selection and contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New Orleans, LA, USA, 9676–9686.
- Angelos Katharopoulos and François Fleuret. 2018. Not all samples are created equal: Deep learning with importance sampling. In International Conference on Machine Learning (ICML), Vol. 80. PMLR, Stockholm, Sweden, 2525–2534.
- A systematic review on imbalanced data challenges in machine learning: Applications and solutions. ACM Computing Surveys (CSUR) 52, 4 (2019), 1–36.
- Learning from less data: A unified data subset selection and active learning framework for computer vision. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, United States, 1289–1299.
- Kenji Kawaguchi and Haihao Lu. 2020. Ordered sgd: A new stochastic optimization framework for empirical risk minimization. In International Conference on Artificial Intelligence and Statistics, Vol. 108. PMLR, Palermo, Sicily, Italy, 669–679.
- Dynabench: Rethinking benchmarking in NLP. arXiv preprint arXiv:2104.14337 (2021).
- AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient Hyper-parameter Tuning. arXiv preprint arXiv:2203.08212 (2022).
- Grad-match: Gradient matching based data subset selection for efficient deep model training. In International Conference on Machine Learning (ICML), Vol. 139. PMLR, 5464–5474.
- Glister: Generalization based data subset selection for efficient and robust learning. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, 8110–8118.
- Retrieve: Coreset selection for efficient and robust semi-supervised learning. Advances in Neural Information Processing Systems (NIPS) 34 (2021), 14488–14501.
- Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. In International Conference on Machine Learning (ICML), Vol. 70. PMLR, Sydney, NSW, Australia, 1885–1894.
- Prism: A rich class of parameterized submodular information measures for guided data subset selection. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, 10238–10246.
- Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Tech. Rep 1 (01 2009).
- Self-paced learning for latent variable models. Advances in Neural Information Processing Systems (NIPS) 23 (2010), 1189–1197.
- Are all training examples equally valuable? arXiv preprint arXiv:1311.6510 (2013).
- Adversarial filters of dataset biases. In International Conference on Machine Learning (ICML), Vol. 119. PMLR, 1078–1088.
- Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.
- Meta-learning with differentiable convex optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, 10657–10665.
- David D Lewis. 1995. A sequential algorithm for training text classifiers: Corrigendum and additional data. In Acm Sigir Forum. ACM New York, NY, USA, 13–19.
- Feature selection: A data perspective. ACM computing surveys (CSUR) 50, 6 (2017), 1–45.
- Resound: Towards action recognition without representation bias. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, Tel Aviv, Israel, 513–528.
- Yi Li and Nuno Vasconcelos. 2019. Repair: Removing representation bias by dataset resampling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, 9572–9581.
- Advances, challenges and opportunities in creating data for trustworthy AI. Nature Machine Intelligence 4 (2022), 669–677.
- Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). IEEE, Venice, Italy, 2980–2988.
- A Survey on Active Deep Learning: From Model Driven to Data Driven. ACM Computing Surveys (CSUR) 54, 10s (sep. 2022), 34 pages.
- Tongliang Liu and Dacheng Tao. 2015. Classification with noisy labels by importance reweighting. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 3 (2015), 447–461.
- Ilya Loshchilov and Frank Hutter. 2015. Online batch selection for faster training of neural networks. arXiv preprint arXiv:1511.06343 (2015).
- Variance reduced training with stratified sampling for forecasting models. In International Conference on Machine Learning (ICML), Vol. 139. PMLR, 7145–7155.
- Yueming Lyu and Ivor W Tsang. 2019. Curriculum loss: Robust learning and generalization against label corruption. arXiv preprint arXiv:1905.10045 (2019).
- Eran Malach and Shai Shalev-Shwartz. 2017. Decoupling” when to update” from” how to update”. Advances in neural information processing systems (NIPS) 30 (2017), 960–970.
- Coresets for data-efficient training of machine learning models. In International Conference on Machine Learning (ICML), Vol. 119. PMLR, 6950–6960.
- Coresets for robust training of deep neural networks against noisy labels. Advances in Neural Information Processing Systems (NIPS) 33 (2020), 11465–11477.
- Is your data relevant?: Dynamic selection of relevant data for federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, 7859–7867.
- An analysis of approximations for maximizing submodular set functions—I. Mathematical programming 14, 1 (1978), 265–294.
- A study of the effect of different types of noise on the precision of supervised learning techniques. Artificial Intelligence Review 33, 4 (2010), 275–306.
- Vincent Ng and Claire Cardie. 2002. Combining sample selection and error-driven pruning for machine learning of coreference rules. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002). ACL, Philadelphia, PA, USA, 55–62.
- Self: Learning to filter noisy labels with self-ensembling. arXiv preprint arXiv:1910.01842 (2019).
- Hieu T Nguyen and Arnold Smeulders. 2004. Active learning using pre-clustering. In Proceedings of the twenty-first International Conference on Machine Learning. Association for Computing Machinery, New York, NY, USA, 79.
- Confident learning: Estimating uncertainty in dataset labels. Journal of Artificial Intelligence Research 70 (2021), 1373–1411.
- Learning with confident examples: Rank pruning for robust classification with noisy labels. arXiv preprint arXiv:1705.01936 (2017).
- A review of instance selection methods. Artificial Intelligence Review 34, 2 (2010), 133–143.
- Influence-Balanced Loss for Imbalanced Visual Classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Montreal, BC, Canada, 735–744.
- Active learning by feature mixing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New Orleans, LA, USA, 12237–12246.
- Deep learning on a data diet: Finding important examples early in training. Advances in Neural Information Processing Systems (NIPS) 34 (2021), 20596–20607.
- Accelerating minibatch stochastic gradient descent using typicality sampling. IEEE Transactions on Neural Networks and Learning Systems 31, 11 (2019), 4649–4659.
- Identifying mislabeled data using the area under the margin ranking. Advances in Neural Information Processing Systems 33 (2020), 17044–17056.
- Low-Shot Validation: Active Importance Sampling for Estimating Classifier Performance on Rare Categories. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Montreal, BC, Canada, 10705–10714.
- Adaptive Second Order Coresets for Data-efficient Machine Learning. In International Conference on Machine Learning (ICML), Vol. 162. PMLR, Baltimore, Maryland, USA, 17848–17869.
- Deep active learning for image classification. In 2017 IEEE International Conference on Image Processing (ICIP). IEEE, Beijing, China, 3934–3938.
- Sachin Ravi and Hugo Larochelle. 2017. Optimization as a Model for Few-Shot Learning. In International Conference on Learning Representations. OpenReview.net, Palais des Congrès Neptune, Toulon, France, 1–11. https://openreview.net/forum?id=rJY0-Kcll
- Sachin Ravi and Hugo Larochelle. 2018. Meta-learning for batch mode active learning. (2018). https://openreview.net/forum?id=r1PsGFJPz
- Balanced Meta-Softmax for Long-Tailed Visual Recognition. In Advances in Neural Information Processing Systems (NIPS), Vol. 33. Curran Associates, Inc., 4175–4186.
- Learning to reweight examples for robust deep learning. In International Conference on Machine Learning (ICML), Vol. 80. PMLR, Stockholm, Sweden, 4334–4343.
- A survey of deep active learning. ACM computing surveys (CSUR) 54, 9 (2021), 1–40.
- Cognitive psychology for deep neural networks: A shape bias case study. In International Conference on Machine Learning (ICML), Vol. 70. PMLR, Sydney, NSW, Australia, 2940–2949.
- Deep active learning in remote sensing for data efficient change detection. In Proceedings of MACLEAN: MAChine Learning for EArth ObservatioN Workshop co-located with the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2020), Vol. 2766. RWTH Aachen University.
- An investigation of why overparameterization exacerbates spurious correlations. In International Conference on Machine Learning (ICML), Vol. 119. PMLR, 8346–8356.
- Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 1 (2017), 83–112.
- Green ai. Commun. ACM 63, 12 (2020), 54–63.
- Ozan Sener and Silvio Savarese. 2017a. Active learning for convolutional neural networks: A core-set approach. arXiv preprint arXiv:1708.00489 (2017).
- Ozan Sener and Silvio Savarese. 2017b. A geometric approach to active learning for convolutional neural networks. arXiv preprint arXiv:1708.00489 7 (2017).
- Burr Settles. 2012. Active learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 6, 1 (2012), 1–114.
- Query by committee. In Proceedings of the fifth Annual workshop on Computational Learning Theory. Association for Computing Machinery, New York, NY, USA, 287–294.
- Yanyao Shen and Sujay Sanghavi. 2019. Learning with bad training data via iterative trimmed loss minimization. In International Conference on Machine Learning (ICML), Vol. 97. PMLR, Long Beach, California, USA, 5739–5748.
- Connor Shorten and Taghi M Khoshgoftaar. 2019. A survey on image data augmentation for deep learning. Journal of Big Data 6, 1 (2019), 1–48.
- Training region-based object detectors with online hard example mining. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Las Vegas, NV, USA, 761–769.
- Meta-weight-net: Learning an explicit mapping for sample weighting. Advances in Neural Information Processing Systems (NIPS) 32 (2019), 1917–1928.
- CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep Learning. arXiv preprint arXiv:2202.05613 (2022).
- Selfie: Refurbishing unclean samples for robust deep learning. In International Conference on Machine Learning (ICML), Vol. 97. PMLR, Long Beach, California, USA, 5907–5915.
- How does early stopping help generalization against label noise? arXiv preprint arXiv:1911.08059 (2019).
- Robust learning by self-transition for handling noisy labels. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. Association for Computing Machinery, New York, NY, USA, 1490–1500.
- Learning from noisy labels with deep neural networks: A survey. IEEE Transactions on Neural Networks and Learning Systems (2022), 1–19.
- Energy and policy considerations for deep learning in NLP. arXiv preprint arXiv:1906.02243 (2019).
- Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). IEEE, Venice, Italy, 843–852.
- AutoSampling: Search for effective data sampling schedules. In International Conference on Machine Learning (ICML), Vol. 139. PMLR, virtual, 9923–9933.
- Crssc: salvage reusable samples from noisy data for robust learning. In Proceedings of the 28th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, 92–101.
- Dataset cartography: Mapping and diagnosing datasets with training dynamics. arXiv preprint arXiv:2009.10795 (2020).
- A deeper look at dataset bias. Springer, Cham, 37–55.
- An empirical study of example forgetting during deep neural network learning. arXiv preprint arXiv:1812.05159 (2018).
- Antonio Torralba. 2003. Contextual priming for object detection. International Journal of Computer Vision 53, 2 (2003), 169–191.
- Antonio Torralba and Alexei A Efros. 2011. Unbiased look at dataset bias. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Colorado Springs, CO, USA, 1521–1528.
- Core vector machines: Fast SVM training on very large data sets. Journal of Machine Learning Research 6, 4 (2005), 363–392.
- Acute Lymphoblastic Leukemia Detection by Tuned Convolutional Neural Network. In 2022 32nd International Conference Radioelektronika (RADIOELEKTRONIKA). IEEE, 1–4.
- The inaturalist species classification and detection dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Salt Lake City, UT, USA, 8769–8778.
- Multiclass learning with partially corrupted labels. IEEE Transactions on Neural Networks and Learning Systems 29, 6 (2017), 2568–2580.
- Data dropout: Optimizing training data for convolutional neural networks. In 2018 IEEE 30th international conference on tools with artificial intelligence (ICTAI). IEEE, Greece, 39–46.
- The devil is in classification: A simple framework for long-tail instance segmentation. In European Conference on Computer Vision. Springer, Cham, 728–744.
- Dataset distillation. arXiv preprint arXiv:1811.10959 (2018).
- A survey on curriculum learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 9 (2021), 4555–4576.
- Dynamic curriculum learning for imbalanced data classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Seoul, South Korea, 5017–5026.
- E2-train: Training state-of-the-art cnns with over 80% energy savings. Advances in Neural Information Processing Systems (NIPS) 32 (2019), 5139–5151.
- Less is better: Unweighted data subsampling via influence function. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, 6340–6347.
- Submodularity in data subset selection and active learning. In International Conference on Machine Learning (ICML), Vol. 37. PMLR, Lille, France, 1954–1963.
- Data collection and quality challenges in deep learning: A data-centric ai perspective. The VLDB Journal (2023), 1–23.
- Entropy-based active learning for object detection with progressive diversity constraint. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New Orleans, LA, USA, 9397–9406.
- A topological filter for learning with label noise. Advances in Neural Information Processing Systems (NIPS) 33 (2020), 21382–21393.
- NGC: a unified framework for learning with open-world noisy data. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Montreal, BC, Canada, 62–71.
- Mutual Quantization for Cross-Modal Search with Noisy Labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New Orleans, LA, USA, 7551–7560.
- Yazhou Yang and Marco Loog. 2019. Single shot active learning using pseudo annotators. Pattern Recognition 89 (2019), 22–31.
- A domain robust approach for image dataset construction. In Proceedings of the 24th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, 212–216.
- Online coreset selection for rehearsal-based continual learning. arXiv preprint arXiv:2106.01085 (2021).
- ImageNet Training in Minutes. In Proceedings of the 47th International Conference on Parallel Processing (Eugene, OR, USA). Association for Computing Machinery, New York, NY, USA, Article 1, 10 pages.
- A Re-Balancing Strategy for Class-Imbalanced Classification Based on Instance Difficulty. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New Orleans, LA, USA, 70–79.
- How does disagreement help generalization against label corruption?. In International Conference on Machine Learning (ICML), Vol. 97. PMLR, Long Beach, California, USA, 7164–7173.
- FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Montreal, BC, Canada, 3457–3466.
- Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64, 3 (2021), 107–115.
- One for More: Selecting Generalizable Samples for Generalizable ReID Model. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, 3324–3332.
- Dualgraph: A graph-based method for reasoning about label noise. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 9654–9663.
- Distribution Alignment: A Unified Framework for Long-Tail Visual Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2361–2370.
- Range loss for deep face recognition with long-tailed training data. In Proceedings of the IEEE International Conference on Computer Vision (CVPR). IEEE Computer Society, Venice, Italy, 5409–5418.
- Videolt: Large-scale long-tailed video recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Montreal, BC, Canada, 7960–7969.
- Deep long-tailed learning: A survey. arXiv preprint arXiv:2110.04596 (2021).
- Dataset Condensation with Gradient Matching. 9th International Conference on Learning Representations, ICLR 2021, Virtual Event 1, 2 (2021), 3.
- Peilin Zhao and Tong Zhang. 2014. Accelerating minibatch stochastic gradient descent using stratified sampling. arXiv preprint arXiv:1405.3080 (2014).
- Coverage-centric Coreset Selection for High Pruning Rates. arXiv preprint arXiv:2210.15809 (2022).
- Curriculum learning by dynamic instance hardness. Advances in Neural Information Processing Systems (NIPS) 33 (2020), 8602–8613.
- Probabilistic Bilevel Coreset Selection. In International Conference on Machine Learning (ICML). PMLR, Baltimore, Maryland, USA, 27287–27302.
- Zhijing Wan (4 papers)
- Zhixiang Wang (30 papers)
- CheukTing Chung (1 paper)
- Zheng Wang (400 papers)