A Comprehensive Survey of Visual Analytics Techniques for Machine Learning
This paper presents a comprehensive survey of visual analytics techniques specifically tailored to support various aspects of machine learning model deployment. By systematically reviewing 259 papers, the authors have developed a taxonomy consisting of three primary categories: techniques before model building, techniques during model building, and techniques after model building. Each category is further delineated by specific tasks such as data quality improvement, model understanding, diagnosis, and output interpretation. Through this categorization, the survey provides a structured overview of the field, offering insights into current methodologies and identifying significant research challenges and opportunities.
Techniques Before Model Building
The focus before model building is on data preparation and feature engineering, critical stages that significantly impact model performance.
Improving Data Quality: The survey emphasizes methods to address issues like missing values, label noise, and outliers through interactive visual analytics systems. For instance, tools like LabelInspect and DataDebugger enable users to verify label accuracy and improve data quality iteratively, often leveraging human-in-the-loop strategies.
Feature Quality Improvement: Efforts in this area aim at selecting or engineering features that enhance model efficacy. Techniques involve visualizing feature relevance and redundancy to guide selection, as illustrated by systems like DimStiller. Additionally, the integration of deep learning models for feature extraction highlights the ongoing trend toward automated yet interpretable feature engineering.
Techniques During Model Building
The model building phase encompasses understanding, diagnosing, and refining the models.
Model Understanding: Understanding the intricacies of model parameters and architectures is facilitated by network-centric and instance-centric visualizations. Tools like CNNVis and ActiVis aim to demystify complex neural networks by showing neuron interactions and relationships between layers, crucial for debugging and optimizing models.
Model Diagnosis and Steering: These techniques focus on identifying and resolving errors in the training process. Systems such as AEVis provide insights into adversarial examples, while others like DQNViz offer a window into reinforcement learning dynamics, enabling developers to make informed adjustments to enhance model robustness.
Techniques After Model Building
Once models are deployed, understanding their outputs is paramount for deriving actionable insights.
Understanding Different Data Analysis Results: Tailored visual analytics methods help comprehend static and dynamic data analysis results. For static data, strategies like those employed by TopicPanorama illustrate topic extraction and comparison across datasets. For dynamic data, tools such as TextFlow leverage temporal visualizations to track evolving topics, providing deeper insights into temporal trends and changes.
Discussion and Future Directions
The paper identifies several long-term challenges and future research directions, including enhancing data quality for weakly supervised learning, developing explainable feature engineering methods, and improving techniques for handling multi-modal data and concept drift. Specifically, the survey underscores the need for online diagnosis tools to facilitate real-time model training monitoring and the potential of integrating uncertainty measures for more effective interactive model refinement.
Conclusion
This survey serves as a crucial resource for researchers in visual analytics and machine learning, offering a cohesive view of the state-of-the-art and paving the way for future advancements in explainable, interactive, and effective machine learning deployment. By dissecting the various stages of machine learning and associated visual analytics tools, the paper not only catalogs existing efforts but also directs attention to underexplored areas, fostering continued innovation in this dynamic field.