Model and Optimize Data Quality via Edge Deletions
Incorporate training data quality into the concept–text bipartite framework by modeling text-to-concept edge deletions during sequential concept learning, and develop optimization strategies to improve learning performance within this extended model.
References
There are some open questions and considerations worth exploring. Further, the quality of the training data is related to text-to-concept edge deletions in sequential concept learning, which can be incorporated into our framework. Such optimization is a line of future work that has natural analogues in optimization of communication systems and fault-tolerant computation .
— An Information Theory of Compute-Optimal Size Scaling, Emergence, and Plateaus in Language Models
(2410.01243 - Nayak et al., 2 Oct 2024) in Conclusion, final paragraph