Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bias and Error Mitigation in Software-Generated Data: An Advanced Search and Optimization Framework Leveraging Generative Code Models (2310.11546v1)

Published 17 Oct 2023 in cs.SE, cs.IT, cs.LG, math.IT, and math.OC

Abstract: Data generation and analysis is a fundamental aspect of many industries and disciplines, from strategic decision making in business to research in the physical and social sciences. However, data generated using software and algorithms can be subject to biases and errors. These can be due to problems with the original software, default settings that do not align with the specific needs of the situation, or even deeper problems with the underlying theories and models. This paper proposes an advanced search and optimization framework aimed at generating and choosing optimal source code capable of correcting errors and biases from previous versions to address typical problems in software systems specializing in data analysis and generation, especially those in the corporate and data science world. Applying this framework multiple times on the same software system would incrementally improve the quality of the output results. It uses Solomonoff Induction as a sound theoretical basis, extending it with Kolmogorov Conditional Complexity, a novel adaptation, to evaluate a set of candidate programs. We propose the use of generative models for the creation of this set of programs, with special emphasis on the capabilities of LLMs to generate high quality code.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. Claude E Shannon “A Mathematical Theory of Communication” In The Bell System Technical Journal 27.3 AT&T, 1948, pp. 379–423
  2. Ray J Solomonoff “A formal theory of inductive inference. Part I” In Information and control 7.1, 1964, pp. 1–22
  3. Christopher M Bishop “Neural networks for pattern recognition” In Oxford university press, 1995
  4. George Casella and Roger L. Berger “Statistical Inference” Duxbury Pacific Grove, CA, 2002
  5. Christopher M. Bishop “Pattern Recognition and Machine Learning” Springer, 2006
  6. Marcus Hutter “Universal algorithmic intelligence: A mathematical top→→\rightarrow→ down approach” In arXiv preprint arXiv:0706.0557, 2007
  7. “An introduction to Kolmogorov complexity and its applications” In Springer Verlag New York Inc. 3.10, 2008, pp. 5–75
  8. Trevor Hastie, Robert Tibshirani and Jerome Friedman “The Elements of Statistical Learning” Springer, 2009
  9. “A philosophical treatise of universal induction” In Entropy 13.6 Multidisciplinary Digital Publishing Institute, 2011, pp. 1076–1136
  10. Marcus Hutter and David L. Dowe “Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence” 7070, Lecture Notes in Computer Science Springer, 2013
  11. “Pro Git” Apress, 2014 URL: https://git-scm.com/book/en/v2
  12. Fouad B. Chedid “Kolmogorov Complexity and Information Content” In CoRR abs/1710.06846, 2017 arXiv: http://arxiv.org/abs/1710.06846
  13. “Language Models are Few-Shot Learners” In arXiv preprint arXiv:2005.14165, 2020
  14. OpenAI “Generative Pre-trained Transformer 3 (GPT-3)”, https://openai.com/research/gpt-3, 2020
  15. Paul M.B. Viányi “How incomputable is Kolmogorov complexity?” In CoRR abs/2002.07674, 2020 arXiv: https://arxiv.org/abs/2002.07674
  16. GitHub “GitHub Copilot: Your AI pair programmer”, https://github.com/github/copilot, 2022
  17. Meta AI “Llama 2”, https://ai.meta.com/research/llama-2, 2023
  18. OpenAI “GPT-4”, https://openai.com/research/gpt-4, 2023

Summary

We haven't generated a summary for this paper yet.