A simplicity bubble problem and zemblanity in digitally intermediated societies (2304.10681v3)
Abstract: In this article, we discuss the ubiquity of Big Data and machine learning in society and propose that it evinces the need of further investigation of their fundamental limitations. We extend the ``too much information tends to behave like very little information'' phenomenon to formal knowledge about lawlike universes and arbitrary collections of computably generated datasets. This gives rise to the simplicity bubble problem, which refers to a learning algorithm equipped with a formal theory that can be deceived by a dataset to find a locally optimal model which it deems to be the global one. In the context of lawlike (computable) universes and formal learning systems, we show that there is a ceiling above which formal knowledge cannot further decrease the probability of zemblanitous findings, should the randomly generated data made available to the formal learning system be sufficiently large in comparison to their joint complexity. Zemblanity, the opposite of serendipity, is defined by an undesirable but expected finding that reveals an underlying problem or negative consequence in a given model or theory, which is in principle predictable in case the formal theory contains sufficient information. We also argue that this is an epistemological limitation that may generate unpredictable problems in digitally intermediated societies.
- Emergence of complex data from simple local rules in a network game. In Edna Alves de Souza, Mariana Claudia Broens, and Maria Eunice Quilici Gonzalez, editors, Big Data: ethical and epistemological implications, volume 89 of Coleção CLE, pages 125–144. Coleção CLE e Editora FiloCzar, Campinas, 2020. ISBN 978-65-87117-33-1.
- The simplicity bubble effect as a zemblanitous phenomenon in learning systems. In Ninth Conference on Model-Based Reasoning, Abductive Cognition, Creativity, Rome, 2023a. URL https://www.mbr023rome.com/. Preprint available at: https://arxiv.org/abs/2304.10681.
- A simplicity bubble problem in formal-theoretic learning systems. arXiv Preprints, arXiv:2112.12275 [cs.IT], 2023b. URL http://arxiv.org/abs/2112.12275v2.
- Nomic realism, simplicity, and the simplicity bubble effect. arXiv, arXiv:2310.17035 [physics.hist-ph], 2023. URL http://arxiv.org/abs/2310.17035.
- Emergence and algorithmic information dynamics of systems and observers. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 380(2227), 2022. ISSN 1364-503X. doi: 10.1098/rsta.2020.0429.
- Chris Anderson. The end of theory: The data deluge makes the scientific method obsolete. Wired magazine, 16(7):16–07, 2008.
- De novo protein design by deep network hallucination. Nature, 600(7889):547–552, 2021. ISSN 0028-0836, 1476-4687. doi: 10.1038/s41586-021-04184-w.
- Francis Bacon. NOVUM ORGANUM. In James Spedding, Robert Leslie Ellis, and Douglas Denon Heath, editors, The Works of Francis Bacon. Cambridge University Press, 1 edition, 2011. ISBN 978-1-108-04064-8 978-1-139-14954-9. doi: 10.1017/CBO9781139149549.019.
- William Boyd. Armadillo. Hamish Hamilton, 1998.
- Ewen Callaway. What’s next for AlphaFold and the AI protein-folding revolution. Nature, 604(7905):234–238, 2022. ISSN 0028-0836, 1476-4687. doi: 10.1038/d41586-022-00997-5.
- The Deluge of Spurious Correlations in Big Data. Foundations of Science, 22(3):595–612, September 2017. ISSN 1233-1821, 1572-8471. doi: 10.1007/s10699-016-9489-4. URL http://link.springer.com/10.1007/s10699-016-9489-4.
- Ricardo P. Cavassane. Zemblanity and big Data: the ugly truths the algorithms remind us of. Acta Scientiarum. Human and Social Sciences, 44(1), 2022. ISSN 1807-8656, 1679-7361. doi: 10.4025/actascihumansoc.v44i1.62246. URL https://periodicos.uem.br/ojs/index.php/ActaSciHumanSocSci/article/view/62246.
- Big Data and the Emergence of Zemblanity and Self-Fulfilling Prophecies. In Cognition & Modeling, Proceedings of 11th International Meeting on Informational, Knowledge and Action, Cognition & Modeling, 2023. URL https://philpapers.org/rec/CAVBDA.
- Gregory Chaitin. A Computable Universe, chapter Life as Evolving Software. World Scientific, 2012.
- Data Management Plans, Institutional Review Boards, and the Ethical Management of Big Data About Human Subjects. In Jeff Collmann and Sorin Adam Matei, editors, Ethical Reasoning in Big Data, pages 141–184. Springer International Publishing, Cham, 2016. ISBN 978-3-319-28420-0. doi: 10.1007/978-3-319-28422-4–“˙˝10.
- Luciano Floridi. Big Data and Their Epistemological Challenge. Philosophy & Technology, 25(4):435–437. ISSN 2210-5433, 2210-5441. doi: 10.1007/s13347-012-0093-4.
- Causal inference. Erkenntnis, 35(1-3):151–189, 1991. ISSN 0165-0106, 1572-8420. doi: 10.1007/BF00388284.
- Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.
- Algorithmic Probability-Guided Machine Learning on Non-Differentiable Spaces. Frontiers in Artificial Intelligence, 3:567356, January 2021. ISSN 2624-8212. doi: 10.3389/frai.2020.567356. URL https://www.frontiersin.org/articles/10.3389/frai.2020.567356/full.
- An enquiry concerning human understanding and other writings. Cambridge texts in the history of philosophy. Cambridge University Press, Cambridge ; New York, 2007. ISBN 978-0-521-84340-9.
- Rob Kitchin. Big Data, new epistemologies and paradigm shifts. Big Data & Society, 1(1), 2014. ISSN 2053-9517. doi: 10.1177/2053951714528481.
- Jaron Lanier. Ten Arguments for Deleting Your Social Media Accounts Right Now. Henry Holt and Company, 2018. ISBN 9781250196699.
- S Leonelli. What difference does quantity make? On the epistemology of Big Data in biology. Big Data & Society, 1(1), 2014. ISSN 2053-9517. doi: 10.1177/2053951714534395.
- Sabina Leonelli. Philosophy of Open Science. Cambridge University Press, 1 edition, 2023. ISBN 978-1-00-941636-8. doi: 10.1017/9781009416368.
- Big data: a revolution that will transform how we live, work and think. Houghton Mifflin Harcourt, 2013.
- Robert K. Merton. The bearing of empirical research upon the development of social theory. American Sociological Review, 13(5), 1948. ISSN 0003-1224, 1939-8271. doi: 10.2307/2087142. URL https://www.jstor.org/stable/2087142.
- John Stuart Mill. A System of Logic, Ratiocinative and Inductive: Being a Connected View of the Principles of Evidence, and the Methods of Scientific Investigation. Cambridge University Press, 1 edition, 2011. ISBN 978-1-108-04088-4. doi: 10.1017/CBO9781139149839.
- Safiya Umoja Noble. Algorithms of Oppression: How Search Engines Reinforce Racism. NYU Press, 2018. ISBN 9781479866762.
- Judea Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, USA, 2nd edition, 2009. ISBN 052189560X.
- Toward Causal Representation Learning. Proceedings of the IEEE, 109(5):612–634, May 2021. ISSN 0018-9219, 1558-2256. doi: 10.1109/JPROC.2021.3058954. URL https://ieeexplore.ieee.org/document/9363924/.
- G. Schurz. Patterns of abduction. Synthese, 164(2):201–234, 2008. ISSN 0039-7857, 1573-0964. doi: 10.1007/s11229-007-9223-4.
- Gary Smith. The paradox of big data. SN Applied Sciences, 2(6):1041, 2020. ISSN 2523-3963, 2523-3971. doi: 10.1007/s42452-020-2862-5. URL http://link.springer.com/10.1007/s42452-020-2862-5.
- On the salient limitations of the methods of assembly theory and their classification of molecular biosignatures. arXiv Preprints, arXiv:2210.00901 [cs.IT], 2023. URL https://arxiv.org/abs/2210.00901.
- Michael Winter. Meta+phenomenology: Primer Towards a Phenomenology Formally Based on Algorithmic Information Theory and Metabiology. In Unravelling Complexity, pages 317–334. World Scientific, feb 2020.
- Data Mining. Morgan Kaufmann, fourth edition edition, 2017. ISBN 978-0-12-804291-5. doi: 10.1016/C2015-0-02071-8. URL https://www.sciencedirect.com/science/article/pii/B9780128042915000052.
- No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1):67–82, 1997. doi: 10.1109/4235.585893.
- Hector Zenil, editor. A computable universe: understanding and exploring nature as computation. World Scientific Publishing, 2013. ISBN 978-981-4447-78-2.
- A Decomposition Method for Global Evaluation of Shannon Entropy and Local Estimations of Algorithmic Complexity. Entropy, 20(8):605, aug 2018. ISSN 1099-4300. doi: 10.3390/e20080605. URL http://www.mdpi.com/1099-4300/20/8/605.
- An Algorithmic Information Calculus for Causal Discovery and Reprogramming Systems. iScience, 19:1160–1172, sep 2019a. doi: 10.1016/j.isci.2019.07.043.
- Causal deconvolution by algorithmic generative models. Nature Machine Intelligence, 1(1):58–66, jan 2019b. ISSN 2522-5839. doi: 10.1038/s42256-018-0005-0. URL http://www.nature.com/articles/s42256-018-0005-0.
- Algorithmic Information Dynamics. Scholarpedia Journal, 15(7):53143, 2020. ISSN 1941-6016. doi: 10.4249/scholarpedia.53143.
- Optimal spatial deconvolution and message reconstruction from a large generative model of models. arXiv Preprints, arXiv:1802.05843 [cs.DS], 2023a. doi: 10.48550/arXiv.1802.05843. URL https://arxiv.org/abs/1802.05843.
- The Future of Fundamental Science Led by Generative Closed-Loop Artificial Intelligence. arXiv:2307.07522 [cs], 2023b. URL http://arxiv.org/abs/2307.07522.
- Shoshana Zuboff. The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. PublicAffairs, 2018. ISBN 1610395697.