Artificial intelligence for science: The easy and hard problems (2408.14508v2)

Published 24 Aug 2024 in cs.AI, cs.LG, and q-bio.NC

Abstract: A suite of impressive scientific discoveries have been driven by recent advances in artificial intelligence. These almost all result from training flexible algorithms to solve difficult optimization problems specified in advance by teams of domain scientists and engineers with access to large amounts of data. Although extremely useful, this kind of problem solving only corresponds to one part of science - the "easy problem." The other part of scientific research is coming up with the problem itself - the "hard problem." Solving the hard problem is beyond the capacities of current algorithms for scientific discovery because it requires continual conceptual revision based on poorly defined constraints. We can make progress on understanding how humans solve the hard problem by studying the cognitive science of scientists, and then use the results to design new computational agents that automatically infer and update their scientific paradigms.

Summary

The paper shows that AI effectively solves well-defined optimization tasks across domains like protein folding and nuclear reactor design.
It distinguishes between tasks with clear inputs/outputs and complex challenges that require continuous conceptual innovation.
The study underscores the need for AI systems to integrate cognitive insights to better emulate human-driven scientific discovery.

Overview of "Artificial Intelligence for Science: The Easy and Hard Problems"

The paper by Ruairidh M. Battleday and Samuel J. Gershman, titled "Artificial Intelligence for Science: The Easy and Hard Problems," delineates the current capabilities and limitations of AI in the domain of scientific discovery. The authors differentiate between what they term the “easy problem” and the “hard problem” of scientific research. Their investigation is supported by illustrative case studies that include significant historical scientific discoveries.

The Easy Problem

The easy problem of scientific research, according to the authors, involves the application of AI to predefined optimization tasks. These tasks are typically characterized by:

Clear Inputs and Outputs: Scientists specify functions that need optimization, detailing input (e.g., amino acid sequence), output (e.g., 3D structure of a protein), and a comparison metric against the ground truth.
Large Data Availability: A vast dataset representative of the ground truth or an alternative means to evaluate the model's output.
Success in Various Fields: Significant AI-driven advancements have occurred in areas such as protein folding, antibiotic discovery, and nuclear fusion reactor design.

This process benefits from clear constraints and objectives from the outset, making the problem more manageable despite its complexity. The paper reviews existing literature and practical achievements to highlight how this problem has been effectively addressed.

The Hard Problem

In contrast, the hard problem involves the formulation of scientific problems themselves—an inherently more complex task. This requires:

Continual Conceptual Revision: Unlike the easy problems, solving the hard problem necessitates a continual iterative process of problem formulation and conceptual breakthroughs.
Understanding and Emulating Human Cognition: Insights from cognitive science about how scientists think and create theories are crucial.
Representation Learning: This must include the identification and updating of scientific paradigms.

The authors contend that most AI systems to date have not effectively tackled this aspect due to the intrinsic nature of problem creation and conceptual innovation.

Illustrative Case Studies

To ground their arguments, the paper explores three case studies:

The Discovery of Oxygen:
- Case Analysis: The STAHLp model revisits how Joseph Priestley and Antoine Lavoisier conceptualized oxygen during the Chemical Revolution. STAHLp's limitations in capturing the conceptual innovations demonstrate the difficulties in automating such groundbreaking discoveries.
- Historical Perspective: It suggests that Lavoisier's iterative refinement based on domain re-specification and conceptual frameworks was pivotal.
The Electromagnetic Field:
- Case Analysis: AI Feynman, an AI model for symbolic regression, exemplifies how predefined symbolic manipulation and mathematical formalisms capture known laws but lack the flexibility of historical innovative processes observed in Maxwell’s development of field theory.
- Historical Perspective: Maxwell's development required constructing and iteratively refining complex intermediate models, an aspect modern AI often fails to emulate.
Protein Folding:
- Case Analysis: AlphaFold2 is a prime example of successful application in biological domains, achieving unprecedented accuracy in predicting protein structures.
- Historical Perspective: Despite its engineering triumph, AlphaFold2 redefined the original protein folding problem's scope, opting for static rather than dynamic modeling.

Implications and Future Directions

The findings underscore the need to draw on cognitive science to develop AI systems capable of solving the hard problem. Key implications and future directions include:

Scalable AI Scientists: Future AI systems might not be standalone but could function as intelligent research assistants, capable of interacting effectively with human scientists during problem formulation.
Cognitive and Social Aspects: Emulating the sociological and aesthetic judgments made by human scientists is crucial. Training AI systems to understand the broader cultural and communal aspects of scientific practice remains a nascent but necessary step.
Reflective Systems: AI needs to incorporate continual learning and adaptation capabilities, reflecting on model constraints and adjacent domains to better address poorly defined problems.

In conclusion, while significant strides have been made in utilizing AI for the easy problem of science, addressing the hard problem requires a confluence of insights from cognitive science, sociology of science, and advanced computational models. As this interdisciplinary research evolves, it promises to push the boundaries of AI’s capabilities in scientific discovery.

PDF Markdown

Related Papers

Tweets

https://twitter.com/AnnaLeptikon/status/1829816195169546561

https://twitter.com/gershbrain/status/1828603525237997869

https://twitter.com/stuartbuck1/status/1830038762673332572

https://twitter.com/gastronomy/status/1828645847271104791

https://twitter.com/ajshackman/status/1865736796526489914

https://twitter.com/BioPapers/status/1869297993016320185

YouTube

Show All Videos