Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

How Far Are We From AGI: Are LLMs All We Need? (2405.10313v2)

Published 16 May 2024 in cs.AI, cs.CL, cs.CY, and cs.LG
How Far Are We From AGI: Are LLMs All We Need?

Abstract: The evolution of AI has profoundly impacted human society, driving significant advancements in multiple sectors. AGI, distinguished by its ability to execute diverse real-world tasks with efficiency and effectiveness comparable to human intelligence, reflects a paramount milestone in AI evolution. While existing studies have reviewed specific advancements in AI and proposed potential paths to AGI, such as LLMs, they fall short of providing a thorough exploration of AGI's definitions, objectives, and developmental trajectories. Unlike previous survey papers, this work goes beyond summarizing LLMs by addressing key questions about our progress toward AGI and outlining the strategies essential for its realization through comprehensive analysis, in-depth discussions, and novel insights. We start by articulating the requisite capability frameworks for AGI, integrating the internal, interface, and system dimensions. As the realization of AGI requires more advanced capabilities and adherence to stringent constraints, we further discuss necessary AGI alignment technologies to harmonize these factors. Notably, we emphasize the importance of approaching AGI responsibly by first defining the key levels of AGI progression, followed by the evaluation framework that situates the status quo, and finally giving our roadmap of how to reach the pinnacle of AGI. Moreover, to give tangible insights into the ubiquitous impact of the integration of AI, we outline existing challenges and potential pathways toward AGI in multiple domains. In sum, serving as a pioneering exploration into the current state and future trajectory of AGI, this paper aims to foster a collective comprehension and catalyze broader public discussions among researchers and practitioners on AGI.

Demystifying the Path to AGI: Insights and Implications

Introduction

The progression of AI into what many call AGI marks an ambitious milestone in AI's evolution. Unlike traditional AI, AGI aims to perform a variety of tasks with human-like efficiency and effectiveness. In a comprehensive discussion, a paper dives into key aspects such as definitions, capability frameworks, and strategic roadmaps, laying out where we stand and what lies on the path to AGI.

AGI’s Core Capabilities and Frameworks

Perception

AGI perception involves interpreting sensory information from various inputs—text, images, videos, and more. Current models like GPT-4 and Gemini excel in NLP and vision, thanks to advancements in multi-modal intelligence. Although many models can interact across different data types, challenges like robustness and handling adversarial examples persist. Future research suggests better integration of diverse data types and enhancing model robustness to adversarial inputs while prioritizing transparency and explainability.

Reasoning

AI reasoning encompasses logical and systematic thinking to draw conclusions or make decisions. Techniques like Chain of Thought (CoT) and Tree of Thoughts (ToT) have significantly improved reasoning outcomes by enabling complex problem decomposition. Moving forward, areas like self-consistency in reasoning and dynamic reasoning and planning offer promising avenues for more reliable AI systems. However, the journey demands models capable of understanding and applying causation, handling long contexts, minimizing hallucinations, and enhancing social reasoning.

Memory

AI memory, divided into short-term and long-term categories, is crucial for maintaining information needed for decision-making. Efficient memory retrieval and the integration of retrieval procedures with reasoned actions embody future goals for AGI memory systems. Decentralized memory and autonomous self-updating knowledge banks will bolster AI’s capacity for continuous learning and adaptation.

Metacognition

Metacognition, or the ability to understand and regulate one's own cognitive processes, will be key in future AGI systems. Developing self-awareness, managing complex tasks, and evolving independently without human intervention highlight the long-term objectives for AGI metacognition. Despite promising developments, challenges like handling ambiguity, ensuring ethical design, and fostering advanced social reasoning remain.

AGI Interfaces: Bridging the Digital and Physical Worlds

AI Interfaces to the Digital World

Modern AI agents can navigate and utilize a variety of digital tools efficiently. Implementations like Toolformer and frameworks for managing digital environments such as Mind2Web reflect substantial progress. To break new ground, extending these capabilities to include a wider range of digital interaction scenarios and congruently managing the ethical use of digital tools is essential.

AI Interfaces to the Physical World

Robotic control, navigation, and interaction capabilities have shown significant strides. Systems like GPT-4 with visual perception and manipulation abilities indicate progress towards integrating AI into real-world environments. Deploying efficient algorithms in resource-constrained settings and leveraging edge computing will be pivotal in broader adoption.

AI Interfaces to Intelligence

Bringing in other AI and human agents, AGI must excel at collaborative environments. Techniques such as prompt engineering and robust interaction protocols ensure that AGI systems operate harmoniously with other intelligent systems. Focus on advanced communication frameworks, robust interaction protocols, and secure collaboration will be central to development.

AGI Systems: Engineering Scalability and Efficiency

Scalable model architectures, large-scale training frameworks, and optimized inference techniques underscore the backbone of AGI systems. Diverse approaches include:

  • Model compression: Techniques like knowledge distillation and parameter-efficient fine-tuning methods (e.g., LoRA).
  • Memory management: Efficiently managing resources with methods like adaptive KV cache compression.
  • Parallel computing: Automatic parallelism scheduling and combining tensor parallel, model parallel, and pipeline parallel strategies.

AGI Alignment: Ensuring Ethical and Practical Utility

Aligning AGI to human values involves careful consideration of ethical implications, data bias, transparency, and robust safety mechanisms. Current methods like RLHF (Reinforcement Learning from Human Feedback) and scalable alignment approaches offer sound frameworks. However, advanced AGI systems will need alignment that integrates more deeply with human-centric goals and ethical standards.

AGI Roadmap: From Definition to Realization

Comprehensive stratification into Embryonic AGI, Superhuman AGI, and Ultimate AGI helps in charting the progression trajectory. Moving towards AGI necessitates improved model evaluation frameworks, efficient data economy, and interdisciplinary cooperation. Understanding AGI's practical implications across various domains, from scientific discovery to real-world robotics, highlights the potential and pitfalls alike.

Conclusion

The path to AGI, while challenging, is lined with promising advancements and emerging frameworks. Key focuses remain on enhancing core capabilities, developing sophisticated interfaces, ensuring ethical alignment, and fostering a responsible approach. AGI's potential to revolutionize everything from everyday tasks to scientific breakthroughs hinges on collective collaboration, innovative research, and stringent safety standards. The journey towards fully realized AGI remains in its early stages, but with continued dedication, we can envision AGI as a transformative force for greater societal benefit.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (699)
  1. 2022a. 10 years of Amazon robotics: how robots help sort packages, move product, and improve safety. https://www.aboutamazon.com/news/operations/10-years-of-amazon-robotics-how-robots-help-sort-packages-move-product-and-improve-safety. Accessed: 2023-09-21.
  2. 2022b. Amazon introduces Sparrow—a state-of-the-art robot that handles millions of diverse products. https://www.aboutamazon.com/news/operations/amazon-introduces-sparrow-a-state-of-the-art-robot-that-handles-millions-of-diverse-products. Accessed: 2023-09-21.
  3. 2023. TinyChat: Efficient and Lightweight Chatbot with AWQ. https://github.com/mit-han-lab/llm-awq/tree/main/tinychat.
  4. Persistent Anti-Muslim Bias in Large Language Models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’21). Association for Computing Machinery, New York, NY, USA, 298–306. https://doi.org/10.1145/3461702.3462624
  5. John M Abowd and Lars Vilhuber. 2008. How protective are synthetic data?. In International Conference on Privacy in Statistical Databases. Springer, 239–246.
  6. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature (2024). https://doi.org/10.1038/s41586-024-07487-w
  7. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
  8. The cringe loss: Learning what language not to model. arXiv preprint arXiv:2211.05826 (2022).
  9. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691 (2022).
  10. Anthropic AI. 2024. Claude 3. https://www.anthropic.com/news/claude-3-family.
  11. ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent. arXiv:2312.10003 [cs.CL]
  12. RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs. arXiv preprint arXiv:2305.08844 (2023).
  13. Flamingo: a visual language model for few-shot learning. NeurIPS (2022).
  14. Arwa I Alhussain and Aqil M Azmi. 2021. Automatic story generation: A survey of approaches. ACM Computing Surveys (CSUR) 54, 5 (2021), 1–38.
  15. SantaCoder: don’t reach for the stars! arXiv:2301.03988 [cs.SE]
  16. Guidelines for Human-AI Interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3290605.3300233
  17. DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale. arXiv:2207.00032 [cs.LG]
  18. MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 2357–2367. https://doi.org/10.18653/v1/N19-1245
  19. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565 (2016).
  20. Palm 2 technical report. arXiv preprint arXiv:2305.10403 (2023).
  21. ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing. https://doi.org/10.48550/arXiv.2309.09128 arXiv:2309.09128 [cs].
  22. A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861 (2021).
  23. Program Synthesis with Large Language Models. arXiv:2108.07732 [cs.PL]
  24. OmniFill: Domain-Agnostic Form Filling Suggestions Using Multi-Faceted Context. https://doi.org/10.48550/arXiv.2310.17826 arXiv:2310.17826 [cs].
  25. A general theoretical paradigm to understand learning from human preferences. arXiv preprint arXiv:2310.12036 (2023).
  26. Improving Language Models with Advantage-based Offline Policy Gradients. arXiv preprint arXiv:2305.14718 (2023).
  27. Explaining neural scaling laws. arXiv preprint arXiv:2102.06701 (2021).
  28. Sequential Modeling Enables Scalable Learning for Large Vision Models. arXiv preprint arXiv:2312.00785 (2023).
  29. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862 (2022).
  30. Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073 (2022).
  31. Examination of Ethical Principles for LLM-Based Recommendations in Conversational AI. In 2023 International Conference on Platform Technology and Service (PlatCon). IEEE, 109–113.
  32. Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing. arXiv:2301.04558 [cs.CV]
  33. Is the Most Accurate AI the Best Teammate? Optimizing AI for Teamwork. Proceedings of the AAAI Conference on Artificial Intelligence 35, 13 (May 2021), 11405–11414. https://doi.org/10.1609/aaai.v35i13.17359 Number: 13.
  34. Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff. Proceedings of the AAAI Conference on Artificial Intelligence 33, 01 (July 2019), 2429–2437. https://doi.org/10.1609/aaai.v33i01.33012429 Number: 01.
  35. Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. ACM, Yokohama Japan, 1–16. https://doi.org/10.1145/3411764.3445717
  36. Chris Baraniuk. 2018. Exclusive: UK police wants AI to stop violent crime before it happens | New Scientist. https://www.newscientist.com/article/2186512-exclusive-uk-police-wants-ai-to-stop-violent-crime-before-it-happens/
  37. Identifying and Mitigating the Security Risks of Generative AI. Foundations and Trends® in Privacy and Security 6, 1 (2023), 1–52. https://doi.org/10.1561/3300000041 arXiv:2308.14840 [cs].
  38. Strong baselines for parameter efficient few-shot fine-tuning. arXiv preprint arXiv:2304.01917 (2023).
  39. Efficient Training of Language Models to Fill in the Middle. arXiv:2207.14255 [cs.CL]
  40. Introducing our Multimodal Models. https://www.adept.ai/blog/fuyu-8b
  41. Beijing Academy of Artificial Intelligence. 2023. Beijing AI Safety International Consensus. https://idais-beijing.baai.ac.cn/?lang=en. Accessed: 2023-04-25.
  42. Longformer: The Long-Document Transformer. arXiv:2004.05150 [cs.CL]
  43. A framework for the evaluation of code generation models. https://github.com/bigcode-project/bigcode-evaluation-harness.
  44. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’21). Association for Computing Machinery, New York, NY, USA, 610–623. https://doi.org/10.1145/3442188.3445922
  45. Safe model-based reinforcement learning with stability guarantees. Advances in neural information processing systems 30 (2017).
  46. Towards language models that can see: Computer vision through the lens of natural language. arXiv preprint arXiv:2306.16410 (2023).
  47. Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687 (2023).
  48. Uncertainty as a Form of Transparency: Measuring, Communicating, and Using Uncertainty. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’21). Association for Computing Machinery, New York, NY, USA, 401–413. https://doi.org/10.1145/3461702.3462571
  49. Patient and Consumer Safety Risks When Using Conversational Assistants for Medical Information: An Observational Study of Siri, Alexa, and Google Assistant. Journal of Medical Internet Research 20, 9 (Sept. 2018), e11510. https://doi.org/10.2196/11510 Company: Journal of Medical Internet Research Distributor: Journal of Medical Internet Research Institution: Journal of Medical Internet Research Label: Journal of Medical Internet Research Publisher: JMIR Publications Inc., Toronto, Canada.
  50. Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling. arXiv:2304.01373 [cs.CL]
  51. Jeffrey P. Bigham. 2023. How HCI Might Engage with the Easy Access to Statistical Likelihoods of Things. https://www.jeffreybigham.com/blog/2023/how-hci-might-engage-with-easy-access-to-unintuitive-statistical-likelihoods.html.
  52. Experience grounds language. arXiv preprint arXiv:2004.10151 (2020).
  53. Emergent autonomous scientific research capabilities of large language models. arXiv preprint arXiv:2304.05332 (2023).
  54. Autonomous chemical research with large language models. Nature 624, 7992 (12 2023), 570–578. https://doi.org/10.1038/s41586-023-06792-0
  55. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021).
  56. Improving language models by retrieving from trillions of tokens. In International conference on machine learning. PMLR, 2206–2240.
  57. Petals: Collaborative Inference and Fine-tuning of Large Models. arXiv preprint arXiv:2209.01188 (2022). https://arxiv.org/abs/2209.01188
  58. Samuel R Bowman. 2023. Eight things to know about large language models. arXiv preprint arXiv:2304.00612 (2023).
  59. Measuring progress on scalable oversight for large language models. arXiv preprint arXiv:2211.03540 (2022).
  60. Embedding Large Language Models into Extended Reality: Opportunities and Challenges for Inclusion, Engagement, and Privacy. http://arxiv.org/abs/2402.03907 arXiv:2402.03907 [cs].
  61. JAX: composable transformations of Python+NumPy programs. http://github.com/google/jax
  62. Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3586183.3606725
  63. ChemCrow: Augmenting large-language models with chemistry tools. arXiv preprint arXiv:2304.05376 (2023).
  64. Learning to win by reading manuals in a monte-carlo framework. Journal of Artificial Intelligence Research 43 (2012), 661–704.
  65. Accurate, reliable and fast robustness evaluation. Advances in neural information processing systems 32 (2019).
  66. Selmer Bringsjord and David Ferrucci. 2003. Artificial intelligence and literary creativity: Inside the mind of Brutus, a storytelling machine. Routledge.
  67. Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818 (2023).
  68. Video generation models as world simulators. (2024). https://openai.com/research/video-generation-models-as-world-simulators
  69. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  70. Genie: Generative Interactive Environments. arXiv:2402.15391 [cs.LG]
  71. Improved prediction of protein-protein interactions using AlphaFold2. Nature communications 13, 1 (2022), 1265.
  72. Erik Brynjolfsson and Andrew McAfee. 2014. The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies. W. W. Norton & Company.
  73. Joanna J Bryson and Alan Winfield. 2017. Standardizing ethical design for artificial intelligence and autonomous systems. Computer 50, 5 (2017), 116–119.
  74. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 (2023).
  75. Cameron Buckner and James Garson. 1997. Connectionism. (1997).
  76. Stephan Vladimir Bugaj and Ben Goertzel. 2007. Five ethical imperatives and their implications for human-AGI interaction. Dynamical Psychology (2007), 1–7.
  77. Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision. arXiv preprint arXiv:2312.09390 (2023).
  78. Consciousness in Artificial Intelligence: Insights from the Science of Consciousness. arXiv:2308.08708 [cs.AI]
  79. Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads. arXiv:2401.10774 [cs.LG]
  80. Large Language Models as Tool Makers. In The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=qV83K9d5WB
  81. Semantics derived automatically from language corpora contain human-like biases. Science 356, 6334 (April 2017), 183–186. https://doi.org/10.1126/science.aal4230 arXiv:1608.07187 [cs].
  82. Defending against alignment-breaking attacks via robustly aligned llm. arXiv preprint arXiv:2309.14348 (2023).
  83. Yang Trista Cao and Hal Daumé III. 2020. Toward Gender-Inclusive Coreference Resolution. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online, 4568–4595. https://doi.org/10.18653/v1/2020.acl-main.418
  84. Converging on the divergent: The history (and future) of the international joint workshops in computational creativity. AI magazine 30, 3 (2009), 15–15.
  85. Graham Caron and Shashank Srivastava. 2022. Identifying and manipulating the personality traits of language models. arXiv preprint arXiv:2212.10276 (2022).
  86. Open problems and fundamental limitations of reinforcement learning from human feedback. arXiv preprint arXiv:2307.15217 (2023).
  87. Pew Research Center. 2017. The Future of Jobs and Jobs Training. https://www.pewresearch.org/internet/2017/05/03/the-future-of-jobs-and-jobs-training/.
  88. EvoPrompting: Language Models for Code-Level Neural Architecture Search. arXiv:2302.14838 [cs.NE]
  89. AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search. arXiv:2001.04246 [cs.CL]
  90. X-llm: Bootstrapping advanced large language models by treating multi-modalities as foreign languages. arXiv preprint arXiv:2305.04160 (2023).
  91. Autoagents: A framework for automatic agent generation. arXiv preprint arXiv:2309.17288 (2023).
  92. Shikra: Unleashing Multimodal LLM’s Referential Dialogue Magic. arXiv preprint arXiv:2306.15195 (2023).
  93. Punica: Multi-Tenant LoRA Serving. arXiv:2310.18547 [cs.DC]
  94. FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance. arXiv:2305.05176 [cs.LG]
  95. Evaluating Large Language Models Trained on Code. (2021). arXiv:2107.03374 [cs.LG]
  96. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. arXiv:1802.04799 [cs.LG]
  97. Training Deep Nets with Sublinear Memory Cost. arXiv:1604.06174 [cs.LG]
  98. Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors in agents. arXiv preprint arXiv:2308.10848 (2023).
  99. Llava-interactive: An all-in-one demo for image chat, segmentation, generation and editing. arXiv preprint arXiv:2311.00571 (2023).
  100. Can large language models provide security & privacy advice? measuring the ability of llms to refute misconceptions. In Proceedings of the 39th Annual Computer Security Applications Conference. 366–378.
  101. LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models. arXiv:2309.12307 [cs.CL]
  102. Do models explain themselves? counterfactual simulatability of natural language explanations. arXiv preprint arXiv:2307.08678 (2023).
  103. Self-play fine-tuning converts weak language models to strong language models. arXiv preprint arXiv:2401.01335 (2024).
  104. Octavius: Mitigating Task Interference in MLLMs via MoE. arXiv preprint arXiv:2311.02684 (2023).
  105. CreativeConnect: Supporting Reference Recombination for Graphic Design Ideation with Generative AI. https://doi.org/10.48550/arXiv.2312.11949 arXiv:2312.11949 [cs].
  106. Jyoti Choudrie and Mohamad Selamat. 2006. The Consideration of Meta-Abilities in Tacit Knowledge Externalization and Organizational Learning. https://doi.org/10.1109/HICSS.2006.456
  107. PaLM: Scaling Language Modeling with Pathways. arXiv:2204.02311 [cs.CL]
  108. Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks. arXiv preprint arXiv:2306.04073 (2023).
  109. Supervising strong learners by amplifying weak experts. arXiv preprint arXiv:1810.08575 (2018).
  110. Mobilevlm: A fast, reproducible and strong vision language assistant for mobile devices. arXiv preprint arXiv:2312.16886 (2023).
  111. Computational creativity: The final frontier?. In Ecai, Vol. 12. Montpelier, 21–26.
  112. Katherine Crowson. 2022. VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance. arXiv preprint arXiv:2204.08583 (2022).
  113. Holistic analysis of hallucination in gpt-4v (ision): Bias and interference challenges. arXiv preprint arXiv:2311.03287 (2023).
  114. Large Language Models for Compiler Optimization. arXiv:2309.07062 [cs.PL]
  115. Bennett Cyphers and Gennie Gebhart. 2019. Behind the One-Way Mirror: A Deep Dive Into the Technology of Corporate Surveillance. https://www.eff.org/wp/behind-the-one-way-mirror
  116. Cooperative AI: machines must learn to find common ground.
  117. Open problems in cooperative AI. arXiv preprint arXiv:2012.08630 (2020).
  118. InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. NeurIPS (2023).
  119. GitHub Copilot AI pair programmer: Asset or Liability? arXiv:2206.15331 [cs.SE]
  120. Global optimization of quantum dynamics with AlphaZero deep exploration. npj Quantum Information 6, 1 (2020), 1–9.
  121. Tri Dao. 2023. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning. (2023).
  122. Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems 35 (2022), 16344–16359.
  123. Privacy in the Age of AI. Commun. ACM 66, 11 (Nov. 2023), 29–31. https://doi.org/10.1145/3625254
  124. Editing factual knowledge in language models. arXiv preprint arXiv:2104.08164 (2021).
  125. Mind2Web: Towards a Generalist Agent for the Web. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=kiYqbO3wqw
  126. Mind2web: Towards a generalist agent for the web. Advances in Neural Information Processing Systems 36 (2024).
  127. QLoRA: Efficient Finetuning of Quantized LLMs. arXiv:2305.14314 [cs.LG]
  128. Virginia Dignum. 2019. Responsible artificial intelligence: How to develop and use AI in a responsible way. Springer Nature.
  129. LongNet: Scaling Transformers to 1,000,000,000 Tokens. arXiv:2307.02486 [cs.CL]
  130. Enhancing chat language models by scaling high-quality instructional conversations. arXiv preprint arXiv:2305.14233 (2023).
  131. Documenting large webtext corpora: A case study on the colossal clean crawled corpus. arXiv preprint arXiv:2104.08758 (2021).
  132. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. arXiv preprint arXiv:2302.04754 (2023).
  133. ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning. arXiv:2212.01378 [cs.LG]
  134. Towards Structured Sparsity in Transformers for Efficient Inference. In Workshop on Efficient Systems for Foundation Models @ ICML2023. https://openreview.net/forum?id=c4m0BkO4OL
  135. Raft: Reward ranked finetuning for generative foundation model alignment. arXiv preprint arXiv:2304.06767 (2023).
  136. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378 (2023).
  137. Glam: Efficient scaling of language models with mixture-of-experts. In International Conference on Machine Learning. PMLR, 5547–5569.
  138. Improving Factuality and Reasoning in Language Models through Multiagent Debate. arXiv preprint arXiv:2305.14325 (2023).
  139. Generating Automatic Feedback on UI Mockups with Large Language Models. https://doi.org/10.1145/3613904.3642782 arXiv:2403.13139 [cs].
  140. Scalable Pre-training of Large Autoregressive Image Models. arXiv preprint arXiv:2401.08541 (2024).
  141. Douglas C. Engelbart. 1962. A conceptual framework for the augmentation of man’s intellect. Air Force Office of Scientific Research, AFOSR-3233, www.bootstrap.org/augdocs/friedewald030402/augmentinghumanintellect/ahi62index.html.
  142. Lessons from the amazon picking challenge: Four aspects of building robotic systems.. In Robotics: science and systems, Vol. 12.
  143. AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. arXiv preprint arXiv:2003.06505 (2020).
  144. Human-Centered Loss Functions (HALOs). Technical Report. Contextual AI. https://github.com/ContextualAI/HALOs/blob/main/assets/report.pdf.
  145. Reasoning about knowledge. MIT press.
  146. FairScale authors. 2021. FairScale: A general purpose modular PyTorch library for high performance and large scale training. https://github.com/facebookresearch/fairscale.
  147. Reducing Transformer Depth on Demand with Structured Dropout. arXiv:1909.11556 [cs.LG]
  148. Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA. arXiv preprint arXiv:2401.15847 (2024).
  149. LLM Agents can Autonomously Hack Websites. https://doi.org/10.48550/arXiv.2402.06664 arXiv:2402.06664 [cs].
  150. Sparsity in transformers: A systematic literature review. Neurocomputing 582 (2024), 127468. https://doi.org/10.1016/j.neucom.2024.127468
  151. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. arXiv:2101.03961 [cs.LG]
  152. Mmdialog: A large-scale multi-turn dialogue dataset towards multi-modal open-domain conversation. arXiv preprint arXiv:2211.05719 (2022).
  153. CoPrompt: Supporting Prompt Sharing and Referring in Collaborative Natural Language Programming. http://arxiv.org/abs/2310.09235 arXiv:2310.09235 [cs].
  154. LayoutGPT: Compositional Visual Planning and Generation with Large Language Models. arXiv:2305.15393 [cs.CV]
  155. Promptbreeder: Self-referential self-improvement via prompt evolution. arXiv preprint arXiv:2309.16797 (2023).
  156. Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning. (2020).
  157. Distributed constraint optimization problems and applications: A survey. Journal of Artificial Intelligence Research 61 (2018), 623–698.
  158. Principled Artificial Intelligence: Mapping Consensus in Ethical and Rights-Based Approaches to Principles for AI. https://doi.org/10.2139/ssrn.3518482
  159. Counterfactual multi-agent policy gradients. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.
  160. Martin Ford. 2015. Rise of the Robots: Technology and the Threat of a Jobless Future. Basic Books.
  161. World Economic Forum. 2020. The Future of Jobs Report 2020. https://www.weforum.org/reports/the-future-of-jobs-report-2020.
  162. Jonathan Frankle and Michael Carbin. 2019. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. arXiv:1803.03635 [cs.LG]
  163. InCoder: A Generative Model for Code Infilling and Synthesis. arXiv:2204.05999 [cs.SE]
  164. Hungry Hungry Hippos: Towards Language Modeling with State Space Models. arXiv:2212.14052 [cs.LG]
  165. Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning. arXiv:2403.11401 [cs.CV]
  166. Text-guided 3D Human Generation from 2D Collections. arXiv:2305.14312 [cs.CV]
  167. Specializing Smaller Language Models towards Multi-Step Reasoning. arXiv preprint arXiv:2301.12726 (2023).
  168. Complexity-based prompting for multi-step reasoning. arXiv preprint arXiv:2210.00720 (2022).
  169. Yao Fu Jinjie Ni Zangwei Zheng Wangchunshu Zhou Fuzhao Xue, Zian Zheng and Yang You. 2023. OpenMoE: Open Mixture-of-Experts Language Models. https://github.com/XueFuzhao/OpenMoE.
  170. Iason Gabriel. 2020. Artificial intelligence, values, and alignment. Minds and Machines 30, 3 (2020), 411–437.
  171. A review on speech recognition technique. International Journal of Computer Applications 10, 3 (2010), 16–24.
  172. Krzysztof Z. Gajos and Lena Mamykina. 2022. Do People Engage Cognitively with AI? Impact of AI Assistance on Incidental Learning. In 27th International Conference on Intelligent User Interfaces (IUI ’22). Association for Computing Machinery, New York, NY, USA, 794–806. https://doi.org/10.1145/3490099.3511138
  173. MegaBlocks: Efficient Sparse Training with Mixture-of-Experts. arXiv:2211.15841 [cs.LG]
  174. Assistgui: Task-oriented desktop graphical user interface automation. arXiv preprint arXiv:2312.13108 (2023).
  175. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:2304.15010 (2023).
  176. Making pre-trained language models better few-shot learners. arXiv preprint arXiv:2012.15723 (2020).
  177. Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs. arXiv:2310.01801 [cs.CL]
  178. Planting a seed of vision in large language model. arXiv preprint arXiv:2307.08041 (2023).
  179. OpenAGI: When LLM Meets Domain Experts. In Advances in Neural Information Processing Systems (NeurIPS) (2023).
  180. Daniel George and EA Huerta. 2018. Deep learning for real-time gravitational wave detection and parameter estimation: Results with Advanced LIGO data. Physics Letters B 778 (2018), 64–70.
  181. Supporting Sensemaking of Large Language Model Outputs at Scale. https://doi.org/10.48550/arXiv.2401.13726 arXiv:2401.13726 [cs].
  182. Navigating to objects in the real world. Science Robotics 8, 79 (2023), eadf6991. https://doi.org/10.1126/scirobotics.adf6991 arXiv:https://www.science.org/doi/pdf/10.1126/scirobotics.adf6991
  183. Amirata Ghorbani and James Zou. 2019. Data Shapley: Equitable Valuation of Data for Machine Learning. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 2242–2251. https://proceedings.mlr.press/v97/ghorbani19c.html
  184. ChatGPT outperforms crowd workers for text-annotation tasks. Proceedings of the National Academy of Sciences 120, 30 (2023), e2305016120.
  185. Imagebind: One embedding space to bind them all. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15180–15190.
  186. Improving alignment of dialogue agents via targeted human judgements. arXiv preprint arXiv:2209.14375 (2022).
  187. Aligning language models with preferences through f-divergence minimization. arXiv preprint arXiv:2302.08215 (2023).
  188. Multimodal-gpt: A vision and language model for dialogue with humans. arXiv preprint arXiv:2305.04790 (2023).
  189. Generative adversarial networks. Commun. ACM 63, 11 (2020), 139–144.
  190. Critic: Large language models can self-correct with tool-interactive critiquing. arXiv preprint arXiv:2305.11738 (2023).
  191. Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013).
  192. Alex Graves and Alex Graves. 2012. Long short-term memory. Supervised sequence labelling with recurrent neural networks (2012), 37–45.
  193. OLMo: Accelerating the Science of Language Models. arXiv:2402.00838 [cs.CL]
  194. Designing LLM Chains by Adapting Techniques from Crowdsourcing Workflows. https://doi.org/10.48550/arXiv.2312.11681 arXiv:2312.11681 [cs].
  195. Albert Gu and Tri Dao. 2023. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv:2312.00752 [cs.LG]
  196. Efficiently Modeling Long Sequences with Structured State Spaces. arXiv:2111.00396 [cs.LG]
  197. On the Parameterization and Initialization of Diagonal State Space Models. arXiv:2206.11893 [cs.LG]
  198. Knowledge Distillation of Large Language Models. arXiv:2306.08543 [cs.CL]
  199. Textbooks Are All You Need. arXiv:2306.11644 [cs.CL]
  200. A comprehensive evaluation framework for deep model robustness. Pattern Recognition 137 (2023), 109308.
  201. Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models. arXiv preprint arXiv:2402.03749 (2024).
  202. General-Purpose Embodied AI Agent via Reinforcement Learning with Internet-Scale Knowledge. arXiv preprint arXiv:2212.09710 (2022).
  203. Diagonal State Spaces are as Effective as Structured State Spaces. arXiv:2203.14343 [cs.LG]
  204. Cooperative multi-agent control using deep reinforcement learning. In Autonomous Agents and Multiagent Systems: AAMAS 2017 Workshops, Best Papers, São Paulo, Brazil, May 8-12, 2017, Revised Selected Papers 16. Springer, 66–83.
  205. Retrieval augmented language model pre-training. In International conference on machine learning. PMLR, 3929–3938.
  206. Cooperative inverse reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 29.
  207. Mastering Diverse Domains through World Models. arXiv preprint arXiv:2301.04104 (2023).
  208. OneLLM: One Framework to Align All Modalities with Language. arXiv preprint arXiv:2312.03700 (2023).
  209. Imagebind-llm: Multi-modality instruction tuning. arXiv preprint arXiv:2309.03905 (2023).
  210. Proof Artifact Co-Training for Theorem Proving with Language Models. In International Conference on Learning Representations. https://openreview.net/forum?id=rpxJc9j04U
  211. EIE: Efficient Inference Engine on Compressed Deep Neural Network. arXiv:1602.01528 [cs.CV]
  212. Grounding language to entities and dynamics for generalization in reinforcement learning. In International Conference on Machine Learning. PMLR, 4051–4062.
  213. Reasoning with language model is planning with world model. arXiv preprint arXiv:2305.14992 (2023).
  214. Stevan Harnad. 1990. The symbol grounding problem. Physica D: Nonlinear Phenomena 42, 1-3 (1990), 335–346.
  215. Incorporating Visual Experts to Resolve the Information Loss in Multimodal Large Language Models. arXiv preprint arXiv:2401.03105 (2024).
  216. Towards the systematic reporting of the energy and carbon footprints of machine learning. Journal of Machine Learning Research 21, 248 (2020), 1–43.
  217. Measuring Coding Challenge Competence With APPS. arXiv:2105.09938 [cs.SE]
  218. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300 (2020).
  219. Dan Hendrycks and Mantas Mazeika. 2022. X-risk analysis for ai research. arXiv preprint arXiv:2206.05862 (2022).
  220. Distilling the Knowledge in a Neural Network. arXiv:1503.02531 [stat.ML]
  221. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851.
  222. Jonathan Ho and Tim Salimans. 2022. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022).
  223. Video Diffusion Models. arXiv:2204.03458 [cs.CV]
  224. Hal Hodson. 2016. Revealed: Google AI has access to huge haul of NHS patient data. https://www.newscientist.com/article/2086454-revealed-google-ai-has-access-to-huge-haul-of-nhs-patient-data/
  225. Training Compute-Optimal Large Language Models. arXiv:2203.15556 [cs.CL]
  226. DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference. arXiv:2401.08671 [cs.PF]
  227. FlashDecoding++: Faster Large Language Model Inference on GPUs. arXiv:2311.01282 [cs.LG]
  228. Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352 (2023).
  229. ScholarBERT: bigger is not always better. arXiv preprint arXiv:2205.11342 (2022).
  230. Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems (CHI ’99). Association for Computing Machinery, New York, NY, USA, 159–166. https://doi.org/10.1145/302979.303030
  231. Parameter-efficient transfer learning for NLP. In International conference on machine learning. PMLR, 2790–2799.
  232. Dirk Hovy and Shannon L. Spruit. 2016. The Social Impact of Natural Language Processing. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Katrin Erk and Noah A. Smith (Eds.). Association for Computational Linguistics, Berlin, Germany, 591–598. https://doi.org/10.18653/v1/P16-2096
  233. Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes. arXiv preprint arXiv:2305.02301 (2023).
  234. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
  235. AgentsCoDriver: Large Language Model Empowered Collaborative Driving with Lifelong Learning. arXiv:2404.06345 [cs.AI]
  236. SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code. arXiv:2403.01248 [cs.CV]
  237. Zhiting Hu and Tianmin Shu. 2023. Language Models, Agent Models, and World Models: The LAW for Machine Reasoning and Planning. arXiv:2312.05230 [cs.AI]
  238. LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition. arXiv:2307.13269 [cs.CL]
  239. Jie Huang and Kevin Chen-Chuan Chang. 2022. Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403 (2022).
  240. ChatGPT an ENFJ, Bard an ISTJ: Empirical Study on Personalities of Large Language Models. arXiv preprint arXiv:2305.19926 (2023).
  241. Audiogpt: Understanding and generating speech, music, sound, and talking head. arXiv preprint arXiv:2304.12995 (2023).
  242. Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model. arXiv preprint arXiv:2305.11176 (2023).
  243. Voxposer: Composable 3d value maps for robotic manipulation with language models. arXiv preprint arXiv:2307.05973 (2023).
  244. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608 (2022).
  245. In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models. arXiv preprint arXiv:2212.10670 (2022).
  246. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. arXiv:1811.06965 [cs.CV]
  247. Evan Hubinger. 2023. AI safety via market making. https://www.lesswrong.com/posts/YWwzccGbcHMJMpT45/ai-safety-via-market-making..
  248. GenAssist: Making Image Generation Accessible. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–17. https://doi.org/10.1145/3586183.3606735
  249. Automated Machine Learning: Methods, Systems, Challenges (1st ed.). Springer Publishing Company, Incorporated.
  250. Tutel: Adaptive Mixture-of-Experts at Scale. arXiv:2206.03382 [cs.DC]
  251. FedPara: Low-Rank Hadamard Product for Communication-Efficient Federated Learning. arXiv:2108.06098 [cs.LG]
  252. Editing Models with Task Arithmetic. arXiv:2212.04089 [cs.LG]
  253. MathPrompter: Mathematical Reasoning using Large Language Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track). 37–42.
  254. McKinsey Global Institute. 2017. Jobs lost, jobs gained: Workforce transitions in a time of automation. https://www.mckinsey.com/featured-insights/future-of-work/jobs-lost-jobs-gained-what-the-future-of-work-will-mean-for-jobs-skills-and-wages.
  255. Brookings Institution. 2021. How to combat America’s digital divide. https://www.brookings.edu/research/how-to-combat-americas-digital-divide/.
  256. Geoffrey Irving and Amanda Askell. 2019. AI safety needs social scientists. Distill 4, 2 (2019), e14.
  257. AI safety via debate. arXiv preprint arXiv:1805.00899 (2018).
  258. Brett W Israelsen. 2019. Algorithmic assurances and self-assessment of competency boundaries in autonomous systems. Ph. D. Dissertation. University of Colorado at Boulder.
  259. Frustration as a way toward autonomy and self-improvement in robotic navigation. In 2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL). IEEE, 1–7.
  260. From self-assessment to frustration, a small step toward autonomy in robotic navigation. Frontiers in neurorobotics 7 (2013), 16.
  261. Phi-2: The surprising power of small language models. Microsoft Research Blog (2023).
  262. Neural amortized inference for nested multi-agent reasoning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 530–537.
  263. Aligner: Achieving efficient alignment through weak-to-strong correction. arXiv preprint arXiv:2402.02416 (2024).
  264. Ai alignment: A comprehensive survey. arXiv preprint arXiv:2310.19852 (2023).
  265. Survey of hallucination in natural language generation. Comput. Surveys 55, 12 (2023), 1–38.
  266. Towards Efficient Data Valuation Based on the Shapley Value. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research, Vol. 89), Kamalika Chaudhuri and Masashi Sugiyama (Eds.). PMLR, 1167–1176. https://proceedings.mlr.press/v89/jia19a.html
  267. Beyond Data and Model Parallelism for Deep Neural Networks. arXiv:1807.05358 [cs.DC]
  268. Mistral 7B. arXiv:2310.06825 [cs.CL]
  269. MotionGPT: Human Motion as a Foreign Language. arXiv preprint arXiv:2306.14795 (2023).
  270. Evaluating and Inducing Personality in Pre-trained Language Models. (2022).
  271. Graphologue: Exploring Large Language Model Responses with Interactive Diagrams. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–20. https://doi.org/10.1145/3586183.3606737
  272. Lion: Adversarial Distillation of Closed-Source Large Language Model. arXiv preprint arXiv:2305.12870 (2023).
  273. Vima: General robot manipulation with multimodal prompts. arXiv (2022).
  274. HexGen: Generative Inference of Foundation Model over Heterogeneous Decentralized Environment. arXiv:2311.11514 [cs.DC]
  275. SWE-bench: Can Language Models Resolve Real-World GitHub Issues? arXiv:2310.06770 [cs.CL]
  276. Mmtom-qa: Multimodal theory of mind question answering. arXiv preprint arXiv:2401.08743 (2024).
  277. The Cultural Psychology of Large Language Models: Is ChatGPT a Holistic or Analytic Thinker? arXiv preprint arXiv:2308.14242 (2023).
  278. S3: Increasing GPU Utilization during Generative Inference for Higher Throughput. arXiv:2306.06000 [cs.AR]
  279. The global landscape of AI ethics guidelines. Nature Machine Intelligence 1, 9 (Sept. 2019), 389–399. https://doi.org/10.1038/s42256-019-0088-2 Publisher: Nature Publishing Group.
  280. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. arXiv:1705.03551 [cs.CL]
  281. TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings. arXiv:2304.01433 [cs.AR]
  282. Highly accurate protein structure prediction with AlphaFold. Nature 596, 7873 (2021), 583–589.
  283. Maieutic prompting: Logically consistent reasoning with recursive explanations. arXiv preprint arXiv:2205.11822 (2022).
  284. Scaling Laws for Neural Language Models. arXiv preprint arXiv:2001.08361 (2020).
  285. OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web. arXiv preprint arXiv:2402.17553 (2024).
  286. Data Debugging with Shapley Importance over End-to-End Machine Learning Pipelines. arXiv:2204.11131 [cs.LG]
  287. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906 (2020).
  288. Understanding Contestability on the Margins: Implications for the Design of Algorithmic Decision-making in Public Services. (2024).
  289. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention. arXiv:2006.16236 [cs.LG]
  290. CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Student and Educator Needs. https://doi.org/10.1145/3613904.3642773 arXiv:2401.11314 [cs].
  291. Alignment of Language Agents. https://doi.org/10.48550/arXiv.2103.14659 arXiv:2103.14659 [cs].
  292. Ben Kenward and Thomas Sinclair. 2021. Machine morality, moral progress, and the looming environmental disaster.
  293. DreamIX: DreamFusion via Iterative Spatiotemporal Mixing. arXiv preprint arXiv:2212.04508 (2022).
  294. Demonstrate-Search-Predict: Composing Retrieval and Language Models for Knowledge-Intensive NLP. arXiv preprint arXiv:2212.14024 (2022).
  295. DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines. arXiv preprint arXiv:2310.03714 (2023).
  296. DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset. arXiv:2403.12945 [cs.RO]
  297. Decomposed prompting: A modular approach for solving complex tasks. arXiv preprint arXiv:2210.02406 (2022).
  298. Aligning Large Language Models through Synthetic Feedback. arXiv preprint arXiv:2305.13735 (2023).
  299. EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria. https://doi.org/10.1145/3613904.3642216 arXiv:2309.13633 [cs].
  300. Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
  301. Generating images with multimodal language models. arXiv preprint arXiv:2305.17216 (2023).
  302. Grounding language models to images for multimodal inputs and outputs. In International Conference on Machine Learning. PMLR, 17283–17300.
  303. Large language models are zero-shot reasoners. Advances in neural information processing systems 35 (2022), 22199–22213.
  304. Bart Kosko. 1992. Neural networks and fuzzy systems: a dynamical systems approach to machine intelligence. Prentice-Hall, Inc.
  305. Automated Scientific Discovery: From Equation Discovery to Autonomous Discovery Systems. arXiv preprint arXiv:2305.02251 (2023).
  306. Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference. arXiv:2110.03742 [cs.CL]
  307. ZipLM: Inference-Aware Structured Pruning of Language Models. arXiv:2302.04089 [cs.LG]
  308. Natural Questions: A Benchmark for Question Answering Research. Transactions of the Association for Computational Linguistics 7 (2019), 452–466. https://doi.org/10.1162/tacl_a_00276
  309. Unspeaking on Facebook? Testing network effects on self-censorship of political expressions in social network sites. Quality & Quantity 49, 4 (July 2015), 1417–1435. https://doi.org/10.1007/s11135-014-0078-8
  310. Reward design with language models. arXiv preprint arXiv:2303.00001 (2023).
  311. Efficient Memory Management for Large Language Model Serving with PagedAttention. arXiv:2309.06180 [cs.LG]
  312. DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation. arXiv:2211.11501 [cs.SE]
  313. Brenden M Lake and Marco Baroni. 2023. Human-like systematic generalization through a meta-learning neural network. Nature (2023), 1–7.
  314. Lambda. 2023. OpenAI’s GPT-3 Language Model: A Technical Overview. https://lambdalabs.com/blog/demystifying-gpt-3
  315. Obelics: An open web-scale filtered dataset of interleaved image-text documents. Advances in Neural Information Processing Systems 36 (2024).
  316. Nam Le. 2019. Evolving Self-supervised Neural Networks Autonomous Intelligence from Evolved Self-teaching. arXiv:1906.08865 [cs.NE]
  317. Yann LeCun. 2022. A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27. Open Review 62, 1 (2022).
  318. Rlaif: Scaling reinforcement learning from human feedback with ai feedback. arXiv preprint arXiv:2309.00267 (2023).
  319. Deepfakes, Phrenology, Surveillance, and More! A Taxonomy of AI Privacy Risks. https://doi.org/10.48550/arXiv.2310.07879 arXiv:2310.07879 [cs].
  320. A Design Space for Intelligent and Interactive Writing Assistants. https://doi.org/10.1145/3613904.3642697 arXiv:2403.14117 [cs].
  321. CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities. In CHI Conference on Human Factors in Computing Systems. 1–19. https://doi.org/10.1145/3491102.3502030 arXiv:2201.06796 [cs].
  322. Scalable agent alignment via reward modeling: a research direction. arXiv preprint arXiv:1811.07871 (2018).
  323. HILL: A Hallucination Identifier for Large Language Models. http://arxiv.org/abs/2403.06710 arXiv:2403.06710 [cs].
  324. The Power of Scale for Parameter-Efficient Prompt Tuning. arXiv:2104.08691 [cs.CL]
  325. Fast Inference from Transformers via Speculative Decoding. arXiv:2211.17192 [cs.LG]
  326. Sam Levin. 2017. New AI can guess whether you’re gay or straight from a photograph. The Guardian (Sept. 2017). https://www.theguardian.com/technology/2017/sep/07/new-artificial-intelligence-can-tell-whether-youre-gay-or-straight-from-a-photograph
  327. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33 (2020), 9459–9474.
  328. Solving quantitative reasoning problems with language models. arXiv preprint arXiv:2206.14858 (2022).
  329. OtterHD: A High-Resolution Multi-modality Model. arXiv preprint arXiv:2311.04219 (2023).
  330. Otter: A multi-modal model with in-context instruction tuning. arXiv preprint arXiv:2305.03726 (2023).
  331. BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation. arXiv:2403.09227 [cs.RO]
  332. Camel: Communicative agents for" mind" exploration of large scale language model society. arXiv preprint arXiv:2303.17760 (2023).
  333. Ethics of large language models in medicine and medical research. The Lancet Digital Health 5, 6 (2023), e333–e335.
  334. Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. Advances in Neural Information Processing Systems 36 (2024).
  335. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. ICML (2023).
  336. OmniActions: Predicting Digital Actions in Response to Real-World Multimodal Sensory Inputs with LLMs. (2024).
  337. Videochat: Chat-centric video understanding. arXiv preprint arXiv:2305.06355 (2023).
  338. Silkie: Preference distillation for large visual language models. arXiv preprint arXiv:2312.10665 (2023).
  339. Symbolic chain-of-thought distillation: Small models can also" think" step-by-step. arXiv preprint arXiv:2306.14050 (2023).
  340. DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models. arXiv:2402.19481 [cs.CV]
  341. Reflection-tuning: Data recycling improves llm instruction-tuning. arXiv preprint arXiv:2310.11716 (2023).
  342. StarCoder: may the source be with you! arXiv:2305.06161 [cs.CL]
  343. Explanations from large language models make small reasoners better. arXiv preprint arXiv:2210.06726 (2022).
  344. Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190 (2021).
  345. Textbooks are all you need ii: phi-1.5 technical report. arXiv preprint arXiv:2309.05463 (2023).
  346. Competition-level code generation with AlphaCode. Science 378, 6624 (Dec. 2022), 1092–1097. https://doi.org/10.1126/science.abq1158
  347. Evaluating object hallucination in large vision-language models. arXiv preprint arXiv:2305.10355 (2023).
  348. Guiding large language models via directional stimulus prompting. Advances in Neural Information Processing Systems 36 (2024).
  349. LEGO: Language Enhanced Multi-modal Grounding Model. arXiv preprint arXiv:2401.06071 (2024).
  350. Holistic Evaluation of Language Models. arXiv:2211.09110 [cs.CL]
  351. Q. Vera Liao and Jennifer Wortman Vaughan. 2023. AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap. https://doi.org/10.48550/arXiv.2306.01941 arXiv:2306.01941 [cs].
  352. Jamba: A Hybrid Transformer-Mamba Language Model. arXiv:2403.19887 [cs.CL]
  353. Let’s Verify Step by Step. arXiv preprint arXiv:2305.20050 (2023).
  354. Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache. arXiv:2401.02669 [cs.DC]
  355. The unlocking spell on base llms: Rethinking alignment via in-context learning. arXiv preprint arXiv:2312.01552 (2023).
  356. AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration. arXiv:2306.00978 [cs.CL]
  357. VILA: On Pre-training for Visual Language Models. arXiv:2312.07533 [cs.CV]
  358. On-Device Training Under 256KB Memory. arXiv:2206.15472 [cs.CV]
  359. TruthfulQA: Measuring How Models Mimic Human Falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, Dublin, Ireland, 3214–3252. https://doi.org/10.18653/v1/2022.acl-long.229
  360. Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation. https://doi.org/10.1145/3613904.3642217 arXiv:2401.10838 [cs].
  361. QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving. arXiv:2405.04532 [cs.CL]
  362. Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv (2022). https://doi.org/10.1101/2022.07.20.500902 arXiv:https://www.biorxiv.org/content/early/2022/07/21/2022.07.20.500902.full.pdf
  363. SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models. ArXiv abs/2311.07575 (2023). https://api.semanticscholar.org/CorpusID:265150267
  364. We’re Afraid Language Models Aren’t Modeling Ambiguity. arXiv preprint arXiv:2304.14399 (2023).
  365. Llm+ p: Empowering large language models with optimal planning proficiency. arXiv preprint arXiv:2304.11477 (2023).
  366. Visual instruction tuning. arXiv preprint arXiv:2304.08485 (2023).
  367. Chain of hindsight aligns language models with feedback. arXiv preprint arXiv:2302.02676 3 (2023).
  368. Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning. arXiv:2205.05638 [cs.LG]
  369. World Model on Million-Length Video And Language With RingAttention. arXiv:2402.08268 [cs.LG]
  370. Ring Attention with Blockwise Transformers for Near-Infinite Context. arXiv:2310.01889 [cs.CL]
  371. CLIP-Layout: Style-Consistent Indoor Scene Synthesis with Semantic Furniture Embedding. arXiv:2303.03565 [cs.CV]
  372. Second thoughts are best: Learning to re-align with human values from text edits. Advances in Neural Information Processing Systems 35 (2022), 181–196.
  373. Best Practices and Lessons Learned on Synthetic Data for Language Models. arXiv preprint arXiv:2404.07503 (2024).
  374. Aligning generative language models with human values. In Findings of the Association for Computational Linguistics: NAACL 2022. 241–252.
  375. EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention. arXiv:2305.07027 [cs.CV]
  376. AgentBench: Evaluating LLMs as Agents. In The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=zAdUB0aCTQ
  377. CacheGen: Fast Context Loading for Language Model Applications. arXiv:2310.07240 [cs.NI]
  378. Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time. arXiv:2305.17118 [cs.LG]
  379. Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time. arXiv:2310.17157 [cs.LG]
  380. Dynamic llm-agent network: An llm-agent collaboration framework with agent team optimization. arXiv preprint arXiv:2310.02170 (2023).
  381. Katherine Anne Long. 2021. Amazon and Microsoft team up to defend against facial recognition lawsuits. https://www.seattletimes.com/business/technology/facial-recognition-lawsuits-against-amazon-and-microsoft-can-proceed-judge-rules/
  382. The flan collection: Designing data and methods for effective instruction tuning. In International Conference on Machine Learning. PMLR, 22631–22648.
  383. Dengsheng Lu and Qihao Weng. 2007. A survey of image classification methods and techniques for improving classification performance. International journal of Remote sensing 28, 5 (2007), 823–870.
  384. Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models. arXiv:2311.08692 [cs.CL]
  385. Cheap and quick: Efficient vision-language instruction tuning for large language models. arXiv preprint arXiv:2305.15023 (2023).
  386. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Briefings in bioinformatics 23, 6 (2022), bbac409.
  387. Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration. arXiv preprint arXiv:2306.09093 (2023).
  388. Towards Faithful Model Explanation in NLP: A Survey. https://doi.org/10.48550/arXiv.2209.11326 arXiv:2209.11326 [cs].
  389. Large language models play starcraft ii: Benchmarks and a chain of summarization approach. arXiv preprint arXiv:2312.11865 (2023).
  390. Eureka: Human-Level Reward Design via Coding Large Language Models. arXiv:2310.12931 [cs.RO]
  391. Self-refine: Iterative refinement with self-feedback. Advances in Neural Information Processing Systems 36 (2024).
  392. PriorBand: Practical Hyperparameter Optimization in the Age of Deep Learning. arXiv:2306.12370 [cs.LG]
  393. GPT-Driver: Learning to Drive with GPT. arXiv:2310.01415 [cs.CV]
  394. A Language Agent for Autonomous Driving. arXiv:2311.10813 [cs.CV]
  395. Andres Marzal and Enrique Vidal. 1993. Computation of normalized edit distance and applications. IEEE transactions on pattern analysis and machine intelligence 15, 9 (1993), 926–932.
  396. Molecular Optimization using Language Models. arXiv preprint arXiv:2210.00299 (2022).
  397. Kris McGuffie and Alex Newhouse. 2020. The Radicalization Risks of GPT-3 and Advanced Neural Language Models. https://doi.org/10.48550/arXiv.2009.06807 arXiv:2009.06807 [cs].
  398. The inadequacy of reinforcement learning from human feedback-radicalizing large language models via semantic vulnerabilities. IEEE Transactions on Cognitive and Developmental Systems (2024).
  399. David A Medler. 1998. A brief history of connectionism. Neural computing surveys 1 (1998), 18–72.
  400. AIOS: LLM Agent Operating System. arXiv:2403.16971 [cs.OS]
  401. Bahar Memarian and Tenzin Doleck. 2023. Fairness, Accountability, Transparency, and Ethics (FATE) in Artificial Intelligence (AI), and higher education: A systematic review. Computers and Education: Artificial Intelligence (2023), 100152.
  402. Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073 (2021).
  403. PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models. arXiv preprint arXiv:2404.02948 (2024).
  404. GAIA: a benchmark for General AI Assistants. arXiv:2311.12983 [cs.CL]
  405. Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems. arXiv:2312.15234 [cs.LG]
  406. SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification. arXiv:2305.09781 [cs.CL]
  407. Dan Milmo. 2021. Amazon asks Ring owners to respect privacy after court rules usage broke law. The Guardian (Oct. 2021). https://www.theguardian.com/uk-news/2021/oct/14/amazon-asks-ring-owners-to-respect-privacy-after-court-rules-usage-broke-law
  408. Smartphone-Based Conversational Agents and Responses to Questions About Mental Health, Interpersonal Violence, and Physical Health. JAMA Internal Medicine 176, 5 (May 2016), 619. https://doi.org/10.1001/jamainternmed.2016.0400
  409. Fast model editing at scale. arXiv preprint arXiv:2110.11309 (2021).
  410. Model Cards for Model Reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT* ’19). Association for Computing Machinery, New York, NY, USA, 220–229. https://doi.org/10.1145/3287560.3287596
  411. MLC team. 2023. MLC-LLM. https://github.com/mlc-ai/mlc-llm
  412. Amirkeivan Mohtashami and Martin Jaggi. 2023. Landmark Attention: Random-Access Infinite Context Length for Transformers. arXiv:2305.16300 [cs.CL]
  413. Levels of AGI: Operationalizing Progress on the Path to AGI. arXiv:2311.02462 [cs.AI]
  414. AI-assisted Code Authoring at Scale: Fine-tuning, deploying, and mixed methods evaluation. arXiv:2305.12050 [cs.SE]
  415. Past, present, and future of user interface software tools. ACM Transactions on Computer-Human Interaction 7, 1 (2000), 3–28. https://doi.org/10.1145/344949.344959
  416. Meenakshi Nadimpalli. 2017. Artificial intelligence risks and benefits. International Journal of Innovative Research in Science, Engineering and Technology 6, 6 (2017).
  417. LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Optimization. arXiv:2306.01102 [cs.NE]
  418. LLMs for Science: Usage for Code Generation and Data Analysis. arXiv preprint arXiv:2311.16733 (2023).
  419. Evaluating the Robustness to Instructions of Large Language Models. arXiv preprint arXiv:2308.14306 (2023).
  420. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021).
  421. Sergey I Nikolenko. 2021. Synthetic data for deep learning. Vol. 174. Springer.
  422. ScreenAgent: A Vision Language Model-driven Computer Control Agent. arXiv preprint arXiv:2402.07945 (2024).
  423. NVIDIA. 2023a. FasterTransformer. https://github.com/NVIDIA/FasterTransformer.
  424. NVIDIA. 2023b. TensorRT-LLM: A TensorRT Toolbox for Optimized Large Language Model Inference. https://github.com/NVIDIA/TensorRT-LLM.
  425. Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114 (2021).
  426. I Lead, You Help but Only with Enough Details: Understanding User Experience of Co-Creation with Artificial Intelligence. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3173574.3174223
  427. OpenAI. 2018. Charter. https://www.openai.com/charter/.
  428. OpenAI. 2023a. GPT-4 Technical Report. Technical Report. OpenAI.
  429. OpenAI. 2023b. GPT-4v(ision) Technical Work and Authors. Technical Report. OpenAI. https://cdn.openai.com/contributions/gpt-4v.pdf
  430. OpenAI. 2024. Hello GPT-4o. https://openai.com/index/hello-gpt-4o/
  431. Training language models to follow instructions with human feedback. Advances in neural information processing systems 35 (2022), 27730–27744.
  432. Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442 (2023).
  433. Kaushikkumar Patel. 2024. Ethical reflections on data-centric AI: balancing benefits and risks. International Journal of Artificial Intelligence Research and Development 2, 1 (2024), 1–17.
  434. Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334 (2023).
  435. William Peebles and Saining Xie. 2023. Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4195–4205.
  436. Rwkv: Reinventing rnns for the transformer era. arXiv preprint arXiv:2305.13048 (2023).
  437. Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence. arXiv:2404.05892 [cs.CL]
  438. Instruction Tuning with GPT-4. arXiv:2304.03277 [cs.CL]
  439. Hyena Hierarchy: Towards Larger Convolutional Language Models. arXiv:2302.10866 [cs.LG]
  440. Mechanistic Design and Scaling of Hybrid Architectures. arXiv:2403.17844 [cs.LG]
  441. Stanislas Polu and Ilya Sutskever. 2022. Formal mathematics statement curriculum learning. arXiv preprint arXiv:2202.01344 (2022).
  442. Efficiently Scaling Transformer Inference. arXiv:2211.05102 [cs.LG]
  443. The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society.
  444. Predibase. 2023. Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs. https://github.com/predibase/lorax.
  445. A review of nuclear batteries. Progress in Nuclear Energy 75 (2014), 117–148. https://doi.org/10.1016/j.pnucene.2014.04.007
  446. David Premack and Guy Woodruff. 1978. Does the chimpanzee have a theory of mind? Behavioral and brain sciences 1, 4 (1978), 515–526.
  447. Watch-and-help: A challenge for social perception and human-ai collaboration. arXiv preprint arXiv:2010.09890 (2020).
  448. Is Next Token Prediction Sufficient for GPT? Exploration on Code Logic Comprehension. arXiv preprint arXiv:2404.08885 (2024).
  449. Visual adversarial examples jailbreak aligned large language models. In The Second Workshop on New Frontiers in Adversarial Machine Learning, Vol. 1.
  450. Communicative agents for software development. arXiv preprint arXiv:2307.07924 (2023).
  451. CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 6922–6939. https://doi.org/10.18653/v1/2023.findings-emnlp.462
  452. Creator: Disentangling abstract and concrete reasonings of large language models through tool creation. arXiv preprint arXiv:2305.14318 (2023).
  453. Investigate-Consolidate-Exploit: A General Strategy for Inter-Task Agent Self-Evolution. arXiv:2401.13996 [cs.CL]
  454. Guanghui Qin and Jason Eisner. 2021. Learning how to ask: Querying LMs with mixtures of soft prompts. arXiv preprint arXiv:2104.06599 (2021).
  455. Tool learning with foundation models. arXiv preprint arXiv:2304.08354 (2023).
  456. Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789 (2023).
  457. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
  458. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
  459. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems 36 (2024).
  460. Trivellore E Raghunathan. 2021. Synthetic data. Annual review of statistics and its application 8 (2021), 129–140.
  461. SQuAD: 100,000+ Questions for Machine Comprehension of Text. arXiv:1606.05250 [cs.CL]
  462. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 1, 2 (2022), 3.
  463. Tailoring self-rationalizers with multi-reward distillation. arXiv preprint arXiv:2311.02805 (2023).
  464. Waseem Rawat and Zenghui Wang. 2017. Deep convolutional neural networks for image classification: A comprehensive review. Neural computation 29, 9 (2017), 2352–2449.
  465. Android in the wild: A large-scale dataset for android device control. arXiv preprint arXiv:2307.10088 (2023).
  466. Shahana Rayhan. 2023. Ethical Implications of Creating AGI: Impact on Human Society, Privacy, and Power Dynamics. Artificial Intelligence Review (2023).
  467. Coqa: A conversational question answering challenge. Transactions of the Association for Computational Linguistics 7 (2019), 249–266.
  468. Acquisition of Multimodal Models via Retrieval. arXiv preprint arXiv:2302.02916 (2023).
  469. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv preprint arXiv:2403.05530 (2024).
  470. Identifying quantum phase transitions with adversarial neural networks. Nature Physics 15, 9 (2019), 917–920.
  471. Samreen Rizvi. 2023. Blockchain-Based LLMs: A Game Changer for Data Privacy Protection. https://www.dataversity.net/blockchain-based-llms-a-game-changer-for-data-privacy-protection
  472. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
  473. Efficient Content-Based Sparse Attention with Routing Transformers. arXiv:2003.05997 [cs.LG]
  474. Code Llama: Open Foundation Models for Code. arXiv:2308.12950 [cs.CL]
  475. Learning to retrieve prompts for in-context learning. arXiv preprint arXiv:2112.08633 (2021).
  476. Sebastian Ruder. 2020. Why You Should Do NLP Beyond English. http://ruder.io/nlp-beyond-english.
  477. Stuart Russell. 2019. Human Compatible: Artificial Intelligence and the Problem of Control. Viking.
  478. SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient. arXiv:2301.11913 [cs.DC]
  479. Tandem Transformers for Inference Efficient LLMs. arXiv:2402.08644 [cs.AI]
  480. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 conference proceedings. 1–10.
  481. Whose opinions do language models reflect?. In International Conference on Machine Learning. PMLR, 29971–30004.
  482. Neural theory-of-mind? on the limits of social intelligence in large lms. arXiv preprint arXiv:2210.13312 (2022).
  483. Testing the general deductive reasoning capacity of large language models using ood examples. Advances in Neural Information Processing Systems 36 (2024).
  484. Apoorv Saxena. 2023. Prompt Lookup Decoding. https://github.com/apoorvumang/prompt-lookup-decoding/
  485. Training language models with language feedback at scale. arXiv preprint arXiv:2303.16755 (2023).
  486. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761 (2023).
  487. Timo Schick and Hinrich Schütze. 2020. Exploiting cloze questions for few shot text classification and natural language inference. arXiv preprint arXiv:2001.07676 (2020).
  488. Self-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in nlp. Transactions of the Association for Computational Linguistics 9 (2021), 1408–1424.
  489. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588, 7839 (Dec. 2020), 604–609. https://doi.org/10.1038/s41586-020-03051-4
  490. Green ai. Commun. ACM 63, 12 (2020), 54–63.
  491. Charbel-Raphaël Segerie. 2023. Task decomposition for scalable oversight (AGISF Distillation). https://www.lesswrong.com/posts/FFz6H35Gy6BArHxkc/task-decomposition-for-scalable-oversight-agisf-distillation.
  492. AI-Augmented Brainwriting: Investigating the use of LLMs in group ideation. https://doi.org/10.48550/arXiv.2402.14978 arXiv:2402.14978 [cs].
  493. Blockchain for Deep Learning: Review and Open Challenges. (Oct. 2021). https://doi.org/10.36227/techrxiv.16823140.v1
  494. Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action. In Conference on Robot Learning. PMLR, 492–504.
  495. Murray Shanahan and Catherine Clarke. 2023. Evaluating Large Language Model Creativity from a Literary Perspective. arXiv preprint arXiv:2312.03746 (2023).
  496. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538 (2017).
  497. Mathbert: A pre-trained language model for general nlp tasks in mathematics education. arXiv preprint arXiv:2106.07340 (2021).
  498. Large language model alignment: A survey. arXiv preprint arXiv:2309.15025 (2023).
  499. Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580 (2023).
  500. S-LoRA: Serving Thousands of Concurrent LoRA Adapters. arXiv:2311.03285 [cs.LG]
  501. FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU. arXiv:2303.06865 [cs.LG]
  502. Large language models can be easily distracted by irrelevant context. In International Conference on Machine Learning. PMLR, 31210–31227.
  503. Yell At Your Robot: Improving On-the-Fly from Language Corrections. arXiv:2403.12910 [cs.RO]
  504. Haotian Liu Hao Zhang Feng Li Tianhe Ren Xueyan Zou Jianwei Yang Hang Su Jun Zhu Lei Zhang Jianfeng Gao Chunyuan Li Shilong Liu, Hao Cheng. 2023. LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents. arXiv preprint arXiv:2311.05437 (2023).
  505. BioMegatron: Larger Biomedical Domain Language Model. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 4700–4706. https://doi.org/10.18653/v1/2020.emnlp-main.379
  506. Reflexion: Language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems.
  507. Ben Shneiderman and Pattie Maes. 1997. Direct manipulation vs. interface agents. Interactions 4, 6 (1997), 42–61. https://doi.org/10.1145/267505.267514
  508. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv:1909.08053 [cs.CL]
  509. Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation. arXiv:2209.05451 [cs.RO]
  510. Audio-Visual LLM for Video Understanding. arXiv preprint arXiv:2312.06720 (2023).
  511. Retrieval augmentation reduces hallucination in conversation. arXiv preprint arXiv:2104.07567 (2021).
  512. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484–489.
  513. Mastering the game of go without human knowledge. nature 550, 7676 (2017), 354–359.
  514. Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 11523–11530.
  515. Simplified State Space Layers for Sequence Modeling. arXiv:2208.04933 [cs.LG]
  516. Nate Soares. 2016. The value learning problem. Machine Intelligence Research Institute Technical Report 4 (2016).
  517. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning. PMLR, 2256–2265.
  518. Llm-planner: Few-shot grounded planning for embodied agents with large language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2998–3009.
  519. Preference ranking optimization for human alignment. arXiv preprint arXiv:2306.17492 (2023).
  520. Yang Song and Stefano Ermon. 2019. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems 32 (2019).
  521. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456 (2020).
  522. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615 (2022).
  523. Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation. arXiv:2310.02368 [cs.SE]
  524. Learning to summarize with human feedback. Advances in Neural Information Processing Systems 33 (2020), 3008–3021.
  525. Energy and policy considerations for deep learning in NLP. arXiv preprint arXiv:1906.02243 (2019).
  526. Sheldon Stryker. 1959. Symbolic interaction as an approach to family research. Marriage and Family Living 21, 2 (1959), 111–119.
  527. Pandagpt: One model to instruction-follow them all. arXiv preprint arXiv:2305.16355 (2023).
  528. Evaluating model robustness and stability to dataset shift. In International conference on artificial intelligence and statistics. PMLR, 2611–2619.
  529. Luminate: Structured Generation and Exploration of Design Space with Large Language Models for Human-AI Co-Creation. https://doi.org/10.1145/3613904.3642400 arXiv:2310.12953 [cs].
  530. Sensecape: Enabling Multilevel Exploration and Sensemaking with Large Language Models. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–18. https://doi.org/10.1145/3586183.3606756
  531. Learning multiagent communication with backpropagation. Advances in neural information processing systems 29 (2016).
  532. Cognitive architectures for language agents. arXiv preprint arXiv:2309.02427 (2023).
  533. 3D-GPT: Procedural 3D Modeling with Large Language Models. arXiv preprint arXiv:2310.12945 (2023).
  534. Generative pretraining in multimodality. arXiv preprint arXiv:2307.05222 (2023).
  535. Retentive Network: A Successor to Transformer for Large Language Models. ArXiv abs/2307.08621 (2023). https://api.semanticscholar.org/CorpusID:259937453
  536. Principle-driven self-alignment of language models from scratch with minimal human supervision. arXiv preprint arXiv:2305.03047 (2023).
  537. Principle-driven self-alignment of language models from scratch with minimal human supervision. Advances in Neural Information Processing Systems 36 (2024).
  538. Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision. arXiv preprint arXiv:2403.09472 (2024).
  539. Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296 (2017).
  540. An Empirical Study of Multimodal Model Merging. arXiv:2304.14933 [cs.CV]
  541. Beyond Expertise and Roles: A Framework to Characterize the Stakeholders of Interpretable Machine Learning and their Needs. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI ’21). Association for Computing Machinery, New York, NY, USA, 1–16. https://doi.org/10.1145/3411764.3445088
  542. Merging by Matching Models in Task Subspaces. arXiv:2312.04339 [cs.LG]
  543. Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models. http://arxiv.org/abs/2102.02503 arXiv:2102.02503 [cs].
  544. Towards general computer control: A multimodal agent for red dead redemption ii as a case study. arXiv preprint arXiv:2403.03186 (2024).
  545. MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning. arXiv:2311.10537 [cs.CL]
  546. FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs. arXiv:2309.01172 [cs.DC]
  547. Stanford Alpaca: An Instruction-following LLaMA model. https://github.com/tatsu-lab/stanford_alpaca.
  548. Efficient transformers: A survey. Comput. Surveys 55, 6 (2022), 1–28.
  549. Galactica: A large language model for science. arXiv preprint arXiv:2211.09085 (2022).
  550. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023).
  551. SuperBench Team. 2023. SuperBench is Measuring LLMs in The Open: A Critical Analysis.
  552. Max Tegmark. 2017. Life 3.0: Being Human in the Age of Artificial Intelligence. Knopf.
  553. The Economic Times. 2024. China develops groundbreaking nuclear battery that can last 50 years without charging. https://economictimes.indiatimes.com/news/international/business/china-introduces-revolutionary-nuclear-battery-that-lasts-50-years-without-charging/articleshow/106880627.cms?from=mdr
  554. Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction. https://api.semanticscholar.org/CorpusID:268876071
  555. DebugBench: Evaluating Debugging Capability of Large Language Models. arXiv:2401.04621 [cs.SE]
  556. Triton: an intermediate language and compiler for tiled neural network computations. In Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (Phoenix, AZ, USA) (MAPL 2019). Association for Computing Machinery, New York, NY, USA, 10–19. https://doi.org/10.1145/3315508.3329973
  557. Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models. arXiv:2205.10770 [cs.CL]
  558. Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs. ArXiv abs/2401.06209 (2024). https://api.semanticscholar.org/CorpusID:266976992
  559. AutoML in the Age of Large Language Models: Current Challenges, Future Opportunities and Risks. arXiv:2306.08107 [cs.LG]
  560. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
  561. How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs. arXiv preprint arXiv:2311.16101 (2023).
  562. ReSee: Responding through Seeing Fine-grained Visual Knowledge in Open-domain Dialogue. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 7720–7735.
  563. Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics. In NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following.
  564. A. M. Turing. 1950. Computing Machinery and Intelligence. Mind 59, 236 (1950), 433–460. http://www.jstor.org/stable/2251299
  565. Victor Turner. 1975. Symbolic studies. Annual review of anthropology 4, 1 (1975), 145–161.
  566. RetroTRAE: retrosynthetic translation of atomic environments with Transformer. (2022).
  567. Can Large Language Models Identify And Reason About Security Vulnerabilities? Not Yet. arXiv:2312.12575 [cs.CR]
  568. Tomer Ullman. 2023. Large language models fail on trivial alterations to theory-of-mind tasks. arXiv preprint arXiv:2302.08399 (2023).
  569. “The less I type, the better”: How AI Language Models can Enhance or Impede Communication for AAC Users. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3544548.3581560
  570. Attention is all you need. Advances in neural information processing systems 30 (2017).
  571. SimNet: Learning Simulation-Based World Models for Physical Reasoning. In International Conference on Learning Representations.
  572. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575 (2019), 350 – 354. https://api.semanticscholar.org/CorpusID:204972004
  573. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350–354.
  574. Peter Voss and Mladjan Jovanovic. 2023. Concepts is All You Need: A More Direct Path to AGI. arXiv preprint arXiv:2309.01622 (2023).
  575. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harvard Journal of Law & Technology 31 (2017), 841.
  576. GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration. arXiv:2311.12015 [cs.RO]
  577. Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference. arXiv:2303.04673 [cs.CL]
  578. Designing Theory-Driven User-Centric Explainable AI. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–15. https://doi.org/10.1145/3290605.3300831
  579. What Makes for Good Visual Tokenizers for Large Language Models? arXiv preprint arXiv:2305.12223 (2023).
  580. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291 (2023).
  581. The Impact of Deep Learning on Organizational Agility.. In ICIS.
  582. CocktailSGD: Fine-tuning Foundation Models over 500Mbps Networks. In Proceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 36058–36076. https://proceedings.mlr.press/v202/wang23t.html
  583. Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees. arXiv:2206.01299 [cs.LG]
  584. Pei Wang and Patrick Hammer. 2018. Perception from an AGI perspective. In Artificial General Intelligence: 11th International Conference, AGI 2018, Prague, Czech Republic, August 22-25, 2018, Proceedings 11. Springer, 259–269.
  585. Conceptions of artificial intelligence and singularity. Information 9, 4 (2018), 79.
  586. Visionllm: Large language model is also an open-ended decoder for vision-centric tasks. arXiv preprint arXiv:2305.11175 (2023).
  587. LightSeq: A High Performance Inference Library for Transformers. arXiv:2010.13887 [cs.MS]
  588. Seggpt: Segmenting everything in context. arXiv preprint arXiv:2304.03284 (2023).
  589. Mementos: A comprehensive benchmark for multimodal large language model reasoning over image sequences. arXiv preprint arXiv:2401.10529 (2024).
  590. Self-instruct: Aligning language models with self-generated instructions. arXiv preprint arXiv:2212.10560 (2022).
  591. Self-Instruct: Aligning Language Models with Self-Generated Instructions. arXiv:2212.10560 [cs.CL]
  592. Super-naturalinstructions: Generalization via declarative instructions on 1600+ nlp tasks. arXiv preprint arXiv:2204.07705 (2022).
  593. Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur) 53, 3 (2020), 1–34.
  594. Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models. arXiv preprint arXiv:2311.05997 (2023).
  595. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560 (2023).
  596. De-Diffusion Makes Text a Strong Cross-Modal Interface. arXiv preprint arXiv:2311.00618 (2023).
  597. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652 (2021).
  598. Emergent Abilities of Large Language Models. arXiv:2206.07682 [cs.CL]
  599. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
  600. Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359 (2021).
  601. Design Principles for Generative AI Applications. https://doi.org/10.1145/3613904.3642466 arXiv:2401.14484 [cs].
  602. DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models. In The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=OqTMUPuLuC
  603. Kyle Wiggers. 2021. AI datasets are prone to mismanagement, study finds. https://venturebeat.com/ai/ai-datasets-are-prone-to-mismanagement-study-finds/
  604. Language Models are Few-shot Multilingual Learners. https://doi.org/10.48550/arXiv.2109.07684 arXiv:2109.07684 [cs].
  605. Machine ethics: The design and governance of ethical AI and autonomous systems [scanning the issue]. Proc. IEEE 107, 3 (2019), 509–517.
  606. Refining Decompiled C Code with Large Language Models. arXiv:2310.06530 [cs.SE]
  607. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. arXiv:2203.05482 [cs.LG]
  608. Fast Distributed Inference Serving for Large Language Models. arXiv:2305.05920 [cs.LG]
  609. π𝜋\piitalic_π-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation. arXiv:2304.14381 [cs.CV]
  610. Learning to See Physics via Visual De-animation. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/4c56ff4ce4aaf9573aa5dff913df997a-Paper.pdf
  611. A brief overview of ChatGPT: The history, status quo and potential future development. IEEE/CAA Journal of Automatica Sinica 10, 5 (2023), 1122–1136.
  612. PromptChainer: Chaining Large Language Model Prompts through Visual Programming. https://doi.org/10.48550/arXiv.2203.06566 arXiv:2203.06566 [cs].
  613. AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing Machinery, New York, NY, USA, 1–22. https://doi.org/10.1145/3491102.3517582
  614. Do language models plan ahead for future tokens? arXiv preprint arXiv:2404.00859 (2024).
  615. ReFT: Representation Finetuning for Language Models. arXiv preprint arXiv:2404.03592 (2024).
  616. OS-Copilot: Towards Generalist Computer Agents with Self-Improvement. arXiv:2402.07456 [cs.AI]
  617. The rise and potential of large language model based agents: A survey. arXiv preprint arXiv:2309.07864 (2023).
  618. Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity. arXiv:2309.10285 [cs.DC]
  619. Training Trajectories of Language Models Across Scales. arXiv:2212.09803 [cs.CL]
  620. Language Models Meet World Models: Embodied Experiences Enhance Language Models. In Thirty-seventh Conference on Neural Information Processing Systems. https://openreview.net/forum?id=SVBR6xBaMl
  621. SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models. arXiv:2211.10438 [cs.CL]
  622. Efficient Streaming Language Models with Attention Sinks. arXiv:2309.17453 [cs.CL]
  623. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:2302.05128 (2023).
  624. Effective Long-Context Scaling of Foundation Models. arXiv:2309.16039 [cs.CL]
  625. Leashing the Inner Demons: Self-Detoxification for Language Models. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 11530–11537.
  626. Wizardlm: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244 (2023).
  627. A Survey on Game Playing Agents and Large Models: Methods, Applications, and Challenges. arXiv preprint arXiv:2403.10249 (2024).
  628. Roman V Yampolskiy. 2020. Artificial Intelligence Safety and Security. CRC Press.
  629. King-Yin Yan. 2022. AGI via Combining Logic with Deep Learning. In Artificial General Intelligence: 14th International Conference, AGI 2021, Palo Alto, CA, USA, October 15–18, 2021, Proceedings 14. Springer, 327–343.
  630. InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback. arXiv:2306.14898 [cs.CL]
  631. Diffusion models: A comprehensive survey of methods and applications. Comput. Surveys 56, 4 (2023), 1–39.
  632. Harnessing Biomedical Literature to Calibrate Clinicians’ Trust in AI Decision Support Systems. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. ACM, Hamburg Germany, 1–14. https://doi.org/10.1145/3544548.3581393
  633. Re-examining Whether, Why, and How Human-AI Interaction Is Uniquely Difficult to Design. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376301
  634. Gpt4tools: Teaching large language model to use tools via self-instruction. arXiv preprint arXiv:2305.18752 (2023).
  635. Gated linear attention transformers with hardware-efficient training. arXiv preprint arXiv:2312.06635 (2023).
  636. AppAgent: Multimodal Agents as Smartphone Users. arXiv preprint arXiv:2312.13771 (2023).
  637. Keep calm and explore: Language models for action generation in text-based games. arXiv preprint arXiv:2010.02903 (2020).
  638. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601 (2023).
  639. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629 (2022).
  640. A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. High-Confidence Computing (2024), 100211.
  641. mplug-owl: Modularization empowers large language models with multimodality. arXiv preprint arXiv:2304.14178 (2023).
  642. Anil Yemme and Shayan Srinivasa Garani. 2023. A Scalable GPT-2 Inference Hardware Architecture on FPGA. In 2023 International Joint Conference on Neural Networks (IJCNN). 1–8. https://doi.org/10.1109/IJCNN54540.2023.10191067
  643. A Survey on Multimodal Large Language Models. arXiv preprint arXiv:2306.13549 (2023).
  644. William York and Jerry Swan. 2012. Taking Turing Seriously.
  645. Orca: A Distributed Serving System for Transformer-Based Generative Models. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). USENIX Association, Carlsbad, CA, 521–538. https://www.usenix.org/conference/osdi22/presentation/yu
  646. Building ethics into artificial intelligence. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 5527–5533.
  647. Language to rewards for robotic skill synthesis. arXiv preprint arXiv:2306.08647 (2023).
  648. Decentralized Training of Foundation Models in Heterogeneous Environments. arXiv:2206.01288 [cs.DC]
  649. Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi. arXiv preprint arXiv:2311.16502 (2023).
  650. Lotfi A Zadeh. 1996. Fuzzy logic, neural networks, and soft computing. In Fuzzy sets, fuzzy logic, and fuzzy systems: selected papers by Lotfi A Zadeh. World Scientific, 775–782.
  651. Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Association for Computing Machinery, New York, NY, USA, 1–21. https://doi.org/10.1145/3544548.3581388
  652. ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs. arXiv:2210.03052 [cs.LG]
  653. UFO: A UI-Focused Agent for Windows OS Interaction. arXiv preprint arXiv:2402.07939 (2024).
  654. A Simple LLM Framework for Long-Range Video Question-Answering. arXiv preprint arXiv:2312.17235 (2023).
  655. Building cooperative embodied agents modularly with large language models. arXiv preprint arXiv:2307.02485 (2023).
  656. Video-llama: An instruction-tuned audio-visual language model for video understanding. arXiv preprint arXiv:2306.02858 (2023).
  657. Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding. arXiv:2309.08168 [cs.CL]
  658. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3836–3847.
  659. What if the tv was off? examining counterfactual reasoning abilities of multi-modal language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4629–4633.
  660. AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning. arXiv:2303.10512 [cs.CL]
  661. LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention. arXiv:2303.16199 [cs.CV]
  662. Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199 (2023).
  663. NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities. In 7th Annual Conference on Robot Learning. https://openreview.net/forum?id=eyykI3UIHa
  664. OPT: Open Pre-trained Transformer Language Models. arXiv:2205.01068 [cs.CL]
  665. Rethinking Human-AI Collaboration in Complex Medical Decision Making: A Case Study in Sepsis Diagnosis. https://doi.org/10.1145/3613904.3642343 arXiv:2309.12368 [cs].
  666. From dark matter to galaxies with convolutional networks. Proceedings of the National Academy of Sciences 116, 28 (2019), 13825–13832.
  667. Privacyasst: Safeguarding user privacy in tool-using large language model agents. IEEE Transactions on Dependable and Secure Computing (2024).
  668. MetaSim: Learning to Generate Synthetic Datasets. arXiv preprint arXiv:2302.03213 (2023).
  669. Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (FAT* ’20). Association for Computing Machinery, New York, NY, USA, 295–305. https://doi.org/10.1145/3351095.3372852
  670. A Survey on the Memory Mechanism of Large Language Model based Agents. arXiv preprint arXiv:2404.13501 (2024).
  671. H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models. arXiv:2306.14048 [cs.LG]
  672. Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493 (2022).
  673. Expel: Llm agents are experiential learners. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 19632–19642.
  674. Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning. arXiv preprint arXiv:2312.11420 (2023).
  675. GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection. arXiv preprint arXiv:2403.03507 (2024).
  676. Slic-hf: Sequence likelihood calibration with human feedback. arXiv preprint arXiv:2305.10425 (2023).
  677. On evaluating adversarial robustness of large vision-language models. arXiv preprint arXiv:2305.16934 (2023).
  678. Assessing and Understanding Creativity in Large Language Models. arXiv preprint arXiv:2401.12491 (2024).
  679. Progressive-hint prompting improves reasoning in large language models. arXiv preprint arXiv:2304.09797 (2023).
  680. Minigpt-5: Interleaved vision-and-language generation via generative vokens. arXiv preprint arXiv:2310.02239 (2023).
  681. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems 36 (2024).
  682. Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning. arXiv:2201.12023 [cs.LG]
  683. Can chatgpt understand too? a comparative study on chatgpt and fine-tuned bert. arXiv preprint arXiv:2302.10198 (2023).
  684. Agieval: A human-centric benchmark for evaluating foundation models. arXiv preprint arXiv:2304.06364 (2023).
  685. Lima: Less is more for alignment. Advances in Neural Information Processing Systems 36 (2024).
  686. Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625 (2022).
  687. Jian Zhou and Olga G Troyanskaya. 2015. Predicting effects of noncoding variants with deep learning–based sequence model. Nature methods 12, 10 (2015), 931–934.
  688. Navigating the grey area: Expressions of overconfidence and uncertainty in language models. arXiv preprint arXiv:2302.13439 (2023).
  689. WebArena: A Realistic Web Environment for Building Autonomous Agents. In The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=oKn9c6ytLx
  690. Agents: An open-source framework for autonomous language agents. arXiv preprint arXiv:2309.07870 (2023).
  691. Mixture-of-Experts with Expert Choice Routing. arXiv:2202.09368 [cs.LG]
  692. Large Language Models Are Human-Level Prompt Engineers. arXiv:2211.01910 [cs.LG]
  693. Principled Reinforcement Learning with Human Feedback from Pairwise or K𝐾Kitalic_K-wise Comparisons. arXiv preprint arXiv:2301.11270 (2023).
  694. LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment. arXiv preprint arXiv:2310.01852 (2023).
  695. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592 (2023).
  696. ToolQA: A Dataset for LLM Question Answering with External Tools. arXiv preprint arXiv:2306.13304 (2023).
  697. Mindstorms in natural language-based societies of mind. arXiv preprint arXiv:2305.17066 (2023).
  698. Maximum entropy inverse reinforcement learning.. In Aaai, Vol. 8. Chicago, IL, USA, 1433–1438.
  699. Auto-pytorch tabular: Multi-fidelity metalearning for efficient and robust autodl. arXiv preprint arXiv:2006.13799 (2020).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Tao Feng (152 papers)
  2. Chuanyang Jin (9 papers)
  3. Jingyu Liu (53 papers)
  4. Kunlun Zhu (12 papers)
  5. Haoqin Tu (25 papers)
  6. Zirui Cheng (6 papers)
  7. Guanyu Lin (9 papers)
  8. Jiaxuan You (50 papers)
Citations (12)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub

Youtube Logo Streamline Icon: https://streamlinehq.com

HackerNews

  1. How Far Are We from AGI (13 points, 7 comments)
  2. How Far Are We from AGI (5 points, 4 comments)
Reddit Logo Streamline Icon: https://streamlinehq.com

Reddit

  1. How Far Are We From AGI? (23 points, 48 comments)