Large (Vision) Language Models are Unsupervised In-Context Learners (2504.02349v1)

Published 3 Apr 2025 in cs.LG

Abstract: Recent advances in large language and vision-LLMs have enabled zero-shot inference, allowing models to solve new tasks without task-specific training. Various adaptation techniques such as prompt engineering, In-Context Learning (ICL), and supervised fine-tuning can further enhance the model's performance on a downstream task, but they require substantial manual effort to construct effective prompts or labeled examples. In this work, we introduce a joint inference framework for fully unsupervised adaptation, eliminating the need for manual prompt engineering and labeled examples. Unlike zero-shot inference, which makes independent predictions, the joint inference makes predictions simultaneously for all inputs in a given task. Since direct joint inference involves computationally expensive optimization, we develop efficient approximation techniques, leading to two unsupervised adaptation methods: unsupervised fine-tuning and unsupervised ICL. We demonstrate the effectiveness of our methods across diverse tasks and models, including language-only Llama-3.1 on natural language processing tasks, reasoning-oriented Qwen2.5-Math on grade school math problems, vision-language OpenFlamingo on vision tasks, and the API-only access GPT-4o model on massive multi-discipline tasks. Our experiments demonstrate substantial improvements over the standard zero-shot approach, including 39% absolute improvement on the challenging GSM8K math reasoning dataset. Remarkably, despite being fully unsupervised, our framework often performs on par with supervised approaches that rely on ground truth labels.

PDF Abstract

The provided document is a LaTeX template and set of formatting instructions for submitting papers to the International Conference on Learning Representations (ICLR) 2025. It outlines the required style and layout for submissions, rather than presenting research findings on a specific topic like "Large (Vision) LLMs are Unsupervised In-Context Learners" (which is present but commented out in the LaTeX source).

The core content of the document provides detailed guidance on various formatting aspects to ensure submitted papers meet the conference requirements. Key instructions include:

Document Class and Style Files: Authors must use the iclr2025_conference.sty and iclr2025_conference.bst files.
Page Dimensions and Margins: Specifies a text area of 5.5 inches wide and 9 inches long, with a 1.5-inch left margin.
Font and Spacing: Requires 10-point type with 11-point vertical spacing. Paragraphs are separated by 1/2 line space with no indentation.
Titles and Authors: Paper title should be 17-point, small caps, left-aligned. Author names are in boldface, placed above their addresses, with the lead author listed first.
Abstract: The abstract must be a single paragraph, indented 1/2 inch on both sides, using 10-point type with 11-point spacing. The heading "Abstract" must be centered and in small caps.
Headings: Defines formatting for first, second, and third-level headings (small caps, flush left, specific point sizes and spacing).
Page Limit: A strict upper limit of 10 pages is set for the main text of the initial submission, with unlimited pages for citations.
Citations: Recommends using the natbib package for in-text citations, distinguishing between author-in-text citations (\citet{}) and parenthetical citations (\citep{}). References should be listed alphabetically in the References section.
Footnotes: Footnotes are indicated by a number and placed at the bottom of the page, separated by a horizontal rule.
Figures and Tables: Provides guidelines for including figures and tables, emphasizing neatness, legibility, and correct placement of numbers and captions (after figures, before tables). Figures and captions should not be split across pages. Using \includegraphics with width specified relative to \linewidth is recommended.
Notation: Includes a section suggesting the use of standardized mathematical notation, referencing the notation from the book "Deep Learning". Provides tables for common notation regarding numbers, arrays, sets, graphs, indexing, calculus, probability, information theory, and functions.
File Preparation: Instructions are given for generating PostScript or PDF files, recommending pdflatex and specifying the US Letter paper size.
Optional Sections: Mentions optional sections for "Author Contributions" and "Acknowledgments" (using unnumbered third-level headings).
Bibliography: Includes example bibliography entries in BibTeX format, which list various papers related to LLMs, vision-LLMs, prompt tuning, reinforcement learning, and datasets. These entries serve as examples of how to format references but are not discussed in the text beyond their listing.

In summary, the document is a technical guide focused on the logistical aspects of preparing a research paper for submission to ICLR 2025 using LaTeX, ensuring consistency and adherence to conference standards.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Artyom Gadetsky (7 papers)
Andrei Atanov (12 papers)
Yulun Jiang (3 papers)
Zhitong Gao (9 papers)
Ghazal Hosseini Mighan (2 papers)
Amir Zamir (28 papers)
Maria Brbic (11 papers)

Related Papers

Find Related Papers

Tweets

https://twitter.com/gm8xx8/status/1908290370074505657

https://twitter.com/TheTuringPost/status/1913282525218447574