Affordance Diffusion: Synthesizing Hand-Object Interactions (2303.12538v3)

Published 21 Mar 2023 in cs.CV and cs.RO

Abstract: Recent successes in image synthesis are powered by large-scale diffusion models. However, most methods are currently limited to either text- or image-conditioned generation for synthesizing an entire image, texture transfer or inserting objects into a user-specified region. In contrast, in this work we focus on synthesizing complex interactions (ie, an articulated hand) with a given object. Given an RGB image of an object, we aim to hallucinate plausible images of a human hand interacting with it. We propose a two-step generative approach: a LayoutNet that samples an articulation-agnostic hand-object-interaction layout, and a ContentNet that synthesizes images of a hand grasping the object given the predicted layout. Both are built on top of a large-scale pretrained diffusion model to make use of its latent representation. Compared to baselines, the proposed method is shown to generalize better to novel objects and perform surprisingly well on out-of-distribution in-the-wild scenes of portable-sized objects. The resulting system allows us to predict descriptive affordance information, such as hand articulation and approaching orientation. Project page: https://judyye.github.io/affordiffusion-www

PDF Abstract

Overview of CVPR Proceedings Author Guidelines

The document titled "LaTeX Author Guidelines for CVPR Proceedings" serves as a comprehensive guide for authors submitting papers to the Computer Vision and Pattern Recognition (CVPR) conference. This guide is structured to ensure uniformity and high standards in the presentation of manuscripts, reflecting the norms and expectations of the conference series.

Manuscript Submission Protocols

The document explicates the necessary steps for manuscript submission, emphasizing adherence to these guidelines to avoid rejection. Noteworthy changes include the discontinuation of older practices—such as using sticky tape for figures—but maintaining essential technical compliance rules.

Document Format and Structure

The guide stipulates a rigid structure for the manuscript, which includes:

Language: All manuscripts must be in English.
Length: Papers, excluding references, must not exceed eight pages. Overlength papers will not be reviewed.
Ruler: A ruler must be included to aid reviewers in referencing specific lines.
Mathematics: Proper numbering and referencing of equations are mandated.
Blind Review Practices: Authors should anonymize references to their own prior work to maintain a blind peer-review process.

Technical Formatting Details

The guidelines delineate the precise formatting requirements for the manuscript's visual presentation. The text must be in a two-column format with specified column width and inter-column spacing. This includes directives for margins, page numbering, type style, and font usage to maintain readability and consistency with CVPR standards.

Figures, Tables, and References

Authors are advised on proper placement and formatting of figures and tables, ensuring clarity and relevance within the text. Additionally, cross-referencing commands facilitate seamless integration of references, maintaining academic integrity and coherence throughout the document. References should be formatted in a prescribed manner for clarity and uniformity.

Theoretical and Practical Implications

While the guidelines are primarily technical, they implicitly emphasize the need for precision, clarity, and consistency in scientific communication. Adherence to these standards ensures that submissions meet the high-quality benchmarks expected by the CVPR community, facilitating the dissemination of transparent and reproducible research.

Future Directions in AI and CVPR Submissions

While the document itself does not discuss AI advancements directly, compliance with these guidelines ensures that cutting-edge AI research is presented effectively. As AI models continue to evolve, the interplay between technological advances and academic presentation standards will necessitate periodic updates to these guidelines to accommodate new forms of data representation and scientific inquiry.

This document thus serves as both a roadmap for current authors and a baseline for future enhancements in scientific reporting within the field of computer vision and pattern recognition.

PDF Markdown Bookmark Chat (Pro)

Authors (8)

Yufei Ye (16 papers)
Xueting Li (32 papers)
Abhinav Gupta (178 papers)
Shalini De Mello (45 papers)
Stan Birchfield (64 papers)
Jiaming Song (78 papers)
Shubham Tulsiani (71 papers)
Sifei Liu (64 papers)

Citations (59)

View on Semantic Scholar

Related Papers

Find Related Papers