WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning (2103.01913v2)

Published 2 Mar 2021 in cs.CV, cs.CL, and cs.IR

Abstract: The milestone improvements brought about by deep representation learning and pre-training techniques have led to large performance gains across downstream NLP, IR and Vision tasks. Multimodal modeling techniques aim to leverage large high-quality visio-linguistic datasets for learning complementary information (across image and text modalities). In this paper, we introduce the Wikipedia-based Image Text (WIT) Dataset (https://github.com/google-research-datasets/wit) to better facilitate multimodal, multilingual learning. WIT is composed of a curated set of 37.6 million entity rich image-text examples with 11.5 million unique images across 108 Wikipedia languages. Its size enables WIT to be used as a pretraining dataset for multimodal models, as we show when applied to downstream tasks such as image-text retrieval. WIT has four main and unique advantages. First, WIT is the largest multimodal dataset by the number of image-text examples by 3x (at the time of writing). Second, WIT is massively multilingual (first of its kind) with coverage over 100+ languages (each of which has at least 12K examples) and provides cross-lingual texts for many images. Third, WIT represents a more diverse set of concepts and real world entities relative to what previous datasets cover. Lastly, WIT provides a very challenging real-world test set, as we empirically illustrate using an image-text retrieval task as an example.

Authors (5)

Krishna Srinivasan (14 papers)
Karthik Raman (26 papers)
Jiecao Chen (23 papers)
Michael Bendersky (63 papers)
Marc Najork (27 papers)

Citations (258)

View on Semantic Scholar

Summary

Overview of "The Name of the Title is Hope"

This paper serves primarily as a detailed guide to the "acmart" document class, utilized for preparing publications in ACM's conferences and journals. It is intended to provide a comprehensive understanding of the variations and formatting elements that authors may employ during the preparation of their scholarly articles.

Key Features and Contributions

The introduction of ACM's consolidated article template in 2017 marks a significant development in standardizing the \LaTeX\ style across various publications. This paper elucidates the functionality and versatility of the "acmart" document class, which is adaptable for a wide range of publication types including conference papers, journal articles, and other ACM formats.

Template Styles: The authors discuss multiple template styles, such as acmsmall, acmlarge, and acmtog, which cater to different journals and conference proceedings. This flexibility is crucial for authors targeting specific ACM publications.
Template Parameters: By delineating the parameters like anonymous,review, and authorversion, the paper provides clear guidance on how to adjust the document for various stages of publication, ensuring proper formatting for double-blind reviews or author versions suitable for online postings.
Typefaces and Modifications: Emphasizing consistency, the paper requires the use of the "Libertine" typeface family and strictly prohibits modifications that could affect the document’s appearance, reinforcing a unified aesthetic across ACM publications.
Metadata and Accessibility: The template includes accessibility and metadata features, ensuring that the documents are fully compatible with digital library requirements and worldwide accessibility standards.

Practical and Theoretical Implications

The practical implication of this research lies in its contribution to a smoother publishing process for ACM authors. By adopting this standardized template, authors can allocate their resources more effectively, focusing on content quality instead of formatting challenges.

Theoretically, this template may serve as a model for other organizations and publishers aiming to streamline their publication processes. Its ability to consolidate various templates into a singular, versatile option could inspire similar initiatives in different fields.

Future Developments

Given the perpetual evolution of digital publication standards, future enhancements may focus on further accessibility options, integration with evolving digital identifiers, and adaptive layouts suited for emerging publication media. Ongoing feedback from users might drive refinements to the template's functionality and user-friendliness.

Conclusion

While not positioning as revolutionary, this document efficiently addresses the complexities associated with academic publishing through a standardized, versatile \LaTeX\ template. By facilitating a consistent approach across ACM publications, it mitigates the formatting challenges often faced by researchers, allowing them to concentrate on the substantive elements of their scholarly contributions.

PDF Markdown