A Comparative Investigation of Compositional Syntax and Semantics in DALL-E 2 (2403.12294v1)

Published 18 Mar 2024 in cs.CL

Abstract: In this study we compared how well DALL-E 2 visually represented the meaning of linguistic prompts also given to young children in comprehension tests. Sentences representing fundamental components of grammatical knowledge were selected from assessment tests used with several hundred English-speaking children aged 2-7 years for whom we had collected original item-level data. DALL-E 2 was given these prompts five times to generate 20 cartoons per item, for 9 adult judges to score. Results revealed no conditions in which DALL-E 2-generated images that matched the semantic accuracy of children, even at the youngest age (2 years). DALL-E 2 failed to assign the appropriate roles in reversible forms; it failed on negation despite an easier contrastive prompt than the children received; it often assigned the adjective to the wrong noun; it ignored implicit agents in passives. This work points to a clear absence of compositional sentence representations for DALL-E 2.

PDF Abstract

Comparative Investigation of Compositional Syntax and Semantics in DALL·E 2

Introduction

Recent advancements in text-to-image models such as DALL·E 2 have sparked significant interest for their ability to generate images from textual descriptions. Though these models demonstrate an impressive capacity for realistic image synthesis, their understanding of complex linguistic prompts remains questionable. This paper aims to evaluate the syntactic and semantic comprehension of DALL·E 2 in comparison to human children, focusing on core compositional syntax elements that are vital for language understanding.

Methods

The methodology involved presenting DALL·E 2 with sentences that test foundational grammar aspects. These sentences, derived from comprehension tests for English-speaking children aged 2–7 years, covered various grammatical constructions. To assess DALL·E 2's interpretations, each sentence prompt was used to generate 20 images, subsequently rated by nine adult judges for semantic accuracy. The evaluation criteria centered on DALL·E 2’s ability to depict reversible transitive verbs, negation, prepositions, embedded adjectives, and passive voice constructions.

Results

The outcomes starkly demonstrate DALL·E 2’s inefficacies in processing compositional syntax and semantics. Across all tested grammatical structures, not a single instance showed DALL·E 2 matching the comprehension level of human children, even those as young as two years. Specifically, DALL·E 2 struggled with:

Correctly depicting reversible actions and prepositional phrases
Handling negation, often misplacing adjectives among nouns
Ignoring implicit agents in passive voice constructions

These results underline a fundamental gap in DALL·E 2’s capability to construct linguistically coherent images, suggesting an absence of a robust compositional sentence representation mechanism within the model.

Discussion

The findings reinforce prior skepticism regarding the syntactic understanding of AI models like DALL·E 2. This comparison with children's understanding underscores a crucial limitation: while human learners rapidly acquire and apply grammatical knowledge to understand and produce language, DALL·E 2 exhibits a lack of comprehension of basic grammar principles necessary for accurate language interpretation. These deficiencies spotlight the importance of advancing models' linguistic capabilities beyond mere keyword recognition towards a deeper syntactic and semantic analysis. Incorporating grammatical competence into AI, perhaps through neurosymbolic approaches or enhancing models with syntactic inductive biases, appears to be a promising pathway for future research.

Furthermore, the paper's approach emphasizes the potential of using child language development benchmarks for assessing and guiding the progress of AI models in linguistic tasks. Such comparisons not only provide tangible goals for AI advancements but also offer insights into the complex nature of human language acquisition and processing.

Conclusion

This comparative investigation reveals significant limitations in DALL·E 2’s handling of compositional syntax and semantics, highlighting a gap between AI and human language comprehension. The results suggest a direction for future research: the integration of more sophisticated grammar-aware mechanisms within AI models. Advancing AI’s capability to understand and generate language in a human-like manner will require not only larger datasets or more computing power but a fundamental rethinking of how compositional semantics are represented and processed.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Elliot Murphy (11 papers)
Jill de Villiers (1 paper)
Sofia Lucero Morales (1 paper)

Citations (3)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/ElliotMurphy91/status/1770354255737520280

https://twitter.com/fly51fly/status/1770571903079776516