Papers
Topics
Authors
Recent
Search
2000 character limit reached

OnomaCompass: Bidirectional Texture Exploration

Updated 15 January 2026
  • OnomaCompass is an interactive web system that maps invented onomatopoeic terms to generated texture images, enabling cross-modal material exploration.
  • It employs dual latent spaces with UMAP-based visualizations to overcome the vocabulary problem and stimulate creative, multisensory design ideation.
  • Empirical evaluations indicate reduced mental demand and increased serendipitous discovery compared to linear prompt-based workflows.

OnomaCompass is a web-based interactive system for material texture exploration that operationalizes the bidirectional mapping between human sound-symbolic onomatopoeia and visual representations of texture. The platform is designed to address the cognitive bottleneck known as the "vocabulary problem" in texture ideation: articulating nuanced, multisensory texture impressions in discrete language often constrains divergent thinking and serendipitous discovery. OnomaCompass introduces dual, coordinated latent spaces—one for invented onomatopoeic terms and one for generated texture images—enabling users to shuttle between linguistic and visual conceptualizations in early-stage material design (Okamura et al., 8 Jan 2026).

1. Motivation and Conceptual Framework

Material textures elicit somatic impressions challenging to verbalize, particularly for non-expert designers. The conventional prompt-based generative AI workflow instantiates a one-directional translation (“prompt → image”), introducing a "1" that hinders early ideation and restricts exploration to what can be readily described in natural language. OnomaCompass replaces this with a bidirectional exploration interface, allowing users to traverse and interconnect two continuous latent spaces:

  • Texture latent space: Visual representations of macro-texture
  • Onomatopoeia latent space: Sound-symbolic (mimetic) terms

This coordination is intended to scaffold externalization of vague sensory expectations, operationalize comparison and combination of examples, and facilitate conceptual reconceptualization through linguistic cues. The approach leverages sound symbolism—where phonetic properties evoke sensory associations—as a lightweight, intuitive handle for Kansei-driven (“affective”) design ideation beyond the constraints of standard prompt-centric pipelines (Okamura et al., 8 Jan 2026).

2. Dataset Construction

The core dataset is structured to guarantee deterministic cross-modal correspondence:

  • Invented onomatopoeia: 235 mimetic terms authored to exceed everyday lexical range and provoke imaginative associations.
  • Generation pipeline:
  1. LLM Staging: For each term, OpenAI “o1” transforms onomatopoeia into an implied material, then physical qualities, then an English description, which is adopted as the final Stable Diffusion prompt.
  2. Image Synthesis: Stable Diffusion v1.5 (64×64 px) generates up to three macro-texture images per description.
  3. Outcome: 676 unique images, reflecting cases where certain terms did not yield three images due to generation constraints.

The English description text is used uniformly (a) as the Stable Diffusion prompt, (b) for text embedding to construct the onomatopoeia map, and (c) as the deterministic cross-modal index linking each onomatopoeia to its corresponding generated textures (Okamura et al., 8 Jan 2026).

3. Latent-Space Encoding and Cross-Modal Mechanics

Let W={w1,,wn}W = \{w_1, \ldots, w_n\} denote the set of n=235n = 235 onomatopoeic terms, and X={x1,,xm}X = \{x_1, \ldots, x_m\} the m=676m = 676 texture images.

Embedding Functions:

  • fword:WR1536f_{word} : W \rightarrow \mathbb{R}^{1536}, implemented as ti=fword(wi)=text_embed3-small(prompt_text(wi))R1536t_i = f_{word}(w_i) = \text{text\_embed}_{3\text{-}small}(\text{prompt\_text}(w_i)) \in \mathbb{R}^{1536}
  • fimg:XR512f_{img} : X \rightarrow \mathbb{R}^{512}, implemented as vj=fimg(xj)=CLIPViTB/32(xj)R512v_j = f_{img}(x_j) = \text{CLIP}_{ViT-B/32}(x_j) \in \mathbb{R}^{512}

Dimensionality Reduction (for map visualization):

  • UMAP with hyperparameters (nneighbors=15,min_dist=0.5,metric=cosine)(n_{neighbors}=15, min\_dist=0.5, metric=cosine)
    • UMAPword:R1536R2,cword,i=UMAPword(ti)UMAP_{word} : \mathbb{R}^{1536} \rightarrow \mathbb{R}^2, \quad c_{word,i} = UMAP_{word}(t_i)
    • UMAPimg:R512R2,cimg,j=UMAPimg(vj)UMAP_{img} : \mathbb{R}^{512} \rightarrow \mathbb{R}^2, \quad c_{img,j} = UMAP_{img}(v_j)

Similarity Metrics (for navigation/highlighting):

  • cosine_similarity(u,v)=uvuv\text{cosine\_similarity}(u, v) = \frac{u \cdot v}{\|u\| \|v\|}
  • dcos(u,v)=1cosine_similarity(u,v)d_{cos}(u, v) = 1 - \text{cosine\_similarity}(u, v)

Cross-Modal Highlighting is driven by a deterministic authoring-time relation:

R={(wi,xj)xj was generated from prompt_text(wi)}R = \{(w_i, x_j) \mid x_j \text{ was generated from } \text{prompt\_text}(w_i)\}

Selecting wiw_i highlights H(word)={xj(wi,xj)R}H(word) = \{x_j \mid (w_i, x_j) \in R\}; selecting xjx_j highlights H(img)={wi(wi,xj)R}H(img) = \{w_i \mid (w_i, x_j) \in R\} (Okamura et al., 8 Jan 2026).

4. User Interface, Interactions, and Emergent Exploration Loops

The user interface presents dual, coordinated UMAP scatterplots:

  • Left: Texture-image sprites in 2D, visualized in a navigable 3D scene.
  • Right: Scatterplot of onomatopoeic terms, in parallel style.

Navigation utilizes Three.js/OrbitControls (rotation, zoom, pan). Clicking on an element auto-focuses and shuttles to its paired location in the alternate map.

Cross-Modal Highlighting: Selecting a term dims unrelated textures and emphasizes linked images. Conversely, selecting an image emphasizes its source onomatopoeia.

Gallery Curation: Users can star or drag images into a gallery panel for subsequent operations.

Video Interpolation & Re-Embedding (Emergent Loop):

  1. Users select two gallery textures xAx_A, xBx_B.
  2. Luma AI Ray 1.6 synthesizes a transition video ABA \rightarrow B.
  3. An intermediate frame fkf_k is extracted.
  4. Re-embedding pipeline:
    • Gemini 2.0 VLM proposes a new onomatopoeia w^k\hat{w}_k for fkf_k.
    • OpenAI o1 expands w^k\hat{w}_k to a detailed description dkd_k.
    • fimg(fk)=vkR512f_{img}(f_k) = v_k \in \mathbb{R}^{512}; fword(dk)=tkR1536f_{word}(d_k) = t_k \in \mathbb{R}^{1536}.
    • cimg,k=UMAPimg(vk)c_{img,k} = UMAP_{img}(v_k); cword,k=UMAPword(tk)c_{word,k} = UMAP_{word}(t_k).
  5. New points are appended (in orange), operationalizing an "exploration loop" that iteratively expands the design space (Okamura et al., 8 Jan 2026).

5. Integration with Generative and Image Editing Models

Textures curated in the gallery can be previewed on product photographs (e.g., vases or headphones). Using Gemini 2.5 Flash Image, a text-guided inpainting/composition pipeline inserts the macro-texture into a target region, preserving appearance and implied surface cues. This enables domain experts and novices to rapidly assess how a candidate texture might manifest on an artifact, supporting early-stage material speculation and ideation (Okamura et al., 8 Jan 2026).

6. Empirical Evaluation

A within-subjects study with N=11N=11 native Japanese students compared OnomaCompass (OC) with a prompt-driven Gemini 2.5 Flash Image workflow (Nano Banana, NB).

Metric OnomaCompass (OC) Nano Banana (NB) Statistical Result
NASA-TLX Overall 34.46 (SD 24.23) 45.55 (SD 25.56) t(10)=2.686,p=0.023,g=0.74t(10)=-2.686, p=0.023, g=-0.74
Mental Demand (NASA-TLX) 38.36 58.00 p=0.015,g=0.80p=0.015, g=-0.80
Effort (NASA-TLX) 25.09 49.27 p=0.047,g=0.63p=0.047, g=-0.63
Frustration (NASA-TLX) 23.46 56.82 p=0.001,g=1.22p=0.001, g=-1.22
UEQ Hedonic Quality 2.000 -0.091 t(10)=4.416,p=0.001,g=1.22t(10)=4.416, p=0.001, g=1.22
UEQ Overall Quality 1.578 0.694 t(10)=2.847,p=0.017,g=0.78t(10)=2.847, p=0.017, g=0.78
SUS (Usability) 67.73 75.68 t(10)=2.316,p=0.043,g=0.64t(10)=-2.316, p=0.043, g=-0.64

Selected items from a 12-item creativity/exploration questionnaire were also significant in favor of OC (e.g., “explore variations” $6.73$ vs. $4.00$, “creative/fun” $6.27$ vs. $4.82$).

Qualitative Findings:

  • Participants described serendipitous browsing akin to navigating a vast wallpaper catalog, in contrast to the linearity of chat-based prompt workflows.
  • Onomatopoeia functioned as cues for reinterpretation, expanding rather than concretizing linguistic meaning.
  • The emergent loop (video interpolation and re-embedding) was likened to “breeding” new materials, suggestive of generative recombination frameworks.
  • Usability challenges surfaced around 3D navigation unfamiliarity, map semantics, and selection operations (Okamura et al., 8 Jan 2026).

7. Implications and Future Directions

OnomaCompass demonstrates reduction of the linguistic articulation barrier in texture ideation by exposing a spatial, cross-modal interface underpinned by sound-symbolic cues. This enables novices to pursue divergent thinking paths and facilitates serendipitous material discovery. Prompt-based workflows remain superior for convergent refinement and precise control.

Sound symbolism is validated as an effective heuristic anchor for Kansei-driven material exploration, especially when opaque or ineffable qualities are desired. The explicit support for "in-between" textures and bidirectional navigation fosters robust mental map construction for subsequent creative processes.

Identified avenues for enhancement include redesigning for 2D/2.5D navigation to lower spatial interaction burdens, improved communication of latent-space structuring, and deeper integration between divergent (map-based) and convergent (prompt-editing) ideation workflows into a unified design environment (Okamura et al., 8 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to OnomaCompass.