- The paper introduces NFT1000, the first NFT-centric visual-text dataset designed for effective retrieval of high-similarity NFTs.
- It outlines a novel retrieval task and benchmarks various CLIP models to establish baseline performance metrics.
- The study presents the Comprehensive Variance Index, a robust metric that closely correlates with retrieval difficulty in NFT datasets.
Overview of "NFT1000: A VISUAL-TEXT DATASET FOR NON-FUNGIBLE TOKEN RETRIEVAL"
The paper introduces the NFT1000 dataset, a significant contribution to the intersection of blockchain and computer vision, specifically focused on the retrieval of Non-Fungible Tokens (NFTs). With the proliferation of NFTs in the context of "Metaverse" and "Web3.0", the need for efficient retrieval methods has escalated. By November 2023, more than 1.4 billion NFT tokens have been minted, presenting a formidable challenge for both academia and industry in terms of efficient and precise retrieval amidst high regional and semantic similarity among these tokens.
NFT1000 Dataset
The NFT1000 dataset encompasses 7.56 million image-text pairs, derived from the top 1000 NFT collections by sales volume on the Ethereum blockchain. Each collection represents an NFT project compliant with the ERC-721 standard, averaging 6600 image-text pairs per collection. This results in a total data volume of 1.75TB, suitable for various downstream tasks like retrieval, generation, and visual question answering in the NFT domain.
Contributions
The primary contributions of the paper are as follows:
- Construction of the NFT1000 Dataset: This is the first NFT-centric visual-text dataset in the computer vision domain.
- Introduction of a Retrieval Task: The paper proposes a task that focuses on the retrieval of high-similarity image-text pairs, relevant in the context of large-scale NFT datasets.
- Benchmark Testing: The authors evaluate several CLIP (Contrastive Language-Image Pretraining) models to provide baseline performance metrics.
- Comprehensive Variance Index (CVI): The development of CVI offers a robust metric to assess the similarity and retrieval difficulty of visual-text pairs.
Data Characteristics and Processing
The inherent structure of NFTs involves metadata files that describe the attributes of each token. This dataset standardizes these attributes into image-caption pairs, facilitating machine learning applications. In terms of preprocessing, static images are converted to PNG format while dynamic media like GIFs and MP4s are represented by a single frame.
To avoid data leakage during model training and testing, the dataset is divided into training, validation, and test sets based on entire NFT projects rather than random image samples.
Experimental Evaluation
The NFT1000 dataset's validity is assessed using zero-shot inference and fine-tuning on various CLIP models, including OpenAI's CLIP-ViT variations, Meta's META-CLIP, and BAAI's EVA-CLIP02. The experiments demonstrate the dataset's uniqueness and its distinct distribution compared to the training data of these models.
Comprehensive Variance Index (CVI)
The Comprehensive Variance Index is proposed as a metric for evaluating the similarity within batches of image-text pairs. The CVI is based on the variance of the cosine similarity distributions of feature vectors, considering both intra-modal (image-image, text-text) and inter-modal (image-text) similarities. Empirical results show a high correlation between CVI and retrieval difficulty, validating its effectiveness.
Implications and Future Work
The NFT1000 dataset and the associated tasks set a new benchmark for cross-modal retrieval in the burgeoning field of NFTs. Practically, this dataset can enhance AI-driven search and retrieval systems in blockchain environments. Theoretically, it bridges a gap in the computer vision domain by presenting a unique challenge of high-similarity data.
Future work includes:
- Data Optimization: Removing redundant data to enhance model efficiency and generalization.
- Dataset Expansion: Extending beyond Ethereum to include NFTs from other blockchains like Solana and Polygon, aiming to construct a dataset with hundreds of millions of pairs.
- Generative Models: Exploring the generative potential aligned with the NFT1000 dataset to create diverse NFT artworks.
Conclusion
The construction of the NFT1000 dataset marks a significant development in NFT retrieval research. By addressing the high-similarity challenge inherent to NFTs, the authors provide a dataset that is poised to advance the capabilities of AI in the blockchain domain. The introduction of the Comprehensive Variance Index further broadens the methodological toolkit available for cross-modal retrieval tasks. Future efforts will focus on expanding and refining this dataset to maintain its relevance and utility in ongoing research and industry applications.