Dimensionality Reduced Clustered Data and Order Partition and Stepwise Dimensionality Increasing Indices (2401.02858v1)
Abstract: One of the goals of NASA funded project at IBM T. J. Watson Research Center was to build an index for similarity searching satellite images, which were characterized by high-dimensional feature image texture vectors. Reviewed is our effort on data clustering, dimensionality reduction via Singular Value Decomposition - SVD and indexing to build a smaller index and more efficient k-Nearest Neighbor - k-NN query processing for similarity search. k-NN queries based on scanning of the feature vectors of all images is obviously too costly for ever-increasing number of images. The ubiquitous multidimensional R-tree index and its extensions were not an option given their limited scalability dimension-wise. The cost of processing k-NN queries was further reduced by building memory resident Ordered Partition indices on dimensionality reduced clusters. Further research in a university setting included the following: (1) Clustered SVD was extended to yield exact k-NN queries by issuing appropriate less costly range queries, (2) Stepwise Dimensionality Increasing - SDI index outperformed other known indices, (3) selection of optimal number of dimensions to reduce query processing cost, (4) two methods to make the OP-trees persistent and loadable as a single file access.
- V. Castelli and L. D. Bergman. Image Databases: Search and Retrieval of Digital Imagery. Wiley Interscience, 2002.
- V. Castelli. Multidimensional indexing structures for content-based retrieval. Chapter 14 in [11].
- C. Faloutsos. Searching Multimedia Databases by Content (Advances in Database Systems). KAP/Springer 1996/2012.
- V. Gaede and O. Gunther: Multidimensional access methods. ACM Computing Surveys 30(2): 170-231 (1998).
- G. H. Golub and C. F. Van Loan Matrix Computations, 4th ed. John Hopkins Univ. 2012.
- I.T. Jolliffe. Principal Component Analysis. Springer Verlag, 1986
- J. Kogan. Introduction to Clustering Large and High-Dimensional Data. Cambridge Univ. Press, 2006.
- C.-S. Li and V. Castelli: Deriving texture feature set for content-based retrieval of satellite image database. In Proc. Int'l Conf. on Image Processing - ICIP 1997: pp. 586-589.
- D. B. Lomet and B. Salzberg. The hB-tree: A multiattribute indexing method with good guaranteed performance. ACM Trans. Database Systems 15(4): 625-658 (1990).
- B. S. Manjunath and W.-Y. Ma. Textures Features for Image Retrieval. Chapter 12 in [11]
- R. Ng and J. Han. Efficient and effective clustering method for spatial data mining. In Proc. Int'l Conf. Very Large Data Bases - VLDB 1994, 144- 155,
- A. Papadopoulos and Y. Manolopoulos. Performance of nearest neighbor queries in R-Trees. In Proc. 6th Int'l Conf. on Database Theory - ICDT 1997, 394-408.
- R. Ramakrishnan and J. Gehrke. Databases Management Systems. McGraw-Hill, 2003.
- H. K. Ramapriyan. Satellite Imagery in Earth Science Applications. Chapter 3 in [11]
- A. R. Rao. A Taxonomy of Texture Description and Identification. Springer-Verlag, 1990.
- H. Samet. Foundations of Multidimensional and Metric Data Structure. Elsevier, 2006.
- D. A. White and R. C. Jain: Similarity indexing with the SS-tree. In Proc. 12th Int'l Conf. on Data Engineering - ICDE 1996, 516-523.
- D. A. White and R. C. Jain: Similarity indexing: Algorithms and performance. In Proc. Storage and Retrieval for Image and Video Databases (SPIE) 1996: 62-73.