Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dimensionality Reduced Clustered Data and Order Partition and Stepwise Dimensionality Increasing Indices (2401.02858v1)

Published 5 Jan 2024 in cs.DB, cs.DL, and cs.DS

Abstract: One of the goals of NASA funded project at IBM T. J. Watson Research Center was to build an index for similarity searching satellite images, which were characterized by high-dimensional feature image texture vectors. Reviewed is our effort on data clustering, dimensionality reduction via Singular Value Decomposition - SVD and indexing to build a smaller index and more efficient k-Nearest Neighbor - k-NN query processing for similarity search. k-NN queries based on scanning of the feature vectors of all images is obviously too costly for ever-increasing number of images. The ubiquitous multidimensional R-tree index and its extensions were not an option given their limited scalability dimension-wise. The cost of processing k-NN queries was further reduced by building memory resident Ordered Partition indices on dimensionality reduced clusters. Further research in a university setting included the following: (1) Clustered SVD was extended to yield exact k-NN queries by issuing appropriate less costly range queries, (2) Stepwise Dimensionality Increasing - SDI index outperformed other known indices, (3) selection of optimal number of dimensions to reduce query processing cost, (4) two methods to make the OP-trees persistent and loadable as a single file access.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. V. Castelli and L. D. Bergman. Image Databases: Search and Retrieval of Digital Imagery. Wiley Interscience, 2002.
  2. V. Castelli. Multidimensional indexing structures for content-based retrieval. Chapter 14 in [11].
  3. C. Faloutsos. Searching Multimedia Databases by Content (Advances in Database Systems). KAP/Springer 1996/2012.
  4. V. Gaede and O. Gunther: Multidimensional access methods. ACM Computing Surveys 30(2): 170-231 (1998).
  5. G. H. Golub and C. F. Van Loan Matrix Computations, 4th ed. John Hopkins Univ. 2012.
  6. I.T. Jolliffe. Principal Component Analysis. Springer Verlag, 1986
  7. J. Kogan. Introduction to Clustering Large and High-Dimensional Data. Cambridge Univ. Press, 2006.
  8. C.-S. Li and V. Castelli: Deriving texture feature set for content-based retrieval of satellite image database. In Proc. Int'l Conf. on Image Processing - ICIP 1997: pp. 586-589.
  9. D. B. Lomet and B. Salzberg. The hB-tree: A multiattribute indexing method with good guaranteed performance. ACM Trans. Database Systems 15(4): 625-658 (1990).
  10. B. S. Manjunath and W.-Y. Ma. Textures Features for Image Retrieval. Chapter 12 in [11]
  11. R. Ng and J. Han. Efficient and effective clustering method for spatial data mining. In Proc. Int'l Conf. Very Large Data Bases - VLDB 1994, 144- 155,
  12. A. Papadopoulos and Y. Manolopoulos. Performance of nearest neighbor queries in R-Trees. In Proc. 6th Int'l Conf. on Database Theory - ICDT 1997, 394-408.
  13. R. Ramakrishnan and J. Gehrke. Databases Management Systems. McGraw-Hill, 2003.
  14. H. K. Ramapriyan. Satellite Imagery in Earth Science Applications. Chapter 3 in [11]
  15. A. R. Rao. A Taxonomy of Texture Description and Identification. Springer-Verlag, 1990.
  16. H. Samet. Foundations of Multidimensional and Metric Data Structure. Elsevier, 2006.
  17. D. A. White and R. C. Jain: Similarity indexing with the SS-tree. In Proc. 12th Int'l Conf. on Data Engineering - ICDE 1996, 516-523.
  18. D. A. White and R. C. Jain: Similarity indexing: Algorithms and performance. In Proc. Storage and Retrieval for Image and Video Databases (SPIE) 1996: 62-73.

Summary

We haven't generated a summary for this paper yet.