Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Speeding-up the Verification Phase of Set Similarity Joins in the GPGPU paradigm (1812.09141v1)

Published 21 Dec 2018 in cs.DB and cs.DC

Abstract: We investigate the problem of exact set similarity joins using a co-process CPU-GPU scheme. The state-of-the-art CPU solutions split the wok in two main phases. First, filtering and index building takes place to reduce the candidate sets to be compared as much as possible; then the pairs are compared to verify whether they should become part of the result. We investigate in-depth solutions for transferring the second, so-called verification phase, to the GPU addressing several challenges regarding the data serialization and layout, the thread management and the techniques to compare sets of tokens. Using real datasets, we provide concrete experimental proofs that our solutions have reached their maximum potential, since they totally overlap verification with CPU tasks, and manage to yield significant speed-ups, up to 2.6X in our cases.

Summary

We haven't generated a summary for this paper yet.