Dice Question Streamline Icon: https://streamlinehq.com

Obtain Starmie containment measurements on YADL 50k and Open Data US

Determine the Jaccard Containment results for the Starmie retrieval method on the YADL 50k and Open Data US data lakes by computing the containment values between the query columns and the candidate columns returned by Starmie, following the same evaluation protocol used elsewhere in the paper (e.g., averaging over the top retrieved candidates across base tables) to enable direct comparison with Exact Matching, MinHash, and Hybrid MinHash.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper analyzes retrieval quality using Jaccard Containment across several methods (Exact Matching, MinHash, Hybrid MinHash, and Starmie) and data lakes. Figure 1 reports average top-200 containment to compare retrieval methods.

While containment was computed for Exact Matching, MinHash, and Hybrid MinHash across the evaluated data lakes, the authors were not able to produce the containment measurements for Starmie on YADL 50k and Open Data US, leaving a gap in the comparative analysis for those settings.

References

We were unable to obtain the containment results for Starmie on YADL 50k and Open Data US.

Retrieve, Merge, Predict: Augmenting Tables with Data Lakes (2402.06282 - Cappuzzo et al., 9 Feb 2024) in Section 5.2, Containment affects the entire process