Papers
Topics
Authors
Recent
2000 character limit reached

Generic Embedding-Based Lexicons for Transparent and Reproducible Text Scoring (2411.00964v1)

Published 1 Nov 2024 in cs.CL

Abstract: With text analysis tools becoming increasingly sophisticated over the last decade, researchers now face a decision of whether to use state-of-the-art models that provide high performance but that can be highly opaque in their operations and computationally intensive to run. The alternative, frequently, is to rely on older, manually crafted textual scoring tools that are transparently and easily applied, but can suffer from limited performance. I present an alternative that combines the strengths of both: lexicons created with minimal researcher inputs from generic (pretrained) word embeddings. Presenting a number of conceptual lexicons produced from FastText and GloVe (6B) vector representations of words, I argue that embedding-based lexicons respond to a need for transparent yet high-performance text measuring tools.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.