2000 character limit reached
Gender Inference using Statistical Name Characteristics in Twitter (1606.05467v2)
Published 17 Jun 2016 in cs.CL and cs.SI
Abstract: Much attention has been given to the task of gender inference of Twitter users. Although names are strong gender indicators, the names of Twitter users are rarely used as a feature; probably due to the high number of ill-formed names, which cannot be found in any name dictionary. Instead of relying solely on a name database, we propose a novel name classifier. Our approach extracts characteristics from the user names and uses those in order to assign the names to a gender. This enables us to classify international first names as well as ill-formed names.
- Juergen Mueller (8 papers)
- Gerd Stumme (43 papers)