Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 90 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 21 tok/s
GPT-5 High 23 tok/s Pro
GPT-4o 79 tok/s
GPT OSS 120B 465 tok/s Pro
Kimi K2 214 tok/s Pro
2000 character limit reached

Tree-structured multi-stage principal component analysis (TMPCA): theory and applications (1807.08228v2)

Published 22 Jul 2018 in cs.CL

Abstract: A PCA based sequence-to-vector (seq2vec) dimension reduction method for the text classification problem, called the tree-structured multi-stage principal component analysis (TMPCA) is presented in this paper. Theoretical analysis and applicability of TMPCA are demonstrated as an extension to our previous work (Su, Huang & Kuo). Unlike conventional word-to-vector embedding methods, the TMPCA method conducts dimension reduction at the sequence level without labeled training data. Furthermore, it can preserve the sequential structure of input sequences. We show that TMPCA is computationally efficient and able to facilitate sequence-based text classification tasks by preserving strong mutual information between its input and output mathematically. It is also demonstrated by experimental results that a dense (fully connected) network trained on the TMPCA preprocessed data achieves better performance than state-of-the-art fastText and other neural-network-based solutions.

Citations (27)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.