Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
Gemini 2.5 Pro
GPT-5
GPT-4o
DeepSeek R1 via Azure
2000 character limit reached

Assessing the Impact of Sequence Length Learning on Classification Tasks for Transformer Encoder Models (2212.08399v2)

Published 16 Dec 2022 in cs.LG

Abstract: Classification algorithms using Transformer architectures can be affected by the sequence length learning problem whenever observations from different classes have a different length distribution. This problem causes models to use sequence length as a predictive feature instead of relying on important textual information. Although most public datasets are not affected by this problem, privately owned corpora for fields such as medicine and insurance may carry this data bias. The exploitation of this sequence length feature poses challenges throughout the value chain as these machine learning models can be used in critical applications. In this paper, we empirically expose this problem and present approaches to minimize its impacts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. 2023. Preventing rnn from using sequence length as a feature. NLPIR ’22, 16–20. New York, NY, USA: Association for Computing Machinery.
  2. 2021. ”will you find these shortcuts?” a protocol for evaluating the faithfulness of input salience methods for text classification. arXiv preprint arXiv:2111.07367.
  3. 2019. On adversarial removal of hypothesis-only bias in natural language inference. In Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (* SEM 2019), 256–262.
  4. 2021. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623.
  5. 2021. A survey on bias in deep nlp. Applied Sciences 11(7):3184.
  6. 2021. Countering the influence of essay length in neural essay scoring. In Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing, 32–38.
  7. 2022. Rose: Robust selective fine-tuning for pre-trained language models. arXiv preprint arXiv:2210.09658.
  8. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  9. 2021. Predicting inductive biases of pre-trained models. In International Conference on learning representations.
  10. 2020. Gender bias in neural natural language processing. In Logic, Language, and Security. Springer. 189–202.
  11. 2019. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3428–3448.
  12. 2019. Hierarchical transformers for long document classification. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 838–844. IEEE.
  13. 2019. Debiasing embeddings for reduced gender bias in text classification. arXiv preprint arXiv:1908.02810.
  14. 2016. Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2383–2392.
  15. 2019. Mitigating gender bias in natural language processing: Literature review. arXiv preprint arXiv:1906.08976.
  16. 2017. Attention is all you need. Advances in neural information processing systems 30.
  17. 2020. Learning which features matter: Roberta acquires a preference for linguistic generalizations (eventually). In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 217–235.
  18. 2018. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 1112–1122.
  19. 2019. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771.
  20. 2022. Generating data to mitigate spurious correlations in natural language inference datasets. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2660–2676.
  21. 2016. Hierarchical attention networks for document classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, 1480–1489.
  22. 2015. Character-level convolutional networks for text classification. Advances in neural information processing systems 28.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.