Papers
Topics
Authors
Recent
Search
2000 character limit reached

Feature Encodings for Gradient Boosting with Automunge

Published 25 Sep 2022 in cs.LG | (2209.12309v2)

Abstract: Automunge is a tabular preprocessing library that encodes dataframes for supervised learning. When selecting a default feature encoding strategy for gradient boosted learning, one may consider metrics of training duration and achieved predictive performance associated with the feature representations. Automunge offers a default of binarization for categoric features and z-score normalization for numeric. The presented study sought to validate those defaults by way of benchmarking on a series of diverse data sets by encoding variations with tuned gradient boosted learning. We found that on average our chosen defaults were top performers both from a tuning duration and a model performance standpoint. Another key finding was that one hot encoding did not perform in a manner consistent with suitability to serve as a categoric default in comparison to categoric binarization. We present here these and further benchmarks.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.