Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
Gemini 2.5 Pro
GPT-5
GPT-4o
DeepSeek R1 via Azure
2000 character limit reached

Feature Engineering for Predictive Modeling using Reinforcement Learning (1709.07150v1)

Published 21 Sep 2017 in cs.AI, cs.LG, and stat.ML

Abstract: Feature engineering is a crucial step in the process of predictive modeling. It involves the transformation of given feature space, typically using mathematical functions, with the objective of reducing the modeling error for a given target. However, there is no well-defined basis for performing effective feature engineering. It involves domain knowledge, intuition, and most of all, a lengthy process of trial and error. The human attention involved in overseeing this process significantly influences the cost of model generation. We present a new framework to automate feature engineering. It is based on performance driven exploration of a transformation graph, which systematically and compactly enumerates the space of given options. A highly efficient exploration strategy is derived through reinforcement learning on past examples.

Citations (168)

Summary

Automated Feature Engineering via Reinforcement Learning

This paper presents a method to automate feature engineering (FE) for predictive modeling using reinforcement learning (RL). FE is a critical step in predictive modeling, which involves transforming the feature space to reduce modeling errors. Conventionally, FE relies on domain knowledge and intuition, making it labor-intensive and susceptible to biases. The authors propose a framework to systematically explore and transform the feature space through the use of a transformation graph and RL.

Transformation Graph

The foundation of the proposed method is a transformation graph, which is a directed acyclic graph (DAG) that represents various transformations of the original dataset. Nodes in the graph correspond to datasets created by applying transformation functions to features, while edges denote the application of these transformations. The exploration of this graph is guided by estimating rewards from transformations applied to existing nodes.

Reinforcement Learning Approach

To automate exploration, the authors model the process as a reinforcement learning problem, where the environment is the transformation graph and the agent aims to maximize prediction accuracy within a budget constraint. The action space involves applying available transformations to nodes in the graph. The Q-learning algorithm is used to learn the optimal policy for exploring this space. The Q-function is approximated using state characteristics such as node accuracy, transformation performance history, and node depth.

Experimental Evaluation

The authors conducted experiments on various publicly available datasets to compare their method's effectiveness against baseline methods, including expansion-reduction approaches and random feature transformations. Results indicate that their RL-guided approach significantly improves model accuracy over these methods, demonstrating the effectiveness of automating FE. Notably, there is a substantial reduction in error rates across diverse datasets, indicating robust application potential.

Discussion

The automation of FE via this RL-driven framework addresses key challenges associated with the manual process, such as time consumption and reliance on human intuition. By systematically exploring transformation choices, the framework reduces the feature engineering cycle time. The authors highlight the domain-independent nature of the proposed solution, noting its adaptability to various types of data and learning models.

Future Directions

The paper suggests extending this framework to other aspects of predictive modeling, such as imputation and model selection. The interaction between optimal features and model types presents opportunities for joint optimization strategies, enhancing both FE and learning algorithm efficiency.

Conclusion

This research contributes a structured approach to feature engineering, demonstrating how reinforcement learning can effectively automate complex processes typically dependent on human expertise. The systematic exploration of transformation graphs illustrated in this paper offers a promising avenue for optimizing predictive models across diverse domains. Potential future work includes refining the learning model's complexity and exploring broader applications in predictive analytics.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.