FAMOSE: A ReAct Approach to Automated Feature Discovery
This presentation explores FAMOSE, a novel agent-based framework that automates feature engineering for tabular machine learning through iterative reasoning and validation. Unlike traditional one-shot approaches, FAMOSE employs a ReAct-style agent that proposes, evaluates, and refines features in data-driven feedback loops, achieving state-of-the-art performance while discovering diverse, interpretable transformations. We examine its architecture, empirical advantages over classical and LLM-based baselines, and implications for the future of automated machine learning.Script
What if your machine learning model could discover its own features the way an expert data scientist does, learning from each experiment and getting smarter with every iteration? This is the promise of FAMOSE, a breakthrough that transforms automated feature engineering from blind enumeration into intelligent exploration.
Building on this vision, the challenge becomes clear. Traditional feature engineering hits three critical walls: it demands rare expertise, classical automation drowns in combinatorial complexity, and even recent language model approaches generate features once without learning from their mistakes.
FAMOSE tackles these limitations head-on with an agent-based architecture.
Following this architecture, FAMOSE introduces a fundamentally different approach. The agent iteratively proposes feature transformations, validates them empirically, and remembers what worked, creating a feedback loop that mirrors how data scientists actually work.
This schematic captures the elegance of the system: the language model proposes candidate features grounded in explicit schema metadata, code execution validates them against real performance metrics, and successful features inform the next round of proposals. The loop continues until no further gains emerge, then mRMR pruning distills the final feature set.
The proof, of course, lies in the results.
Turning to the benchmarks, FAMOSE delivers consistent, measurable advantages. On classification tasks with over 10,000 samples, it achieves a 0.23% ROC-AUC gain, while regression sees a 2% RMSE reduction, and critically, these features generalize across diverse model architectures.
Beyond raw performance numbers, what makes FAMOSE special is the quality and diversity of what it discovers. Unlike narrower approaches that rely on a handful of transformations, the agent explores a richer mathematical landscape, uncovering composite features that classical enumeration simply never reaches.
Perhaps most importantly for real-world deployment, FAMOSE delivers transparency alongside performance. The reasoning traces are interpretable, validation catches hallucinations before they harm results, and the system proves robust across different language model backends.
No approach is without constraints. FAMOSE's iterative reasoning demands significant compute, domains outside the language model's training distribution may need retrieval augmentation, and extending beyond binary classification requires architectural evolution.
FAMOSE reimagines automated feature engineering as intelligent, iterative exploration, proving that agent-based reasoning can navigate exponentially complex spaces and deliver measurable gains. For more cutting-edge research like this, visit EmergentMind.com.