Determine Causes of GPT-4 Macro-Level Attribution Performance Discrepancy

Determine the reasons for GPT-4’s observed performance discrepancy when executing macro-level Brinson–Fachler performance attribution calculations using the specified "macro prompt" (top-down GICS Type level with multi-step formulas and instructions), wherein GPT-4 required multiple attempts to achieve perfect accuracy for the Portfolio Growth and Portfolio Value datasets but succeeded on the first attempt for Portfolio Defensive; specifically, ascertain whether the prompt’s complexity and the absence of numerical examples in the prompt are responsible for the reduced reliability.

Background

The paper evaluates an AI agent powered by GPT-4 to perform performance attribution tasks using the Brinson–Fachler model. Micro-level calculations achieved perfect accuracy consistently, but macro-level (top-down GICS Type) calculations exhibited inconsistent reliability across funds.

For Portfolio Defensive, GPT-4 produced correct macro-level results on the first attempt, whereas for Portfolio Growth and Portfolio Value it required several attempts before reaching perfect accuracy. The authors hypothesize that prompt complexity and the possible need for numerical examples might contribute to this discrepancy but explicitly state uncertainty about the cause.

References

We are not sure of the reasons for this performance discrepancy. It could be related to the fact that the prompt is complicated with several formulas and instructions, and it may need numerical examples in the prompt.

— Can a GPT4-Powered AI Agent Be a Good Enough Performance Attribution Analyst? (2403.10482 - Melo et al., 15 Mar 2024) in Section 5 (Results), Method #1: Calculation of a multi-level attribution report; paragraph following Exhibit 30

Determine Causes of GPT-4 Macro-Level Attribution Performance Discrepancy

Sponsor

Background

References

Related Problems