Determine Causes of GPT-4 Macro-Level Attribution Performance Discrepancy
Determine the reasons for GPT-4’s observed performance discrepancy when executing macro-level Brinson–Fachler performance attribution calculations using the specified "macro prompt" (top-down GICS Type level with multi-step formulas and instructions), wherein GPT-4 required multiple attempts to achieve perfect accuracy for the Portfolio Growth and Portfolio Value datasets but succeeded on the first attempt for Portfolio Defensive; specifically, ascertain whether the prompt’s complexity and the absence of numerical examples in the prompt are responsible for the reduced reliability.
References
We are not sure of the reasons for this performance discrepancy. It could be related to the fact that the prompt is complicated with several formulas and instructions, and it may need numerical examples in the prompt.