Cause of progressive degradation in quantised Phi-3.5 models during extended processing
Identify the underlying cause(s) of the progressive model degradation and subsequent incoherent outputs observed in quantised Phi-3.5 Mini 3.8B models during extended batch processing of paediatric renal biopsy reports on CPU-only 16GB RAM hardware, and determine whether the failure mode is driven by memory exhaustion, key–value cache accumulation across sequential inference calls, quantisation-related instability with long context lengths, or other factors.
References
Whilst the cause of this progressive model degradation during extended processing sessions remains unclear, possible explanations include memory exhaustion on our limited hardware (16GB RAM), KV cache accumulation across sequential inference calls, or quantisation-related instability under extended processing with longer context lengths.