Cause of Qwen3-30B-A3B verification variability across implementations and hardware
Determine whether the observed differences in Token-DiFR and Activation-DiFR behavior for Qwen3-30B-A3B across inference implementations (vLLM versus HuggingFace Transformers) and GPU types (H200 versus A100) are driven primarily by the model’s mixture-of-experts architecture or by implementation-specific factors in the inference stacks, in order to clarify the source of variability that affects detector calibration and deployment.
Sponsor
References
It is unclear whether these differences are driven primarily by the mixture-of-experts architecture or by implementation details in the current inference stacks.
— DiFR: Inference Verification Despite Nondeterminism
(2511.20621 - Karvonen et al., 25 Nov 2025) in Appendix K, Qwen3-30B-A3B Results