Normalization of reflective-oracle–completed weights for Thompson sampling
Determine whether directly completing lower-semicomputable posterior weights w(ρ | h_{<t}) via a reflective oracle—by treating each weight generator as an oracle probabilistic Turing machine that outputs 1 with probability w(ρ | h_{<t}) and otherwise fails to halt—produces a normalized set of weights that sum to one, thereby enabling the stepwise Thompson sampling policy π_T over the reflective-oracle–computable environment class Mrefl to be defined when the weights are only lower semicomputable and potentially defective.
Sponsor
References
Generalizing to l.s.c. weights, it is natural to try to use the reflective oracle to somehow complete Thompson sampling. We could try to complete \pi_T's environment mixture \xi. Unfortunately this would not explicitly complete the weights which Thompson sampling needs access to; \pi_T requires not a dominant environment but explicit coefficients. The reflective oracle could be used to directly complete each weight from an oracle pTM generating it (in the sense of outputting 1 with probability w(\rho | _{<t}) and otherwise failing to halt) but it is unclear whether the individually completed weights would still sum to 1.