Scalability of saddlepoint Monte Carlo in high-dimensional settings

Investigate the behavior and practical limits of saddlepoint Monte Carlo for computing the marginal likelihood f_AX(y) when the dimension of the latent vector X (d_X) or the aggregated vector Y = AX (d_Y) is very large, and ascertain whether the method remains effective and how its accuracy and computational cost scale in such high-dimensional regimes.

Background

The paper introduces saddlepoint Monte Carlo, an unbiased importance-sampling-based approach leveraging characteristic functions and exponential tilting, particularly effective for exponential family models such as the multinomial. The authors demonstrate strong performance on large real-world ecological inference tasks, especially when using a Gaussian proposal with tilting, and provide supporting asymptotic variance results.

While the method is shown to work well for datasets with large n and moderate dimensionality, the authors explicitly note that exploring the method’s behavior when either the latent dimension d_X or the observed aggregate dimension d_Y becomes very large has not been undertaken. This scalability question is relevant for applications in privacy-preserving aggregation and ill-posed inverse problems, where high-dimensional structures naturally arise.

References

This also means pushing the method to its limits when d X or dY get very large, a question we have not yet explored.

— Saddlepoint Monte Carlo and its Application to Exact Ecological Inference (2410.18243 - Voldoire et al., 2024) in Section 4, Future work

Scalability of saddlepoint Monte Carlo in high-dimensional settings

Background

References

Related Problems