Minimizing Negative Transfer of Knowledge in Multivariate Gaussian Processes: A Scalable and Regularized Approach (1901.11512v2)
Abstract: Recently there has been an increasing interest in the multivariate Gaussian process (MGP) which extends the Gaussian process (GP) to deal with multiple outputs. One approach to construct the MGP and account for non-trivial commonalities amongst outputs employs a convolution process (CP). The CP is based on the idea of sharing latent functions across several convolutions. Despite the elegance of the CP construction, it provides new challenges that need yet to be tackled. First, even with a moderate number of outputs, model building is extremely prohibitive due to the huge increase in computational demands and number of parameters to be estimated. Second, the negative transfer of knowledge may occur when some outputs do not share commonalities. In this paper we address these issues. We propose a regularized pairwise modeling approach for the MGP established using CP. The key feature of our approach is to distribute the estimation of the full multivariate model into a group of bivariate GPs which are individually built. Interestingly pairwise modeling turns out to possess unique characteristics, which allows us to tackle the challenge of negative transfer through penalizing the latent function that facilitates information sharing in each bivariate model. Predictions are then made through combining predictions from the bivariate models within a Bayesian framework. The proposed method has excellent scalability when the number of outputs is large and minimizes the negative transfer of knowledge between uncorrelated outputs. Statistical guarantees for the proposed method are studied and its advantageous features are demonstrated through numerical studies.
- Sparse convolved gaussian processes for multi-output regression. In Advances in neural information processing systems, pages 57–64, 2009.
- Efficient multioutput gaussian processes through variational inducing kernels. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 25–32, 2010.
- Computationally efficient convolved multiple output gaussian processes. Journal of Machine Learning Research, 12(May):1459–1500, 2011.
- Kernels for vector-valued functions: A review. Foundations and Trends® in Machine Learning, 4(3):195–266, 2012.
- Blackbox kriging: spatial prediction without specifying variogram models. Journal of Agricultural, Biological, and Environmental Statistics, pages 297–322, 1996.
- Ishwar V Basawa. Statistical Inferences for Stochasic Processes: Theory and Methods. Elsevier, 1980.
- Asymptotic properties of maximum likelihood estimators for stochastic processes. Sankhyā: The Indian Journal of Statistics, Series A, pages 259–270, 1976.
- Phillip Boyle. Gaussian processes for regression and optimisation. 2007.
- Dependent gaussian processes. In Advances in neural information processing systems, pages 217–224, 2005.
- Some topics in convolution-based spatial modeling. Proceedings of the 56th Session of the International Statistics Institute, pages 22–29, 2007.
- Generalized product of experts for automatic and principled fusion of gaussian process predictions. arXiv preprint arXiv:1410.7827, 2014.
- Rich Caruana. Multitask learning. In Learning to learn, pages 95–133. Springer, 1998.
- From profile to surface monitoring: Spc for cylindrical surfaces via gaussian processes. Journal of Quality Technology, 46(2):95–113, 2014.
- Bayesian emulation of complex multi-output and dynamic computer models. Journal of statistical planning and inference, 140(3):640–651, 2010.
- Gaussian process emulation of dynamic computer codes. Biometrika, 96(3):663–676, 2009.
- Distributed gaussian processes. arXiv preprint arXiv:1502.02843, 2015.
- Adapting to unknown smoothness via wavelet shrinkage. Journal of the american statistical association, 90(432):1200–1224, 1995.
- Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456):1348–1360, 2001.
- Pairwise fitting of mixed models for the joint modeling of multivariate longitudinal profiles. Biometrics, 62(2):424–431, 2006.
- Multivariate gaussian process emulators with nonseparable covariance structures. Technometrics, 55(1):47–56, 2013.
- The elements of statistical learning, volume 1. Springer series in statistics New York, 2001.
- Linear coregionalization model: tools for estimation and choice of cross-variogram matrix. Mathematical Geology, 24(3):269–286, 1992.
- Prediction for computer experiments having quantitative and qualitative input variables. Technometrics, 51(3):278–288, 2009.
- Simon Haykin. Communication systems. John Wiley & Sons, 2008.
- Universal cokriging under intrinsic coregionalization. Mathematical Geology, 26(2):205–226, 1994.
- Tom Heskes. Selecting weighting factors in logarithmic opinion pools. In Advances in neural information processing systems, pages 266–272, 1998.
- Dave Higdon. Space and space-time modeling using process convolutions. In Quantitative methods for current environmental issues, pages 37–56. Springer, 2002.
- Computer model calibration using high-dimensional output. Journal of the American Statistical Association, 103(482):570–583, 2008.
- Leonard Kleinrock. Queueing systems, volume 2: Computer applications, volume 66. wiley New York, 1976.
- Nonparametric-condition-based remaining useful life prediction incorporating external factors. IEEE Transactions on Reliability, 2017a.
- Nonparametric modeling and prognosis of condition monitoring signals using multivariate gaussian convolution processes. Technometrics, (just-accepted), 2017b.
- Pairwise meta-modeling of multivariate output computer models using nonseparable covariance function. Technometrics, 58(4):483–494, 2016.
- Pairwise estimation of multivariate gaussian process models with replicated observations: Application to multivariate profile monitoring. Technometrics, 60(1):70–78, 2018.
- Multivariate spatial modeling for geostatistical data using convolved covariance functions. Mathematical Geology, 39(2):225–245, 2007.
- KV Mardia and AJ Watkins. On multimodality of the likelihood in the spatial linear model. Biometrika, 76(2):289–295, 1989.
- Calibration and uncertainty analysis for computer simulations with multivariate output. AIAA journal, 46(5):1253–1265, 2008.
- Multi-kernel gaussian processes. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence, volume 22, page 1408, 2011.
- Hierarchical mixture-of-experts model for large-scale gaussian process regression. arXiv preprint arXiv:1412.3078, 2014.
- Collaborative multi-output gaussian processes. In UAI, pages 643–652, 2014.
- Towards real-time information processing of sensor network data using computationally efficient multi-output gaussian processes. In Proceedings of the 7th international conference on Information processing in sensor networks, pages 109–120. IEEE Computer Society, 2008.
- Nonstationary covariance functions for gaussian process regression. In Advances in neural information processing systems, pages 273–280, 2004.
- A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2010.
- Gaussian process models for computer experiments with qualitative and quantitative factors. Technometrics, 50(3):383–396, 2008.
- A unifying view of sparse approximate gaussian process regression. Journal of Machine Learning Research, 6(Dec):1939–1959, 2005.
- James O Ramsay. Functional data analysis. Wiley Online Library, 2006.
- Carl Edward Rasmussen. Gaussian processes in machine learning. In Advanced lectures on machine learning, pages 63–71. Springer, 2004.
- Transductive and inductive methods for approximate gaussian process regression. In Advances in Neural Information Processing Systems, pages 977–984, 2003.
- Gaussian process regression analysis for functional data. CRC Press, 2011.
- A Stein and LCA Corsten. Universal kriging and cokriging as a regression procedure. Biometrics, pages 575–587, 1991.
- Sparse precision matrix selection for fitting gaussian random field models to large data sets. arXiv preprint arXiv:1405.5576, 2014.
- H Jean Thiébaux and MA Pedder. Spatial objetive analysis: with applications in atmospheric science. Number 519.24 THI. 1987.
- Volker Tresp. A bayesian committee machine. Neural computation, 12(11):2719–2741, 2000.
- Constructing and fitting models for cokriging and multivariable spatial prediction. Journal of Statistical Planning and Inference, 69(2):275–294, 1998.
- Joe Whittaker. Graphical models in applied multivariate statistics. Wiley Publishing, 2009.
- Christopher K Wikle. A kernel-based spectral model for non-gaussian spatio-temporal processes. Statistical Modelling, 2(4):299–314, 2002.
- Multi-task gaussian process learning of robot inverse dynamics. In Advances in Neural Information Processing Systems, pages 265–272, 2009.
- Introductory statistics, volume 5. Wiley New York, 1990.
- Visual classification with multitask joint sparse representation. IEEE Transactions on Image Processing, 21(10):4349–4360, 2012.
- A simple approach to emulation for computer models with qualitative and quantitative factors. Technometrics, 53(3):266–273, 2011.