Comparative Bi-stochastizations and Associated Clusterings/Regionalizations of the 1995-2000 U. S. Intercounty Migration Network (1208.3428v2)
Abstract: Wang, Li and Konig have recently compared the cluster-theoretic properties of bi-stochasticized symmetric data similarity (e. g. kernel) matrices, produced by minimizing two different forms of Bregman divergences. We extend their investigation to non-symmetric matrices, specifically studying the 1995-2000 U. S. 3,107 x 3,107 intercounty migration matrix. A particular bi-stochastized form of it had been obtained (arXiv:1207.0437), using the well-established Sinkhorn-Knopp (SK) (biproportional) algorithm--which minimizes the Kullback-Leibler form of the divergence. This matrix has but a single entry equal to (the maximal possible value of) 1. Highly contrastingly, the bi-stochastic matrix obtained here, implementing the Wang-Li-Konig-algorithm for the minimum of the alternative, squared-norm form of the divergence, has 2,707 such unit entries. The corresponding 3,107-vertex, 2,707-link directed graph has 2,352 strong components. These consist of 1,659 single/isolated counties, 654 doublets (thirty-one interstate in nature), 22 triplets (one being interstate), 13 quartets (one being interstate), three quintets and one septet. Not manifest in these graph-theoretic results, however, are the five-county states of Hawaii and Rhode Island and the eight-county state of Connecticut. These--among other regional configurations--appealingly emerged as well-defined entities in the SK-based strong-component hierarchical clustering.