News & Events
News

Professor Guo Xu's Team from the School of Statistics Has Achieved a Series of Significant Results in the Field of High-Dimensional Statistical Inference

Professor Guo Xu's team from the School of Statistics has long been dedicated to cutting-edge research in high-dimensional statistical inference and has achieved a series of significant results. Recently, three academic papers written by the team were published in the top international statistics journal, Journal of the American Statistical Association (JASA), with Guo Xu as the first author or corresponding author in all three.


The abstract of papers:


1. Model-Free Statistical Inference on High-Dimensional Data


This article aims to develop an effective model-free inference procedure for high-dimensional data. We first reformulate the hypothesis testing problem via sufficient dimension reduction framework. With the aid of new reformulation, we propose a new test statistic and show that its asymptotic distribution is x2 distribution whose degree of freedom does not depend on the unknown population distribution. We further conduct power analysis under local alternative hypotheses. In addition, we study how to control the false discovery rate of the proposed x2 tests, which are correlated, to identify important predictors under a model-free framework. To this end, we propose a multiple testing procedure and establish its theoretical guarantees. Monte Carlo simulation studies are conducted to assess the performance of the proposed tests and an empirical analysis of a real-world dataset is used to illustrate the proposed methodology. Supplementary materials for this article are available online including a standardized description of the materials available for reproducing the work.


2. Test and measure for partial mean dependence based on machine learning methods


It is of importance to investigate the significance of a subset of covariates W for the response Y given covariates Z in regression modeling. To this end, we propose a significance test for the partial mean independence problem based on machine learning methods and data splitting. The test statistic converges to the standard Chi-squared distribution under the null hypothesis while it converges to a normal distribution under the fixed alternative hypothesis. Power enhancement and algorithm stability are also discussed. If the null hypothesis is rejected, we propose a partial Generalized Measure of Correlation (pGMC) to measure the partial mean dependence of Y given W after controlling for the nonlinear effect of Z. We present the appealing theoretical properties of the pGMC and establish the asymptotic normality of its estimator with the optimal root-N convergence rate. Furthermore, the valid confidence interval for the pGMC is also derived. As an important special case when there are no conditional covariates Z, we introduce a new test of overall significance of covariates for the response in a model-free setting. Numerical studies and real data analysis are also conducted to compare with existing approaches and to demonstrate the validity and flexibility of our proposed procedures. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.


3. Statistical inference for high dimensional convoluted rank regression


High-dimensional penalized rank regression is a powerful tool for modeling high-dimensional data due to its robustness and estimation efficiency. However, the non-smoothness of the rank loss brings great challenges to the computation. To solve this critical issue, high-dimensional convoluted rank regression has been recently proposed, introducing penalized convoluted rank regression estimators. However, these developed estimators cannot be directly used to make inference. In this paper, we investigate the statistical inference problem of high-dimensional convoluted rank regression. The use of U-statistic in convoluted rank loss function presents challenges for the analysis. We begin by establishing estimation error bounds of the penalized convoluted rank regression estimators under weaker conditions on the predictors. Building on this, we further introduce a debiased estimator and provide its Bahadur representation. Subsequently, a high-dimensional Gaussian approximation for the maximum deviation of the debiased estimator is derived, which allows us to construct simultaneous confidence intervals. For implementation, a novel bootstrap procedure is proposed and its theoretical validity is also established. Finally, simulation and real data analysis are conducted to illustrate the merits of our proposed methods.


The source of papers:


1. Guo, X., Li, R. Z., Zhang, Z., and Zou, C. L. (2024). Model-free statistical inference on high-dimensional data. Journal of the American Statistical Association, https://doi.org/10.1080/01621459.2024.2310314


2. Cai L. H., Guo, X., and Zhong, W. (2024). Test and measure for partial mean dependence based on machine learning methods. Journal of the American Statistical Association, https://doi.org/10.1080/01621459.2024.2366030


3. Cai L. H., Guo, X., Lian, H., and Zhu, L. P. (2025). Statistical inference for high dimensional convoluted rank regression. Journal of the American Statistical Association, https://doi.org/10.1080/01621459.2025.2471054