2025 – 2026 Department of Statistics Colloquium Speaer
When: Thursday, September 4, 2025—2:50 p.m. to 3:50 p.m.
Where: LeConte 224
Speaker: Dr. Anru Zhang, Department of Biostatistics & Bioinformatics and Department of Computer Science, Duke University
Abstract: The increasing availability of electronic health records (EHRs) and other biomedical data calls for methodologies that can generate high-quality synthetic data while preserving privacy, correcting bias, and addressing complex data structures. In this talk, I will present a series of recent advances in generative modeling for synthetic health data. First, using denoising diffusion probabilistic models, we develop a framework for generating realistic, privacy-preserving EHR time series that achieve superior fidelity and lower privacy risk than existing methods. Second, to address irregularly observed functional data, we introduce Smooth Flow Matching (SFM), a semiparametric copula flow framework capable of generating smooth, infinite-dimensional trajectories under irregular sampling and non-Gaussian structures. Finally, we propose a bias-corrected data synthesis strategy for imbalanced learning, which mitigates distortions introduced by synthetic samples and enhances predictive performance in rare-event classification. Collectively, these methods provide a principled foundation for generative modeling of synthetic health data, enabling privacy-preserving bias-reduced analysis and broader utilization of sensitive biomedical datasets.
When: Thursday, September 4, 2025—2:50 p.m. to 3:50 p.m.
Where: LeConte 224
Speaker: Dr. Cong Ma, Department of Statistics, University of Chicago
Abstract: Integrative data analysis often requires separating shared from individual variations across multiple datasets, typically using the Joint and Individual Variation Explained (JIVE) model. Despite its popularity, theoretical insights into JIVE methods remain limited, particularly in the context of multiple matrices and varying degrees of subspace misalignment. In this talk, I will present new theoretical results on the Angle-based JIVE (AJIVE) method—a two-stage spectral algorithm. Specifically, we establish that AJIVE achieves decreasing estimation error with an increasing number of matrices in high signal-to-noise ratio (SNR) regimes. In contrast, AJIVE faces inherent limitations in low-SNR conditions, where estimation error remains persistently high. Complementary minimax lower bounds confirm AJIVE’s optimal performance at high SNR, while analysis of an oracle estimator highlights fundamental limitations of spectral methods at low SNR.
When: Thursday, September 18, 2025—2:50 p.m. to 3:50 p.m.
Where: LeConte 224
Speaker: Dr. Christopher Wikle, Department of Statistics, University of Missouri
Abstract: The world is full of extreme events. For example, a central question in public health planning might be to assess the likelihood of extreme exposures (meteorological conditions, air pollution, social stress, etc.). Such extreme events typically occur in spatial and/or temporal clusters. Yet, the principal methodologies that statisticians deal with spatially dependent processes (Gaussian processes and Markov random fields) are not suitable for complex tail dependence structures. This is particularly true of simulation model emulation. More flexible spatial extremes models exhibit appealing extremal dependence properties but are often exceedingly prohibitive to fit and simulate from in high dimensions. Here I present recent work where we develop a new spatial extremes model that has flexible and non-stationary dependence properties, and we integrate it in the encoding-decoding structure of a variational autoencoder (XVAE), whose parameters are estimated via variational Bayes combined with deep learning. The XVAE can be used to analyze high-dimensional data or as a spatio-temporal emulator that characterizes the distribution of potential mechanistic model output states and produces outputs that have the same statistical properties as the inputs, especially in the tail. Through extensive simulation studies, we show that our XVAE is substantially more time-efficient than traditional Bayesian inference while also outperforming many spatial extremes models with a stationary dependence structure. We demonstrate our method applied to a high-resolution satellite-derived dataset of sea surface temperature in the Red Sea and to a high-resolution simulation model of a turbulent plume, such as one would find in a wildfire. We note, however, that these methods can be applied to any data set or simulation model that exhibits extremes.