Learning-augmented count-min sketches via Bayesian nonparametrics
Stefano Favaro - Università di Torino - Collegio Carlo Alberto
The count-min sketch (CMS) is a time- and memory-efficient randomized data structure that provides estimates of token frequencies in a data stream, i.e. point queries, based on randomly hashed data. CMS augmented by learning aim to improve the CMS by means of learning models that allow better exploitation of data properties. We focus on the learning-augmented CMS by Cai, Mitzenmacher and Adams (NeurIPS, 2018), which is based on the non-parametric Bayesian modeling (BNP) of a data flow via priori of the Dirichlet (DP) process; this is called the CMS-DP. We present a new and rigorous approach of the CMS-DP, and show that it allows to consider more general classes of non-parametric priors than the prior DP. We apply our approach to develop a new CMS with learning under power-law data streams, which is based on the BNP modeling of the flow through prior Pitman-Yor process (PYP); this is referred to as CMS-PYP. Applications to synthetic data and real data show that CMS-PYP outperforms both CMS and CMS-DP in estimating low-frequency tokens; this is known to be a critical feature in natural language processing, where it is indeed common to encounter power-law data.
Stefano Favaro is a professor of statistics at the University of Turin and holder of the Carlo Alberto chair of statistics and machine learning at the Collegio Carlo Alberto. He is also Associate Member at the Statistics Department of the University of Oxford, and Research Fellow at the Institute of Applied Mathematics and Information Technologies "Enrico Magenes" of the CNR. In 2019 Stefano Favaro was awarded by an ERC Consolidator Grant on nonparametric Bayes and empirical Bayes methods for discrete functional estimation. In addition to non-parametric Bayesian methods, his research interests include statistical machine learning, data confidentiality and equity, learning-augmented retrieval algorithms, and deep learning mathematics. Since 2020 Stefano Favaro serves as Associate Editor for Bernoulli and Statistical Science.
March 21, 2022, 3:00 pm
Where is it
Room 706, UniGe DIBRIS, Via Dodecaneso 35, Genoa, Italy.