Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Arijit Ghosh

Testing Credibility of Public and Private Surveys through the Lens of Regression

Oct 07, 2024

Debabrota Basu, Sourav Chakraborty, Debarshi Chanda, Buddha Dev Das, Arijit Ghosh, Arnab Ray

Abstract:Testing whether a sample survey is a credible representation of the population is an important question to ensure the validity of any downstream research. While this problem, in general, does not have an efficient solution, one might take a task-based approach and aim to understand whether a certain data analysis tool, like linear regression, would yield similar answers both on the population and the sample survey. In this paper, we design an algorithm to test the credibility of a sample survey in terms of linear regression. In other words, we design an algorithm that can certify if a sample survey is good enough to guarantee the correctness of data analysis done using linear regression tools. Nowadays, one is naturally concerned about data privacy in surveys. Thus, we further test the credibility of surveys published in a differentially private manner. Specifically, we focus on Local Differential Privacy (LDP), which is a standard technique to ensure privacy in surveys where the survey participants might not trust the aggregator. We extend our algorithm to work even when the data analysis has been done using surveys with LDP. In the process, we also propose an algorithm that learns with high probability the guarantees a linear regression model on a survey published with LDP. Our algorithm also serves as a mechanism to learn linear regression models from data corrupted with noise coming from any subexponential distribution. We prove that it achieves the optimal estimation error bound for $\ell_1$ linear regression, which might be of broader interest. We prove the theoretical correctness of our algorithms while trying to reduce the sample complexity for both public and private surveys. We also numerically demonstrate the performance of our algorithms on real and synthetic datasets.

Via

Access Paper or Ask Questions

EchoNet-Synthetic: Privacy-preserving Video Generation for Safe Medical Data Sharing

Jun 02, 2024

Hadrien Reynaud, Qingjie Meng, Mischa Dombrowski, Arijit Ghosh, Thomas Day, Alberto Gomez, Paul Leeson, Bernhard Kainz

Abstract:To make medical datasets accessible without sharing sensitive patient information, we introduce a novel end-to-end approach for generative de-identification of dynamic medical imaging data. Until now, generative methods have faced constraints in terms of fidelity, spatio-temporal coherence, and the length of generation, failing to capture the complete details of dataset distributions. We present a model designed to produce high-fidelity, long and complete data samples with near-real-time efficiency and explore our approach on a challenging task: generating echocardiogram videos. We develop our generation method based on diffusion models and introduce a protocol for medical video dataset anonymization. As an exemplar, we present EchoNet-Synthetic, a fully synthetic, privacy-compliant echocardiogram dataset with paired ejection fraction labels. As part of our de-identification protocol, we evaluate the quality of the generated dataset and propose to use clinical downstream tasks as a measurement on top of widely used but potentially biased image quality metrics. Experimental outcomes demonstrate that EchoNet-Synthetic achieves comparable dataset fidelity to the actual dataset, effectively supporting the ejection fraction regression task. Code, weights and dataset are available at https://github.com/HReynaud/EchoNet-Synthetic.

* Accepted at MICCAI 2024

Via

Access Paper or Ask Questions

Uniform Brackets, Containers, and Combinatorial Macbeath Regions

Nov 19, 2021

Kunal Dutta, Arijit Ghosh, Shay Moran

Abstract:We study the connections between three seemingly different combinatorial structures - "uniform" brackets in statistics and probability theory, "containers" in online and distributed learning theory, and "combinatorial Macbeath regions", or Mnets in discrete and computational geometry. We show that these three concepts are manifestations of a single combinatorial property that can be expressed under a unified framework along the lines of Vapnik-Chervonenkis type theory for uniform convergence. These new connections help us to bring tools from discrete and computational geometry to prove improved bounds for these objects. Our improved bounds help to get an optimal algorithm for distributed learning of halfspaces, an improved algorithm for the distributed convex set disjointness problem, and improved regret bounds for online algorithms against a smoothed adversary for a large class of semi-algebraic threshold functions.

* 21 pages. Full version of the ITCS'22 paper with the same title

Via

Access Paper or Ask Questions

Size sensitive packing number for Hamming cube and its consequences

Dec 12, 2014

Kunal Dutta, Arijit Ghosh

Abstract:We prove a size-sensitive version of Haussler's Packing lemma~\cite{Haussler92spherepacking} for set-systems with bounded primal shatter dimension, which have an additional {\em size-sensitive property}. This answers a question asked by Ezra~\cite{Ezra-sizesendisc-soda-14}. We also partially address another point raised by Ezra regarding overcounting of sets in her chaining procedure. As a consequence of these improvements, we get an improvement on the size-sensitive discrepancy bounds for set systems with the above property. Improved bounds on the discrepancy for these special set systems also imply an improvement in the sizes of {\em relative $(\varepsilon, \delta)$-approximations} and $(\nu, \alpha)$-samples.

* At the time of submission, we have become aware of a similar packing result proven simultaneously by Ezra. However, we note that our proof of the main packing lemma is quite different from hers. Also, the focus of our paper is on discrepancy bounds and sampling complexity

Via

Access Paper or Ask Questions