Publication

Locally differentially-private distribution estimation

Michael Christoph Gastpar, Adriano Pastore
2016
Conference Papers

Résumé

We consider a setup in which confidential i.i.d. samples X1, ..., Xn from an unknown discrete distribution PX are passed through a discrete memoryless privatization channel (a.k.a. mechanism) which guarantees an epsilon-level of local differential privacy. For a given epsilon, the channel should be designed such that an estimate of the source distribution based on the channel outputs converges as fast as possible to the exact value PX. For this purpose we consider two metrics of estimation accuracy: the expected mean-square error and the expected Kullback-Leibler divergence. We derive their respective normalized first-order terms (as n tends to infinity), which for a given target privacy epsilon represent the factor by which the sample size must be augmented so as to achieve the same estimation accuracy as that of an identity (non-privatizing) channel. We formulate the privacy-utility tradeoff problem as being that of minimizing said first-order term under a privacy constraint epsilon. A converse bound is stated which bounds the optimal tradeoff away from the origin. Inspired by recent work on the optimality of staircase mechanisms (albeit for objectives different from ours), we derive an achievable tradeoff based on circulant step mechanisms. Within this finite class, we determine the optimal step pattern.

Source officielle

https://infoscience.epfl.ch/entities/publication/c0333287-5205-4883-9987-e3840f69c6b1

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Michael Christoph Gastpar, Adriano Pastore
2016
Conference Papers

Résumé

Source officielle

https://infoscience.epfl.ch/entities/publication/c0333287-5205-4883-9987-e3840f69c6b1

À propos de ce résultat

Concepts associés (32)

Nombre de sujets nécessaires

En statistique, la détermination du nombre de sujets nécessaires est l'acte de choisir le nombre d'observations ou de répétitions à inclure dans un échantillon statistique. Ce choix est très important pour pouvoir faire de l'inférence sur une population. En pratique, la taille de l'échantillon utilisé dans une étude est déterminée en fonction du coût de la collecte des données et de la nécessité d'avoir une puissance statistique suffisante.

Sample mean and covariance

The sample mean (sample average) or empirical mean (empirical average), and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or mean value) of a sample of numbers taken from a larger population of numbers, where "population" indicates not number of people but the entirety of relevant data, whether collected or not. A sample of 40 companies' sales from the Fortune 500 might be used for convenience instead of looking at the population, all 500 companies' sales.

Échantillonnage stratifié

vignette|Vous prenez un échantillon aléatoire stratifié en divisant d'abord la population en groupes homogènes (semblables en eux-mêmes) (strates) qui sont distincts les uns des autres, c'est-à-dire. Le groupe 1 est différent du groupe 2. Ensuite, choisissez un EAS (échantillon aléatoire simple) distinct dans chaque strate et combinez ces EAS pour former l'échantillon complet. L'échantillonnage aléatoire stratifié est utilisé pour produire des échantillons non biaisés.

Afficher plus

Publications associées (31)

On the Generalization of Stochastic Gradient Descent with Momentum

Volkan Cevher, Kimon Antonakopoulos

While momentum-based accelerated variants of stochastic gradient descent (SGD) are widely used when training machine learning models, there is little theoretical understanding on the generalization error of such methods. In this work, we first show that th ...

Microtome Publishing2024

Quantifying the Unknown: Data-Driven Approaches and Applications in Energy Systems

Paul Scharnhorst

In light of the challenges posed by climate change and the goals of the Paris Agreement, electricity generation is shifting to a more renewable and decentralized pattern, while the operation of systems like buildings is increasingly electrified. This calls ...

EPFL2024

Afficher plus

Locally differentially-private distribution estimation

Graph Chatbot

Two-particle Bose-Einstein correlations and their Lévy parameters in PbPb collisions at √sNN=5.02 TeV

On the Generalization of Stochastic Gradient Descent with Momentum

Quantifying the Unknown: Data-Driven Approaches and Applications in Energy Systems

Two-particle Bose-Einstein correlations and their Lévy parameters in PbPb collisions at √sNN=5.02 TeV

On the Generalization of Stochastic Gradient Descent with Momentum

Quantifying the Unknown: Data-Driven Approaches and Applications in Energy Systems