ud185-2

# Differential Privacy

• Type of Noise(Gaussian / Laplacian)
• Sensitivity of Query
• Desired Epsilon()
• Desired Delta()
Laplacian proof.
Definition 1.1 (ℓ1-sensitivity). The ℓ1-sensitivity of a function f :
$\mathbb{N}^{|\mathcal{X}|} \rightarrow \mathbb{R}^{k}$
is:
$\Delta f=\max _{\|x-y\|_{1}=1}\|f(x)-f(y)\|_{1}$
The ℓ1 sensitivity of a function f captures the magnitude by which a single individual’s data can change the function
$f$
in the worst case, and therefore, intuitively, the uncertainty in the response that we must introduce in order to hide the participation of a single individual. Indeed, we will formalize this intuition: the sensitivity of a function gives an upper bound on how much we must perturb its output to preserve privacy. One noise distribution naturally lends itself to differential privacy.
Definition 1.2 (The Laplace Distribution). The Laplace Distribution(centered at 0) with scale b is the distribution with probability density function:
$\operatorname{Lap}(x | b)=\frac{1}{2b} \exp \left(-\frac{|x|}{b}\right).$
Definition 1.3 (The Laplace Mechanism). Given any function
$f : \mathbb{N}^{|\mathcal{X}|} \rightarrow \mathbb{R}^{k}$
, the Laplace mechanism is defined as:
$\mathcal{M}_{L}(x, f(\cdot), \varepsilon)=f(x)+\left(Y_{1}, \ldots, Y_{k}\right)$
where
$Y_{i}$
are i.i.d. random variables drawn from
$\operatorname{Lap}(\Delta f / \varepsilon).$
Theorem 1. The Laplace mechanism preserves (ε, 0)-differential privacy.
$Proof.$
Let
$x \in \mathbb{N}^{|\mathcal{X}|}$
and
$y \in \mathbb{N}^{|\mathcal{X}|}$
be such that
$||x-y||_{1} \leq 1$
, and let
$f(\cdot)$
be some function
$f : \mathbb{N}^{|\mathcal{X}|} \rightarrow \mathbb{R}^{k}$
. Let
$p_x$
denote the probability density function of
$\mathcal{M}_{L}(x, f, \varepsilon)$
, and let
$p_y$
denote the probability density function of
$\mathcal{M}_{L}(y, f, \varepsilon)$
. We compare the two at some arbitrary point
$z \in \mathbb{R}^{k}$
\begin{aligned} \frac{p_{x}(z)}{p_{y}(z)} &=\prod_{i=1}^{k}\left(\frac{\exp \left(-\frac{\varepsilon\left|f(x)_{i}-z_{i}\right|}{\Delta f}\right)}{\exp \left(-\frac{\varepsilon\left|f(y)_{i}-z_{i}\right|}{\Delta f}\right)}\right) \\ &=\prod_{i=1}^{k} \exp \left(\frac{\varepsilon\left(\left|f(y)_{i}-z_{i}\right|-\left|f(x)_{i}-z_{i}\right|\right)}{\Delta f}\right) \\ & \leq \prod_{i=1}^{k} \exp \left(\frac{\varepsilon\left|f(x)_{i}-f(y)_{i}\right|}{\Delta f}\right) \\ &=\exp \left(\frac{\varepsilon \cdot\|f(x)-f(y)\|_{1}}{\Delta f}\right) \\ & \leq \exp (\varepsilon) \end{aligned}
where the first inequality follows from the triangle inequality, and the last follows from the definition of sensitivity and the fact that
$||x-y||_{1}\leq 1$
. That
$\frac{p_{x}(z)}{p_{y}(z)} \geq \exp (-\varepsilon)$
follows by symmetry.

# Differential Privacy and Machine Learning

​ [译]数据分析中最有用的任务之一是机器学习：自动查找简单规则以准确预测从未见过的数据的某些未知特征的问题。 许多机器学习任务可以在差分隐私的约束下执行。 事实上，隐私的约束并不一定与机器学习的目标相悖，两者都旨在从绘制数据的分布中提取信息，而不是从单个数据点提取信息。 在本节中，我们将调查一些关于私人机器学习的最基本结果，而不是试图完全覆盖这个大型领域。
​ 机器学习的目标通常与隐私数据分析的目标相似。学习者通常希望学习一些解释数据集的简单规则。但是，她希望这条规则能够”概括“ - 也就是说，她应该学习的规则不仅要正确描述她手边的数据，而且应该能够正确描述从中获取的新数据分布。通常，这意味着她希望学习一种规则，该规则捕获有关手头数据集的分布信息，其方式不会过于依赖任何单个数据点。当然，这正是隐私数据分析的目标 - 揭示有关私有数据集的分布信息，而不会过多地揭示数据集中的任何单个个体。毫无疑问，机器学习和隐私数据分析是紧密相连的。事实上，正如我们将要看到的，我们通常能够以几乎同样准确的方式执行隐私保护机器学习，几乎与我们可以执行非隐私保护机器学习的示例数量相同。
“if adding, modifying, or removing any of its training samples would not result in a statistically significant difference in the model parameters learned. For this reason, learning with differential privacy is, in practice, a form of regularization.”