Hi :)
If you may be so kind, I have a further question about floating-point protection. Mironov invented the Snapping mechanism to protect against an attack of that kind, but is the Snapping mechanism something you can include in any mechanism like the geometric or the gaussian or the laplace, or is it a stand-alone mechanism in and of itself?
I think it is the former but
Let me elaborate, on the DPCount found in https://opendifferentialprivacy.github.io/smartnoise-core/doc/smartnoise_validator/proto/struct.DpCount.html:
"From Privatizing mechanism to use. One of [SimpleGeometric, Laplace, Snapping, Gaussian, AnalyticGaussian]. Only SimpleGeometric is accepted if floating-point protections are enabled."
But if floating-point protection is enabled, then it means you are using the snapping mechanism, but the Snapping mechanism is one of the mechanism you could choose instead of the Geometric one.
Yes, floating point protection is enabled by default. It might be more correct to call it "floating-point-safe", since it's not a general-purpose protection method, but instead uses whichever mechanism is floating-point safe. In other words, the floating point problems are related to the mechanism being used. The Laplace mechanism (which transforms uniform random noise using floating point operations to get a Laplace distribution), was the first mechanism where these issues were discovered. But we have subsequently learned there are also issues with Gaussian and Exponential, at least. The snapping mechanism modifies Laplace mechanism; there are some proposals for how to protect Gaussian mechanism, using different techniques. The proposal linked earlier is an entirely different technique that changes the exponential mechanism to be safe.
I should also mention that the known impact of the floating point issues varies depending on the mechanism. The issue with Laplace mechanism can lead to catastrophic privacy loss in some cases, not frequently, but not very rarely. So, depending on your threat model, you probably care about that. It's not obvious (to me at least) how bad the issues in other mechanisms are. In any case, because differential privacy is about providing rigorous assurances, we want to make sure we are creating and deploying defaults that are safe.
The simple geometric mechanism is considered safe, since it's not using complex floating point operations to transform the distribution. You could also use the snapping mechanism for counts, but this would only be necessary if you had a preference for the behavior of the Laplace distribution. Note that you would still need to use snapping mechanism for Laplace counts, though, since simply truncating them to the nearest integer boundary does not prevent the attack. In our tests, simple geometric tends to work better for counts, though, so we default to that for counts (and because it's considered floating point safe). I think it's a good idea to test based on your scenario, though, since one or the other may give better results.
Hi. I have a question about the L1 bound for the mean function in the ADD/DROP ONE scenario. A tighter bound can be obtained on the local L1 sensitivity. Given a database X with n entries, with lower and upper bounds m and M on all possible entries, the bound = max( |mean(X) - M|, |mean(X) - m| ) / n should hold.
Since you're using Laplace noise for the mean function, wouldn't using this for the scale of the distribution be better? As the noise would be less but the guarantees would be the same. Moreover, it shouldn't be computationally expensive as (I think) the algorithm computes the mean anyway.
Hi @Shoeboxam,
Progressing with my benchmark, I have also tested the libraries with real datasets. 73 epsilons, 500 runs per epsilon for statistical queries: count, sum, mean, var.
You may find up a histogram of the dataset used, here is the link: https://archive.ics.uci.edu/ml/datasets/Adult
We chose the numerical attributes age and hours-per-week for the queries.
As you may observe in the other two pictures below, there is a weird behavior with SmartNoise when epsilon equals 10 onwards. This problem is prevalent in sum, mean, and var.
However, we run another test with this dataset: https://archive.ics.uci.edu/ml/datasets/Student+Performance
We chose absence days and final grade, and the results were as expected, no weird behaviour.
smartnoise-samples
repo and offered and issue and PR.