Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
    Hi. For the dp_sum function, is mechanism='Snapping' allowed?
    According to https://opendifferentialprivacy.github.io/smartnoise-core-python/opendp.smartnoise.core.components.html?highlight=dp_sum#opendp.smartnoise.core.components.dp_sum, mechanism – Privatizing mechanism to use. One of [Automatic, Laplace, Gaussian, AnalyticGaussian, SimpleGeometric].
    1 reply
    Yiming (Paul) Li
    Hello. Is there a way to extract the sql query rewritten by smartnoise? For example, some function or method that works like the following: rewritten_sql = func("select sum(var1) from table group by var2"). Thanks!
    2 replies
    Hello. I am having a hard time finding documentation on setting the privacy budget. Can someone assist?
    2 replies
    Gonzalo Munilla Garrido
    Hi @joshua-oss :) I am conducting a benchmark on different DP libs. I have two open points that I am unable to unravel. Perhaps you may be so kind to provide your insights.
    1) I have tried the sum and mean query with Smartnoise and another two libs, and despite the fact that they use a different underlying implementation, the results are the same.
    What you see below is the sample std of the scaled error of 500 experiments performed for each epsilon for the Mean query. The dataset on which it was executed has a size of 10000, an std of 250 and a skewness of 5 (Gaussian).
    What is your intuition about these results?
    2) I did the same for the Count query, and no matter the value of the 3 parameters, the libraries (including Smartnoise) drop at a high value of epsilon:
    What might be the reason behind this drop in Smart noise? I think your outputs are rounded and thus at a high value of epsilon, it is hard to get a value outside the 0.5 decimal range to output a further value form the true one. What do you think?
    Thank you in advance for your time :)
    27 replies
    Gonzalo Munilla Garrido
    Gonzalo Munilla Garrido

    Hi :)
    If you may be so kind, I have a further question about floating-point protection. Mironov invented the Snapping mechanism to protect against an attack of that kind, but is the Snapping mechanism something you can include in any mechanism like the geometric or the gaussian or the laplace, or is it a stand-alone mechanism in and of itself?

    I think it is the former but

    Let me elaborate, on the DPCount found in https://opendifferentialprivacy.github.io/smartnoise-core/doc/smartnoise_validator/proto/struct.DpCount.html:
    "From Privatizing mechanism to use. One of [SimpleGeometric, Laplace, Snapping, Gaussian, AnalyticGaussian]. Only SimpleGeometric is accepted if floating-point protections are enabled."

    But if floating-point protection is enabled, then it means you are using the snapping mechanism, but the Snapping mechanism is one of the mechanism you could choose instead of the Geometric one.

    And it is the one that aparently is chosen as default.
    So reading that from the documentation is conflicting, but perhaps I was mistaken in my prior assessment.
    Gonzalo Munilla Garrido
    And, is floating-point protection enabled by default?
    From a previous conversation Jan 26 22:00: "Note that SmartNoise can use any one of several mechanisms on count: SimpleGeometric, Laplace, Snapping, Gaussian, or AnalyticGaussian. It defaults to Simple Geometric"
    If it defaults to Geometric, then looking at the docs, it means that the floating-point protection does not need to be set, it is by default enabled.
    In our implementation, we have not set it.
    Gonzalo Munilla Garrido
    Also, because counts are integers, then there is no need to use the snapping mech. However, in the docs the protection is specified as if ints were vulnerable

    Yes, floating point protection is enabled by default. It might be more correct to call it "floating-point-safe", since it's not a general-purpose protection method, but instead uses whichever mechanism is floating-point safe. In other words, the floating point problems are related to the mechanism being used. The Laplace mechanism (which transforms uniform random noise using floating point operations to get a Laplace distribution), was the first mechanism where these issues were discovered. But we have subsequently learned there are also issues with Gaussian and Exponential, at least. The snapping mechanism modifies Laplace mechanism; there are some proposals for how to protect Gaussian mechanism, using different techniques. The proposal linked earlier is an entirely different technique that changes the exponential mechanism to be safe.

    I should also mention that the known impact of the floating point issues varies depending on the mechanism. The issue with Laplace mechanism can lead to catastrophic privacy loss in some cases, not frequently, but not very rarely. So, depending on your threat model, you probably care about that. It's not obvious (to me at least) how bad the issues in other mechanisms are. In any case, because differential privacy is about providing rigorous assurances, we want to make sure we are creating and deploying defaults that are safe.

    The simple geometric mechanism is considered safe, since it's not using complex floating point operations to transform the distribution. You could also use the snapping mechanism for counts, but this would only be necessary if you had a preference for the behavior of the Laplace distribution. Note that you would still need to use snapping mechanism for Laplace counts, though, since simply truncating them to the nearest integer boundary does not prevent the attack. In our tests, simple geometric tends to work better for counts, though, so we default to that for counts (and because it's considered floating point safe). I think it's a good idea to test based on your scenario, though, since one or the other may give better results.

    Hello! I've been testing out smartnoise-sdk lately. What I would like to do is to assess the accuracy of the noisy output before executing the private query (maybe by estimating the standard deviation), assuming epsilon and delta are fixed. Is there a way to do so? Either via sdk calls itself of by doing it "by hand"; the problem with the latter is that I don't know which distribution will be used by the library when executing the private query and with what parameters. I hope my question made sense. Cheers!
    2 replies

    Hi. I have a question about the L1 bound for the mean function in the ADD/DROP ONE scenario. A tighter bound can be obtained on the local L1 sensitivity. Given a database X with n entries, with lower and upper bounds m and M on all possible entries, the bound = max( |mean(X) - M|, |mean(X) - m| ) / n should hold.

    Since you're using Laplace noise for the mean function, wouldn't using this for the scale of the distribution be better? As the noise would be less but the guarantees would be the same. Moreover, it shouldn't be computationally expensive as (I think) the algorithm computes the mean anyway.

    4 replies
    Gonzalo Munilla Garrido
    Screenshot 2021-02-25 at 18.11.03.png

    Hi @Shoeboxam,
    Progressing with my benchmark, I have also tested the libraries with real datasets. 73 epsilons, 500 runs per epsilon for statistical queries: count, sum, mean, var.

    You may find up a histogram of the dataset used, here is the link: https://archive.ics.uci.edu/ml/datasets/Adult
    We chose the numerical attributes age and hours-per-week for the queries.

    As you may observe in the other two pictures below, there is a weird behavior with SmartNoise when epsilon equals 10 onwards. This problem is prevalent in sum, mean, and var.

    However, we run another test with this dataset: https://archive.ics.uci.edu/ml/datasets/Student+Performance
    We chose absence days and final grade, and the results were as expected, no weird behaviour.

    Screenshot 2021-02-25 at 18.12.05.png
    Screenshot 2021-02-25 at 18.10.42.png
    Those are for the mean and variance, the sum behaves in the same manner.
    If there is some sort of safety mechanism for an epsilon higher than 10, then I do not understand how is it possible that for the other datasets it is not shown.
    Like here, no weird behavior:
    Screenshot 2021-02-26 at 16.48.11.png
    Gonzalo Munilla Garrido
    Hi! Unfortunately, the issue I have previously posed has not been answered. Is it something that cannot be checked?
    Michael Shoemate
    @gonzalo-munillag Sorry for the delay here. I downloaded Adult and tried to replicate your results. I'm still missing some details, because my lines remain linear for variance and mean queries on age in Adult, regardless of my choice of mechanism (Snapping, Laplace, or AnalyticGaussian, of which Snapping is the default). If you could provide some more info (or even a script), I'd be happy to take a closer look. I've attached my replication attempt below.
    2 replies
    Michael Shoemate
    Gonzalo Munilla Garrido
    Hello Micahel, thank you so much for taking the time, it is an amazing dedicated response.
    We have executed your code and indeed we get the same results; however, if you calculate the std of the scaled error instead of the std of the errir, then you obtain the behavior we show in our plots. Shortly we will get back to you with more details to continue the discussion. Cheers
    Hey folks, thanks for all the great work on smart noise.
    We're doing a few experiments at work and are really interesting in what you've put together
    I've spotted a few bits of READMEs that have slipped out of date in the smartnoise-samples repo and offered and issue and PR.
    Let me know if you'd like me to keep going if we spot more stuff as we step through things