Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community

    Yes, floating point protection is enabled by default. It might be more correct to call it "floating-point-safe", since it's not a general-purpose protection method, but instead uses whichever mechanism is floating-point safe. In other words, the floating point problems are related to the mechanism being used. The Laplace mechanism (which transforms uniform random noise using floating point operations to get a Laplace distribution), was the first mechanism where these issues were discovered. But we have subsequently learned there are also issues with Gaussian and Exponential, at least. The snapping mechanism modifies Laplace mechanism; there are some proposals for how to protect Gaussian mechanism, using different techniques. The proposal linked earlier is an entirely different technique that changes the exponential mechanism to be safe.

    I should also mention that the known impact of the floating point issues varies depending on the mechanism. The issue with Laplace mechanism can lead to catastrophic privacy loss in some cases, not frequently, but not very rarely. So, depending on your threat model, you probably care about that. It's not obvious (to me at least) how bad the issues in other mechanisms are. In any case, because differential privacy is about providing rigorous assurances, we want to make sure we are creating and deploying defaults that are safe.

    The simple geometric mechanism is considered safe, since it's not using complex floating point operations to transform the distribution. You could also use the snapping mechanism for counts, but this would only be necessary if you had a preference for the behavior of the Laplace distribution. Note that you would still need to use snapping mechanism for Laplace counts, though, since simply truncating them to the nearest integer boundary does not prevent the attack. In our tests, simple geometric tends to work better for counts, though, so we default to that for counts (and because it's considered floating point safe). I think it's a good idea to test based on your scenario, though, since one or the other may give better results.

    Hello! I've been testing out smartnoise-sdk lately. What I would like to do is to assess the accuracy of the noisy output before executing the private query (maybe by estimating the standard deviation), assuming epsilon and delta are fixed. Is there a way to do so? Either via sdk calls itself of by doing it "by hand"; the problem with the latter is that I don't know which distribution will be used by the library when executing the private query and with what parameters. I hope my question made sense. Cheers!
    2 replies

    Hi. I have a question about the L1 bound for the mean function in the ADD/DROP ONE scenario. A tighter bound can be obtained on the local L1 sensitivity. Given a database X with n entries, with lower and upper bounds m and M on all possible entries, the bound = max( |mean(X) - M|, |mean(X) - m| ) / n should hold.

    Since you're using Laplace noise for the mean function, wouldn't using this for the scale of the distribution be better? As the noise would be less but the guarantees would be the same. Moreover, it shouldn't be computationally expensive as (I think) the algorithm computes the mean anyway.

    4 replies
    Gonzalo Munilla Garrido
    Screenshot 2021-02-25 at 18.11.03.png

    Hi @Shoeboxam,
    Progressing with my benchmark, I have also tested the libraries with real datasets. 73 epsilons, 500 runs per epsilon for statistical queries: count, sum, mean, var.

    You may find up a histogram of the dataset used, here is the link: https://archive.ics.uci.edu/ml/datasets/Adult
    We chose the numerical attributes age and hours-per-week for the queries.

    As you may observe in the other two pictures below, there is a weird behavior with SmartNoise when epsilon equals 10 onwards. This problem is prevalent in sum, mean, and var.

    However, we run another test with this dataset: https://archive.ics.uci.edu/ml/datasets/Student+Performance
    We chose absence days and final grade, and the results were as expected, no weird behaviour.

    Screenshot 2021-02-25 at 18.12.05.png
    Screenshot 2021-02-25 at 18.10.42.png
    Those are for the mean and variance, the sum behaves in the same manner.
    If there is some sort of safety mechanism for an epsilon higher than 10, then I do not understand how is it possible that for the other datasets it is not shown.
    Like here, no weird behavior:
    Screenshot 2021-02-26 at 16.48.11.png
    Gonzalo Munilla Garrido
    Hi! Unfortunately, the issue I have previously posed has not been answered. Is it something that cannot be checked?
    Michael Shoemate
    @gonzalo-munillag Sorry for the delay here. I downloaded Adult and tried to replicate your results. I'm still missing some details, because my lines remain linear for variance and mean queries on age in Adult, regardless of my choice of mechanism (Snapping, Laplace, or AnalyticGaussian, of which Snapping is the default). If you could provide some more info (or even a script), I'd be happy to take a closer look. I've attached my replication attempt below.
    2 replies
    Michael Shoemate
    Gonzalo Munilla Garrido
    Hello Micahel, thank you so much for taking the time, it is an amazing dedicated response.
    We have executed your code and indeed we get the same results; however, if you calculate the std of the scaled error instead of the std of the errir, then you obtain the behavior we show in our plots. Shortly we will get back to you with more details to continue the discussion. Cheers
    Hey folks, thanks for all the great work on smart noise.
    We're doing a few experiments at work and are really interesting in what you've put together
    I've spotted a few bits of READMEs that have slipped out of date in the smartnoise-samples repo and offered and issue and PR.
    Let me know if you'd like me to keep going if we spot more stuff as we step through things
    Amanjeev Sethi
    hello there. is this the right place to ask to contribute to the codebase? i see a few good-first issues and wonder whats the process to work on them while avoiding duplication of work.
    Michael Shoemate

    @amanjeev We would really appreciate your contributions to the OpenDP library! We are working on replacing SmartNoise-Core with the OpenDP library: https://github.com/opendp/opendp. A few things this library does better- there is a more elegant representation of differential privacy via relations, it removes the protobuf layer (a source of significant serialization overhead), and has generic implementations of components (now called transformations and measurements). There is an early contributor guide here: http://test-docs.opendp.org/contrib/python.html.

    We're open to PRs for anything, but you may find implementing new transformations in Rust the easiest place to start. You probably have a better sense of what specific transformations/measurements you are needing. I'd be happy to provide mentoring/guidance here.

    Amanjeev Sethi
    @Shoeboxam thank you!
    I feel I will start with a couple of small issues already existing to learn the codebase
    Amanjeev Sethi
    What happened to the docs? http://test-docs.opendp.org/
    I guess moved to https://docs.opendp.org/ ?
    oh no, those are older SmartNoise docs it seems
    ugh I dont know
    Philip Durbin
    @amanjeev hi! Sorry for the confusion. Recently, we dropped "test-" from the URL. There are some docs for SmartNoise linked from to top of the docs site but you're saying they're old? You are very welcome to open an issue here if you'd like: https://github.com/opendp/opendp-documentation/issues
    Raman Prasad
    Hello All, We are retiring this Gitter room and moving our conversations to GitHub Discussions. Please join us there: https://github.com/opendp/opendp/discussions