Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
    Hello! I've been testing out smartnoise-sdk lately. What I would like to do is to assess the accuracy of the noisy output before executing the private query (maybe by estimating the standard deviation), assuming epsilon and delta are fixed. Is there a way to do so? Either via sdk calls itself of by doing it "by hand"; the problem with the latter is that I don't know which distribution will be used by the library when executing the private query and with what parameters. I hope my question made sense. Cheers!
    2 replies

    Hi. I have a question about the L1 bound for the mean function in the ADD/DROP ONE scenario. A tighter bound can be obtained on the local L1 sensitivity. Given a database X with n entries, with lower and upper bounds m and M on all possible entries, the bound = max( |mean(X) - M|, |mean(X) - m| ) / n should hold.

    Since you're using Laplace noise for the mean function, wouldn't using this for the scale of the distribution be better? As the noise would be less but the guarantees would be the same. Moreover, it shouldn't be computationally expensive as (I think) the algorithm computes the mean anyway.

    4 replies
    Gonzalo Munilla Garrido
    Screenshot 2021-02-25 at 18.11.03.png

    Hi @Shoeboxam,
    Progressing with my benchmark, I have also tested the libraries with real datasets. 73 epsilons, 500 runs per epsilon for statistical queries: count, sum, mean, var.

    You may find up a histogram of the dataset used, here is the link: https://archive.ics.uci.edu/ml/datasets/Adult
    We chose the numerical attributes age and hours-per-week for the queries.

    As you may observe in the other two pictures below, there is a weird behavior with SmartNoise when epsilon equals 10 onwards. This problem is prevalent in sum, mean, and var.

    However, we run another test with this dataset: https://archive.ics.uci.edu/ml/datasets/Student+Performance
    We chose absence days and final grade, and the results were as expected, no weird behaviour.

    Screenshot 2021-02-25 at 18.12.05.png
    Screenshot 2021-02-25 at 18.10.42.png
    Those are for the mean and variance, the sum behaves in the same manner.
    If there is some sort of safety mechanism for an epsilon higher than 10, then I do not understand how is it possible that for the other datasets it is not shown.
    Like here, no weird behavior:
    Screenshot 2021-02-26 at 16.48.11.png
    Gonzalo Munilla Garrido
    Hi! Unfortunately, the issue I have previously posed has not been answered. Is it something that cannot be checked?
    Michael Shoemate
    @gonzalo-munillag Sorry for the delay here. I downloaded Adult and tried to replicate your results. I'm still missing some details, because my lines remain linear for variance and mean queries on age in Adult, regardless of my choice of mechanism (Snapping, Laplace, or AnalyticGaussian, of which Snapping is the default). If you could provide some more info (or even a script), I'd be happy to take a closer look. I've attached my replication attempt below.
    2 replies
    Michael Shoemate
    Gonzalo Munilla Garrido
    Hello Micahel, thank you so much for taking the time, it is an amazing dedicated response.
    We have executed your code and indeed we get the same results; however, if you calculate the std of the scaled error instead of the std of the errir, then you obtain the behavior we show in our plots. Shortly we will get back to you with more details to continue the discussion. Cheers
    Hey folks, thanks for all the great work on smart noise.
    We're doing a few experiments at work and are really interesting in what you've put together
    I've spotted a few bits of READMEs that have slipped out of date in the smartnoise-samples repo and offered and issue and PR.
    Let me know if you'd like me to keep going if we spot more stuff as we step through things
    Amanjeev Sethi
    hello there. is this the right place to ask to contribute to the codebase? i see a few good-first issues and wonder whats the process to work on them while avoiding duplication of work.
    Michael Shoemate

    @amanjeev We would really appreciate your contributions to the OpenDP library! We are working on replacing SmartNoise-Core with the OpenDP library: https://github.com/opendp/opendp. A few things this library does better- there is a more elegant representation of differential privacy via relations, it removes the protobuf layer (a source of significant serialization overhead), and has generic implementations of components (now called transformations and measurements). There is an early contributor guide here: http://test-docs.opendp.org/contrib/python.html.

    We're open to PRs for anything, but you may find implementing new transformations in Rust the easiest place to start. You probably have a better sense of what specific transformations/measurements you are needing. I'd be happy to provide mentoring/guidance here.

    Amanjeev Sethi
    @Shoeboxam thank you!
    I feel I will start with a couple of small issues already existing to learn the codebase
    Amanjeev Sethi
    What happened to the docs? http://test-docs.opendp.org/
    I guess moved to https://docs.opendp.org/ ?
    oh no, those are older SmartNoise docs it seems
    ugh I dont know
    Philip Durbin
    @amanjeev hi! Sorry for the confusion. Recently, we dropped "test-" from the URL. There are some docs for SmartNoise linked from to top of the docs site but you're saying they're old? You are very welcome to open an issue here if you'd like: https://github.com/opendp/opendp-documentation/issues
    Raman Prasad
    Hello All, We are retiring this Gitter room and moving our conversations to GitHub Discussions. Please join us there: https://github.com/opendp/opendp/discussions