These are chat archives for libmir/public

28th
Apr 2016
John Colvin
@John-Colvin
Apr 28 2016 11:31
OK, turns out I don’t understand MCMC, but if you replace MCMC with Random Walk then the points still hold I think
John Colvin
@John-Colvin
Apr 28 2016 11:49
hmm, OK, so I think I’m gonna have to backtrack from all that, sorry for the noise
Edwin van Leeuwen
@BlackEdder
Apr 28 2016 11:53
Just wanted to point out that while most talk here seems to be aimed at the random number generator, but most non uniform random number generators actually depend on uniform!(0.0,1.0). So maybe we should focus more on that.
John Colvin
@John-Colvin
Apr 28 2016 11:54
Someone just pointed out to me that one of the safest ways is just to have global singletons!
Sebastian Wilzbach
@wilzbach
Apr 28 2016 11:55
@John-Colvin non-copyable?
John Colvin
@John-Colvin
Apr 28 2016 11:55
only one in existence per thread, so yes, non-copyable
Sebastian Wilzbach
@wilzbach
Apr 28 2016 11:55
(rndGen is already a global singelton)
Hmm while this might be nice for most of the users and we should could/should use the singleton by default, we still should allow a user to pass in his own random device
John Colvin
@John-Colvin
Apr 28 2016 11:57
pass a reference to a global singleton of that device?
I don’t like a global singleton really, because it’s a pain for reseeding (if it’s reseeded for one call in the depths of some library, it’s also reseeded for everyone else, which seems unwanted and potentially bad).
Sebastian Wilzbach
@wilzbach
Apr 28 2016 11:59
yep you just destroyed your own idea ;-)
Ilya Yaroshenko
@9il
Apr 28 2016 12:00
please remember also about multithreading
Sebastian Wilzbach
@wilzbach
Apr 28 2016 12:01
@9il if it's thread-global, this shouldn't be a huge deal?
Ilya Yaroshenko
@9il
Apr 28 2016 12:03
yes, if and only if you pass it by reference to thread-local function
if you pass it by reference and then create threads this will be bad
So, passing RNG should be the last step in data generation
or appr. the last
Ilya Yaroshenko
@9il
Apr 28 2016 12:13
btw, low level API for non-uniform RNGs should not have range API
John Colvin
@John-Colvin
Apr 28 2016 12:13
@9il why not?
Ilya Yaroshenko
@9il
Apr 28 2016 12:14
Because we can build single connector to range API. And create RNGs with minimal state. Range are high level API
this(RNG)(<other params>, RNG rng = rndGen()) and opCall(RNG)(RNG rng = rndGen())
Sebastian Wilzbach
@wilzbach
Apr 28 2016 12:16
why rndGen() and not rndGen ?
John Colvin
@John-Colvin
Apr 28 2016 12:16
In what circumstances would making it a range from the bottom up not be able to work? (assuming we work out the problems and make the range API work at any level)
Ilya Yaroshenko
@9il
Apr 28 2016 12:17

why rndGen() and not rndGen ?

can be just rndGen thought

Ilya Yaroshenko
@9il
Apr 28 2016 12:23

In what circumstances would making it a range from the bottom up not be able to work? (assuming we work out the problems and make the range API work at any level)

This is less flexible.

  1. For example I may have a crazyNURNG and I want to test it with different devices RNGs to choose a better one. So I can call opCall with different RNGs without constructing crazyNURNG each time.
  2. Main. opCall approauch has not refence/pointer/value Range problem!
If range based API cause problems we should have ability to work without that API. We still will be able to use it through a simple connector
Range problem seems to be less important for non-uniform package becuase we can think about that in the end of GSoC
John Colvin
@John-Colvin
Apr 28 2016 12:30
It seems to me like the underlying RNG could form an important part of construction, no? Do we really want to have to define RNGs such that their underlying RNG can be different for every call?
Ilya Yaroshenko
@9il
Apr 28 2016 12:31
My approuch is to have not underlying RNG at all
John Colvin
@John-Colvin
Apr 28 2016 12:32
in this: this(RNG)(<other params>, RNG rng = rndGen()) I mean rng is the underlying RNG
Ilya Yaroshenko
@9il
Apr 28 2016 12:32
This would be used only by few generators that requires random number for construciton
95% don’t need RNG for constractor
John Colvin
@John-Colvin
Apr 28 2016 12:33
ok, same for opCall(RNG)(RNG rng = rndGen())
Ilya Yaroshenko
@9il
Apr 28 2016 12:33
this(RNG)(<other params>, scope ref RNG rng = rndGen())
opCall(RNG)(scope ref RNG rng = rndGen())
i have added scope ref to show that structure never holds the rng
Ilya Yaroshenko
@9il
Apr 28 2016 12:39
Yes, this looks strange. But I have not found better API solution. Whatever we will want to do in the future we will be able to do it with opCall API
John Colvin
@John-Colvin
Apr 28 2016 12:56
Potentially of interest: http://xorshift.di.unimi.it/
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 19:39

Because we can build single connector to range API. And create RNGs with minimal state. Range are high level API

Just to note, there's no substantial conceptual difference between RNG-as-functor, versus RNG as input range

Sebastian Wilzbach
@wilzbach
Apr 28 2016 19:41
@WebDrake the range has to save the pointer/reference - a functor not
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 19:41
@wilzbach I'm not talking about the entities that wrap RNGs
Sebastian Wilzbach
@wilzbach
Apr 28 2016 19:42
Yep otherwise I agree with you. A random generator is an infinite input range
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 19:44

But anyway, I would like to have a minimal example of what @9il is discussing when he says:

passing RNG should be the last step in data generation

I think this is an intriguing idea but I wonder how readily it fits with existing range-chaining designs
I mean, presumably the idea is that you would do something like
(iota(100).randomSample(10).filter!(a => a%2 == 0))(rng).writeln;
... ?
(Parentheses added to clarify)

BTW I should add that this remark:

95% don’t need RNG for constractor

... I think is important: I came to the conclusion a while ago that doing any RNG calls in the constructor is a bad idea, because it completely messes with the idea of lazy evaluation of a random sequence

Sebastian Wilzbach
@wilzbach
Apr 28 2016 19:49
@WebDrake I am not sure whether this works, how would the filter function that we can't modify pass the rndGen to randomSample?
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 19:50
I'm not making this as a proposal, I have the same concerns
I'm asking if that is the concept

Because I don't understand what @9il wants when he says:

passing RNG should be the last step in data generation

Or rather, I don't see how he envisions the API
Sebastian Wilzbach
@wilzbach
Apr 28 2016 19:53
Let's wait for him to clarify. What happened to the unique pointer idea?
Ilya Yaroshenko
@9il
Apr 28 2016 20:07
@WebDrake I just mean that we should not have variables for any random ranges except common RNGs.
So we should define for a user that this kind of code auto sample = iota(100).randomSample(10, rng).filter!(a => a%2 == 0) is bad style
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 20:09
I think that will break any ability to have random ranges in range chains

Because even if you do the "good" version:

iota(100).randomSample(10, rng).filter!(a => a%2 == 0).writeln;

... the output of randomSample is still being stored as a variable inside the filter range

Ilya Yaroshenko
@9il
Apr 28 2016 20:10
In the same time iota(100).randomSample(10, rng).filter!(a => a%2 == 0).writeln; is OK
Becuase this is simple chain without multithreading
In addition randomSample must be only input range, but not forward range
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 20:13
But the output of randomSample is stored as a variable inside the filter range
Unavoidably, so far as I can see
No disagreement that all random ranges must be input ranges
Ilya Yaroshenko
@9il
Apr 28 2016 20:13
I don’t understand why this is wrong. Could you please explain? probably i missing something
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 20:14
Give me a couple of seconds to find the right part of phobos so I can point you to the code
In the case of our range chain above, that means the output of randomSample
Let's look at the internals of filter and how it handles the range it wraps:
https://github.com/dlang/phobos/blob/master/std/algorithm/iteration.d#L1007-L1019
... as you can see it creates an internal variable and assigns to it, and if the range it's given is a value type, it will be copied by value
Now, as it's implemented, being terminated by writeln, this is harmless in practice
Because indeed there's only ever one instance of that RandomSample struct, it's consumed, and out it goes
So I assume that for you, the assignment of a RandomSample to a variable inside filter is an acceptable case of assigning a random range to a variable?
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 20:19
& the problem you have is rather one of whether the programmer actively assigns to a variable?

I'm asking because it wasn't clear to me how strictly you intended your remark

we should not have variables for any random ranges except common RNGs

Ilya Yaroshenko
@9il
Apr 28 2016 20:22
Not strictly at all. If something can be thread local shell over an input range (input range -> input range) then this is save for randomSample
ofcourse without sending the reference to RNG to any other functions
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 20:23
OK, then we're safe with range chains
Ilya Yaroshenko
@9il
Apr 28 2016 20:24
Yes, but only if result of this chain is not a lazy range
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 20:24
Right
Ilya Yaroshenko
@9il
Apr 28 2016 20:24
Becuase if this is lazy, then we store the pointer to a RNG, which is buggy style
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 20:25
I see the logic of your thoughts here, but I have 2 points of concern
Ilya Yaroshenko
@9il
Apr 28 2016 20:25
what points?
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 20:25
One, I would really like a solution that doesn't depend on programmer virtue
I accept that may not be possible ;-)
Second, I think it's a big restriction on things like, for example:
auto normal = normalDistribution(gen);
// ... now pass around your normal-variate generator
On the storing-the-pointer side: I'm not sure it's such a problem if you can use return ref (when the RNG is passed in to the random range) to guarantee that the pointer will remain valid for the lifetime of the range (DIP25 feature if I recall right?)
The bigger problem with storing a random range in a variable is the risk of copy-by-value of the random range's own internal state (not the RNG)
Ilya Yaroshenko
@9il
Apr 28 2016 20:29
  1. One sulution (and I will use this in my code) is not use ranges with RNG at all
  2. I don’t think this is restriction:
    auto normal = normalRFunctor(mu, sigma);
    auto val = normal(); //default rng
    auto val = normal(rngGen); //custom rng

The bigger problem with storing a random range in a variable is the risk of copy-by-value of the random range's own internal state (not the RNG)

We must store only a pointer

Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 20:31
Store only a pointer to what?
Ilya Yaroshenko
@9il
Apr 28 2016 20:31
To thread local RNG
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 20:32
@9il I feel that I have to keep re-emphasizing that random ranges have their own state in addition to the RNG
Accessing the RNG via a stored pointer is necessary but not sufficient to solve the copy-by-value problem of random ranges
Note how much internal state it has in addition to the RNG that it wraps
Ilya Yaroshenko
@9il
Apr 28 2016 20:34
What problen by value problem? We have already decided that result of random chain must be not lazy
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 20:35
I accept that forbidding lazy results is a way to address the copy-by-value problem, but I feel it's restrictive
I don't think it's a good idea to decide this early into the discussion that we can't find a solution that will allow us to have lazy random range variables
Ilya Yaroshenko
@9il
Apr 28 2016 20:36
I did not know about RandomSample from std.random. We should define our random sample with pointer symantic: RefRandomSample
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 20:38
RandomSample should have reference-type semantics, yes, and be an input range
Ilya Yaroshenko
@9il
Apr 28 2016 20:38
My goal is very simple: do not spend time for dances with Range API, because this is not important for non uniform RNG
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 20:39
Yes, if you want to limit the use-case to random distributions as defined in C++ <random>, then functors are sufficient
However, if we're going to target stuff towards phobos ultimately, it would be nice to have something that fits in with the range API
But
I recognize my concerns here are not the focus of the GSoC project
Ilya Yaroshenko
@9il
Apr 28 2016 20:40
I don’t want to limit something. Does functor limit something?
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 20:41
I am not sure it's so straightforward to implement RandomSample via functors, without breaking the ability to nicely include it in range chains
Ilya Yaroshenko
@9il
Apr 28 2016 20:42
???
I mean functors that accepts RNG for each call
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 20:42
The C++ implementations (which were proposed for but not actually included in the standard <random> header) may have an easier time of that because IIRC they store begin/end iterators for the input they are sampling
Yes, I understand what kind of functors you mean
Ilya Yaroshenko
@9il
Apr 28 2016 20:43
So range store functor as internal variable.
And you can do whatever you want
You may have value or ref semantics without problem
All range issues are not about non-uniform RNGs. They are about range API
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 20:45
I'm not sure that simplifies code in the case of including a random sample in a range chain, although I would be interested to see a prototype
There is a much simpler sampling algorithm (Knuth calls it "Algorithm S") that would be trivial to implement -- if I can give you a range-based version, could you perhaps come up with a corresponding functor design?
I suggest this because it simplifies the problem of focusing on the API design instead of the sampling algorithm
Sebastian Wilzbach
@wilzbach
Apr 28 2016 20:48
bernoulli is probably the simplest algorithm to implement ;-)
uint bernoulli(RGen)(double p = 0.5, ref RGen gen = rndGen)
{
    assert(0 <= p && p <= 1, "Invalid probability");
    auto val = uniform(0.0L, 1.0L, gen);
    return cast(uint) (val <= p);
}
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 20:49
@wilzbach wrong kind of sampling, I think ;-)
Sebastian Wilzbach
@wilzbach
Apr 28 2016 20:49
if you want to have a state algorithm to test, go for the Box-muller version of normal dist
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 20:50
@wilzbach see, I want to go for RandomSample because I think a random algorithm makes clearer the problems than a random distribution
Sebastian Wilzbach
@wilzbach
Apr 28 2016 20:50
@WebDrake ok ;-)
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 20:50
Because a random distribution, it's much less likely that people will care about placing it in a range chain
BTW I want to add one thing
I am absolutely fine with the idea of just getting on with creating a functor-based framework that implements a bunch of random distributions
I don't want to derail the focus of the GSoC project, and that would be a perfectly valid approach to it
But I think it would be a shame to make design decisions on the basis of not understanding the different complications of range vs. functor based approaches
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 20:55
I think I should add, I spent some time about a year ago toying with a functor-based design, because it seemed at first to simplify things a lot when it came to RNGs and random distributions (i.e. non-uniform wrappers of uniform RNGs)
But it stopped being pleasantly simple once I started trying to extend the principle to RandomSample and RandomCover
@9il that's why I suggest trying out your own prototype, because either you will find a nice design that eluded me, or you'll realize what problems I ran into (or maybe both) ;-)
Conversely, if we find a good design solution for RandomSample, it's likely the same principles will work also for uniform RNGs and (non-uniform) random distributions
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 21:00
One last thing
About Unique
It currently breaks down if you try to include it in a range chain -- try compiling:
void main()
{
    import std.random : Random;
    import std.range.primitives : isInputRange;
    import std.range : take;
    import std.stdio : writeln;
    import std.typecons : Unique;

    static assert(isInputRange!Random);

    Random gen;
    gen.seed(12345);

    auto ugen = Unique!Random(&gen);  // can alternatively just pass `new Random` here

    static assert(isInputRange!(typeof(gen)));
    static assert(isInputRange!(typeof(ugen)));

    gen.take(10).writeln;
    gen.take(10).writeln;

    ugen.take(10).writeln;
    ugen.take(10).writeln;
}

You get:

uniquerange.d(22): Error: template std.range.take cannot deduce function from argument types !()(MersenneTwisterEngine!(uint, 32LU, 624LU, 397LU, 31LU, 2567483615u, 11LU, 7LU, 2636928640u, 15LU, 4022730752u, 18LU)*), candidates are:
/usr/include/d/std/range/package.d(1890):        std.range.take(R)(R input, size_t n) if (isInputRange!(Unqual!R) && !isInfinite!(Unqual!R) && hasSlicing!(Unqual!R) && !is(R T == Take!T))
/usr/include/d/std/range/package.d(1913):        std.range.take(R)(R input, size_t n) if (is(R T == Take!T))
/usr/include/d/std/range/package.d(1921):        std.range.take(R)(R input, size_t n) if (isInputRange!(Unqual!R) && (isInfinite!(Unqual!R) || !hasSlicing!(Unqual!R) && !is(R T == Take!T)))

I haven't had time/focus to look into this, but I suspect it's a problem with how Unique forwards the range primitives, and if I had to make a guess, I'd suspect empty is the culprit, because it is an enum rather than a function in MersenneTwister

You'll probably also find it doesn't forward the isUniformRandom enum flag correctly
Joseph Rushton Wakeling
@WebDrake
Apr 28 2016 21:06
But those are implementation problems it ought to be possible to solve