These are chat archives for evhub/coconut

15th
Jul 2016
Constantine Molchanov
@moigagoo
Jul 15 2016 13:20 UTC

Hi! Is there a way to interrupt map, parallel_map, and concurrent_map when a certain condition becomes True? Say, I need to calculate some expensive stuff until I find the result that satisfies a certain criteria. Because it's expensive, I'd like to use parallel_map. Unfortunately, maps, takewhile, and dropwhile consume the whole iterable they receive, so I don't know how to terminate the computation.

Ideally, I need something like this: count() |> parallel_map$(expensive_calculation) |> until(criteria_satisfied) or count() |> parallel_map$(expensive_calculation) |> while(criteria_not_satisfied).

I thought count() |> parallel_map$(expensive_calculation) |> dropwhile(criteria_not_satisfied) would do the trick, but it doesn't—the iteration doesn't stop after the first match.

Constantine Molchanov
@moigagoo
Jul 15 2016 13:31 UTC
In fact, this until function might look as simple as that:
def until(condition, iterable):
    for i in iterable:
        if condition(i):
            return i
It works with map but not with parallel_map, because the process pool needs to be terminated explicitly. Otherwise, only one fork is stopped whereas the others continue working. I don't know what's the best way to terminate the pool keep it "coconutish."
Constantine Molchanov
@moigagoo
Jul 15 2016 13:43 UTC

BTW I noticed that adding pattern matching to function definition slows down the program dramatically. Compare:

def get_hash(number):
    return md5(b"%b%d" % (SECRET, number)).hexdigest()

def until(condition, iterable):
    for i in iterable:
        if condition(i):
            return i

def criteria(hash) = hash.startswith("000000")

result = count() |> map$(get_hash) |> until$(criteria)
def get_hash(number):
    return (number, md5(b"%b%d" % (SECRET, number)).hexdigest())

def until(condition, iterable):
    for i in iterable:
        if condition(i):
            return i

def criteria((_, hash)) = hash.startswith("000000")

result = count() |> map$(get_hash) |> until$(criteria)

In the first case, get_hash returns just the hash, and criteria check it. It completes in 7 sec.

In the second case, get_hash returns a tuple, and criteria unpacks it. It completes in 17 sec. Huge performance regression.

Evan Hubinger
@evhub
Jul 15 2016 17:44 UTC

@moigagoo For your until function, the best idiom there would be

count() |> parallel_map$(expensive_calculation) |> dropwhile$(criteria_not_satisfied) |> ((it) -> it$[0])

or

def first_satisfying(condition, iterable) = dropwhile((not)..condition, iterable)$[0]
count() |> parallel_map$(expensive_calculation) |> first_satisfying$(criteria_satisfied)

however, I agree that there's a problem with this, namely that parallel_map never completes, and thus never closes all of its processes. That's an oversight on my part, and I'll try to look into ways of fixing it.

For the pattern-matching function slowdown, I'm really not sure what's happening there. Using a pattern-matching function will add a bit of overhead, but it shouldn't be nearly that much. I have a couple of ideas of what might be causing this and I'm going to try to look into them and see what I can do.

Boscillator
@Boscillator
Jul 15 2016 18:05 UTC
Trying to write unit test for coconut. Does anyone have any idea why an import as statement in an exec in a unit test does not work, but when it's not in a unit test it works?
Evan Hubinger
@evhub
Jul 15 2016 18:06 UTC
@Boscillator That's weird—I have no idea. Do you have some code I could look at? I might be able to figure out what's going on if I could see the code.
Boscillator
@Boscillator
Jul 15 2016 18:07 UTC
def test_exec(self):
        expr = 'a = 1 + 1'

        code = convenience.parse(expr)
        print(code)
        eval(code)

        print(a)
raises NameError: name '_coconut_sys' is not defined
oops, should be exec
but that still raises the error
Evan Hubinger
@evhub
Jul 15 2016 18:19 UTC

@Boscillator The problem is that the header needs to be executed at the global level. You can import the header at the top with from coconut.__coconut__ import * (or equivalently exec(parse(""))), and if you use "block" mode, it won't include any header, so you can then execute that in the context of a function. I was able to do:

>>> from coconut.__coconut__ import *
>>> from coconut.convenience import parse
>>> def test():
...  exec(parse("a = 1 + 1", "block"))
...  return a
...
>>> test()
2

I'll edit the documentation for coconut.convenience to reflect that you have to do something like this.

Boscillator
@Boscillator
Jul 15 2016 18:20 UTC
thanks
Evan Hubinger
@evhub
Jul 15 2016 18:21 UTC
@moigagoo See #128, #129.
Boscillator
@Boscillator
Jul 15 2016 18:47 UTC
quick git question, how do I reset a file in my fork to what it is in your repo?
Evan Hubinger
@evhub
Jul 15 2016 18:47 UTC
git checkout <filename> should do it
Boscillator
@Boscillator
Jul 15 2016 18:48 UTC
The problem is, the bad version already commited to my repo, I need to reset it to upstream
Evan Hubinger
@evhub
Jul 15 2016 18:50 UTC
oh, I see. can you just git revert the bad commit?
Boscillator
@Boscillator
Jul 15 2016 18:50 UTC
can you reject only one file from a pull request?
Evan Hubinger
@evhub
Jul 15 2016 18:52 UTC
I'd recommend using git revert if possible, otherwise just copy-and-paste.
Boscillator
@Boscillator
Jul 15 2016 18:53 UTC
reverting worked! Thanks so much!
Evan Hubinger
@evhub
Jul 15 2016 18:53 UTC
great! np.