Hi! Is there a way to interrupt map, parallel_map, and concurrent_map when a certain condition becomes True? Say, I need to calculate some expensive stuff until I find the result that satisfies a certain criteria. Because it's expensive, I'd like to use parallel_map. Unfortunately, maps, takewhile, and dropwhile consume the whole iterable they receive, so I don't know how to terminate the computation.
Ideally, I need something like this: count() |> parallel_map$(expensive_calculation) |> until(criteria_satisfied)
or count() |> parallel_map$(expensive_calculation) |> while(criteria_not_satisfied)
.
I thought count() |> parallel_map$(expensive_calculation) |> dropwhile(criteria_not_satisfied)
would do the trick, but it doesn't—the iteration doesn't stop after the first match.
until
function might look as simple as that:def until(condition, iterable):
for i in iterable:
if condition(i):
return i
map
but not with parallel_map
, because the process pool needs to be terminated explicitly. Otherwise, only one fork is stopped whereas the others continue working. I don't know what's the best way to terminate the pool keep it "coconutish."
BTW I noticed that adding pattern matching to function definition slows down the program dramatically. Compare:
def get_hash(number):
return md5(b"%b%d" % (SECRET, number)).hexdigest()
def until(condition, iterable):
for i in iterable:
if condition(i):
return i
def criteria(hash) = hash.startswith("000000")
result = count() |> map$(get_hash) |> until$(criteria)
def get_hash(number):
return (number, md5(b"%b%d" % (SECRET, number)).hexdigest())
def until(condition, iterable):
for i in iterable:
if condition(i):
return i
def criteria((_, hash)) = hash.startswith("000000")
result = count() |> map$(get_hash) |> until$(criteria)
In the first case, get_hash
returns just the hash, and criteria
check it. It completes in 7 sec.
In the second case, get_hash
returns a tuple, and criteria
unpacks it. It completes in 17 sec. Huge performance regression.
@moigagoo For your until
function, the best idiom there would be
count() |> parallel_map$(expensive_calculation) |> dropwhile$(criteria_not_satisfied) |> ((it) -> it$[0])
or
def first_satisfying(condition, iterable) = dropwhile((not)..condition, iterable)$[0]
count() |> parallel_map$(expensive_calculation) |> first_satisfying$(criteria_satisfied)
however, I agree that there's a problem with this, namely that parallel_map never completes, and thus never closes all of its processes. That's an oversight on my part, and I'll try to look into ways of fixing it.
For the pattern-matching function slowdown, I'm really not sure what's happening there. Using a pattern-matching function will add a bit of overhead, but it shouldn't be nearly that much. I have a couple of ideas of what might be causing this and I'm going to try to look into them and see what I can do.
import as
statement in an exec
in a unit test does not work, but when it's not in a unit test it works?
def test_exec(self):
expr = 'a = 1 + 1'
code = convenience.parse(expr)
print(code)
eval(code)
print(a)
raises NameError: name '_coconut_sys' is not defined
@Boscillator The problem is that the header needs to be executed at the global level. You can import the header at the top with from coconut.__coconut__ import *
(or equivalently exec(parse(""))
), and if you use "block"
mode, it won't include any header, so you can then execute that in the context of a function. I was able to do:
>>> from coconut.__coconut__ import *
>>> from coconut.convenience import parse
>>> def test():
... exec(parse("a = 1 + 1", "block"))
... return a
...
>>> test()
2
I'll edit the documentation for coconut.convenience
to reflect that you have to do something like this.
git checkout <filename>
should do it
git revert
the bad commit?
git revert
if possible, otherwise just copy-and-paste.