Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Jeremy Coyle
    @jeremyrcoyle
    this applies to things like stack, pipeline, and cv
    Nima Hejazi
    @nhejazi
    yea, I agree that the crossproduct and magrittr like operators fall into docs
    hmm, is the separate learners things brought on my dask?
    or was this an outstanding issue already?
    Jeremy Coyle
    @jeremyrcoyle
    it's related to how to parallelize the computation
    this goes back to the issue with henrik's future package and nested futures
    it's an equivalent issue, delayed objects that generate other delayed objects
    Nima Hejazi
    @nhejazi
    right, right -- I see
    Jeremy Coyle
    @jeremyrcoyle
    you could also solve it by recognizing that you're already in a delayed object, pausing the current computation, and adding all the sub computations to the top of the task stack
    Nima Hejazi
    @nhejazi
    oh, well, that makes sense
    Jeremy Coyle
    @jeremyrcoyle
    that recognizing/pausing thing seems like it's pretty tricky to implement
    Nima Hejazi
    @nhejazi
    in terms of recognition, something (a new slot) could be written into the object (or an existing one used) to figure out whether it contains a type "delayed"
    I've no clue how to invoke pausing
    Jeremy Coyle
    @jeremyrcoyle
    let me clarify pausing a bit
    ideally all the workers are busy computing all the time. If one of the tasks that is being computed generates subtasks, there's no workers available to resolve that. Somehow we need to be able identify workers that are technically computing a task but in reality are waiting for their subtasks to be computed so that they can used to compute subtasks
    incidentally, I'm starting to understand why henrik just decided not to support this kind of thing
    it's a real pain
    Nima Hejazi
    @nhejazi
    yea, this sounds like it's going to require a great deal of pain in terms of communication
    a few initial thoughts:
    • is it possible to figure out whether a worker is waiting? as in, if a worker generates subtasks and isn't actually doing anything, how can we evaluate it's current state
    Jeremy Coyle
    @jeremyrcoyle
    right now i'm relying on future to block me if all its workers are busy
    Nima Hejazi
    @nhejazi
    • if figuring out a resting state is possible, then, presumably, a worker that enter a resting state, could change some attribute of its own (e.g., flipping a slot from TRUE to FALSe)
    Jeremy Coyle
    @jeremyrcoyle
    if you set future workers to infinite it won't block
    then we could manually track workers that are working
    and they could signal when they make subtasks if they can recognize thats what they're doing
    Nima Hejazi
    @nhejazi
    ok, cool -- so let's say we stop future from blocking us
    Jeremy Coyle
    @jeremyrcoyle
    I think it's definitely possible
    and it would be really slick if we can make it work
    Nima Hejazi
    @nhejazi
    yea, I like it
    just trying to think about logistics
    is there a maximum number of subtasks that a worker should ever create?
    Jeremy Coyle
    @jeremyrcoyle
    I don't think so
    Nima Hejazi
    @nhejazi
    maybe dependent on the type of learner it is or w/e?
    Jeremy Coyle
    @jeremyrcoyle
    guess it depends if you mean just subtasks or subtasks of subtasks etc
    Nima Hejazi
    @nhejazi
    hmm, what I was thinking was we could just count the number of subtasks spawned and access that from a slot
    Jeremy Coyle
    @jeremyrcoyle
    sure
    certainly I think in all cases it'll be easier to optimize for tasks that don't dynamically generate subtasks at compute time
    so maybe we should make the preferred programming pattern in delayed/sl3 to enumerate your subtasks before compute
    even if we do manage to add support for compute time subtasks (which I think we will still want to do eventually)
    Nima Hejazi
    @nhejazi
    that would make sense
    if we build the full task graph at the very beginning, then it should be easier to optimize resource allocation
    Jeremy Coyle
    @jeremyrcoyle
    yeah
    plus then we get to make pretty pretty pictures
    Nima Hejazi
    @nhejazi
    perfect! pictures always impress
    plus, studying the thing will keep the user busy while the job runs
    ...if for some reason they're just sitting there
    Jeremy Coyle
    @jeremyrcoyle
    don't tell me you've never stared at top
    Nima Hejazi
    @nhejazi
    but yea, I do agree that we should support compute-time tasks -- it's just that thinking about that might take a while
    I can't deny that
    Jeremy Coyle
    @jeremyrcoyle
    okay, i'm going to take a crack at the delayed by default approach today