These are chat archives for twitter/summingbird

10th
Feb 2017
Pankaj Gupta
@pankajroark
Feb 10 2017 00:10
@johnynek I haven’t tried it
P. Oscar Boykin
@johnynek
Feb 10 2017 00:12
it does not work @pankajroark
twitter/summingbird#713
I have a PR, but you will probably need to test it at Twitter
Pankaj Gupta
@pankajroark
Feb 10 2017 00:13
sure, we can try it out
P. Oscar Boykin
@johnynek
Feb 10 2017 00:40
@pankajroark twitter/summingbird#714
I found a fairly minimal fix. I can’t see how that can break you, but if you are abusing mutate in some ways, maybe.
that was added for dubious reasons, and I probably didn’t push back enough on it. There is no way to guess what mutate might be doing, and it forces the API to not change. Can you see where you might be using mutate passed to run and see if it is still needed? If now, I would love to remove it, and instead give people a path to just get an Execution and use FlowListeners which are already supported.
we should have only one way, ideally, to run scalding code.
Pankaj Gupta
@pankajroark
Feb 10 2017 01:02
Sure, I will take a look tomorrow
I’m not that familiar with summingbird scalding platform, might take me a bit to check this. Planning to spend some time on it tomorrow.
P. Oscar Boykin
@johnynek
Feb 10 2017 01:06
okay. Please read the buildFlow method in scalding while you review
the problem is we added more and more scalding features into that method
but summingbird never calls it.
Pankaj Gupta
@pankajroark
Feb 10 2017 01:07
I see, will surely go through that
P. Oscar Boykin
@johnynek
Feb 10 2017 01:07
it just builds the flow with it’s own call.
I had another more ambitious PR, but I realized I could do this.
ideally, we should not be using Flow at all, and we should move the code to using Execution.
but that will probably mean mutate won’t work.
which is why I was asking about that.
Pankaj Gupta
@pankajroark
Feb 10 2017 01:08
So iiuc your concern is that users might have used mutate in unexpected ways?
P. Oscar Boykin
@johnynek
Feb 10 2017 01:12
not users, Twitter
Tianshuo added it for some reason.
i don’t quite recall. Maybe something to do with some earlier version of DAL? where it was going to log paths?
Pankaj Gupta
@pankajroark
Feb 10 2017 01:13
on a cursory look it seems that we’re doing something related to drscalding in mutate
P. Oscar Boykin
@johnynek
Feb 10 2017 01:13
in that case, it is not actually mutating, and a flow listener is fine.
are you mutating, or will a flow listener do?
Pankaj Gupta
@pankajroark
Feb 10 2017 01:13
  scaldPlatform
    .run(state, mode, toRun, {
      f: Flow[_] =>
        tryEnableDrScalding(f)
    })
Pankaj Gupta
@pankajroark
Feb 10 2017 01:14
just listening in this case: flow.enableDrScalding
P. Oscar Boykin
@johnynek
Feb 10 2017 01:14
what does tryEnableDrScalding do?
that is not a method on flow.
Pankaj Gupta
@pankajroark
Feb 10 2017 01:14
actually not sure if that method mutates, let me check
P. Oscar Boykin
@johnynek
Feb 10 2017 01:14
enableDrScalding
Pankaj Gupta
@pankajroark
Feb 10 2017 01:15

implicit class DrScaldingFlow(val flow: Flow[_]) extends AnyVal {
/**

 * Enables Dr. Scalding for this flow.
 * Returns true if successfully enabled, false otherwise
 */
def enableDrScalding: Boolean = {
  defaultListenerOrLogFailure match {
    case l: DrScaldingCascadingListener =>
      flow.addListener(l)
      flow.addStepListener(l)
      true
    case _ => false
  }
}

}

adding listener
P. Oscar Boykin
@johnynek
Feb 10 2017 01:15
okay, so, that can be ported. There is a config API to add listeners
so, we could just use that to add the listener and totally remove the mutate method, right?
Pankaj Gupta
@pankajroark
Feb 10 2017 01:16
I just did a basic search, let me search a bit more. I can’t imagine we call run anywhere else but want to be extra sure.
P. Oscar Boykin
@johnynek
Feb 10 2017 01:17
okay.
how do you dr.scalding with Execution? you must already have code for that?
probably where the flow listeners were added to Config in the first place.
Pankaj Gupta
@pankajroark
Feb 10 2017 01:22
afaict that’s the only usage of run
P. Oscar Boykin
@johnynek
Feb 10 2017 01:22
okay. So, moving this code to Execution still seems like a good move to me.
Pankaj Gupta
@pankajroark
Feb 10 2017 01:22
I’m not sure how dr.scalding is done with Execution. Ben might know.
Trying to find his handle
P. Oscar Boykin
@johnynek
Feb 10 2017 01:23
yes, if you can ask Ben, then in a follow up PR, I can send the Execution fixes.
Pankaj Gupta
@pankajroark
Feb 10 2017 01:23
@benpence ^^
yeah, let me check with him
P. Oscar Boykin
@johnynek
Feb 10 2017 01:24
ok. I have to run. Thanks for looking into this. Not having reducer estimation for these jobs is a pain for us.
we sometimes run HUGE batch ranges, but usually 1 batch at a time.
for that, reducer estimation is really nice.
Pankaj Gupta
@pankajroark
Feb 10 2017 01:25
sure, I’m just checking with him, he does know how it works. I’m asking him to provide that info here.
P. Oscar Boykin
@johnynek
Feb 10 2017 01:25
ok
Ben Pence
@benpence
Feb 10 2017 01:26
@johnynek Yes, we serialize flowlisteners and flowsteplisteners on the Config and grab the FlowId, etc that way
P. Oscar Boykin
@johnynek
Feb 10 2017 01:26
okay, so it should be easy to use that same code in summingbird if we only use an execution API at the edge, right?
Ben Pence
@benpence
Feb 10 2017 01:27
Sounds like it
P. Oscar Boykin
@johnynek
Feb 10 2017 01:27
cool. Execution all the things.
since Job and Execution share the same flow building (ExecutionContext.buildFlow) we shouldn’t have this problem in the future if we really work hard to only use Execution as the API to use scalding from other code.
(but Execution came after summingbird, so we didn’t use it there)
Pankaj Gupta
@pankajroark
Feb 10 2017 01:29
tbh I’m still illiterate about what the best way to do mutations to config is via Execution, I’ll bug someone about it at a better time :)
P. Oscar Boykin
@johnynek
Feb 10 2017 01:29
.withConfig
is the only way to change config as you go.
Pankaj Gupta
@pankajroark
Feb 10 2017 01:29
that just changes the config for one execution, how can that mutation be passed to next execution chained via flatMap?
P. Oscar Boykin
@johnynek
Feb 10 2017 01:30
you put the chain inside the withConfig:
Pankaj Gupta
@pankajroark
Feb 10 2017 01:30
so then at what point does the config get modified in the chain?
P. Oscar Boykin
@johnynek
Feb 10 2017 01:30
myExecution.flatMap { v => Execution.withConfig(someBigChain(v)) (_ + (“mykey” -> “value”))
it only applies to the thing passed in.
it is like a scope you control.
Pankaj Gupta
@pankajroark
Feb 10 2017 01:31
so it’s not like state monad then, the mutation is never passed
P. Oscar Boykin
@johnynek
Feb 10 2017 01:31
right. It is not like the state monad.
(I mean, it is sort of like the state monad), but it is more about delimiting where you want config to be in play.
it is not modeling some mutable variable. It is modeling overlapping regions
Pankaj Gupta
@pankajroark
Feb 10 2017 01:33
got it, I guess we don’t have cases where this will be limiting
i.e. a chain of executions where config changes need to be passed along
P. Oscar Boykin
@johnynek
Feb 10 2017 01:33
I don’t think so. And in many ways it is clearer, because it does not mutate what comes after
Pankaj Gupta
@pankajroark
Feb 10 2017 01:33
cool
P. Oscar Boykin
@johnynek
Feb 10 2017 01:34
okay, so after we merge my current PR, I will send the next more aggressive one which removes mutate
I want to take baby steps in case we need to revert
the current one should be safe, but also fix the bug.
Pankaj Gupta
@pankajroark
Feb 10 2017 01:34
makes sense, definitely prefer that
P. Oscar Boykin
@johnynek
Feb 10 2017 01:34
okay
gotta run.
Pankaj Gupta
@pankajroark
Feb 10 2017 01:37
so to understand the impact of the small PR correctly, we’ll need to move over the drscalding code to use Execution specific stuff right?
P. Oscar Boykin
@johnynek
Feb 10 2017 01:37
this current PR should just be purely better
the next PR means we will have to change your code to use the Execution approach for Dr. scalding
Pankaj Gupta
@pankajroark
Feb 10 2017 01:37
so no work on our end for this one?
P. Oscar Boykin
@johnynek
Feb 10 2017 01:37
no, should just work
Pankaj Gupta
@pankajroark
Feb 10 2017 01:37
cool, thanks
I will take a look at the PR a bit too when I get a chance.
P. Oscar Boykin
@johnynek
Feb 10 2017 01:38
thank you for looking right away to get to the bottom of an approach that improves matters here!
Pankaj Gupta
@pankajroark
Feb 10 2017 01:39
np, happy to help