These are chat archives for CommBank/maestro

4th
Mar 2015
Vineeth Varghese
@vineethvarghese
Mar 04 2015 02:27
Can I merge CommBank/answer#2
Stephan Hoermann
@stephanh
Mar 04 2015 02:29
No objections from me.
Stephan Hoermann
@stephanh
Mar 04 2015 03:26
@SamRoberts I don't think there is an empty for the Hive, DB or HDFS monad which means we can't use MonadPlus.
Stephan Hoermann
@stephanh
Mar 04 2015 03:34
I guess I could just dummy up a Result failure and use that.
Sam Roberts
@SamRoberts
Mar 04 2015 03:48
@stephanh yeah, that's why I didn't make a MonadPlus instance back when I was adding those guard, prevent, mandatory, etc. methods. You can still make a Plus instance, though. Also, seeing as we have filter in Execution using a dummy failure anyway, I wonder if we might as well just use a dummy failure in the Result-like monads too.
Stephan Hoermann
@stephanh
Mar 04 2015 03:50
I don't like the dummy failure approach for filter in Execution.
Sam Roberts
@SamRoberts
Mar 04 2015 03:51
you've got to think carefully about whether they will obey the identity laws too
so anyway, given the circumstance, I don't mind if we don't give an Empty instance
Stephan Hoermann
@stephanh
Mar 04 2015 03:53
What would it take for you to satisfy yourself that there isn't another typeclass we should use for Result or ResultMonadOps?
Sam Roberts
@SamRoberts
Mar 04 2015 03:56
Well, my unproductive grumbling is more based around the fact that the original Hdfs monad should have been designed so it could be an instance of these extra typeclasses. But I think you can make it an instance of Plus without any problems.
Luke Williams
@shmookey
Mar 04 2015 04:03
@stephanh guessing we want to avoid generating class files for every mapping in a transform? :P
Stephan Hoermann
@stephanh
Mar 04 2015 04:09
@shmookey that would be ideal.
@SamRoberts added Plus to result CommBank/omnitool#13
Sam Roberts
@SamRoberts
Mar 04 2015 04:11
cool :)
Stephan Hoermann
@stephanh
Mar 04 2015 04:13
Do you know what benefits Plus has apart from the fact that we can use it to say some type must have an instance of Plus?
Sam Roberts
@SamRoberts
Mar 04 2015 04:13
@shmookey what exactly do you mean? do you mean you can avoid generating a class for the custom mappings? or do you mean that the transform is currently generating a class for each field being copied over, and we can stop doing that?
(because I can't see how it is possible to avoid generating a class for custom mappings, which are arbitrary scala functions)
Luke Williams
@shmookey
Mar 04 2015 04:15
must they be? they were arbitrary scala functions for field accessors too, now they're not
i guess it's more likely in a transform that the mapping function will do some work
are many jobs using the transform macro?
anyway, i think i have the luxury of changing the api somewhat (right?) for this one, so there are lots of possibilities
Sam Roberts
@SamRoberts
Mar 04 2015 04:20
@stephanh Plus gives you a new obscure way of writing or :smile:
I guess, from my point of view, it's one step closer to belonging to an ecosystem of functions which operate on a wide variety of data types in standard ways
which improves code readability (in aggregate)
also, the plus laws give us an extra test for free
@shmookey right, but the field accessors always did the same thing, so you could encapsulate that behavior in the one class that was parametrized over different types and product indices
Stephan Hoermann
@stephanh
Mar 04 2015 04:24
Ok. I agree, just wanted to know if there where any other benefits that I wasn't aware off. <+> is soo much better than |||. :smirk:
Sam Roberts
@SamRoberts
Mar 04 2015 04:24
whereas the custom mappings deliberately give the users the ability to run whatever scala code they like
Luke Williams
@shmookey
Mar 04 2015 04:24
yeah, just realised they come from the user :D
Sam Roberts
@SamRoberts
Mar 04 2015 04:24
right, I see |||, I have no idea what that means. But <+> is so obvious
yeah. plus by itself doesn
doesn't give us much, but it's also easy to add
and stops us from accidentally breaking those semantics and getting further away from the standard behaviour and apis
Luke Williams
@shmookey
Mar 04 2015 04:25
ok, well we can't stop users creating thousands of classes, but it would be nice if we could give them a way to avoid doing so if they've got a lot of mappings
Sam Roberts
@SamRoberts
Mar 04 2015 04:28
@shmookey hrmm ... I guess you could provide a bunch of standard Function instances that people might need (e.g. date time conversions), so that they tend to pass those functions in and re-use the same classes all the time
although I think that's more likely to be something that grows naturally over time
if you have any other ideas I would love to hear them!
I don't think our users will ever be manually doing hundreds of custom mappings, though
(well, not enough to cause problems, anyway)
Luke Williams
@shmookey
Mar 04 2015 04:33
in that case i mightn't bother, but i was thinking that the argument to mkJoin macro could be a singleton object rather than a list of (key, A => B) transformation functions, and the methods on the object could be the manual transform rules. the keys could be the method names!
Luke Williams
@shmookey
Mar 04 2015 04:40
hmm, if we go that way we don't have to break the existing API
Sam Roberts
@SamRoberts
Mar 04 2015 04:42
ha, that's a clever idea!
Doesn't let you abstract over the list of Fields, though, which users can technically do at the moment. But still a nice idea ...
Luke Williams
@shmookey
Mar 04 2015 04:49
they can write their own macro if they want to do that :P
i mean, i assume we'd eliminate string representations of field names entirely if it were practical to do so?
Luke Williams
@shmookey
Mar 04 2015 04:55
maybe i'm missing your point :)
what do you mean by abstracting over the list of Fields?
Sam Roberts
@SamRoberts
Mar 04 2015 04:57
hrmm. So, I would do away with the stringly based implementation of Fields, if I could. But I wouldn't do away with the concept that you have a first class thing that represents a field in a datastructure, and you can pass it around, and create lists of fields, and filter them by their various properties, and so forth
because when you are dealing with data structures with 100s of columns in them, you need some way of being able to abstract over those columns
So we have this dual notion of the thrift structures we pass around. On the one hand, we have this notion of them as a class with a specific (statically known) set of fields, and we get a lot of type safety from this.
Luke Williams
@shmookey
Mar 04 2015 05:00
agreed, not suggesting we do away with Fields, just manual mapping function, which would become methods on a "manual transformation" object
Sam Roberts
@SamRoberts
Mar 04 2015 05:01
On the other hand we have this notion of them as a bag of fields, which allows us to write programs which abstract over those fields
Right, but by tying the fields to the method names on the object, you make it impossible to write scala code that manipulates the fields without resorting to macros
the set of methods on an object are not a first class thing that you can manipulate
Luke Williams
@shmookey
Mar 04 2015 05:11
they are in macros :)
Sam Roberts
@SamRoberts
Mar 04 2015 05:12
right, but we're not going to get our users to do that!
Luke Williams
@shmookey
Mar 04 2015 05:13
we run our own macro over it
Sam Roberts
@SamRoberts
Mar 04 2015 05:18
aren't we then supplying our own macro version of every function we want our users to be able to use on Fields?
Luke Williams
@shmookey
Mar 04 2015 05:19
apologies if i'm not being very clear, my scala lingo is not all there yet :)
Sam Roberts
@SamRoberts
Mar 04 2015 05:21
no worries :)
Luke Williams
@shmookey
Mar 04 2015 05:31
still trying to establish if scala can do this, but i'm imagining we could provide a Transformation class/trait macro (type provider?) which, when derived/implemented by a user class, modifies its AST to be equivalent to the objects already returned by mkJoin/mkTransform
(which will probably mean taking in method names and spitting out strings into the generated code)
kind of like python metaclasses, if you've done any of that nonsense :P
Sam Roberts
@SamRoberts
Mar 04 2015 05:36
what are you trying to accomplish?
Luke Williams
@shmookey
Mar 04 2015 05:37
avoiding a new class file generated per mapping rule
also, preserving the existing API, removing an instance of key names as strings and (maybe) making transformations more intuitive :P
Sam Roberts
@SamRoberts
Mar 04 2015 05:42
perhaps you could mock up an example of what the end-user code would look like it and share it with us?
Luke Williams
@shmookey
Mar 04 2015 05:42
will do :)
Sam Roberts
@SamRoberts
Mar 04 2015 05:42
awesome :)
Rowan Davies
@rowandavies
Mar 04 2015 05:47
@SamRoberts I’ve fixed the example in the README PR #282, but Execution.from looked a bit more complicated, so instead I’ve moved the if to the end, and have it comparing the actual count in loadInfo and the count returned by viewHive. I’ve also rebased and squashed into a single commit that includes adding CustomerJob.scala (linked to from the README) and CustomerJobSpec.scala. Maybe have a quick look when you get a chance, then I’ll push the commit to master.
Sam Roberts
@SamRoberts
Mar 04 2015 05:50
@rowandavies looks good
Luke Williams
@shmookey
Mar 04 2015 05:53
ah, no type macros yet, even in the latest scala/paradise
annotations will be a tolerable substitute
Rowan Davies
@rowandavies
Mar 04 2015 05:59
@SamRoberts Thanks! I just pushed.