These are chat archives for juttle/juttle

29th
Dec 2015
Daria Mehra
@dmehra
Dec 29 2015 18:42
let’s get the conversation going here. i have a question in response to review comment https://github.com/juttle/juttle/pull/15#discussion_r48519584
our future tutorial shows this program:
read file -file '/tmp/github_data_from_es.json'
| reduce total = count(),
  actors = count_unique('actor_login'),
  repos = count_unique('repo_name’)
then i make this lame segue into adding put avg_actors = actors / repos:
Daria Mehra
@dmehra
Dec 29 2015 18:48
uhm gitter issues...
Let's say we wanted to know the number of actors participating in an average GitHub repository. The program above gives us the counts of actors and repos, so finding the average is simple math. However, it will not work to add "avg_actors = actors / repos" to the reduce expression, since this computation is not a reducer; and in general, this is not the logic you seek.
that ^ segue. dave said it was opaque and he was right.
my real problem was that a number of ways to add avg_actors = actors / repos don’t work. the one that does work as part of the same reduce is this bit of complexity:
read file -file '/tmp/github_data_from_es.json'
| reduce total = count(),
actors = count_unique('actor_login'),
repos = count_unique('repo_name'),
avg_actors = *'actors' / *'repos'
output
┌──────────┬───────────────────────────┬──────────┬──────────┐
│ actors   │ avg_actors                │ repos    │ total    │
├──────────┼───────────────────────────┼──────────┼──────────┤
│ 753      │ 18.365853658536587        │ 41       │ 1597     │
└──────────┴───────────────────────────┴──────────┴──────────┘
but if i try doing the exact same thing in a separate follow-on reduce, it doesn’t work:
read file -file '/tmp/github_data_from_es.json'
| reduce total = count(),
actors = count_unique('actor_login'),
repos = count_unique('repo_name')
| reduce avg_actors = *'actors' / *'repos'
Rodney Lopes Gomes
@rlgomes
Dec 29 2015 18:51
the math you're doing in the second reduce sin't a "reduction"
Daria Mehra
@dmehra
Dec 29 2015 18:51
[RuntimeError: Error: Invalid operand types for "/": null and null.]
Rodney Lopes Gomes
@rlgomes
Dec 29 2015 18:51
its simply an annotation or put in our language
Daria Mehra
@dmehra
Dec 29 2015 18:51
i know… that’s why i ended up making that statement in the tutorial about “this computation is not a reducer” and got nicked for opaqueness
Rodney Lopes Gomes
@rlgomes
Dec 29 2015 18:52
a = b / c isn't a reduction its a computation so I'm not sure what @dave (who's not on gitter yet) means by opaqueness
Daria Mehra
@dmehra
Dec 29 2015 18:52
double checked our docs, they are opaque too, saying reduce takes fieldname=expr
An assignment expression, where expr can be a reducer.
and “can be”… something else? because normal assignments that work in put sure don’t work in the reduce above.
i think dave’s right that for a new user, my statement doesn’t elucidate much.
Rodney Lopes Gomes
@rlgomes
Dec 29 2015 18:53
yeah can be seems incorrect since reduce only accepts fieldname=reducer expression
Daria Mehra
@dmehra
Dec 29 2015 18:53
but what about my code above that actually worked? the dereferencing one?
@go-oleg there you are!
we are having an exciting discussion about what expressions are accepted by reduce.
so, if reduce only accepts reducers, and not assignment expressions, then why does this code work correctly?
```
read file -file '/tmp/github_data_from_es.json'
| reduce total = count(),
actors = count_unique('actor_login'),
repos = count_unique('repo_name'),
avg_actors = *'actors' / *'repos'
(when i was writing the tutorial, i didn’t know that worked. probably for the best. it’s not something i’d want to show a new user)
Daria Mehra
@dmehra
Dec 29 2015 19:01
oh and it wouldn’t be right to say that reduce only takes reducers, it is legal to do straight up value assignment like this:
juttle> emit -limit 3 | reduce c = count(), d = "duh"
┌──────────┬──────────┐
│ c        │ d        │
├──────────┼──────────┤
│ 3        │ duh      │
└──────────┴──────────┘
Rodney Lopes Gomes
@rlgomes
Dec 29 2015 19:08
I think the fact you can't do avg_actors = actors/repos is a bug becuase there's really no reason to have to put the *'xxx' in there other than probably some missing grammar change to support the name of the field.
Oleg Seletsky
@go-oleg
Dec 29 2015 19:09
lets see if i can come up with wording that explains it: you cannot dereference point fields in an expression used in reduce
Daria Mehra
@dmehra
Dec 29 2015 19:13
“point fields” being fields in the data point going through the reduce?
Oleg Seletsky
@go-oleg
Dec 29 2015 19:14
yes, the data points going through the reduce (as opposed to the point emitted by the reduce)
Daria Mehra
@dmehra
Dec 29 2015 19:14
ah ok so that’s why my dereferencing only worked in the context of single reduce.
Rodney Lopes Gomes
@rlgomes
Dec 29 2015 19:15
so how come this doesn't work
rlgomes@x230> ./bin/juttle -e 'emit -limit 3 -every :1ms: | reduce value=count(), avg(value), double=*'value'*2'          
<input>:1:72: 
   1:emit -limit 3 -every :1ms: | reduce value=count(), avg(value), double=*value*2
                                                                            ^^^^^
Error: value is not defined (RT-UNDEFINED)
Daria Mehra
@dmehra
Dec 29 2015 19:15
so the fields of the point going through the reduce are not in scope for dereferencing. (why, though?)
Rodney Lopes Gomes
@rlgomes
Dec 29 2015 19:15
oops typo in the quotes
that does work fine
rlgomes@x230> ./bin/juttle -e "emit -limit 3 -every :1ms: | reduce value=count(), avg(value), double=*'value'*2"
┌──────────┬──────────┬──────────┐
│ value    │ avg      │ double   │
├──────────┼──────────┼──────────┤
│ 3        │          │ 6        │
└──────────┴──────────┴──────────┘
Daria Mehra
@dmehra
Dec 29 2015 19:16
@rlgomes i think you got two value there
the original point going through the reduce has value and the point being generated by reduce also has value
Rodney Lopes Gomes
@rlgomes
Dec 29 2015 19:17
there's no value going through the reduce ... emit only creates points with time field
Daria Mehra
@dmehra
Dec 29 2015 19:17
ah
Oleg Seletsky
@go-oleg
Dec 29 2015 19:17
@dmehra because there are multiple points going through and only one is emitted so it would be ambiguous as to what the expression is supposed to be doing. inside the reducer is where point fields are in scope for dereferencing.
Daria Mehra
@dmehra
Dec 29 2015 19:17
in that case avg(*’value’)
Rodney Lopes Gomes
@rlgomes
Dec 29 2015 19:18
the initial thing I wrote just had a mess of single quote usage...
Daria Mehra
@dmehra
Dec 29 2015 19:18
right i get it now @go-oleg
maybe even well enough to document :)
and @rlgomes you can’t do avg(value) at all since the incoming points have no value.
that’s why you got a blank for avg. the double=… code works because it’s looking at the value field of the newly generated point.
Rodney Lopes Gomes
@rlgomes
Dec 29 2015 19:20
yup I know I was just trying to see if it would complain the same way things do when you use the doubleValue=value*2 where we complain about value not defined (which I still feel is odd)
Daria Mehra
@dmehra
Dec 29 2015 19:21
yeah UNDEFINED is not a greatly useful error in this case. it is correct… it looks for a const named value, doesn’t find it, complains about undefined.
as you could’ve done this:
juttle> const constvalue = 42; emit -limit 3 -every :1ms: | reduce stuff=constvalue*2
┌──────────┐
│ stuff    │
├──────────┤
│ 84       │
└──────────┘
Daria Mehra
@dmehra
Dec 29 2015 19:27
that isn’t a very useful usage of reduce, of course. it’s only there for “naming”, like this
juttle> const tag = "Data123"; emit -limit 3 -every :1ms: | reduce count(), label = tag
┌──────────┬───────────┐
│ count    │ label     │
├──────────┼───────────┤
│ 3        │ Data123   │
└──────────┴───────────┘
and i do believe i just came up with the gnarliest big of syntax, rivaling the worst of perl…
juttle> const tag = "MyData"; emit -limit 3 -every :1ms: | reduce c=count(), label = "${tag}${*'c'}"
┌──────────┬───────────┐
│ c        │ label     │
├──────────┼───────────┤
│ 3        │ MyData3   │
└──────────┴───────────┘
don’t do that unless you really need to.
(i’m a bug magnet, too. sitting here watching the above code randomly change in front of my eyes, because i edited it twice, and i think gitter has some caching / refresh delay issue)