These are chat archives for linkedin/pinot

17th
Mar 2016
Jean-Fran├žois Im
@jfim
Mar 17 2016 19:57
@mardambey You can do daily pushes
Basically what happens is that you generate segments for that day's data
Each segment is a set of rows and associated indices
From what you're asking, then no, there's no global index
We just iterate over all the segments and process them (in parallel)
We can do early exit though, based on the dictionary
@daifish You need to get the data into Hadoop through some other system
Jean-Fran├žois Im
@jfim
Mar 17 2016 20:03
@daifish At LinkedIn, we use Gobblin (https://github.com/linkedin/gobblin) for data ingestion into Hadoop. There's a diagram on the slide deck on slide 8 that explains how it works