These are chat archives for linkedin/pinot

20th
Jan 2016
Jean-François Im
@jfim
Jan 20 2016 00:26
@daifish None of the pinot devs speak chinese (although @fx19880617 does), so we wouldn't be very helpful :P
DaiYu
@daifish
Jan 20 2016 00:41
@jfim ok,i see he come from china, so i want to ask some question to him.By the way, could u give me some suggestions for that how can i think the pinot deployed successfully for three machines? i have try this somedays, i don't know whether i deploy it successfully
Jean-François Im
@jfim
Jan 20 2016 00:41
Sure
Are the three machines for test purposes or production?
DaiYu
@daifish
Jan 20 2016 00:42
yes
Jean-François Im
@jfim
Jan 20 2016 00:43
If it's for test, you can deploy the controller in one machine, the broker in another and the server in the last machine
If you only have three servers to work with in a production environment, I would deploy all three components (broker, server and controller) on all three
so that if any server fails, pinot still works
DaiYu
@daifish
Jan 20 2016 00:45
Do u mean, on each machine, i should process the command startServer/Broker/Controller on it?
although there are two slaves?
Jean-François Im
@jfim
Jan 20 2016 00:49
yes
Pinot is not master-slave
DaiYu
@daifish
Jan 20 2016 00:51
ok, i will try it later, by the way, i always wonder how can i prove i have deployed it successfully? can i have some evidences for that?
Jean-François Im
@jfim
Jan 20 2016 00:52
Well, you can check that your deployment works by sending queries to it
DaiYu
@daifish
Jan 20 2016 00:54
sendding queries? u mean that i use one controller, one server, one broker on each machine? i use the queries? if i can check the result that can prove the success?
@jfim In a word, thanks very much for your help, i will try the schema which u say, i have puzzeled for almost a week :) Thanks for your time again
Jean-François Im
@jfim
Jan 20 2016 00:58
If you configure it so that each machine has one controller, one server and one broker and they're all connected to the same zookeeper, you can query any one of the machines and you will get the same query results
so this way, if any machine fails, everything still works
DaiYu
@daifish
Jan 20 2016 00:59
@jfim ok, i will try that :)
Jean-François Im
@jfim
Jan 20 2016 00:59
Sure, ping me if you need help :)
Sorry, the documentation is a bit lacking
Jean-François Im
@jfim
Jan 20 2016 01:04
Let me add a short link to an introductory presentation on Pinot
DaiYu
@daifish
Jan 20 2016 01:07
Many thanks, So kind of u @jfim
DaiYu
@daifish
Jan 20 2016 03:55
Hi, @jfim , i have deploy the three machines by the way one controller, one server, one broker, it looks perfect, if i use the query: select sum(Delayed) from flight group by Year, Month, Dest, it only spends 600ms, and the numTotal is 1million then i deploy all three components (broker, server and controller) on all three, the same query looks slowly, almost 2000ms, i wonder when the
i wonder the operation like addSchema,addTable,should i do that on all three? or just on one?
Jean-François Im
@jfim
Jan 20 2016 04:30
@daifish How much memory do you have on your machines?
DaiYu
@daifish
Jan 20 2016 05:08
about 2g for each, i use the vm @jfim
Jean-François Im
@jfim
Jan 20 2016 05:18
Ah okay, then that's probably why it's slower if you put all three in the same node
in that case, it's better to have only one service per machine
DaiYu
@daifish
Jan 20 2016 05:20
ok, i now just test for the vm, i will deploy on the amozen later.In addition, should the cluster should keep two alive machines at least?
DaiYu
@daifish
Jan 20 2016 05:26
and when use the cluster, the query i only use the bin/pinot-admin.sh PostQuery -query "select count(*) from flight", it has the default -brokerHost and -brokerPort (eg)
the default as -brokerHost localhost -brokerPort 8099, if i deploy three brokers on all three machines, should i assign the query -brokerHost and -brokerPort when use the query?
Jean-François Im
@jfim
Jan 20 2016 05:31
Yes, you'll need to specify it in that case
You can query any broker if they're part of the same cluster
DaiYu
@daifish
Jan 20 2016 05:32
the speed will has no different?
i mean the query speed
Jean-François Im
@jfim
Jan 20 2016 05:40
No, the query speed should be the same
DaiYu
@daifish
Jan 20 2016 05:42
ok, I know, the last, i think i will have almost one hundred million records, if i use three machines, could u give some suggestion about the machine configuration and the number of server, broker?
sorry, one hundred million records for each month
Jean-François Im
@jfim
Jan 20 2016 05:56
It really depends on the number of queries you're getting
how many queries per second, roughly, are you planning on handling?
DaiYu
@daifish
Jan 20 2016 06:05
my leader says that we will have one hundred order.
recently we use it for statistics,but in the future we will use for some realtime application.
Jean-François Im
@jfim
Jan 20 2016 06:06
one hundred queries per second?
DaiYu
@daifish
Jan 20 2016 06:07
yeah
if i use three machines
Jean-François Im
@jfim
Jan 20 2016 06:09
you'll probably need more than three vms :)
DaiYu
@daifish
Jan 20 2016 06:10
Ah okay, i mean the machine is not vm, is the real machine,like the Amazon:)
Jean-François Im
@jfim
Jan 20 2016 18:25
It really depends on the data and the queries you're sending, there's no easy answer
The best way is to deploy it on a couple of servers and benchmark it