These are chat archives for getredash/redash

3rd
Jun 2015
DBY
@Dnile
Jun 03 2015 07:04
hey @arikfr
were experiencing something very weird with the celery workers
the refuse taking work, even after cleaning redis and restarting them
any idea where to poke?
Arik Fraimovich
@arikfr
Jun 03 2015 07:05
@Dnile that's weird. I witnessed issues with celery works stopping working, and might have found a workaround. But never seen something like that. Are you sure they use correct redis settings?
if you check the logs it will show the settings it uses
although you might need to increase logging level
this is done by editing /opt/redash/.env
DBY
@Dnile
Jun 03 2015 07:06
checking
change INFO to DEBUG?
Arik Fraimovich
@arikfr
Jun 03 2015 07:07
yep
DBY
@Dnile
Jun 03 2015 07:08
cool
Arik Fraimovich
@arikfr
Jun 03 2015 07:10
btw, to prevent other issues with celery add --maxtasksperchild=10 -Ofair to the celery start command
DBY
@Dnile
Jun 03 2015 07:10
celery log looks the same
api_error too
Arik Fraimovich
@arikfr
Jun 03 2015 07:11
and adjust number of workers while you at it
(if you haven't already)
DBY
@Dnile
Jun 03 2015 07:11
cool
how many is a fair number of workers?
(we dont have that configured i see now)
Arik Fraimovich
@arikfr
Jun 03 2015 07:16
whaat kind of instance is this?
DBY
@Dnile
Jun 03 2015 07:16
m2.xlarge
Arik Fraimovich
@arikfr
Jun 03 2015 07:17
dedicated for re:dash?!
DBY
@Dnile
Jun 03 2015 07:17
indeed.
Arik Fraimovich
@arikfr
Jun 03 2015 07:17
wow, that's huge.
DBY
@Dnile
Jun 03 2015 07:17
blame @srockets
Arik Fraimovich
@arikfr
Jun 03 2015 07:17
we use m3.medium or m3.large I don't remember
it's funny thast you have this beast and you use only 2 workers :) (by default celery will start 1 worker per core)
DBY
@Dnile
Jun 03 2015 07:18
i love that you call it a beast
Arik Fraimovich
@arikfr
Jun 03 2015 07:18
you know, everything is relative
DBY
@Dnile
Jun 03 2015 07:18
we have some i24xlrg ones too :P
Arik Fraimovich
@arikfr
Jun 03 2015 07:19
anyway, because stuff are io bound the real barrier for # of workers is memory. and if you use it with Redshift that # of slots you have allocated
(use only w/ redshift, I mean)
so 8 is a safe number. but obviously is can go for much more
-c8
DBY
@Dnile
Jun 03 2015 07:20
i'll go with 8 then
thanks
Arik Fraimovich
@arikfr
Jun 03 2015 07:21
I think we have 15 but split across different queues (so mysql queries won't get stuck behind the redshift ones)
DBY
@Dnile
Jun 03 2015 07:21
that sounds like a smart thing to do
we only use redshift though
mysqls are not queried by analytics
Arik Fraimovich
@arikfr
Jun 03 2015 07:23
redash doesn't have to be used only by analytics ;)
but ok
DBY
@Dnile
Jun 03 2015 07:23
true
product can use it too :P
btw
is elasticsearch pr happening?
really looking forward to it
maybe i can get rid of kibana...
Arik Fraimovich
@arikfr
Jun 03 2015 07:25
hmm.. I know that @erans started working on it. not sure what's up with it though.
DBY
@Dnile
Jun 03 2015 07:25
interesting
i'll ask him
Arik Fraimovich
@arikfr
Jun 03 2015 07:25
there are some things in Kibana that are easier than in re:dash (like changing time frames...)
DBY
@Dnile
Jun 03 2015 07:25
true
so it looks like a lot of the queries are shown on api_error.log
[DEBUG][peewee] prefix to them
Arik Fraimovich
@arikfr
Jun 03 2015 07:28
yeah, peewee is very chatty...
(in debug)
did celery start taking work?
DBY
@Dnile
Jun 03 2015 07:29
i think it picked up only some of the work
Arik Fraimovich
@arikfr
Jun 03 2015 07:29
and you have 8 workers already?
maybe there is some heavy query that blocking the others?
DBY
@Dnile
Jun 03 2015 07:29
yea
no wait
thats what i thought
kicking another query to verify
Query in queue… 08:16:02
cant see it running on the cluster
Arik Fraimovich
@arikfr
Jun 03 2015 07:36
try running a completely new query something like : "select 83484398934"
is it stuck in queue as well?
DBY
@Dnile
Jun 03 2015 07:36
checking
yea :\
Arik Fraimovich
@arikfr
Jun 03 2015 07:37
but did it start from 00:00:00 coutning the queue?
DBY
@Dnile
Jun 03 2015 07:38
yes
perhaps i should try cleaning the queue and restart?
Arik Fraimovich
@arikfr
Jun 03 2015 07:40
yes, stop celery (and make sure it really stopped) & then flush the redis db -> start celery. see if it still happens
DBY
@Dnile
Jun 03 2015 07:40
on it
well, there are a lot of celery processes after the supervisor stop
so i'll kill all of them first
cool
DBY
@Dnile
Jun 03 2015 07:46
done that, that query ran immediately
the other one - not so much
DBY
@Dnile
Jun 03 2015 08:07
so it seems to only pick up work after restarts...
Arik Fraimovich
@arikfr
Jun 03 2015 08:16
And if you run the first one again? And they get stuck in queue or execution doesn't finish?
DBY
@Dnile
Jun 03 2015 08:16
correct
wait!
i got executing query now
i restarted redash and celery again a while ago
Arik Fraimovich
@arikfr
Jun 03 2015 08:21
I'm not following -- did it execute that query again?
(without restarting)
DBY
@Dnile
Jun 03 2015 08:21
without restarting it kept staying in the queue
and now
it says executing query - but i dont see it on the cluster
Arik Fraimovich
@arikfr
Jun 03 2015 08:28
right
might be the mtu issue
one minute
DBY
@Dnile
Jun 03 2015 08:28
k
DBY
@Dnile
Jun 03 2015 08:29
it's on 1500
Arik Fraimovich
@arikfr
Jun 03 2015 08:30
:(
cna you send me over your celery logs?
DBY
@Dnile
Jun 03 2015 08:30
4 sure
err log ?
Arik Fraimovich
@arikfr
Jun 03 2015 08:31
both
DBY
@Dnile
Jun 03 2015 08:35
sending you everything, you choose
keep in mind timestamps are pacific time
10 hrs less the IL
DBY
@Dnile
Jun 03 2015 08:41
email/g-drive?
Arik Fraimovich
@arikfr
Jun 03 2015 08:46
whatever. email works
DBY
@Dnile
Jun 03 2015 08:47
sent
thanks for the assistance man
Arik Fraimovich
@arikfr
Jun 03 2015 08:48
sure thing
btw, is your server timezone set to pacific time or utc?
DBY
@Dnile
Jun 03 2015 08:48
pacific :)
Arik Fraimovich
@arikfr
Jun 03 2015 08:49
so this EverythingMe/redash#442 should be relevant to you
in theory scheduled queries not supposed to work for you
DBY
@Dnile
Jun 03 2015 08:51
interesting
Arik Fraimovich
@arikfr
Jun 03 2015 08:51
but maybe it were working for you, if you were using an older version before we changed the retrieved_at field in pg to have time zone
DBY
@Dnile
Jun 03 2015 08:52
yea, i upgraded last week
Arik Fraimovich
@arikfr
Jun 03 2015 08:52
so you're affected
but shuoldn't be related to what you experience now
becuase those are adhoc queries
DBY
@Dnile
Jun 03 2015 08:52
indeed
DBY
@Dnile
Jun 03 2015 08:59
so..looking at that fix
says the path is /opt/redash/current/tasks.py
mine is /opt/redash/current/redash/tasks.py
Arik Fraimovich
@arikfr
Jun 03 2015 09:51
@Dnile oops. you're right
DBY
@Dnile
Jun 03 2015 09:51
:thumbsup:
Arik Fraimovich
@arikfr
Jun 03 2015 11:11
@Dnile just realized that for celery to have higher logging level, you need to pass the -l{level} (i.e. -ldebug) to the celery command. at first I didn't understand why there is so little in the logs
@Dnile also, this issue of queries not running - started all of a sudden?
DBY
@Dnile
Jun 03 2015 11:11
gotcha - on it
and yes,all of the sudden :(
command=/opt/redash/current/bin/run celery worker --app=redash.worker --beat -Qqueries,celery,scheduled_queries --maxtasksperchild=10 -c8 -ldebug --concurrency 6
something is wrong there. not sure what.
Arik Fraimovich
@arikfr
Jun 03 2015 11:15
-c8 and --concurrency 6 seems contradicting
DBY
@Dnile
Jun 03 2015 11:15
huh.
getting rid of the latter
ran the command manually because something is messy there.
this is some of the stuff
[2015-06-03 04:19:09,663: INFO/MainProcess] Task redash.tasks.record_event[0251b405-b6a0-4c7c-8733-16c6e8c2ea91] succeeded in 0.0363939013332s: None [2015-06-03 04:19:19,367: INFO/Beat] Scheduler: Sending due task refresh_queries (redash.tasks.refresh_queries)
DBY
@Dnile
Jun 03 2015 11:22
this is the command currently run by supervisor:
/opt/redash/current/bin/run celery worker --app=redash.worker --beat -Qqueries,celery,scheduled_queries --maxtasksperchild=10 -c8 -ldebug
DBY
@Dnile
Jun 03 2015 11:50
sent the same logs :|
resending in a minute
DBY
@Dnile
Jun 03 2015 11:58
done
Arik Fraimovich
@arikfr
Jun 03 2015 12:42
k, will take a look soonish
DBY
@Dnile
Jun 03 2015 12:43
thanks again
DBY
@Dnile
Jun 03 2015 14:29
got one of those thrown randomly to stdout:
[2015-06-03 07:18:19,817: ERROR/Worker-8] table 418500 dropped by concurrent transaction Traceback (most recent call last): File "/opt/redash/redash.0.6.1.b840/redash/query_runner/pg.py", line 122, in run_query _wait(connection) File "/opt/redash/redash.0.6.1.b840/redash/query_runner/pg.py", line 34, in _wait state = conn.poll() InternalError: table 418500 dropped by concurrent transaction
Arik Fraimovich
@arikfr
Jun 03 2015 14:33
WTF :O
but it might be a good clue to what's going on
DBY
@Dnile
Jun 03 2015 14:34
thought so
martin sarsale
@runa
Jun 03 2015 14:37
@arikfr aloha
Arik Fraimovich
@arikfr
Jun 03 2015 15:08
@runa hi. did you see the pull request?
martin sarsale
@runa
Jun 03 2015 15:15
@arikfr no! letme see
Arik Fraimovich
@arikfr
Jun 03 2015 15:16
@runa #442 see the comment I wrote for you there
martin sarsale
@runa
Jun 03 2015 15:16
@arikfr yep, shall I restart redash_celery afterwards?
Arik Fraimovich
@arikfr
Jun 03 2015 15:17
yes
martin sarsale
@runa
Jun 03 2015 15:18
@arikfr tellme what should I expect now? I have the select NOW() query refreshing every minute. should I expect to see the 'NOW query updated' html5 notification every minute? or I should reload (f5) the page to see the changes
?
Arik Fraimovich
@arikfr
Jun 03 2015 15:22
@runa it doesn't refresh automatically, so you need to f5 the page. but you should see that: 1) it updated; 2) the "last updated" timestamp is correct
martin sarsale
@runa
Jun 03 2015 15:22
ok, letme see
g
I only restarted redash_celery
maybe I should restart everything?
if I hit f5, I still see the old data, but a "NOW updated" html5 notif appears
Arik Fraimovich
@arikfr
Jun 03 2015 15:26
the html5 notif appears everytime the data is a loaded (sort of a bug)
@runa did you manually run the query once, or waited for the auto refresh to kick in?
(run once since the restart, I mean)
martin sarsale
@runa
Jun 03 2015 15:27
@arikfr waited for the refresh
Arik Fraimovich
@arikfr
Jun 03 2015 15:28
@runa it will only refresh in an hour (because it still has the bad timestamps). try running it once manually and see if it starts refreshing properly
martin sarsale
@runa
Jun 03 2015 15:29
@arikfr ok. sleep(1.minutes)
@arikfr YES :)
@arikfr thanks :)))
@arikfr this is still a minor bug: g
martin sarsale
@runa
Jun 03 2015 15:59
@arikfr btw,
@arikfr I gave access to our redash to one of our guys, a journalist who works on our blog
@arikfr a week ago, he asked me on how to do something, and I gave him a tutorial on SQL
martin sarsale
@runa
Jun 03 2015 16:20
@arikfr today he shown me some queries he wrote and how he was using redash
@arikfr it was pretty cool!
Arik Fraimovich
@arikfr
Jun 03 2015 16:24
@runa about "in 2 minutes" -- any chance there is time difference between your machine and the server? otherwise I'm not sure how to explain this :|
@runa about the journalist : this is awesome :) last time I gave someone a tutorial on SQL, he came back months later studying Python. so beware :)
martin sarsale
@runa
Jun 03 2015 16:26
@arikfr the server is not on the same TZ than my desktop, thats for sure
ah
I understood what you mean
Arik Fraimovich
@arikfr
Jun 03 2015 16:27
@runa I guess you figured this out, but I meant if there is difference in minutes
martin sarsale
@runa
Jun 03 2015 16:30
@arikfr yep, letme check
indeed ;)
Arik Fraimovich
@arikfr
Jun 03 2015 16:39
indeed there was ?
martin sarsale
@runa
Jun 03 2015 16:41
@arikfr yep :)
Arik Fraimovich
@arikfr
Jun 03 2015 16:49
@runa great :) one day I will add a way to compensate for these differences. until then...
@runa btw, I'm still waiting for a logo :)
martin sarsale
@runa
Jun 03 2015 16:50
@arikfr oh. I forgot :)