These are chat archives for getredash/redash

2nd
Dec 2016
jonmauney
@jonmauney
Dec 02 2016 16:22
Hi folks, I'm a first time poster here, glad to have found the community. I apologize in advance if I violate any etiquette tenants, please let me know if I do! I have a question concerning concurrent queries. I followed the steps listed here: http://docs.redash.io/en/latest/usage/maintenance.html (adding -c nworkers to celery config) and restarted everything. I have seen a couple queries running at the same time but recently saw 1 long running query lock out many more: (will try to post a screenshot, never used gitter before)
blob
/opt/redash/supervisord/supervisord.conf:
[supervisord]
nodaemon=false
logfile=/opt/redash/logs/supervisord.log
pidfile=/opt/redash/supervisord/supervisord.pid
directory=/opt/redash/current

[inet_http_server]
port = 127.0.0.1:9001

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[program:redash_server]
command=/opt/redash/current/bin/run gunicorn -b 127.0.0.1:5000 --name redash -w 4 redash.wsgi:app
process_name=redash_server
numprocs=1
priority=999
autostart=true
autorestart=true
stdout_logfile=/opt/redash/logs/api.log
stderr_logfile=/opt/redash/logs/api_error.log

# There are two queue types here: one for ad-hoc queries, and one for the refresh of scheduled queries
# (note that "scheduled_queries" appears only in the queue list of "redash_celery_scheduled").
# The default concurrency level for each is 2 (-c2), you can increase based on your machine's resources.

[program:redash_celery]
command=/opt/redash/current/bin/run celery worker --app=redash.worker --beat -c2 -Qqueries,celery -c 16
process_name=redash_celery
numprocs=1
priority=999
autostart=true
autorestart=true
stdout_logfile=/opt/redash/logs/celery.log
stderr_logfile=/opt/redash/logs/celery_error.log

[program:redash_celery_scheduled]
command=/opt/redash/current/bin/run celery worker --app=redash.worker -c2 -Qscheduled_queries
process_name=redash_celery_scheduled
numprocs=1
priority=999
autostart=true
autorestart=true
stdout_logfile=/opt/redash/logs/celery.log
stderr_logfile=/opt/redash/logs/celery_error.log
Can anyone help me troubleshoot why 1 query blocked 32 more?
oh man, maybe i see my problem
-c2
mmmk i'll show myself out.
jonmauney
@jonmauney
Dec 02 2016 16:27
ooh perhaps I now have a real problem- after fixing my now-I-realize obvious issue above, I attempted to restart celery and received a refused connection error to the port:
ubuntu@ip-xxx:~$ sudo supervisorctl restart redash_celery
http://localhost:9001 refused connection
Arik Fraimovich
@arikfr
Dec 02 2016 16:31
-c2 means you can have at most 2 concurrent queries, so in theory you should've seen more than 1 query anyway.
re. refused connection - it might be that the supervisor daemon was killed by the oom (out of memory) killer.
jonmauney
@jonmauney
Dec 02 2016 16:38
thanks arik- so I should just try starting supervisor?
starting it up got me back on track. Only problem now is I have a bunch of (what appear to be) orphaned celery workers. @arikfr I'm on ec2- would you recommend just rebooting the instance?
jonmauney
@jonmauney
Dec 02 2016 17:36
(I restarted- looks good to go now.) Thanks for the help- i’ll leave my silly question in the chat history as a badge of honor.
Arik Fraimovich
@arikfr
Dec 02 2016 20:05
I don't think it's silly :) In the future, you can just kill them all, but I guess that an instance restart works too.
I think that in your case supervisord died and later on Celery, so there was nothing to bring it back. I wonder if there is a way to configure OOM not to kill supervisord.
jonmauney
@jonmauney
Dec 02 2016 20:19
makes sense. It’s also enitrely possible I improperly killed some things when tooling around on the server :smile:
Allen Short
@washort
Dec 02 2016 20:36
@arikfr Any objections to dropping flask_script in favor of click? I don't see any good way to write tests for flask_script commands and i'd like to have some coverage on them
Arik Fraimovich
@arikfr
Dec 02 2016 20:56
@washort nope , as long as it doesn't complicate things. Also I think I saw that click is officially supported in the next version of Flask.
Allen Short
@washort
Dec 02 2016 21:01
@arikfr That was what motivated me, plus seeing click specifically had support for testing.
Converting the CLI stuff wasn't too hard, I've got a couple tests written and passing
I'll put up a PR against master and then merge it into the sqlalchemy branch afterwards
Arik Fraimovich
@arikfr
Dec 02 2016 21:14
Why not make the change in the sqla branch?
Allen Short
@washort
Dec 02 2016 21:16
Paranoia about making a branch too big :)
Allen Short
@washort
Dec 02 2016 21:34
... aaaand i've already found and fixed a bug. yay tests.