Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 16:49
    feng-tao commented #6157
  • 16:29
    mik-laj commented #6147
  • 16:28
    mik-laj commented #6147
  • 16:27
    mik-laj commented #6147
  • 16:27
    mik-laj commented #6147
  • 15:59

    potiuk on master

    [AIRFLOW-XXX] Update documentat… (compare)

  • 15:59
    potiuk closed #6158
  • 15:59
    hsej starred apache/airflow
  • 15:58
    mik-laj labeled #6158
  • 15:47
    codecov-io commented #6158
  • 15:47
    codecov-io commented #6158
  • 15:28
    ashb synchronize #6157
  • 15:14
    ashb synchronize #6157
  • 15:10
    FengLi666 starred apache/airflow
  • 15:06
    ashb synchronize #6157
  • 15:06
    potiuk commented #6147
  • 15:05
    potiuk commented #6147
  • 14:59
    mik-laj commented #6155
  • 14:53
    potiuk review_requested #6158
  • 14:53
    potiuk opened #6158
Avinash Pallerlamudi
@stpavinash
but I see this error on my airflow UI :No module named 'airflow.contrib.operators.mssql_to_gcs'
can someone help me with this issue?
tunafish0805
@tunafish0805

Hi has anyone seen these errors on the scheduler? We are running 1.10.4 with the Celerey Executor.

 [2019-08-26 23:14:28,985] {{timeout.py:42}} ERROR - Process timed out, PID: 28688
[2019-08-26 23:14:28,985] {{timeout.py:42}} ERROR - Process timed out, PID: 28687
[2019-08-26 23:14:28,985] {{timeout.py:42}} ERROR - Process timed out, PID: 28689
[2019-08-26 23:14:35,082] {{timeout.py:42}} ERROR - Process timed out, PID: 28736

or

File "/usr/local/lib/python2.7/site-packages/sqlalchemy/event/attr.py", line 297, in __call__
    fn(*args, **kw)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 189, in on_connect
    do_on_connect(conn)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 846, in on_connect
    fn(conn)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 813, in on_connect
    hstore_oids = self._hstore_oids(conn)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 876, in oneshot
    result = fn(self, *args, **kw)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 899, in _hstore_oids
    oids = extras.HstoreAdapter.get_oids(conn)
  File "/usr/local/lib/python2.7/site-packages/psycopg2/extras.py", line 917, in get_oids
    """ % typarray)
  File "<string>", line 8, in __new__
  File "/usr/local/lib/python2.7/site-packages/airflow/utils/timeout.py", line 43, in handle_timeout
    raise AirflowTaskTimeout(self.error_message)
AirflowTaskTimeout: Timeout, PID: 28574
Andrew Desousa
@andrewldesousa
How would I run the airflow flask app in development mode?
Normally, if i run running a standard flask app I would have to 'export FLASK_ENV=developement' before running the app
Also, I am using Docker which is probably an important detail
Julien Debbia
@n4rk0o
Hi,
Is it possible to send a Jinja2 template value to the parameter queue that is herited from the BaseOperator? Basically, I need to dinamically define which queue I want to use and to achieve it I send the value to a customOperator with a Jinja2 template value but it doesnt work :/
for example:
task = myCustomOperator(task_id="test, queue="{{ task_instance.xcom_pull(task_ids='get_queue') }}")
Bruno Ambrozio
@bambrozio
Hello guys -
There's any documentation about best practices on DAGs deployment? Eg: Versioning?
I saw some people try to versioning the DAGs (and dag's file names) in order to avoid conflicts and somehow they try to automate the "turn off" of the previous DAG version.
I also see some people simply orchestrate the CD pipeline in order to avoid deploys of DAGs while they are running.
Wondering if there's some best practices on it?
tunafish0805
@tunafish0805
@n4rk0o have you set queue as a templated field in your myCustomOperator class? This is how it is achieved in the bash operator
template_fields = ('bash_command', 'env')
Zachary Jablons
@zmjjmz
Hey there, I have a weird problem that may be due to how our Airflow instance is setup, but I need help debugging it mostly. Is it possible to see what python path the main airflow instance is running dags with?
Brian Nutt
@bnutt
Has anyone seen issues with k8 executor on 1.10.4? We have a dag with 10k+ tasks made up of k8 pod operator tasks and it used to schedule just fine in 1.10.3, but now the tasks are halted to be scheduled due to Waiting for <Process(DagFileProcessor2982-Process, stopped)>. It works fine for dags with a smaller amount of tasks.
SOUVIK GHOSH
@souvikg10
Hi,
my company’s vulnerability review found issues with flask-appbuilder 1.13.1 and 1.12.5 both required for airflow. Any idea if this is supposed to change to support flaskappbuilder 2 or greater?
Ash Berlin-Taylor
@ashb
It may work. Try it and see.
What was the issue with 1.13.1?
SOUVIK GHOSH
@souvikg10
It uses a jquery package which is vulnerable to prototype pollution. Jquery.extend and jquery.fn.extend functions allow an untrusted object to extend Object.prototype , the CVSS vulnerability score is 9.8 from Sonatype
I tried with flask but setup.py of airflow strictly mentions >=1.125 and <2.0.0
I think with flask-appbuilder 2.0 there is some breaking changes and hence maybe airflow don’t want to update. Are there any airflow developers here? Do you guys have any plans to update the flask-appbuilder 2.0?
Ash Berlin-Taylor
@ashb
I'm the only Airflow developer in here. Most everyone else has moved to slack (I'm in both)
That prototype pollution vulnerability doesn't affect Airflow - we don't use that feature, nor do we accept JSON from a third party site or query strings/post forms
SOUVIK GHOSH
@souvikg10
Is there a reason for not using app builder 2.0? Thanks for your reply btw. It helps understanding and perhaps to convince my security experts
I can probably build from source and try to update the package to 2.0 perhaps
In the setup.py
Ash Berlin-Taylor
@ashb
No reason other than "we haven't gotten around to it yet"
Less work would probably be to update the version of jquery in use
SOUVIK GHOSH
@souvikg10
Isn’t that coming from flask-appbuilder though? I can try to build from source of airflow by updating the version appbuilder to 2.1 which doesn’t have any vulnerability and let you know if it breaks. If it works then perhaps it could be put in the latest release
Ash Berlin-Taylor
@ashb
It's too late for 1.10.5 (which is already in RC stage) but otherwise sounds good!
SOUVIK GHOSH
@souvikg10
When would be the next release? Timing wise say October?
Ash Berlin-Taylor
@ashb
About, yeah.
SOUVIK GHOSH
@souvikg10
Okay. I will check it out tonight. I really hope it works. Personally I have used but I am trying to do the advocacy at work. We have several requirements with respect RBAC, master slave worker and department based data pipelines for analytics. I find airflow to be the perfect fit unfortunately not so much our cyber security team
Ash Berlin-Taylor
@ashb
I work for Astronomer.io if we can be of help convincing security team.
SOUVIK GHOSH
@souvikg10
Interesting. I will definitely take a look. Though we are not cloud native and rhel 7 based
Ash Berlin-Taylor
@ashb
We do consulting and training as well as deployments on Kube
(I'll help how I can here too)
SOUVIK GHOSH
@souvikg10
Okay. We don’t have Kubernetes but Linux machines and deployment on 5 workers with a Chinese wall like security and on movement on data between folders . I will talk to my manager and write to your company’s contact page in case they would like to talk further
qinzl1
@qinzl1
when i set "default_timezone=system" ,but web ui time still utc. is not the system timezone
Shuai Liang
@slcode
I wish an SSH connection accepts one task at a time. How to set the parameters? Thanks a lot.
Patrick Guiran
@Tauop

Hello,
I am looking for help and/or explanations on Airflow scheduler ; it seems the scheduler take times to create and queue new tasks.
The average time between the end of a start (airflow_db.task_instances.end_date) and the queued time of the next time (airflow_db.task_instances.queued_dttm) is more than 2 minutes.
How can I speed-up the scheduler (or lower this time interval) ?

I am currently backfilling a DAG, which total task duration is around 70 sec, and the total dag execution duration is more than 10min because of the speed of the scheduler.
(Sorry for my poor english and thank you for your help :) )

ChrisOriginal
@ChrisOriginal
Hi there, I have small system where web users generate tickets for machine learning jobs. Then I have bunch of physical machines with GPUs (workers or slaves) that are capable of executing these tickets. I'm looking for some out of the box solution that could orchestrate that. Is apache airflow the perfect tool to do that?
kr
@xtrntr
does airflow backfill/external DAG triggers respect the parallelism parameters of the DAG? i can't verify this in the documentation: https://stackoverflow.com/a/45765784
japrogramer
@japrogramer
Hello i am running airflow in a venv and trying to get my dags to execute, just the example dags. but my scheduler doesn't execute them. They are both running in the same machine.
japrogramer
@japrogramer

The only error is this on the scheduler [2019-09-11 16:05:47,337] {dag_processing.py:748} ERROR - Cannot use more than 1 thread when using sqlite. Setting parallelism to 1
And the webserver

[2019-09-11 16:15:10 -0500] [10894] [INFO] Handling signal: winch
[2019-09-11 16:15:17 -0500] [10894] [INFO] Handling signal: ttin
[2019-09-11 16:15:17 -0500] [10959] [INFO] Booting worker with pid: 10959
[2019-09-11 16:15:17,770] {init.py:51} INFO - Using executor SequentialExecutor
[2019-09-11 16:15:18,045] {dagbag.py:90} INFO - Filling up the DagBag from /home/archangel/airflow/dags
[2019-09-11 16:15:18 -0500] [10894] [INFO] Handling signal: ttou
[2019-09-11 16:15:18 -0500] [10898] [INFO] Worker exiting (pid: 10898)

But the task remains un-executed.

japrogramer
@japrogramer
also how do i version control new dags that have been added via the web interface ?
arnekaiser-bt
@arnekaiser-bt
Hi, does anybody uses xcom_push with cloud composer in GCP?
I wonder whether it works with xcom push from another K8s cluster than the composer cluster itself?
Ash Berlin-Taylor
@ashb
Is it listening on HTTP? It sounds like it's not
IS that in the same port for both?
Chris Howard
@ckhoward

Hey everyone, hope you’re having a nice Friday. I was hoping I could get some help with configuring a Connection from an environment variable.

I understand that the value needs to have a URI format, that the path is defined in 'extra,' and that takes the form of a query parameter. In my Kubernetes template I have tried variations close to this:

  • name: AIRFLOW_CONN_SOURCE_DIR
    value: fs://?path=/etc/source

and likewise, I have tried:

export AIRFLOW_CONN_SOURCE_DIR=fs://?path=/etc/source

Is this value missing anything obvious?