Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
  • 01:47
    nlnjnj commented #22657
  • 01:41
    utkarsharma2 commented #27929
  • 01:38
    potiuk review_requested #28249
  • 01:38
    potiuk review_requested #28249
  • 01:38
    potiuk review_requested #28249
  • 01:38
    potiuk review_requested #28249
  • 01:37
    boring-cyborg[bot] labeled #28249
  • 01:37
    boring-cyborg[bot] labeled #28249
  • 01:37
    potiuk opened #28249
  • 01:27
    leandroj starred apache/airflow
  • 01:15
    2h-kim synchronize #28052
  • 01:11
    currydai starred apache/airflow
  • 00:30
    boring-cyborg[bot] labeled #28248
  • 00:30
    potiuk opened #28248
  • 00:15
    potiuk review_requested #28247
  • 00:15
    potiuk review_requested #28247
  • 00:15
    potiuk review_requested #28247
  • 00:15
    potiuk review_requested #28247
  • 00:15
    boring-cyborg[bot] labeled #28247
  • 00:15
    potiuk opened #28247
Ash Berlin-Taylor
We can't serialize custom operators, and nor (reliably) serialize callbacks or code for PythonOperator, so we took the decision to not change how the workers operate right now
Joao Da Silva
@ashb thank you


If I deploy Airflow with Authentication:


I receive an error:

DAG "as400_csv_daily" seems to be missing.

This error disappear if I set this variable:


But then the problem is in the scheduler when the Kubernetes worker has to execute a dag:

ModuleNotFoundError: No module named 'dns_operators'

This dns_operators is a python folder inside of the project.

Any idea?


@ashb could be because of the serialized dags?? As I am using a custom operator in a Kubernetes Executor.

I imagine that the better would be not to serialise the dags, but then the authentication fails

Ash Berlin-Taylor
Serialisation only affects webserver - executors and scheduler still use dag files
Problem sounds like the missing module

Thanks @ashb, you have helped me a lot just to discard scenarios. In this case the problem is coming with custom modules that are defined inside of the dags folder.

The Dag execution works fine when everything is deployed in a standalone version or from docker, but using the Kubernetes executor it happens:

  1. It works fine for a simple Dag that uses BashOperators.
  2. It does not work with custom modules that are embedded in the dags folder.

When the dag is being sent to the worker, it is being send just the python file that contains the dag, or it is being sent all the dags folder?

This is the Kubernetes configuration I set for the workers:

      worker_container_repository = apache/airflow
      worker_container_tag = 1.10.10-python3.7
      worker_container_image_pull_policy = IfNotPresent

What I noticed, and I think is part of the problem is this log:

[2020-06-29 17:48:47,124] {dagbag.py:396} INFO - Filling up the DagBag from /git/dags/dags/isa_store_zoning_daily.py

Why the dagbag is being filled up with the specific dag, and not from /git/dags/dags/

in the scheduler config:
dags_folder = /git/dags/dags
I have a plugin registered with airflow. Even after removing the plugin script from the plugins folder, its still available. How to deregister or remove the necessary dependencies of a plugin. Even after removing all contents, it still gets the data from pycache folder( inspite of removing it)
Joao Da Silva
Hello, I'm using the remote logging option with remote_base_log_folder storing the log files on S3. This adds the logs to S3 but does not remove them locally. I guess I was wondering how other people are dealing with large amount of log files filling up their disks in production. Any tips please ?
Ho to fix the error fom production docker imagesModuleNotFoundError: No module named 'airflow.providers'
I have Admin role in Airflow. Recently, I just noticed that I cannot pause/unpause a DAG in the Airflow UI. I am using v 1.10.10.
1 reply
I can pause or unpause it using the airflow cli.
hi, how we pass parameter from Jenkins.

adan_geno Hi everybody,

check out the new platform:

hi, all, my airflow version is 1.10.8, and there are 3000 dags, the duration between task and task is about 3mins, it's a bit too long for us, so we want to optimize the platform, is there any idea?
run_duration = -1
num_runs = -1
processor_poll_interval = 1
min_file_process_interval = 0
dag_dir_list_interval = 300
store_serialized_dags = True
min_serialized_dag_update_interval = 600
i found a bug in the new version above 1.10.8: if you run the no_status task whose upstream tasks are already success in the UI, then you will get an dependencies not met error, is there any one get this situation?
i update airflow source code:
vim airflow/serialization/serialized_objects.py +582
replace the code :dag.task_dict[task_id]._upstream_task_ids.add(task_id) # pylint: disable=protected-access
Hi All, I had a question regarding the ExternalTaskSensor. The operator requires the parent DAG's run date to exactly match the child DAG's run date(along with timestamp) , otherwise it keeps poking the parent task. Is that correct? I was thinking it will consider the most recent run for the same date if timedelta/execution_dt_fn is not provided.
Mohit Bhandari
Hi Everyone , good evening i am new to apache-airflow , i have setup it on my local env but when i run it from UI it got stucked at Adding at queue
[2020-08-14 19:30:09,730] {base_executor.py:58} INFO - Adding to queue: ['airflow', 'run', 'hello_world', 'hello_task', '2020-08-14T14:00:07.217726+00:00', '--local', '--pool', 'default_pool', '-sd', '/Users/user/airflow/dags/hello_world.py']
same i can run with command line
any help?
any help
Hi All, anybody doing deploy of dags folders from AWS S3 with celery executor? Which strategy are you doing, rsync, cp or mount s3 folder on airflow server?
Kenney He
any one can help me with bitnami airflow oauth setup? I used webserver_config.py with airflow webserver_config instruction and environment but still fail
Ankur Nayyar
Hello.. in Airflow how to capture ldap user name
Can somebody help me understand what the point is of the official Airflow Docker Image?
From my understanding I still need to do my own level of orchestration
of which daemons i want to start etc
Ashish Mishra
I just want to make sure of one thing. Suppose a dag.py file which contain lets say two task A and B. is dag.py parsed/run for each task. I see in log in each task "Filling up the DagBag from /usr/local/airflow/dags/dag.py". is it correct or we have configured it wrong
if dag.py file has sleep(3000) then will it sleep for both task run. or parsing happens only once.
Thiago Salgado
Hello all, have a new docker image on hub. Anyone know how can i deploy this image correctly ?
Elhay Efrat
someone saw this isssue when running long tasks [2020-09-14 16:45:26,189] {{taskinstance.py:1150}} ERROR - (psycopg2.OperationalError) could not translate host name "airflow-dev-postgresql" to address: Temporary failure in name resolution
Nikolaas Steenbergen

Hi all, I'm trying to run a complete dag from within a unit test.
For a simple sample graph it works, however for the actual test graph i get:

[2020-09-24 16:55:57,343] {backfill_job.py:461} ERROR - Task instance <TaskInstance: dag_name.print_the_context 2020-05-22 12:00:00+00:00 [failed]> with failed state

Is there a default log directory with more information on what exactly goes wrong ?
I cannot seem to find anything..

Nikolaas Steenbergen
ok, it seems its very important to call .clear() on the dag before execution, then this wont fail
Daniel Papp
I deployed airflow via the helm chart from the stable repository.
It seems to be working fine
I just don't know how to use the airflow cli tool to upload a dag
Can someone help me with that?
Akshay Krishnan R
Hi, what is bind_password in Apache Airflow? And why is the official documentation asking to set it to insecure?
Ganapathi Chidambaram
Anybody pls guide me to get the filename using pattern through filesensor
i need to listen for the files on particular local folder and get the list of filenames and call it to another task to process that file.
Ganapathi Chidambaram
please help me to write plugin on airflow 1.10.6 version
What will airflow do if a task completes, but the SQLAlchemy connection is unavailable? Our DB is going down for maintenance for a couple hours tomorrow and i'm planning to stop the scheduler so nothing is running then, but wondering what would happen if we just let it go
Emma Grotto
Hey Everyone!
I downgraded airflow from 1.10.12 to 1.10.10 and I am getting:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.DuplicateColumn) column "operator" of relation "task_instance" already exists
running resetdb is not an option for me. Does anyone know how to fix this?
1 reply
Hello teram,