Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 09:19
    rubenbriones reopened #29202
  • 09:19
    rubenbriones commented #29202
  • 09:17
    LightByteCode starred apache/airflow
  • 09:15
    amoghrajesh commented #22790
  • 09:15
    boring-cyborg[bot] labeled #29214
  • 09:14
    amoghrajesh review_requested #29214
  • 09:14
    amoghrajesh review_requested #29214
  • 09:14
    amoghrajesh opened #29214
  • 08:56

    potiuk on main

    redirect to the origin page wit… (compare)

  • 08:56
    potiuk closed #29197
  • 08:56
    potiuk closed #29212
  • 08:54
    potiuk assigned #29197
  • 08:54

    potiuk on main

    provide missing connection to t… (compare)

  • 08:54
    potiuk closed #29198
  • 08:54
    potiuk closed #29211
  • 08:54
    potiuk commented #29211
  • 08:46
    surawut-jirasaktavee starred apache/airflow
  • 08:30
    potiuk review_requested #28846
  • 08:30
    potiuk commented #28846
  • 08:30
    potiuk commented #28846
Tanuj Gupta
@guptakumartanuj
:-(
Tanuj Gupta
@guptakumartanuj
Hi Everyone, airflow depends on Flask-AppBuilder for any kind of rbac related change. All the metadata related to rbac comes from Flask-AppBuilder only. Just wanted to check, do we include any kind of alembic change if they change their metadata ? As per my observation, airflow doesn't keep any alembic regarding this. Please confirm.
@ashb Can you please give any insights on this ?
Ash Berlin-Taylor
@ashb
@guptakumartanuj, correct, we don't include migrations for those. we late FAB manage it itself. Did you have something in mind?
Tanuj Gupta
@guptakumartanuj
@ashb Yeah, actually when we enable rbac then few new tables are added starting with prefix ab_ in the database. There we have a table called ab_view_menu in which name column is aligned with our dag_id. As per the metadata dag_id is of 250 characters in DAG table but name is of 100 characters in ab_view_menu table. So whenever any DAG ID having more then 100 chars is inserted in the ab_view_menu table, it is truncated and then because of the truncation issue, we end up inserting the same name if DAG id prefix having 100 chars are some in table. So ultimately, to keep everything in safe, I wanted to modify it's column name size.
Ash Berlin-Taylor
@ashb
Yes, sensible ask. This is the first time it's come up
Tanuj Gupta
@guptakumartanuj
So, that's where I have raised a PR against Flask-AppBuilder repo. It would be good if you can have a look there.
Daniel who is the owner of Flask-AppBuilder repo, doesn't have any problem with this. BUt yeah, I just wanted to cross check with airflow community.
Ash Berlin-Taylor
@ashb
Got it, thanks, have commented there
Nicke Alves
@nickedev
I need help with rbac = True, after I can't enable any dag, can't toggle, why?
Vinayak Mehta
@vinayak-mehta
It's possible that the role associated with your user might not have the necessary permissions to pause / unpause a dag.
You can try creating a user with the admin role using airflow create_user.
Vinayak Mehta
@vinayak-mehta
You can check out the permissions for all default roles here: https://airflow.apache.org/docs/stable/security.html#default-roles
aldissatriya
@aldissatriya
image.png

hi all, question regarding task parallelism,
my configuration :
parallelism = 2000
dag_concurrency = 100
worker_concurrency = 100
max_threads = 20
workers = 20

but why the max number of running task in DAG is 14

Ash Berlin-Taylor
@ashb
Show us the dag? Have you set concurrency=14 at the dag level?
iam432
@iam432
Hi All,
iam432
@iam432
We recently upgraded to airflow version 1.10.7 and after migrating I am seeing issue with DAG permissions and DAG code modifications are not updating in WEB UI until I execute airflow sync_perm command manually. Am I missing anything ? Moreover when I schedule to run airflow sync_perm command over cron job it is not executing and throwing a message - "The sync_perm command only works for rbac UI". Moreover I am not able to see anything in UI related to this command either in roles or settings to automate this. After struggling for some time I saw a blog where it is mentioned with 1.10.x we can use flask-apidesigner instead of flask-admin UI. Can someone here help on my issue, is it something I missed during upgrade.
iam432
@iam432
Hi All, Anyone to help me on above error ?
Armel
@Armelabdelkbir
hello community i'm new in airflow since i will work with it to run some sparkJobs, i'm creating my first DAG i wanna use bashoperator with wget command but, in logs it seems like wget not found t1 = BashOperator(task_id="check_file_existe", bashcommand="mkdir ~/test && cd $ && wget https://github.com/Armelabdelkbir/homecloud/blob/master/deploy.xml -O test.xml ", retries=2, retry_delay=timedelta(seconds=15), dag=dag)
ankurdhir
@ankurdhir
Hello,
I recently upgraded to airflow 1.10.10 from airflow 1.10.2
After upgrade , the logs from any operator are not being printed to stdout but instead are redirected to the scheduler logs.
The logs are not visible in UI because of that , as I have redirected scheduler logs to other file.
I am using local logging , remote logging is off.
any help would be appriciated!
Prashant Koti 🇮🇳
@PrashantGKoti_twitter
@ankurdhir
Even i'm facing the same issue. Please share solution, if any
2 replies
Avinash Pallerlamudi
@stpavinash
Hello All,
I've created a policy tag (column level security) on one of my PII column in a table in Google Big Query yesterday and this morning I've loaded the the same table (full refresh) using GoogleCloudStorageToBigQueryOperator. I see that my policy tags are gone. Will WRITE_TRUNCATE remove all the columns level securities on every refresh/load?
Naresh-Edla
@Naresh-Edla
Hello All
I'm seeing below error frequently . Does anyone know work-around for this issue?
Executor reports task instance <TaskInstance: [queued]> finished (failed) although the task says its queued. Was the task killed externally?
Awaish Kumar
@AwaishK
Hi
I am trying to create a docker image using this docker file but getting error PermissionError: [Errno 13] Permission denied: '/home/airflow'
# VERSION 1.10.9
# AUTHOR: Matthieu "Puckel_" Roisil
# DESCRIPTION: Basic Airflow container
# BUILD: docker build --rm -t puckel/docker-airflow .
# SOURCE: https://github.com/puckel/docker-airflow

FROM python:3.7-slim-buster
LABEL maintainer="Puckel_"

# Never prompt the user for choices on installation/configuration of packages
ENV DEBIAN_FRONTEND noninteractive
ENV TERM linux

# Airflow
ARG AIRFLOW_VERSION=1.10.10
ARG AIRFLOW_USER_HOME=/usr/local/airflow
ARG AIRFLOW_DEPS=""
ARG PYTHON_DEPS=""
ENV AIRFLOW_HOME=${AIRFLOW_USER_HOME}

# Define en_US.
ENV LANGUAGE en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LC_ALL en_US.UTF-8
ENV LC_CTYPE en_US.UTF-8
ENV LC_MESSAGES en_US.UTF-8

# Disable noisy "Handling signal" log messages:
# ENV GUNICORN_CMD_ARGS --log-level WARNING

COPY id_rsa ~/.ssh/id_rsa
COPY id_rsa.pub ~/.ssh/id_rsa.pub
RUN set -ex \
    && buildDeps=' \
        freetds-dev \
        libkrb5-dev \
        libsasl2-dev \
        libssl-dev \
        libffi-dev \
        libpq-dev \
        git \
    ' \
    && apt-get update -yqq \
    && apt-get upgrade -yqq \
    && apt-get install -yqq --no-install-recommends \
        $buildDeps \
        freetds-bin \
        build-essential \
        default-libmysqlclient-dev \
        apt-utils \
        curl \
        rsync \
        netcat \
        locales \
        git  \
        openssh-server \
    && sed -i 's/^# en_US.UTF-8 UTF-8$/en_US.UTF-8 UTF-8/g' /etc/locale.gen \
    && locale-gen \
    && update-locale LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 \
    && useradd -ms /bin/bash -d ${AIRFLOW_USER_HOME} airflow \
    && pip install -U pip setuptools wheel \
    && pip install pytz \
    && pip install pyOpenSSL \
    && pip install ndg-httpsclient \
    && pip install pyasn1 \
    && pip install dataclasses \
    && pip install apache-airflow==1.10.10 \
 --constraint https://raw.githubusercontent.com/apache/airflow/1.10.10/requirements/requirements-python3.7.txt \
    && pip install 'redis==3.2' \
    && if [ -n "${PYTHON_DEPS}" ]; then pip install ${PYTHON_DEPS}; fi \
    && apt-get purge --auto-remove -yqq $buildDeps \
    && apt-get autoremove -yqq --purge \
    && apt-get clean \
    && rm -rf \
        /var/lib/apt/lists/* \
        /tmp/* \
        /var/tmp/* \
        /usr/share/man \
        /usr/share/doc \
        /usr/share/doc-base

COPY entrypoint.sh /entrypoint.sh
COPY airflow.cfg ${AIRFLOW_USER_HOME}/airflow.cfg


RUN chown -R airflow: ${AIRFLOW_USER_HOME}

EXPOSE 8080 5555 8793

USER airflow
WORKDIR ${AIRFLOW_USER_HOME}
ENTRYPOINT ["/entrypoint.sh"]
CMD ["webserver"]
error
File "/usr/local/lib/python3.7/logging/config.py", line 571, in configure
webserver_1  |     '%r' % name) from e
webserver_1  | ValueError: Unable to configure handler 'processor'
dvirgiln
@dvirgiln

Hi, I am receiving an error in the web server UI:

DAG "as400_csv_daily" seems to be missing.

It happens every time I click in one dag to see the details.

The dag folder is different than the default one:

dags_folder = /git/dags/dags

I entered in the container and checked that the dags are placed in that folder.

I can see in the logs just this entry:

Filling up the DagBag from /dev/null
Joao Da Silva
@jsilva
Hi, I am looking at the docs in https://airflow.apache.org/docs/stable/executor/celery.html and the section with Workers –> DAG files - Reveal the DAG structure and execute the tasks and I would like to know if workers can handle serialised Dags as well ? thanks
Marwane Bel-lahcen
@bellahcenmarwane
hello
i get this error
File "/usr/lib/python3.6/socket.py", line 745, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
for sending mail
Ash Berlin-Taylor
@ashb
@jsilva No, workers don't currently operator on serialized dags, they still need the real dag files to work against.
We can't serialize custom operators, and nor (reliably) serialize callbacks or code for PythonOperator, so we took the decision to not change how the workers operate right now
Joao Da Silva
@jsilva
@ashb thank you
dvirgiln
@dvirgiln

Hi,

If I deploy Airflow with Authentication:

  AIRFLOW__WEBSERVER__RBAC: True
  AIRFLOW__WEBSERVER__AUTHENTICATE: True

I receive an error:

DAG "as400_csv_daily" seems to be missing.

This error disappear if I set this variable:

AIRFLOW__CORE__STORE_SERIALIZED_DAGS: True

But then the problem is in the scheduler when the Kubernetes worker has to execute a dag:

ModuleNotFoundError: No module named 'dns_operators'

This dns_operators is a python folder inside of the project.

Any idea?

dvirgiln
@dvirgiln

@ashb could be because of the serialized dags?? As I am using a custom operator in a Kubernetes Executor.

I imagine that the better would be not to serialise the dags, but then the authentication fails

Ash Berlin-Taylor
@ashb
Serialisation only affects webserver - executors and scheduler still use dag files
Problem sounds like the missing module
dvirgiln
@dvirgiln

Thanks @ashb, you have helped me a lot just to discard scenarios. In this case the problem is coming with custom modules that are defined inside of the dags folder.

The Dag execution works fine when everything is deployed in a standalone version or from docker, but using the Kubernetes executor it happens:

  1. It works fine for a simple Dag that uses BashOperators.
  2. It does not work with custom modules that are embedded in the dags folder.

When the dag is being sent to the worker, it is being send just the python file that contains the dag, or it is being sent all the dags folder?

This is the Kubernetes configuration I set for the workers:

      worker_container_repository = apache/airflow
      worker_container_tag = 1.10.10-python3.7
      worker_container_image_pull_policy = IfNotPresent
dvirgiln
@dvirgiln

What I noticed, and I think is part of the problem is this log:

[2020-06-29 17:48:47,124] {dagbag.py:396} INFO - Filling up the DagBag from /git/dags/dags/isa_store_zoning_daily.py

Why the dagbag is being filled up with the specific dag, and not from /git/dags/dags/

in the scheduler config:
dags_folder = /git/dags/dags
Sri-nidhi
@Sri-nidhi
I have a plugin registered with airflow. Even after removing the plugin script from the plugins folder, its still available. How to deregister or remove the necessary dependencies of a plugin. Even after removing all contents, it still gets the data from pycache folder( inspite of removing it)
Joao Da Silva
@jsilva
Hello, I'm using the remote logging option with remote_base_log_folder storing the log files on S3. This adds the logs to S3 but does not remove them locally. I guess I was wondering how other people are dealing with large amount of log files filling up their disks in production. Any tips please ?
simisoz
@simisoz
Ho to fix the error fom production docker imagesModuleNotFoundError: No module named 'airflow.providers'
@alltej
@llntjn_twitter
I have Admin role in Airflow. Recently, I just noticed that I cannot pause/unpause a DAG in the Airflow UI. I am using v 1.10.10.
1 reply
I can pause or unpause it using the airflow cli.