potiuk on main
redirect to the origin page wit… (compare)
potiuk on main
provide missing connection to t… (compare)
airflow create_user
.
hi all, question regarding task parallelism,
my configuration :
parallelism = 2000
dag_concurrency = 100
worker_concurrency = 100
max_threads = 20
workers = 20
but why the max number of running task in DAG is 14
PermissionError: [Errno 13] Permission denied: '/home/airflow'
# VERSION 1.10.9
# AUTHOR: Matthieu "Puckel_" Roisil
# DESCRIPTION: Basic Airflow container
# BUILD: docker build --rm -t puckel/docker-airflow .
# SOURCE: https://github.com/puckel/docker-airflow
FROM python:3.7-slim-buster
LABEL maintainer="Puckel_"
# Never prompt the user for choices on installation/configuration of packages
ENV DEBIAN_FRONTEND noninteractive
ENV TERM linux
# Airflow
ARG AIRFLOW_VERSION=1.10.10
ARG AIRFLOW_USER_HOME=/usr/local/airflow
ARG AIRFLOW_DEPS=""
ARG PYTHON_DEPS=""
ENV AIRFLOW_HOME=${AIRFLOW_USER_HOME}
# Define en_US.
ENV LANGUAGE en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LC_ALL en_US.UTF-8
ENV LC_CTYPE en_US.UTF-8
ENV LC_MESSAGES en_US.UTF-8
# Disable noisy "Handling signal" log messages:
# ENV GUNICORN_CMD_ARGS --log-level WARNING
COPY id_rsa ~/.ssh/id_rsa
COPY id_rsa.pub ~/.ssh/id_rsa.pub
RUN set -ex \
&& buildDeps=' \
freetds-dev \
libkrb5-dev \
libsasl2-dev \
libssl-dev \
libffi-dev \
libpq-dev \
git \
' \
&& apt-get update -yqq \
&& apt-get upgrade -yqq \
&& apt-get install -yqq --no-install-recommends \
$buildDeps \
freetds-bin \
build-essential \
default-libmysqlclient-dev \
apt-utils \
curl \
rsync \
netcat \
locales \
git \
openssh-server \
&& sed -i 's/^# en_US.UTF-8 UTF-8$/en_US.UTF-8 UTF-8/g' /etc/locale.gen \
&& locale-gen \
&& update-locale LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 \
&& useradd -ms /bin/bash -d ${AIRFLOW_USER_HOME} airflow \
&& pip install -U pip setuptools wheel \
&& pip install pytz \
&& pip install pyOpenSSL \
&& pip install ndg-httpsclient \
&& pip install pyasn1 \
&& pip install dataclasses \
&& pip install apache-airflow==1.10.10 \
--constraint https://raw.githubusercontent.com/apache/airflow/1.10.10/requirements/requirements-python3.7.txt \
&& pip install 'redis==3.2' \
&& if [ -n "${PYTHON_DEPS}" ]; then pip install ${PYTHON_DEPS}; fi \
&& apt-get purge --auto-remove -yqq $buildDeps \
&& apt-get autoremove -yqq --purge \
&& apt-get clean \
&& rm -rf \
/var/lib/apt/lists/* \
/tmp/* \
/var/tmp/* \
/usr/share/man \
/usr/share/doc \
/usr/share/doc-base
COPY entrypoint.sh /entrypoint.sh
COPY airflow.cfg ${AIRFLOW_USER_HOME}/airflow.cfg
RUN chown -R airflow: ${AIRFLOW_USER_HOME}
EXPOSE 8080 5555 8793
USER airflow
WORKDIR ${AIRFLOW_USER_HOME}
ENTRYPOINT ["/entrypoint.sh"]
CMD ["webserver"]
File "/usr/local/lib/python3.7/logging/config.py", line 571, in configure
webserver_1 | '%r' % name) from e
webserver_1 | ValueError: Unable to configure handler 'processor'
Hi, I am receiving an error in the web server UI:
DAG "as400_csv_daily" seems to be missing.
It happens every time I click in one dag to see the details.
The dag folder is different than the default one:
dags_folder = /git/dags/dags
I entered in the container and checked that the dags are placed in that folder.
I can see in the logs just this entry:
Filling up the DagBag from /dev/null
Workers –> DAG files - Reveal the DAG structure and execute the tasks
and I would like to know if workers can handle serialised Dags as well ? thanks
Hi,
If I deploy Airflow with Authentication:
AIRFLOW__WEBSERVER__RBAC: True
AIRFLOW__WEBSERVER__AUTHENTICATE: True
I receive an error:
DAG "as400_csv_daily" seems to be missing.
This error disappear if I set this variable:
AIRFLOW__CORE__STORE_SERIALIZED_DAGS: True
But then the problem is in the scheduler when the Kubernetes worker has to execute a dag:
ModuleNotFoundError: No module named 'dns_operators'
This dns_operators
is a python folder inside of the project.
Any idea?
Thanks @ashb, you have helped me a lot just to discard scenarios. In this case the problem is coming with custom modules that are defined inside of the dags folder.
The Dag execution works fine when everything is deployed in a standalone version or from docker, but using the Kubernetes executor it happens:
When the dag is being sent to the worker, it is being send just the python file that contains the dag, or it is being sent all the dags folder?
This is the Kubernetes configuration I set for the workers:
worker_container_repository = apache/airflow
worker_container_tag = 1.10.10-python3.7
worker_container_image_pull_policy = IfNotPresent
dags_folder = /git/dags/dags