Hi Experts,
I have Apache Airflow running on an EC2 instance (Ubuntu). Everything is running fine. The DB is SQLite and the executor is Sequential Executor (provided as default). But now I would like to run some DAGs which needs to be run at the same time every hour and every 2 minutes. My question is how can I upgrade my current setup to Celery executor and postgres DB to have the advantage of parallel execution?
Will it work, if I install and setup the postgres, rabbitmq and celery. And make the necessary changes in the airflow.cfg configuration file?
Or do I need to re-install everything from scratch (including airflow)?
Please guide me on this.
Hi all -
I'm trying to deploy airflow on kubernetes, and I'm running into this error:
Traceback (most recent call last):
File "/app/py3ven/bin/airflow", line 21, in <module>
from airflow import configuration
File "/app/py3ven/lib/python3.6/site-packages/airflow/__init__.py", line 31, in <module>
from airflow.utils.log.logging_mixin import LoggingMixin
File "/app/py3ven/lib/python3.6/site-packages/airflow/utils/__init__.py", line 24, in <module>
from .decorators import apply_defaults as _apply_defaults
File "/app/py3ven/lib/python3.6/site-packages/airflow/utils/decorators.py", line 34, in <module>
from airflow import settings
File "/app/py3ven/lib/python3.6/site-packages/airflow/settings.py", line 83, in <module>
prefix=conf.get('scheduler', 'statsd_prefix'))
File "/app/py3ven/lib/python3.6/site-packages/statsd/client/udp.py", line 35, in __init__
host, port, fam, socket.SOCK_DGRAM)[0]
File "/usr/local/lib/python3.6/socket.py", line 745, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -5] No address associated with hostname
My config is as follows:
base_url = http://localhost:8080
web_server_host = 0.0.0.0
web_server_port = 8080
Does anybody know what could be wrong?
To run the jar in this way, you need to:
Either change Spark Master Address in template projects or simply delete it. Currently, they are hard coded to local[4] which means run locally with 4 cores.
Change the dependency packaging scope of Apache Spark from "compile" to "provided". This is a common packaging strategy in Maven and SBT which means do not package Spark into your fat jar. Otherwise, this may lead to a huge jar and version conflicts!
Make sure the dependency versions in build.sbt and POM.xml are consistent with your Spark version.
I'm trying to make airflow2.0 as a service (systemd) in ubuntu 18.04.
Service file path:
sudo vi /etc/systemd/system/airflow-webserver.service
This is my code :
[Unit]
Description=Airflow webserver daemon
After=network.target
#Wants=postgresql-10.service
#After=network.target mysql.service
#Wants=mysql.service
[Service]
#EnvironmentFile=/usr/local/bin/airflow
#EnvironmentFile=/etc/environment
#Environment="PATH=/home/ubuntu/anaconda3/envs/airflow/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
#User=airflow
#Environment="PATH=/home/ubuntu/python/envs/airflow/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
#Group=airflow
#RuntimeDirectory=/home/ubuntu/airflow
#RuntimeDirectoryMode=0775
Environment=PATH=/home/airflow/.local/bin:$PATH
PIDFile=/home/ubuntu/airflow/airflow.pid
Type=simple
#ExecStart=/home/ubuntu/python/envs/airflow/bin/airflow webserver -p 8080 --pid /home/ubuntu/airflow/airflow-webserver.pid
ExecStart=/usr/local/bin/airflow webserver --pid /home/ubuntu/airflow/airflow.pid
Restart=on-failure
RestartSec=5s
PrivateTmp=true
[Install]
WantedBy=multi-user.target`
This error I'm getting:
● airflow-webserver.service - Airflow webserver daemon
Loaded: loaded (/etc/systemd/system/airflow-webserver.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2021-02-02 12:22:49 UTC; 4s ago
Main PID: 15693 (airflow)
Tasks: 2 (limit: 4402)
CGroup: /system.slice/airflow-webserver.service
└─15693 /usr/bin/python3 /usr/local/bin/airflow webserver --pid /home/ubuntu/airflow/airflow.pid
Feb 02 12:22:51 ip-150-31-11-187 airflow[15693]: [SQL: INSERT INTO log (dttm, dag_id, task_id, event, execution_date, owner, extra) VALUES (?, ?, ?, ?, ?, ?, ?)]
Feb 02 12:22:51 ip-150-31-11-187 airflow[15693]: [parameters: ('2021-02-02 12:22:51.772511', None, None, 'cli_webserver', None, 'root', '{"host_name": "ip-150-31-11-187", "full_c
Feb 02 12:22:51 ip-150-31-11-187 airflow[15693]: (Background on this error at: http://sqlalche.me/e/13/e3q8)
Feb 02 12:22:51 ip-150-31-11-187 airflow[15693]: ____________ _____________
Feb 02 12:22:51 ip-150-31-11-187 airflow[15693]: ____ |__( )_________ __/__ /________ __
Feb 02 12:22:51 ip-150-31-11-187 airflow[15693]: ____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
Feb 02 12:22:51 ip-150-31-11-187 airflow[15693]: ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
Feb 02 12:22:51 ip-150-31-11-187 airflow[15693]: _/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
Feb 02 12:22:51 ip-150-31-11-187 airflow[15693]: [2021-02-02 12:22:51,815] {dagbag.py:440} INFO - Filling up the DagBag from /dev/null
Feb 02 12:22:51 ip-150-31-11-187 airflow[15693]: [2021-02-02 12:22:51,867] {manager.py:727} WARNING - No user yet created, use flask fab command to do it.
@groodt @srikrishnavamsi_twitter
flow-init_1 | Traceback (most recent call last):
airflow-init_1 | File "/home/airflow/.local/bin/airflow", line 5, in <module>
airflow-init_1 | from airflow.__main__ import main
airflow-init_1 | ModuleNotFoundError: No module named 'airflow'
airflow-init_1 | Traceback (most recent call last):
airflow-init_1 | File "/home/airflow/.local/bin/airflow", line 5, in <module>
airflow-init_1 | from airflow.__main__ import main
airflow-init_1 | ModuleNotFoundError: No module named 'airflow'
airflow-init_1 | Traceback (most recent call last):
airflow-init_1 | File "/home/airflow/.local/bin/airflow", line 5, in <module>
airflow-init_1 | from airflow.__main__ import main
airflow-init_1 | ModuleNotFoundError: No module named 'airflow'
airflow-init_1 | Traceback (most recent call last):
airflow-init_1 | File "/home/airflow/.local/bin/airflow", line 5, in <module>
airflow-init_1 | from airflow.__main__ import main
airflow-init_1 | ModuleNotFoundError: No module named 'airflow'
airflow-init_1 | Traceback (most recent call last):
airflow-init_1 | File "/home/airflow/.local/bin/airflow", line 5, in <module>
airflow-init_1 | from airflow.__main__ import main
airflow-init_1 | ModuleNotFoundError: No module named 'airflow'
Hello guys new to airflow, trying to integrate airflow with rbac+ldap but facing the error below
airflow version : 2.0.1
mysql version : 5.7.25
DEBUG - LDAP indirect bind with: CN=abcd,OU=SERVICE ACCT,DC=,DC=,DC=*
[2021-03-29 19:03:29,860] {manager.py:834} DEBUG - LDAP BIND indirect OK
[2021-03-29 19:03:29,861] {manager.py:853} DEBUG - LDAP bind failure: user not found
[2021-03-29 19:03:29,932] {manager.py:226} INFO - Updated user Sameer Sharma
[2021-03-29 19:03:29,932] {manager.py:928} WARNING - Login Failed for user: Sameer3.Sharma
can anyone help on this one?
/home/ubuntu/.pyenv/versions/3.8.10/envs/airflowenv/lib/python3.8/site-packages/pandas/compat/init.py:109 UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError.
__ |( )_ / /__
__ /| | / / / /_ _ | /| / /
| / / / / / // / |/ |/ /
// |// // // // __/__/|/
[2021-07-08 10:00:01,260] {dagbag.py:496} INFO - Filling up the DagBag from /dev/null
Running the Gunicorn Server with:
Workers: 4 sync
Host: 0.0.0.0:8080
Timeout: 120
Logfiles: - -
I set dag folder path as this in airflow.cfg file: [core]
dags_folder = /home/ubuntu/airflow/dags
[2021-08-17 07:07:15,579] {jobs.py:1109} INFO - Tasks up for execution:
<TaskInstance: Dashboard_C2C.Dashboard_C2C 2021-08-17 06:40:14.195012+00:00 [scheduled]>
[2021-08-17 07:07:15,583] {jobs.py:1144} INFO - Figuring out tasks to run in Pool(name=None) with 128 open slots and 1 task instances in queue
[2021-08-17 07:07:15,586] {jobs.py:1180} INFO - DAG Dashboard_C2C has 0/2 running and queued tasks
[2021-08-17 07:07:15,586] {jobs.py:1218} INFO - Setting the follow tasks to queued state:
<TaskInstance: Dashboard_C2C.Dashboard_C2C 2021-08-17 06:40:14.195012+00:00 [scheduled]>
[2021-08-17 07:07:15,597] {jobs.py:1301} INFO - Setting the follow tasks to queued state:
<TaskInstance: Dashboard_C2C.Dashboard_C2C 2021-08-17 06:40:14.195012+00:00 [queued]>
[2021-08-17 07:07:15,597] {jobs.py:1343} INFO - Sending ('Dashboard_C2C', 'Dashboard_C2C', datetime.datetime(2021, 8, 17, 6, 40, 14, 195012, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 1) to executor with priority 1 and queue default
[2021-08-17 07:07:15,598] {base_executor.py:56} INFO - Adding to queue: airflow run Dashboard_C2C Dashboard_C2C 2021-08-17T06:40:14.195012+00:00 --local -sd /root/airflow-dags/dags/Dashboard_C2C.py
Process QueuedLocalWorker-2:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/local/lib/python3.6/dist-packages/airflow/executors/local_executor.py", line 113, in run
key, command = self.task_queue.get()
File "/usr/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
TypeError: __init__() missing 5 required positional arguments: 'tz', 'utc_offset', 'is_dst', 'dst', and 'abbrev'
Hello All,
Looking for some help regarding the issue I'm facing after migrating from Airflow 1.10.11 to 2.2.3. I'm unable to execute the DAGs from UI as the task is going into queued state. I created the pod_template_file as suggested in the migration steps but still getting the below error:
"airflow.exceptions.AirflowException: Dag 'xxxxxxx' could not be found; either it does not exist or it failed to parse"
But I could see the DAGs inside the webserver and scheduler in the exact location where the task is trying to read from which is "/opt/airflow/dags/repo/". Surprisingly, I'm able to trigger the DAG from webserver's command line and the DAG is completing successfully. Any help please?
Hi All, Looking for some help in troubleshooting my airflow setup
I have setup the airflow in in Azure Cloud (Azure Container Apps) and attached an Azure File Share as an external mount/volume
airflow.cfg
and 'webserver_config.py' file in the AIRFLOW_HOME (/opt/airflow), which is actually an azure mounted file systemairflow-webserver.pid
file in the AIRFLOW_HOME (/opt/airflow), which is actually an azure mounted file systemPermissionError: [Errno 1] Operation not permitted: '/opt/airflow/airflow-webserver.pid'
Attached screenshot for reference