airflow
cli tool to upload a dag
bind_password
in Apache Airflow? And why is the official documentation asking to set it to insecure
?Hi Experts,
I have Apache Airflow running on an EC2 instance (Ubuntu). Everything is running fine. The DB is SQLite and the executor is Sequential Executor (provided as default). But now I would like to run some DAGs which needs to be run at the same time every hour and every 2 minutes. My question is how can I upgrade my current setup to Celery executor and postgres DB to have the advantage of parallel execution?
Will it work, if I install and setup the postgres, rabbitmq and celery. And make the necessary changes in the airflow.cfg configuration file?
Or do I need to re-install everything from scratch (including airflow)?
Please guide me on this.
Hi all -
I'm trying to deploy airflow on kubernetes, and I'm running into this error:
Traceback (most recent call last):
File "/app/py3ven/bin/airflow", line 21, in <module>
from airflow import configuration
File "/app/py3ven/lib/python3.6/site-packages/airflow/__init__.py", line 31, in <module>
from airflow.utils.log.logging_mixin import LoggingMixin
File "/app/py3ven/lib/python3.6/site-packages/airflow/utils/__init__.py", line 24, in <module>
from .decorators import apply_defaults as _apply_defaults
File "/app/py3ven/lib/python3.6/site-packages/airflow/utils/decorators.py", line 34, in <module>
from airflow import settings
File "/app/py3ven/lib/python3.6/site-packages/airflow/settings.py", line 83, in <module>
prefix=conf.get('scheduler', 'statsd_prefix'))
File "/app/py3ven/lib/python3.6/site-packages/statsd/client/udp.py", line 35, in __init__
host, port, fam, socket.SOCK_DGRAM)[0]
File "/usr/local/lib/python3.6/socket.py", line 745, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -5] No address associated with hostname
My config is as follows:
base_url = http://localhost:8080
web_server_host = 0.0.0.0
web_server_port = 8080
Does anybody know what could be wrong?
To run the jar in this way, you need to:
Either change Spark Master Address in template projects or simply delete it. Currently, they are hard coded to local[4] which means run locally with 4 cores.
Change the dependency packaging scope of Apache Spark from "compile" to "provided". This is a common packaging strategy in Maven and SBT which means do not package Spark into your fat jar. Otherwise, this may lead to a huge jar and version conflicts!
Make sure the dependency versions in build.sbt and POM.xml are consistent with your Spark version.
I'm trying to make airflow2.0 as a service (systemd) in ubuntu 18.04.
Service file path:
sudo vi /etc/systemd/system/airflow-webserver.service
This is my code :
[Unit]
Description=Airflow webserver daemon
After=network.target
#Wants=postgresql-10.service
#After=network.target mysql.service
#Wants=mysql.service
[Service]
#EnvironmentFile=/usr/local/bin/airflow
#EnvironmentFile=/etc/environment
#Environment="PATH=/home/ubuntu/anaconda3/envs/airflow/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
#User=airflow
#Environment="PATH=/home/ubuntu/python/envs/airflow/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
#Group=airflow
#RuntimeDirectory=/home/ubuntu/airflow
#RuntimeDirectoryMode=0775
Environment=PATH=/home/airflow/.local/bin:$PATH
PIDFile=/home/ubuntu/airflow/airflow.pid
Type=simple
#ExecStart=/home/ubuntu/python/envs/airflow/bin/airflow webserver -p 8080 --pid /home/ubuntu/airflow/airflow-webserver.pid
ExecStart=/usr/local/bin/airflow webserver --pid /home/ubuntu/airflow/airflow.pid
Restart=on-failure
RestartSec=5s
PrivateTmp=true
[Install]
WantedBy=multi-user.target`
This error I'm getting:
● airflow-webserver.service - Airflow webserver daemon
Loaded: loaded (/etc/systemd/system/airflow-webserver.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2021-02-02 12:22:49 UTC; 4s ago
Main PID: 15693 (airflow)
Tasks: 2 (limit: 4402)
CGroup: /system.slice/airflow-webserver.service
└─15693 /usr/bin/python3 /usr/local/bin/airflow webserver --pid /home/ubuntu/airflow/airflow.pid
Feb 02 12:22:51 ip-150-31-11-187 airflow[15693]: [SQL: INSERT INTO log (dttm, dag_id, task_id, event, execution_date, owner, extra) VALUES (?, ?, ?, ?, ?, ?, ?)]
Feb 02 12:22:51 ip-150-31-11-187 airflow[15693]: [parameters: ('2021-02-02 12:22:51.772511', None, None, 'cli_webserver', None, 'root', '{"host_name": "ip-150-31-11-187", "full_c
Feb 02 12:22:51 ip-150-31-11-187 airflow[15693]: (Background on this error at: http://sqlalche.me/e/13/e3q8)
Feb 02 12:22:51 ip-150-31-11-187 airflow[15693]: ____________ _____________
Feb 02 12:22:51 ip-150-31-11-187 airflow[15693]: ____ |__( )_________ __/__ /________ __
Feb 02 12:22:51 ip-150-31-11-187 airflow[15693]: ____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
Feb 02 12:22:51 ip-150-31-11-187 airflow[15693]: ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
Feb 02 12:22:51 ip-150-31-11-187 airflow[15693]: _/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
Feb 02 12:22:51 ip-150-31-11-187 airflow[15693]: [2021-02-02 12:22:51,815] {dagbag.py:440} INFO - Filling up the DagBag from /dev/null
Feb 02 12:22:51 ip-150-31-11-187 airflow[15693]: [2021-02-02 12:22:51,867] {manager.py:727} WARNING - No user yet created, use flask fab command to do it.
@groodt @srikrishnavamsi_twitter
flow-init_1 | Traceback (most recent call last):
airflow-init_1 | File "/home/airflow/.local/bin/airflow", line 5, in <module>
airflow-init_1 | from airflow.__main__ import main
airflow-init_1 | ModuleNotFoundError: No module named 'airflow'
airflow-init_1 | Traceback (most recent call last):
airflow-init_1 | File "/home/airflow/.local/bin/airflow", line 5, in <module>
airflow-init_1 | from airflow.__main__ import main
airflow-init_1 | ModuleNotFoundError: No module named 'airflow'
airflow-init_1 | Traceback (most recent call last):
airflow-init_1 | File "/home/airflow/.local/bin/airflow", line 5, in <module>
airflow-init_1 | from airflow.__main__ import main
airflow-init_1 | ModuleNotFoundError: No module named 'airflow'
airflow-init_1 | Traceback (most recent call last):
airflow-init_1 | File "/home/airflow/.local/bin/airflow", line 5, in <module>
airflow-init_1 | from airflow.__main__ import main
airflow-init_1 | ModuleNotFoundError: No module named 'airflow'
airflow-init_1 | Traceback (most recent call last):
airflow-init_1 | File "/home/airflow/.local/bin/airflow", line 5, in <module>
airflow-init_1 | from airflow.__main__ import main
airflow-init_1 | ModuleNotFoundError: No module named 'airflow'
Hello guys new to airflow, trying to integrate airflow with rbac+ldap but facing the error below
airflow version : 2.0.1
mysql version : 5.7.25
DEBUG - LDAP indirect bind with: CN=abcd,OU=SERVICE ACCT,DC=,DC=,DC=*
[2021-03-29 19:03:29,860] {manager.py:834} DEBUG - LDAP BIND indirect OK
[2021-03-29 19:03:29,861] {manager.py:853} DEBUG - LDAP bind failure: user not found
[2021-03-29 19:03:29,932] {manager.py:226} INFO - Updated user Sameer Sharma
[2021-03-29 19:03:29,932] {manager.py:928} WARNING - Login Failed for user: Sameer3.Sharma
can anyone help on this one?
/home/ubuntu/.pyenv/versions/3.8.10/envs/airflowenv/lib/python3.8/site-packages/pandas/compat/init.py:109 UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError.
__ |( )_ / /__
__ /| | / / / /_ _ | /| / /
| / / / / / // / |/ |/ /
// |// // // // __/__/|/
[2021-07-08 10:00:01,260] {dagbag.py:496} INFO - Filling up the DagBag from /dev/null
Running the Gunicorn Server with:
Workers: 4 sync
Host: 0.0.0.0:8080
Timeout: 120
Logfiles: - -
I set dag folder path as this in airflow.cfg file: [core]
dags_folder = /home/ubuntu/airflow/dags
[2021-08-17 07:07:15,579] {jobs.py:1109} INFO - Tasks up for execution:
<TaskInstance: Dashboard_C2C.Dashboard_C2C 2021-08-17 06:40:14.195012+00:00 [scheduled]>
[2021-08-17 07:07:15,583] {jobs.py:1144} INFO - Figuring out tasks to run in Pool(name=None) with 128 open slots and 1 task instances in queue
[2021-08-17 07:07:15,586] {jobs.py:1180} INFO - DAG Dashboard_C2C has 0/2 running and queued tasks
[2021-08-17 07:07:15,586] {jobs.py:1218} INFO - Setting the follow tasks to queued state:
<TaskInstance: Dashboard_C2C.Dashboard_C2C 2021-08-17 06:40:14.195012+00:00 [scheduled]>
[2021-08-17 07:07:15,597] {jobs.py:1301} INFO - Setting the follow tasks to queued state:
<TaskInstance: Dashboard_C2C.Dashboard_C2C 2021-08-17 06:40:14.195012+00:00 [queued]>
[2021-08-17 07:07:15,597] {jobs.py:1343} INFO - Sending ('Dashboard_C2C', 'Dashboard_C2C', datetime.datetime(2021, 8, 17, 6, 40, 14, 195012, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 1) to executor with priority 1 and queue default
[2021-08-17 07:07:15,598] {base_executor.py:56} INFO - Adding to queue: airflow run Dashboard_C2C Dashboard_C2C 2021-08-17T06:40:14.195012+00:00 --local -sd /root/airflow-dags/dags/Dashboard_C2C.py
Process QueuedLocalWorker-2:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/local/lib/python3.6/dist-packages/airflow/executors/local_executor.py", line 113, in run
key, command = self.task_queue.get()
File "/usr/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
TypeError: __init__() missing 5 required positional arguments: 'tz', 'utc_offset', 'is_dst', 'dst', and 'abbrev'