聊聊 Airflow 2.2.3 容器化安裝
上文簡單的了解了airflow的概念與使用場景,今天就通過Docker安裝一下Airflow,在使用中在深入的了解一下airflow有哪些具體的功能。
1Airflow容器化部署
阿里云的宿主機(jī)環(huán)境:
- 操作系統(tǒng): Ubuntu 20.04.3 LTS
 - 內(nèi)核版本: Linux 5.4.0-91-generic
 
安裝docker
安裝Docker可參考官方文檔[1],純凈系統(tǒng),就沒必要卸載舊版本了,因?yàn)槭窃粕掀脚_,為防止配置搞壞環(huán)境,你可以先提前進(jìn)行快照。
- # 更新repo
 - sudo apt-get update
 - sudo apt-get install \
 - ca-certificates \
 - curl \
 - gnupg \
 - lsb-release
 - # 添加docker gpg key
 - curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
 - # 設(shè)置docker stable倉庫地址
 - echo \
 - "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
 - $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
 - # 查看可安裝的docker-ce版本
 - root@bigdata1:~# apt-cache madison docker-ce
 - docker-ce | 5:20.10.12~3-0~ubuntu-focal | https://download.docker.com/linux/ubuntu focal/stable amd64 Packages
 - docker-ce | 5:20.10.11~3-0~ubuntu-focal | https://download.docker.com/linux/ubuntu focal/stable amd64 Packages
 - docker-ce | 5:20.10.10~3-0~ubuntu-focal | https://download.docker.com/linux/ubuntu focal/stable amd64 Packages
 - docker-ce | 5:20.10.9~3-0~ubuntu-focal | https://download.docker.com/linux/ubuntu focal/stable amd64 Packages
 - # 安裝命令格式
 - #sudo apt-get install docker-ce=<VERSION_STRING> docker-ce-cli=<VERSION_STRING> containerd.io
 - # 安裝指定版本
 - sudo apt-get install docker-ce=5:20.10.12~3-0~ubuntu-focal docker-ce-cli=5:20.10.12~3-0~ubuntu-focal containerd.io
 
優(yōu)化Docker配置
- {
 - "data-root": "/var/lib/docker",
 - "exec-opts": [
 - "native.cgroupdriver=systemd"
 - ],
 - "registry-mirrors": [
 - "https://****.mirror.aliyuncs.com" #此處配置一些加速的地址,比如阿里云的等等...
 - ],
 - "storage-driver": "overlay2",
 - "storage-opts": [
 - "overlay2.override_kernel_check=true"
 - ],
 - "log-driver": "json-file",
 - "log-opts": {
 - "max-size": "100m",
 - "max-file": "3"
 - }
 - }
 
配置開機(jī)自己
- systemctl daemon-reload
 - systemctl enable --now docker.service
 
容器化安裝Airflow
數(shù)據(jù)庫選型
根據(jù)官網(wǎng)的說明,數(shù)據(jù)庫建議使用MySQL8+和postgresql 9.6+,在官方的docker-compose腳本[2]中使用是PostgreSQL,因此我們需要調(diào)整一下docker-compose.yml的內(nèi)容
- ---
 - version: '3'
 - x-airflow-common:
 - &airflow-common
 - # In order to add custom dependencies or upgrade provider packages you can use your extended image.
 - # Comment the image line, place your Dockerfile in the directory where you placed the docker-compose.yaml
 - # and uncomment the "build" line below, Then run `docker-compose build` to build the images.
 - image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.2.3}
 - # build: .
 - environment:
 - &airflow-common-env
 - AIRFLOW__CORE__EXECUTOR: CeleryExecutor
 - AIRFLOW__CORE__SQL_ALCHEMY_CONN: mysql+mysqldb://airflow:aaaa@mysql/airflow # 此處替換為mysql連接方式
 - AIRFLOW__CELERY__RESULT_BACKEND: db+mysql://airflow:aaaa@mysql/airflow # 此處替換為mysql連接方式
 - AIRFLOW__CELERY__BROKER_URL: redis://:xxxx@redis:6379/0 # 為保證安全,我們對redis開啟了認(rèn)證,因此將此處xxxx替換為redis密碼
 - AIRFLOW__CORE__FERNET_KEY: ''
 - AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
 - AIRFLOW__CORE__LOAD_EXAMPLES: 'true'
 - AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
 - _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
 - volumes:
 - - ./dags:/opt/airflow/dags
 - - ./logs:/opt/airflow/logs
 - - ./plugins:/opt/airflow/plugins
 - user: "${AIRFLOW_UID:-50000}:0"
 - depends_on:
 - &airflow-common-depends-on
 - redis:
 - condition: service_healthy
 - mysql: # 此處修改為mysql service名稱
 - condition: service_healthy
 - services:
 - mysql:
 - image: mysql:8.0.27 # 修改為mysql最新版鏡像
 - environment:
 - MYSQL_ROOT_PASSWORD: bbbb # MySQL root賬號密碼
 - MYSQL_USER: airflow
 - MYSQL_PASSWORD: aaaa # airflow用戶的密碼
 - MYSQL_DATABASE: airflow
 - command:
 - --default-authentication-plugin=mysql_native_password # 指定默認(rèn)的認(rèn)證插件
 - --collation-server=utf8mb4_general_ci # 依據(jù)官方指定字符集
 - --character-set-server=utf8mb4 # 依據(jù)官方指定字符編碼
 - volumes:
 - - /apps/airflow/mysqldata8:/var/lib/mysql # 持久化MySQL數(shù)據(jù)
 - - /apps/airflow/my.cnf:/etc/my.cnf # 持久化MySQL配置文件
 - healthcheck:
 - test: mysql --user=$$MYSQL_USER --password=$$MYSQL_PASSWORD -e 'SHOW DATABASES;' # healthcheck command
 - interval: 5s
 - retries: 5
 - restart: always
 - redis:
 - image: redis:6.2
 - expose:
 - - 6379
 - command: redis-server --requirepass xxxx # redis-server開啟密碼認(rèn)證
 - healthcheck:
 - test: ["CMD", "redis-cli","-a","xxxx","ping"] # redis使用密碼進(jìn)行healthcheck
 - interval: 5s
 - timeout: 30s
 - retries: 50
 - restart: always
 - airflow-webserver:
 - <<: *airflow-common
 - command: webserver
 - ports:
 - - 8080:8080
 - healthcheck:
 - test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
 - interval: 10s
 - timeout: 10s
 - retries: 5
 - restart: always
 - depends_on:
 - <<: *airflow-common-depends-on
 - airflow-init:
 - condition: service_completed_successfully
 - airflow-scheduler:
 - <<: *airflow-common
 - command: scheduler
 - healthcheck:
 - test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
 - interval: 10s
 - timeout: 10s
 - retries: 5
 - restart: always
 - depends_on:
 - <<: *airflow-common-depends-on
 - airflow-init:
 - condition: service_completed_successfully
 - airflow-worker:
 - <<: *airflow-common
 - command: celery worker
 - healthcheck:
 - test:
 - - "CMD-SHELL"
 - - 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"'
 - interval: 10s
 - timeout: 10s
 - retries: 5
 - environment:
 - <<: *airflow-common-env
 - # Required to handle warm shutdown of the celery workers properly
 - # See https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation
 - DUMB_INIT_SETSID: "0"
 - restart: always
 - depends_on:
 - <<: *airflow-common-depends-on
 - airflow-init:
 - condition: service_completed_successfully
 - airflow-triggerer:
 - <<: *airflow-common
 - command: triggerer
 - healthcheck:
 - test: ["CMD-SHELL", 'airflow jobs check --job-type TriggererJob --hostname "$${HOSTNAME}"']
 - interval: 10s
 - timeout: 10s
 - retries: 5
 - restart: always
 - depends_on:
 - <<: *airflow-common-depends-on
 - airflow-init:
 - condition: service_completed_successfully
 - airflow-init:
 - <<: *airflow-common
 - entrypoint: /bin/bash
 - # yamllint disable rule:line-length
 - command:
 - - -c
 - - |
 - function ver() {
 - printf "%04d%04d%04d%04d" $${1//./ }
 - }
 - airflow_version=$$(gosu airflow airflow version)
 - airflow_version_comparable=$$(ver $${airflow_version})
 - min_airflow_version=2.2.0
 - min_airflow_version_comparable=$$(ver $${min_airflow_version})
 - if (( airflow_version_comparable < min_airflow_version_comparable )); then
 - echo
 - echo -e "\033[1;31mERROR!!!: Too old Airflow version $${airflow_version}!\e[0m"
 - echo "The minimum Airflow version supported: $${min_airflow_version}. Only use this or higher!"
 - echo
 - exit 1
 - fi
 - if [[ -z "${AIRFLOW_UID}" ]]; then
 - echo
 - echo -e "\033[1;33mWARNING!!!: AIRFLOW_UID not set!\e[0m"
 - echo "If you are on Linux, you SHOULD follow the instructions below to set "
 - echo "AIRFLOW_UID environment variable, otherwise files will be owned by root."
 - echo "For other operating systems you can get rid of the warning with manually created .env file:"
 - echo " See: https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#setting-the-right-airflow-user"
 - echo
 - fi
 - one_meg=1048576
 - mem_available=$$(($$(getconf _PHYS_PAGES) * $$(getconf PAGE_SIZE) / one_meg))
 - cpus_available=$$(grep -cE 'cpu[0-9]+' /proc/stat)
 - disk_available=$$(df / | tail -1 | awk '{print $$4}')
 - warning_resources="false"
 - if (( mem_available < 4000 )) ; then
 - echo
 - echo -e "\033[1;33mWARNING!!!: Not enough memory available for Docker.\e[0m"
 - echo "At least 4GB of memory required. You have $$(numfmt --to iec $$((mem_available * one_meg)))"
 - echo
 - warning_resources="true"
 - fi
 - if (( cpus_available < 2 )); then
 - echo
 - echo -e "\033[1;33mWARNING!!!: Not enough CPUS available for Docker.\e[0m"
 - echo "At least 2 CPUs recommended. You have $${cpus_available}"
 - echo
 - warning_resources="true"
 - fi
 - if (( disk_available < one_meg * 10 )); then
 - echo
 - echo -e "\033[1;33mWARNING!!!: Not enough Disk space available for Docker.\e[0m"
 - echo "At least 10 GBs recommended. You have $$(numfmt --to iec $$((disk_available * 1024 )))"
 - echo
 - warning_resources="true"
 - fi
 - if [[ $${warning_resources} == "true" ]]; then
 - echo
 - echo -e "\033[1;33mWARNING!!!: You have not enough resources to run Airflow (see above)!\e[0m"
 - echo "Please follow the instructions to increase amount of resources available:"
 - echo " https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#before-you-begin"
 - echo
 - fi
 - mkdir -p /sources/logs /sources/dags /sources/plugins
 - chown -R "${AIRFLOW_UID}:0" /sources/{logs,dags,plugins}
 - exec /entrypoint airflow version
 - # yamllint enable rule:line-length
 - environment:
 - <<: *airflow-common-env
 - _AIRFLOW_DB_UPGRADE: 'true'
 - _AIRFLOW_WWW_USER_CREATE: 'true'
 - _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
 - _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
 - user: "0:0"
 - volumes:
 - - .:/sources
 - airflow-cli:
 - <<: *airflow-common
 - profiles:
 - - debug
 - environment:
 - <<: *airflow-common-env
 - CONNECTION_CHECK_MAX_COUNT: "0"
 - # Workaround for entrypoint issue. See: https://github.com/apache/airflow/issues/16252
 - command:
 - - bash
 - - -c
 - - airflow
 - flower:
 - <<: *airflow-common
 - command: celery flower
 - ports:
 - - 5555:5555
 - healthcheck:
 - test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
 - interval: 10s
 - timeout: 10s
 - retries: 5
 - restart: always
 - depends_on:
 - <<: *airflow-common-depends-on
 - airflow-init:
 - condition: service_completed_successfully
 
在官方docker-compose.yaml基礎(chǔ)上只修改了x-airflow-common,MySQL,Redis相關(guān)配置,接下來就應(yīng)該啟動容器了,在啟動之前,需要創(chuàng)建幾個(gè)持久化目錄:
- mkdir -p ./dags ./logs ./plugins
 - echo -e "AIRFLOW_UID=$(id -u)" > .env # 注意,此處一定要保證AIRFLOW_UID是普通用戶的UID,且保證此用戶有創(chuàng)建這些持久化目錄的權(quán)限
 
如果不是普通用戶,在運(yùn)行容器的時(shí)候,會報(bào)錯,找不到airflow模塊
- docker-compose up airflow-init #初始化數(shù)據(jù)庫,以及創(chuàng)建表
 - docker-compose up -d #創(chuàng)建airflow容器
 
當(dāng)出現(xiàn)容器的狀態(tài)為unhealthy的時(shí)候,要通過docker inspect $container_name查看報(bào)錯的原因,至此airflow的安裝就已經(jīng)完成了。
參考資料
[1]Install Docker Engine on Ubuntu: https://docs.docker.com/engine/install/ubuntu/
[2]官方docker-compose.yaml: https://airflow.apache.org/docs/apache-airflow/2.2.3/docker-compose.yaml


















 
 
 













 
 
 
 