基于 Prometheus 的云上 MySQL 監(jiān)控實(shí)踐
一、背景
MySQL 8.0是當(dāng)前Oracle公司一直在大力宣傳的新版本,從架構(gòu)到性能均有顯著變化,同時(shí),隨著kubernetes的普及,為更好的提升資源利用率,可以進(jìn)行MySQL上云的探索。MySQL上云如何進(jìn)行運(yùn)行狀態(tài)的監(jiān)控呢?MySQL運(yùn)行狀態(tài)監(jiān)控需要滿足:監(jiān)控?cái)?shù)據(jù)實(shí)時(shí)準(zhǔn)備,報(bào)警機(jī)制響應(yīng)迅速,支持異地集中監(jiān)控。本文將探索云上MySQL的監(jiān)控方案。
二、方案對比
方案一:
Zabbix監(jiān)控系統(tǒng),基于c+php開發(fā)的開源監(jiān)控系統(tǒng),支持多種監(jiān)控采集方式,應(yīng)用廣泛,支持比較成熟,社區(qū)活躍,缺點(diǎn)是對容器支持度比較差。
方案二:
Prometheus監(jiān)控系統(tǒng),基于go開發(fā)的開源監(jiān)控系統(tǒng),支持pull和push兩種采集模式,有完整的監(jiān)控、報(bào)警、展示、數(shù)據(jù)異地傳輸能力,配置簡單,對容器支持良好。
由于目前使用的MySQL在云上部署,而且公司現(xiàn)有對PaaS云監(jiān)控基于Promehteus,因此,方案二是更佳選項(xiàng),既能滿足MySQL監(jiān)控,又能充分利用現(xiàn)有資源。
三、Prometheus監(jiān)控系統(tǒng)概述
Prometheus是由SoundeCloud公司基于go語言開發(fā)的一款開源的監(jiān)控報(bào)警解決方案,基于時(shí)間序列監(jiān)控?cái)?shù)據(jù)。
1、組件及架構(gòu)
組件:
- promethues server:主要獲取和存儲(chǔ)時(shí)間序列數(shù)據(jù)
 - Exporters(導(dǎo)出器):主要是作為agent收集數(shù)據(jù)發(fā)送到prometheus server,不同的數(shù)據(jù)收集由不同的exporters實(shí)
 - pushgateway:允許短暫和批處理的jobs推送它們的數(shù)據(jù)到prometheus;然后由prometheus拉取數(shù)據(jù)。
 - alertmanager:實(shí)現(xiàn)prometheus的告警功能。
 
組件間關(guān)系如下:
圖1 組件架構(gòu)圖
2、Prometheus特點(diǎn)
- 指標(biāo)收集:prometheus服務(wù)器定義了名為目標(biāo)(target)的配置,執(zhí)行抓取所需要的信息。
 - 服務(wù)發(fā)現(xiàn):可以通過通過多種方式來處理要監(jiān)控的資源。包括:靜態(tài)資源列表、基于文件發(fā)現(xiàn)、自動(dòng)發(fā)現(xiàn)。
 - 聚合和報(bào)警:在服務(wù)器上可以查詢和聚合時(shí)間序列數(shù)據(jù)。通過規(guī)則記錄常用的查詢并做聚合??梢栽O(shè)置報(bào)警規(guī)則,滿足報(bào)警條件時(shí)會(huì)觸發(fā)報(bào)警,把報(bào)警信息推送的alertmanager。
 - 自治:不依賴分布式存儲(chǔ),單個(gè)服務(wù)器節(jié)點(diǎn)是自主的。
 - 冗余和高可用性:可部署多臺(tái)prmehteus服務(wù)器,實(shí)現(xiàn)監(jiān)控系統(tǒng)的高可用性。
 - 查詢語言:prometheus服務(wù)器提供了查詢語言PromQL,用于對時(shí)序數(shù)據(jù)進(jìn)行篩選和運(yùn)算。
 - 可視化:prometheus內(nèi)置表達(dá)式瀏覽器可提供可視化,可與grafana配合實(shí)現(xiàn)監(jiān)控?cái)?shù)據(jù)可視化展示。
 
四、MySQL數(shù)據(jù)庫監(jiān)控
1、監(jiān)控方案
Prometheus官方提供了mysqld_exporter導(dǎo)出器,可實(shí)現(xiàn)對MySQL監(jiān)控。該導(dǎo)出器通過MySQL用戶連接數(shù)據(jù)庫,查詢相關(guān)數(shù)據(jù)庫表、狀態(tài)信息,通過http服務(wù)的方式暴露監(jiān)控?cái)?shù)據(jù)。
方案不足:導(dǎo)出器可實(shí)現(xiàn)單節(jié)點(diǎn)和主從復(fù)制相關(guān)監(jiān)控項(xiàng),但對于MGR模式相關(guān)監(jiān)控目前還不能很好地支持。
方案改進(jìn):prometheus提供了client libraries,可實(shí)現(xiàn)對監(jiān)控指標(biāo)進(jìn)行定制化采集。故可用python語言定制腳本的方式采集MGR相關(guān)數(shù)據(jù)。mysqld_exporter與python腳本能夠滿足全部監(jiān)控信息的導(dǎo)出。
2、部署方案
關(guān)于Paas云上MySQL監(jiān)控部署,有兩種方案:
方案一:
MySQL、mysqld_exporter、my_exporter_python監(jiān)控腳本三部分同在一個(gè)鏡像中,運(yùn)行該容器可實(shí)現(xiàn)對MySQL的監(jiān)控。
方案二:
MySQL、mysqld_exporter、my_exporter_python監(jiān)控腳本分別屬于不同的鏡像,MySQL主容器與監(jiān)控容器按順序運(yùn)行。監(jiān)控容器以sidecar的方式訪問MySQL。
方案對比:
MySQL數(shù)據(jù)庫服務(wù)對于應(yīng)用是非常重要的一環(huán),要確保MySQL安全可靠。方案一,如果MySQL異?;虺霈F(xiàn)錯(cuò)誤,對問題診斷與排錯(cuò)方面,監(jiān)控導(dǎo)出器可能會(huì)干擾項(xiàng),不利于后期MySQL運(yùn)維管理。方案二,由于三部分在不同的容器中運(yùn)行,不會(huì)產(chǎn)生互相干擾的可能性,因此方案二為最佳。
五、監(jiān)控具體實(shí)現(xiàn)
1、創(chuàng)建MySQL監(jiān)控用戶并授權(quán)
2、my_exporter_python腳本說明
9000端口提供http提供服務(wù)
start_http_server(9000)
設(shè)置Gauge對象
連接接MySQL查詢數(shù)據(jù)
設(shè)置MGR相關(guān)的metrics
3、鏡像拉取與定制
mysqld_exporter鏡像pull:
docker pull prom/mysqld_exporter
my_exporter_python鏡像制作
Dockerfile內(nèi)容
FROM centos7_python36:v1
RUN pip install prometheus_client pymysql
RUN pip install requests
COPY ./my_exporter_python_v2.py /my_exporter_python_v2.py
WORKDIR /
EXPOSE 9000
CMD ["python","my_exporter_python_v2.py"]
4、鏡像部署yaml文件部分內(nèi)容:
- apiVersion: apps/v1
 - kind: StatefulSet
 - metadata:
 - ......
 - containers:
 - - env:
 - - name: TZ
 - value: Asia/Shanghai
 - - name: DATA_SOURCE_NAME
 - value: 'exporter:userpassword@(localhost:3306)/'
 - - name: TARGET
 - value: 'http://localhost:9104/metrics'
 - image: 'registry.paas.test.abc/library/mysqld-exporter-python:v5'
 - imagePullPolicy: Always
 - name: mysqld-python
 - ports:
 - - containerPort: 9000
 - name: mysqld-python
 - protocol: TCP
 - resources:
 - limits:
 - cpu: '2'
 - memory: 4Gi
 - terminationMessagePath: /dev/termination-log
 - terminationMessagePolicy: File
 - - env:
 - - name: TZ
 - value: Asia/Shanghai
 - - name: DATA_SOURCE_NAME
 - value: 'testuser:userpassword@(localhost:3306)/'
 - image: 'registry.paas.test.abc/library/mysqld-exporter:latest'
 - imagePullPolicy: Always
 - name: mysqld-exporter
 - ports:
 - - containerPort: 9104
 - name: mysqld-exporter
 - protocol: TCP
 - resources:
 - limits:
 - cpu: '2'
 - memory: 4Gi
 - terminationMessagePath: /dev/termination-log
 - terminationMessagePolicy: File
 - ......
 
5、Prometehus server設(shè)置target
- job_name: kubernetes-pods
 - scrape_interval: 30s
 - scrape_timeout: 10s
 - metrics_path: /metrics
 - scheme: http
 - kubernetes_sd_configs:
 - - api_server: null
 - role: pod
 - namespaces:
 - names: []
 - relabel_configs:
 - - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
 - separator: ;
 - regex: "true"
 - replacement: $1
 - action: keep
 - - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
 - separator: ;
 - regex: (.+)
 - target_label: __metrics_path__
 - replacement: $1
 - action: replace
 - - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
 - separator: ;
 - regex: ([^:]+)(?::\d+)?;(\d+)
 - target_label: __address__
 - replacement: $1:$2
 - action: replace
 - - separator: ;
 - regex: __meta_kubernetes_pod_label_(.+)
 - replacement: $1
 - action: labelmap
 - - source_labels: [__meta_kubernetes_namespace]
 - separator: ;
 - regex: (.*)
 - target_label: kubernetes_namespace
 - replacement: $1
 - action: replace
 - - source_labels: [__meta_kubernetes_pod_name]
 - separator: ;
 - regex: (.*)
 - target_label: kubernetes_pod_name
 - replacement: $1
 - action: replace
 
六、采集指標(biāo)解釋
- 查詢mysql上線時(shí)間
 
- mysql> show status like '%uptime%';
 - +---------------------------+---------+
 - | Variable_name | Value |
 - +---------------------------+---------+
 - | Uptime | 1284686 |
 - | Uptime_since_flush_status | 1284686 |
 - +---------------------------+---------+
 
Uptime即為mysql上線時(shí)間,單位為秒,對應(yīng)輸出的監(jiān)控指標(biāo)為:Mysql_uptime??梢詫ΡO(jiān)控指標(biāo)運(yùn)算得到相應(yīng)時(shí)間單位,例如轉(zhuǎn)為天數(shù),mysql_uptime/60/60/24。
- 查詢mysql服務(wù)端口mysql
 
- mysql> show variables like 'port';
 - +---------------+-------+
 - | Variable_name | Value |
 - +---------------+-------+
 - | port | 3306 |
 - +---------------+-------+
 
對應(yīng)輸出監(jiān)控指標(biāo)為:mysql_global_variables_port
- 查看mysql服務(wù)器是否在線
 
如果mysqld_exporter連接mysql服務(wù)器成功,表示服務(wù)器在線,否則表示離線狀態(tài),對應(yīng)輸出的監(jiān)指標(biāo):mysql_up。數(shù)值為1表示在線,數(shù)值0表示離線。
查看數(shù)據(jù)庫連接數(shù)
- mysql> show status like 'Threads%';
 - +-------------------+-------+
 - | Variable_name | Value |
 - +-------------------+-------+
 - | Threads_cached | 2 |
 - | Threads_connected | 1 |
 - | Threads_created | 3 |
 - | Threads_running | 2 |
 - +-------------------+-------+
 - mysql> show variables like '%max_connection%';
 - +------------------------+-------+
 - | Variable_name | Value |
 - +------------------------+-------+
 - | max_connections | 151 |
 - | mysqlx_max_connections | 100 |
 - +------------------------+-------+
 - mysql> show global status like 'max_used_connections';
 - +----------------------+-------+
 - | Variable_name | Value |
 - +----------------------+-------+
 - | Max_used_connections | 3 |
 - +----------------------+-------+
 
Thread_connected:表示打開的鏈接數(shù),對應(yīng)輸出的監(jiān)控指標(biāo)為:mysql_global_status_threads_connected。
Threads_running:表示激活的連接數(shù),并發(fā)數(shù),對應(yīng)輸出的監(jiān)控指標(biāo)為:mysql_global_status_threads_running。
max_used_connections:表示當(dāng)前使用過的最大連接數(shù),對應(yīng)輸出的監(jiān)控指標(biāo)為:mysql_global_status_max_used_connections。
max_connections:表示并發(fā)執(zhí)行的最大連接數(shù),對應(yīng)輸出的監(jiān)控指標(biāo)為:mysql_global_variables_max_connections。
- 查看慢查詢數(shù)量
 
- mysql> show global status like '%Slow_queries%';
 - +---------------+-------+
 - | Variable_name | Value |
 - +---------------+-------+
 - | Slow_queries | 0 |
 - +---------------+-------+
 
對應(yīng)輸出監(jiān)控指標(biāo)為:mysql_global_status_slow_queries
此指標(biāo)為當(dāng)前慢查詢的總數(shù),如果想要更精確的顯示慢查詢額狀態(tài),可以使用promQL,將監(jiān)控指標(biāo)顯示為每秒慢查詢的數(shù)量,可以如下所示:irate(mysql_global_status_slow_queries[5m]),顯示5分鐘內(nèi),每秒慢查詢的數(shù)量。
- 查詢QPS
 
- mysql> show global status like 'questions';
 - +---------------+--------+
 - | Variable_name | Value |
 - +---------------+--------+
 - | Questions | 407158 |
 - +---------------+--------+
 
Questions:表示為收到的總請求的次數(shù),對應(yīng)輸出的監(jiān)控指標(biāo)為:mysql_global_status_questions。如果想要得到?jīng)]秒請求的數(shù)量,可以如下方法所示:
irate(mysql_global_status_questions[5m]),顯示5分鐘內(nèi)每秒請求的數(shù)量,即QPS。
- 查詢innodb_buffer_pool命中率
 
- mysql> show global status like 'innodb_buffer_pool_read%';
 - +---------------------------------------+-------+
 - | Variable_name | Value |
 - +---------------------------------------+-------+
 - | Innodb_buffer_pool_read_ahead_rnd | 0 |
 - | Innodb_buffer_pool_read_ahead | 0 |
 - | Innodb_buffer_pool_read_ahead_evicted | 0 |
 - | Innodb_buffer_pool_read_requests | 19268 |
 - | Innodb_buffer_pool_reads | 887 |
 - +---------------------------------------+-------+
 
Innodb_buffer_pool_reads:表示直接從磁盤讀的次數(shù),對應(yīng)輸出的監(jiān)控指標(biāo)為:
mysql_global_status_innodb_buffer_pool_reads。
Innodb_buffer_pool_read_requests:表示邏輯讀的次數(shù),
對應(yīng)輸出的監(jiān)控指標(biāo)為:
mysql_global_status_innodb_buffer_pool_read_requests。
計(jì)算邏輯讀的命中率,公式為:100 - 100 * (mysql_global_status_innodb_buffer_pool_reads/
mysql_global_status_innodb_buffer_pool_read_requests)。
計(jì)算邏輯讀的命中率,公式為:100 - 100 * (mysql_global_status_innodb_buffer_pool_reads/
mysql_global_status_innodb_buffer_pool_read_requests)。
- 查詢打開表的數(shù)量
 
- mysql> show global status like 'open_tables';
 - +---------------+-------+
 - | Variable_name | Value |
 - +---------------+-------+
 - | Open_tables | 371 |
 - +---------------+-------+
 
對應(yīng)輸出的監(jiān)控指標(biāo)為:mysql_global_status_open_tables
- 查詢表緩存命中率
 
- mysql> show global status like 'threads_created';
 - +-----------------+-------+
 - | Variable_name | Value |
 - +-----------------+-------+
 - | Threads_created | 3 |
 - +-----------------+-------+
 - mysql> show global status like 'connections';
 - +---------------+-------+
 - | Variable_name | Value |
 - +---------------+-------+
 - | Connections | 33479 |
 - +---------------+-------+
 
Threads_created:表示創(chuàng)建過的線程數(shù),對應(yīng)輸出的監(jiān)控指標(biāo)為:mysql_global_status_threads_created。
Connections:表示試圖鏈接mysql服務(wù)器的次數(shù),對應(yīng)輸出的監(jiān)控指標(biāo)為:mysql_global_status_connections。
表緩存命中率為:(1-mysql_global_status_threads_created/mysql_global_status_connections)*100 。
- 查詢鎖狀態(tài)
 
- mysql> show global status like 'table_locks%';
 - +-----------------------+--------+
 - | Variable_name | Value |
 - +-----------------------+--------+
 - | Table_locks_immediate | 156335 |
 - | Table_locks_waited | 0 |
 - +-----------------------+--------+
 
Table_locks_immediate:表示行鎖總數(shù)量,對應(yīng)輸出監(jiān)控指標(biāo)為:mysql_global_status_table_locks_immediate,可以計(jì)算每秒行鎖數(shù)量,如:
irate(mysql_global_status_table_locks_immediate[5m])。
Table_locks_waited 表示為表鎖數(shù)量,對應(yīng)輸出監(jiān)控指標(biāo)為:mysql_global_status_table_locks_waited。
- 查詢臨時(shí)表狀態(tài)
 
- mysql> show global status like '%tmp%';
 - +-------------------------+--------+
 - | Variable_name | Value |
 - +-------------------------+--------+
 - | Created_tmp_disk_tables | 0 |
 - | Created_tmp_files | 6 |
 - | Created_tmp_tables | 111563 |
 - +-------------------------+--------+
 
Created_tmp_disk_tables:表示為創(chuàng)建磁盤臨時(shí)表數(shù)量,對應(yīng)輸出監(jiān)控指標(biāo)為:mysql_global_status_created_tmp_disk_tables。
Created_tmp_tables:表示服務(wù)器內(nèi)部創(chuàng)建臨時(shí)表的數(shù)量,對應(yīng)輸出指標(biāo)為:mysql_global_status_created_tmp_tables。
臨時(shí)表比例計(jì)算為:
mysql_global_status_created_tmp_disk_tables/mysql_global_status_created_tmp_tables。
python腳本實(shí)現(xiàn)監(jiān)控指標(biāo)及sql語句
mysql組復(fù)制相關(guān)信息記錄在performance_schema庫中的replicaion_conection_status、replication_group_member_stats、replication_group_members表中,通過關(guān)聯(lián)查詢能夠得到組復(fù)制相關(guān)的監(jiān)控項(xiàng)
- 查詢當(dāng)前mysql待應(yīng)用的事務(wù)數(shù)
 
- SELECT
 - @@GLOBAL .server_uuid,
 - GTID_SUBTRACT(RECEIVED_TRANSACTION_SET,
 - @@GLOBAL .GTID_EXECUTED)
 - FROM
 - performance_schema.replication_connection_status
 - WHERE
 - channel_name = 'group_replication_applier'
 
對應(yīng)輸出的監(jiān)控指標(biāo)為:mysql_mgr_apply_queue
- 查詢當(dāng)前mysql待認(rèn)證的事務(wù)數(shù)
 
- SELECT
 - MEMBER_ID, Count_Transactions_in_queue
 - FROM
 - performance_schema.replication_group_member_stats
 - WHERE
 - member_id = @@GLOBAL .server_uuid
 
對應(yīng)輸出的監(jiān)控指標(biāo)為:mysql_mgr_cert_queue
- 查尋當(dāng)前mysql的節(jié)點(diǎn)狀態(tài),如果是online返回1,如果是offline返回2,如果是error返回3,如果是recovering返回4.
 
- SELECT
 - member_id,
 - CASE
 - WHEN MEMBER_STATE = 'ONLINE' THEN 1
 - WHEN MEMBER_STATE = 'OFFLINE' THEN 2
 - WHEN MEMBER_STATE = 'ERROR' THEN 3
 - WHEN MEMBER_STATE = 'RECOVERING' THEN 4
 - ELSE 0
 - END AS MEMBER_STATE
 - FROM
 - performance_schema.replication_group_members
 - WHERE
 - MEMBER_ID = @@GLOBAL .server_uuid
 - OR MEMBER_ID = ''
 
對應(yīng)輸出的監(jiān)控指標(biāo)為:mysql_mgr_node_status
- 查詢當(dāng)前mysql節(jié)點(diǎn)的健康狀態(tài)情況,如果online返回1,如果offline返回0。
 
- SELECT
 - member_id,
 - IF(MEMBER_STATE = 'ONLINE'
 - AND ((SELECT
 - COUNT(*)
 - FROM
 - performance_schema.replication_group_members
 - WHERE
 - MEMBER_STATE != 'ONLINE') >= ((SELECT
 - COUNT(*)
 - FROM
 - performance_schema.replication_group_members) / 2) = 0),
 - '1',
 - '0')
 - FROM
 - performance_schema.replication_group_members
 - JOIN
 - performance_schema.replication_group_member_stats USING (member_id)
 - WHERE
 - member_id = @@GLOBAL .server_uuid
 
對應(yīng)輸出的監(jiān)控指標(biāo)為:mysql_mgr_node_health
- 查詢當(dāng)前mysql節(jié)點(diǎn)角色情況,如果是主庫返回1,如果是非主庫返回0。
 
- SELECT
 - @@server_uuid,
 - IF(@@GLOBAL .group_replication_single_primary_mode,
 - (SELECT
 - COUNT(1)
 - FROM
 - performance_schema.global_status
 - WHERE
 - variable_value = @@server_uuid),
 - (SELECT
 - COUNT(1)
 - FROM
 - performance_schema.replication_group_members
 - WHERE
 - member_id = @@server_uuid
 - AND member_state = 'ONLINE')) AS isPrimary
 
對應(yīng)輸出的監(jiān)控指標(biāo)為:mysql_mgr_role
- 大事務(wù)查詢
 
- SELECT
 - @@GLOBAL .server_uuid, COUNT(trx_id)
 - FROM
 - information_schema.INNODB_TRX,
 - sys.session AS se
 - WHERE
 - trx_mysql_thread_id = conn_id
 
對應(yīng)輸出的監(jiān)控指標(biāo)為:mysql_big_trx,由于此查詢需要調(diào)用MySQL系統(tǒng)sys相關(guān)視圖,所以需要為exporter額外授權(quán)。
- grant usage on *.* to exporter@'%';
 - grant select,execute on sys.* to exporter@'%';
 
七、grafana面板
grafana是一款帶面板展示效果的開源應(yīng)用。通過配置拉取prometheus服務(wù)器的指標(biāo)數(shù)據(jù),支持時(shí)序數(shù)據(jù)的查詢及展示。擁有查詢編譯器功能,能夠?qū)r(shí)序數(shù)據(jù)運(yùn)算后進(jìn)行可視化展示。只需把prometheus提供的http服務(wù)配置即可。以下為MySQL監(jiān)控指標(biāo)圖像化展示效果。
























 
 
 















 
 
 
 