一篇帶給你ClickHouse集群搭建
ClickHouse是一個(gè)列導(dǎo)向數(shù)據(jù)庫,是原生的向量化執(zhí)行引擎。它在大數(shù)據(jù)領(lǐng)域沒有走Hadoop生態(tài),而是采用Local attached storage作為存儲(chǔ),這樣整個(gè)IO可能就沒有Hadoop那一套的局限。它的系統(tǒng)在生產(chǎn)環(huán)境中可以應(yīng)用到比較大的規(guī)模,因?yàn)樗木€性擴(kuò)展能力和可靠性保障能夠原生支持shard+replication這種解決方案。它還提供了一些SQL直接接口,有比較豐富的原生client。
ClickHouse數(shù)據(jù)庫的特點(diǎn):
- 速度快ClickHouse性能超過了市面上大部分的列式存儲(chǔ)數(shù)據(jù)庫,相比傳統(tǒng)的數(shù)據(jù)ClickHouse要快100-1000倍,ClickHouse還是有非常大的優(yōu)勢(shì)。1億數(shù)據(jù)集:ClickHouse比Vertica約快5倍,比Hive快279倍,比MySQL快801倍。10億數(shù)據(jù)集:ClickHouse比Vertica約快5倍,MySQL和Hive已經(jīng)無法完成任務(wù)了。
 - 功能多1.支持類SQL查詢;2.支持繁多庫函數(shù)(例如IP轉(zhuǎn)化,URL分析等,預(yù)估計(jì)算/HyperLoglog等);3.支持?jǐn)?shù)組(Array)和嵌套數(shù)據(jù)結(jié)構(gòu)(Nested Data Structure);4.支持?jǐn)?shù)據(jù)庫異地復(fù)制部署。
 
要注意,由于ClickHouse的快速查詢還是基于系統(tǒng)資源的,因此在使用的時(shí)候要注意每個(gè)節(jié)點(diǎn)上的存儲(chǔ)量,以及節(jié)點(diǎn)機(jī)器的系統(tǒng)資源要充足。因?yàn)椴樵儠r(shí)是使用內(nèi)存進(jìn)行聚合,所以同時(shí)并發(fā)查詢的數(shù)量不能太多,否則就會(huì)造成資源崩潰。
環(huán)境配置
初始化環(huán)境(所有節(jié)點(diǎn))
- # 修改機(jī)器的hostname
 - vi /etc/hostname
 - # 配置hosts
 - vi /etc/hosts
 - 192.168.143.20 node1
 - 192.168.143.21 node2
 - 192.168.143.22 node3
 
修改完后,執(zhí)行hostname node1...3,不用重啟機(jī)器使其生效
下載并安裝ClickHouse(所有節(jié)點(diǎn))
主要下載四個(gè)文件:
- Clickhouse-client
 - Clickhouse-common-static
 - Clickhouse-server
 - clickhouse-server-common
 
- rpm -ivh *.rpm
 
安裝 zookeeper(任意一個(gè)節(jié)點(diǎn))
- # 我這里選擇node1
 - docker run -d --net host --name zookeeper zookeeper
 
配置集群(所有節(jié)點(diǎn))
修改/etc/clickhouse-server/config.xml
- <!-- 將下面行注釋去掉 -->
 - <listen_host>::</listen_host>
 - <!-- 修改默認(rèn)數(shù)據(jù)存儲(chǔ)目錄,比如在/home下創(chuàng)建目錄clickhouse -->
 - <path>/var/lib/clickhouse/</path>
 - <!-- 修改為如下 -->
 - <path>/home/clickhouse/</path>
 
修改/etc/clickhouse-server/users.xml
- <!-- 配置查詢使用的內(nèi)存,根據(jù)機(jī)器資源進(jìn)行配置 -->
 - <max_memory_usage>5000000000000</max_memory_usage>
 - <!-- 在</users>前面增加用戶配置 -->
 - <root>
 - <!-- 通過Linux命令計(jì)算出密碼的sha256加密值 -->
 - <password_sha256_hex>xxxx...xxxx</password_sha256_hex>
 - <networks>
 - <ip>::/0</ip>
 - </networks>
 - <profile>default</profile>
 - <quota>default</quota>
 - </root>
 
增加配置文件/etc/metrika.xml
- <yandex>
 - <!-- ck集群節(jié)點(diǎn) -->
 - <clickhouse_remote_servers>
 - <test_cluster>
 - <shard>
 - <internal_replication>true</internal_replication>
 - <replica>
 - <host>node1</host>
 - <port>9000</port>
 - <user>root</user>
 - <password>123456</password>
 - </replica>
 - </shard>
 - <shard>
 - <internal_replication>true</internal_replication>
 - <replica>
 - <host>node2</host>
 - <port>9000</port>
 - <user>root</user>
 - <password>123456</password>
 - </replica>
 - </shard>
 - <shard>
 - <internal_replication>true</internal_replication>
 - <replica>
 - <host>node3</host>
 - <port>9000</port>
 - <user>root</user>
 - <password>123456</password>
 - </replica>
 - </shard>
 - </test_cluster>
 - <!-- zookeeper相關(guān)配置-->
 - <zookeeper-servers>
 - <node index="1">
 - <host>node1</host>
 - <port>2181</port>
 - </node>
 - </zookeeper-servers>
 - <networks>
 - <ip>::/0</ip>
 - </networks>
 - <macros>
 - <replica>node1</replica>
 - </macros>
 - <!-- 壓縮相關(guān)配置 -->
 - <clickhouse_compression>
 - <case>
 - <min_part_size>10000000000</min_part_size>
 - <min_part_size_ratio>0.01</min_part_size_ratio>
 - <method>lz4</method>
 - </case>
 - </clickhouse_compression>
 - </clickhouse_remote_servers>
 - </yandex>
 
重啟clickhouse服務(wù)
- service clickhouse-server restart
 - # 如果不成功,則使用以下命令
 - nohup /usr/bin/clickhouse-server --config=/etc/clickhouse-server/config.xml $
 
創(chuàng)建數(shù)據(jù)表(所有節(jié)點(diǎn))
使用可視化工具連接每個(gè)節(jié)點(diǎn),在上面創(chuàng)建MergeTree
- create database test;
 - create table test.data
 - (
 - country String,
 - province String,
 - value String
 - )
 - engine=MergeTree()
 - partition by (country, province)
 - order by value;
 
創(chuàng)建分布式表(node1節(jié)點(diǎn))
- create table test.mo as test.data ENGINE = Distributed(test_cluster, test, data, rand());
 
使用Python連接clickhouse
安裝clickhouse-driver
- pip install clickhouse-driver
 
執(zhí)行命令
- from clickhouse_driver import Client
 - # 在哪個(gè)節(jié)點(diǎn)創(chuàng)建了分布式表,就連接哪個(gè)節(jié)點(diǎn)
 - client = Client('192.168.143.20', user='root', password='123456', database='test')
 - print(client.execute('select count(*) from mo'))
 
【編輯推薦】

















 
 
 










 
 
 
 