分散式搜索資料庫性能探究

分散式搜索資料庫產品,能滿足很多企業高速檢索的業務場景,海量的單表數據秒級搜索和全文檢索,完全支持SQL語法,支持數據的增刪改查,兼容MySQL/PostgreSQL協議,企業級分散式搜索資料庫解決海量數據檢索問題。

環境準備

必須修改如下相關的配置文件,不然會無法正常啟動集群。

vi /etc/sysctl.confnvm.max_map_count=262144nnsysctl -pnnvi /etc/security/limits.conf n* soft nofile 65536n* hard nofile 65536nnvi /etc/profilenexport CRATE_HEAP_SIZE=100gn

創建數據存儲目錄,創建用戶searchdb,cratedb集群安裝在searchdb用戶下。

wget https://cdn.crate.io/downloads/releases/crate-2.1.5.tar.gz -O /opt/nntar -zxvf crate-2.1.5.tar.gznnmkdir /disk0{1..4}/searchdbnnuseradd searchdbnchown searchdb:searchdb /disk0{1..4}/searchdbnchown searchdb:searchdb -R /opt/crate-2.1.5nnrm -rf /disk0{1..4}/searchdb/*n

注意:需要提前安裝好JDK,配置JAVA_HOME信息,特別注意JDK版本需要jdk1.8.0_45以上。

集群安裝

三節點集群軟體,需要注意的信息network.host為每個節點的主機名,node.name填寫為主機名,my_cluster必須是唯一的,編輯crate.yml文件。

cluster.name: my_clusternnnode.name: node1nndiscovery.zen.ping.unicast.hosts:n - node1:4300n - node2:4300n - node3:4300nndiscovery.zen.minimum_master_nodes: 2nngateway:n recover_after_nodes: 2n recover_after_time: 5mn expected_nodes: 2nnpath.data: /disk01/searchdb,/disk02/searchdb,/disk03/searchdb/,/disk04/searchdbnnnetwork.host: node1nnpsql.enabled: truenpsql.port: 5432n

  • 啟動集群

bin/crate -dn

  • 訪問集群

bin/crashnncr> c node1:4200 n+-------------------+-----------+---------+-----------+---------+n| server_url | node_name | version | connected | message |n+-------------------+-----------+---------+-----------+---------+n| http://node1:4200 | node1 | 2.1.5 | TRUE | OK |n+-------------------+-----------+---------+-----------+---------+n

  • 簡單測試

create table tweets (n created_at timestamp,n id string primary key,n retweeted boolean,n source string INDEX using fulltext,n text string INDEX using fulltext,n user_id stringn);nninsert into tweets values (1394182937, 1, true, web, Dont panic, Douglas);nninsert into tweetsn values (n 1394182938,n 2,n true,n web,n Time is an illusion. Lunchtime doubly so,n Fordn );nninsert into tweets values (1394182937, 3, true, address, 中國,北京, 北京);nnselect * from tweets where id = 2;nnselect user_id, _score from tweets where match(text, is) order by _score desc;nnselect user_id, _score from tweets where match(text, 北京) order by _score desc;nnselect user_id, _score from tweets where match(text, 京城) order by _score desc;nnDELETE FROM tweets where id=3;n

使用一些基礎SQL語法測試,進行簡單測試,包括帶有的全文檢索、分詞能力,支持Update,Delete數據。

性能測試

生成測試數據,生成books表數據,平均6.0K/條,100G大小,通過一個py腳本把文本數據轉換為json數據。

nohup java -cp dbgen-1.0-jar.jar DBGen -p ./data -b 100 & --Total Time: 2610 secondsnncat books | python csv2json.py --columns id:integer isbn:string category:string publish_date:string publisher:string price:float > books.jsonn

數據示例:

$ head data/books/books n0|6-20386-216-4|STUDY-AIDS|1998-05-31|Gakken|166.99n1|0-60558-466-8|JUVENILE-NONFICTION|1975-02-12|Holtzbrinck|128.99n2|3-16551-636-9|POETRY|1988-01-24|Oxford University Press|155.99n3|4-75505-741-2|COMICS-GRAPHIC-NOVELS|1992-02-24|Saraiva|101.99n4|3-32982-589-8|PERFORMING-ARTS|2011-03-09|Cambridge University Press|183.99n

基礎命令:

./crash --helpn ./crash --hosts node1 --sysinfo nn ./crash --hosts node1 -c "show tables"n

創建表結構,導入數據。

CREATE TABLE books (n id integer,n isbn string,n category string,n publish_date string,n publisher string,n price floatn);nnCOPY books FROM /disk01/data/books/books.json;n

通過如上命令,可以生成不同級別大小的測試數據,根據參數可以生成不同大小的表。

測試場景1

  • 表級別:千萬級
  • 效率:15m/s (node)
  • JSON大小:6.6 g
  • 入庫CrateDB大小:5.2 g
  • 數據量:47582950
  • 分片數量:6
  • 副本數:2
  • memory:16g
  • vcpu:24
  • storage:1.4T (4*280g)
  • network:千兆

主要針對單表的查詢測試。

select category,count(*) from books group by category limit 100; -- 3.137 snnselect category,count(*) as num from books group by category order by num limit 100; --2.929 secnnselect category,count(*) as num from books where category=SCIENCE group by category order by num limit 100; --0.143 secnnselect count(*) from books where category=SCIENCE limit 100; -- 0.022 sec nnselect count(distinct category) from books limit 100; -- 2.990 secnnselect distinct category from books limit 100; -- 3.032 secn

修改 number_of_shards 看是否提升性能

ALTER TABLE books SET (number_of_shards = 48)nnOPTIMIZE table books; -- 這個參數比較有用,可以提升性能nnSELECT count(*) as num_shards, sum(num_docs) as num_docs FROM sys.shards WHERE schema_name = doc AND table_name = books;n

測試場景2

  • 表級別:億級表
  • 效率:17285.888 sec
  • JSON大小:33g
  • 入庫CrateDB大小:27g
  • 數據量:235265838
  • node_num: 3
  • 分片數量:1024
  • 副本數:2
  • memory:100g
  • vcpu:24
  • storage:1.4T (4*280g)
  • 每入庫秒鐘:13610 條/s
  • network:千兆

創建表,導入數據。

CREATE TABLE books_t1 (n id integer,n isbn string,n category string INDEX using fulltext,n publish_date string,n publisher string INDEX using fulltext,n price floatn) CLUSTERED BY (category) INTO 1024 SHARDS with (number_of_replicas = 2, refresh_interval=10000);nnCOPY books_t1 FROM /disk01/data/books/books.json; -- COPY OK, 235265838 rows affected (17285.888 sec)n

測試性能。

OPTIMIZE table books_t1;nnselect category,count(*) from books_t1 group by category limit 100; -- 2.556 secnnselect category,count(*) as num from books_t1 group by category order by num limit 100; -- 2.763 secnn問題:Error! SQLActionException[SQLParseException: Cannot GROUP BY category: grouping on analyzed/fulltext columns is not possible]nnselect count(*) from books_t1 where match(category, PERFORMING-ARTS); -- limit 100; -- 0.256 secnnselect * from books_t1 where match(category, ARTS); -- limit 100; -- 0.256 sec; -- 0.928 secn

注意:fulltext欄位的都無法做聚合分析操作,不帶全文索引,只能做全文搜索match,重新導數據在測試.

測試場景3

  • 表級別:億級表
  • 效率:5662.132 sec
  • JSON大小:33g
  • 入庫CrateDB大小:25.3g
  • 數據量:235265838
  • node_num: 3
  • 分片數量:1024
  • 副本數:2
  • memory:100g
  • vcpu:24
  • storage:1.4T (4*280g)
  • 每入庫秒鐘:13610 條/s
  • network:千兆

創建表,插入數據。

CREATE TABLE books_t2 (n id integer,n isbn string,n category string,n publish_date string,n publisher string,n price floatn) CLUSTERED BY (category) INTO 1024 SHARDS;nnCOPY books_t2 FROM /disk01/data/books/books.json;nninsert into books_t2 select * from books_t1; -- INSERT OK, 235265838 rows affected (5662.132 sec)n

性能測試。

OPTIMIZE table books_t2;nnselect category,count(*) from books_t2 group by category limit 100; -- 3.994 secnnselect category,count(*) as num from books_t2 group by category order by num limit 100; -- 4.159 secnnselect category,count(*) as num from books_t2 where category=SCIENCE group by category order by num limit 100; -- 1.731 secnnselect count(*) from books_t2 where category=SCIENCE limit 100; -- 0.001 secnnselect count(distinct category) from books_t2 limit 100; -- 4.677 secnnselect distinct category from books_t2 limit 100; -- 3.914 secnnselect id,price,publisher from books_t2 where publish_date=1999-02-02 and category=SCIENCE limit 100; -- 0.014 secn

注意:分片數量過多導致Heap Usage一直居高不下達到57%,表建立全局索引1024個分片,2個索引欄位

測試場景4

  • 表級別:15億級
  • 效率:15m/s (node)
  • JSON大小:215 G
  • 入庫CrateDB大小: 175.6g
  • 數據量:1551303191
  • 分片數量:500
  • 副本數:2
  • memory:100g
  • vcpu:24
  • storage:1.4T (4280g) 4節點
  • network:千兆

生成的文本數據,轉換為JSON格式。

nohup cat /disk01/searchdb/data/books/books | python csv2json.py --columns id:integer isbn:string category:string publish_date:string publisher:string price:float > /disk02/searchdb/books.json &n

切割一個215g數據文件為22個10g大小的數據文件,並行入庫。

split -b 10000m /disk02/searchdb/books.json -d -a 3 split_filen

創建表。

CREATE TABLE books (n id integer,n isbn string,n category string,n publish_date string,n publisher string,n price floatn) CLUSTERED BY (category) INTO 500 SHARDS;n

批量入庫數據。

/opt/crate-2.1.7/bin/crash --hosts node1 -c "COPY books FROM /disk01/searchdb/split_file000"n

主要針對15億單表的查詢 - 性能測試。

OPTIMIZE table books;nnselect category,count(*) from books_t2 group by category limit 100; -- 3.994 secnnselect category,count(*) as num from books_t2 group by category order by num limit 100; -- 4.159 secnnselect category,count(*) as num from books_t2 where category=SCIENCE group by category order by num limit 100; -- 1.731 secnnselect count(*) from books_t2 where category=SCIENCE limit 100; -- 0.001 secnnselect count(distinct category) from books_t2 limit 100; -- 4.677 secnnselect distinct category from books_t2 limit 100; -- 3.914 secnnselect id,price,publisher from books_t2 where publish_date=1999-02-02 and category=SCIENCE limit 100; -- 0.014 secn

測試場景5

測試SQL如下,主要是針對單表的性能測試,無分區。

OPTIMIZE table books;nnselect category,count(*) from books group by category limit 100; -- 37.878 secnnselect category,count(*) as num from books group by category order by num limit 100; -- 46.603 secnnselect category,count(*) as num from books where category=SCIENCE group by category order by num limit 100; -- 11.808 secnnselect count(*) from books where category=SCIENCE limit 100; -- 0.002 secnnselect count(distinct category) from books limit 100; -- 44.924 secnnselect distinct category from books limit 100; -- 44.335 secnnselect id,price,publisher from books where publish_date=1999-02-02 and category=SCIENCE limit 100; -- 0.347 secnnselect price,count(publisher) from books where publish_date=1999-02-02 and category=SCIENCE group by price order by price desc limit 100; -- 0.981 secnnselect price,category from books where publisher=Kyowon group by price,category order by price limit 100; --n3.602 secnnselect price,category,count(*) from books where publisher=Kyowon group by price,category order by price limit 100; -- 1.406 secn

  • 場景1 數據量:496928035 4節點(mem:128g vcore: 24 storage: 4*250g) 大小:56.2g shards: 500 平均:6.0K/條 網路:千兆
  • 場景2 數據量:993106194 4節點(mem:128g vcore: 24 storage: 4*250g) 大小:112g shards: 500 平均:6.0K/條 網路:千兆
  • 場景3 數據量:1551303103 4節點(mem:128g vcore: 24 storage: 4*250g) 大小:174.4g shards: 500 平均:6.0K/條 網路:千兆

注意:如上測試並沒有專業優化並發,除內存外,所有參數使用默認。

在單表15億+,五分區,4台伺服器,千兆網路,表現出來的性能還是非常強勁的,主要針對單表各種統計分析並且還帶有全文檢索的功能, 此資料庫可以把它稱為是分散式搜索資料庫。底層用到了很多搜索引擎存儲的技術,包括倒排索引,數據分片,利用大內存,細粒度索引,如果能支持多實例,目前CPU還沒完全使用,磁碟IO和內存都滿載。每一個欄位都帶索引,所以入庫比較慢,在單機上入庫是瓶頸,可以分開在多台機器入庫,這樣避免IO堵在一台機器;壓力過大容易導致節點奔潰脫離集群;join性能沒深入探索,不好評價。

  • 全文搜索(Fulltext Search)、分詞等能力通過特殊SQL語法匹配,搜索結果可以進行複雜的統一分析。
  • Geo Search,支持基於地理位置信息的複雜演算法分析,響應迅速。
  • 支持分區表,可以基於分區進行檢索分析。
  • 支持多副本,自帶容錯,檢索分流,提高性能。
  • 海量數據實時入庫,實時檢索、複雜統計分析。
  • Blob(binary large object),二進位大對象存儲分析。
  • 僅支持JSON/CSV入庫。

一個分散式搜索資料庫,支持標準SQL和JDBC,用來替代ES做一些全文檢索並支持複雜統計分析能力,很有實際意義。

我目前參與過最大的ES集群,也就60節點300億+doc(3個主節點,6T&2塊SATA),數據量400TB,還有一個小集群20節點70億+doc(3主節點,4T&2塊SSD),性能基本能滿足要求,存儲近3年的數據,歷史數據HDFS為Backup。

參考:

  • [1]. github.com/baidu/Elasti
  • [2]. itweet.cn/blog

歡迎關注微信公眾號,第一時間,閱讀更多有關雲計算、大數據文章。

weixin.qq.com/r/zUgkPBv (二維碼自動識別)

原創文章,轉載請註明: 轉載自Itweet的博客

本博客的文章集合: itweet.cn/blog/archive/


推薦閱讀:

目前流行病學有成熟的資料庫嗎?
阿里雲Elasticsearch的X-Pack:機器學習、安全保障和可視化
用戶資料庫是用mongodb好,還是用mysql好?

TAG:分布式系统 | 搜索 | 数据库 |