Elasticsearch:文檔的CRUD操作API（第三篇）

02-03

下面講的是如何使用API來創建、檢索、更新、刪除文檔，暫時不關心如何query他們，暫時先關注文檔如何在es中存儲的並讓他們返回。

主要包括如下內容API：

A、單個文檔操作

Index API、Get API、Delete API、Update API

B、批量多個文檔的操作

Multi Get API、Bulk API、Delete By Query API、Update By Query API、Reindex API

1、Index API索引一個文檔

Elasticsearch中的每個索引都被分成分片，每個分片可以有多個副本。這些副本被稱為複製副本，並且在添加或刪除文檔時必須保持同步數據。Elasticsearch的數據複製模型基於主備份模型。

基本的索引過程：

在創建索引的過程中，當你發送文檔時，Elasticsearch會根據文檔的標識符，選擇文檔應編入索引的分片。默認情況下，Elasticsearch計算文檔標識符的散列值，以此為基礎將文檔放置於一個可用的主分片上。下面就是索引的api：

PUT /{index}/{type}/{id}

{

t"field":"value",

……

}

如下所示索引是twitter，類型是tweet，Id是1：

curl -XPUT localhost:9200/twitter/tweet/1 -d

{

"user" : "kimchy",

"post_date" : "2009-11-15T14:12:12",

"message" : "trying out Elasticsearch"

}

就會返回相應的響應報文。

1.2、自動創建索引

如果尚未創建索引，PUT索引操作會自動創建一個索引、類型、自動創建動態類型映射mapping。

禁用自動創建索引：可以通過在所有節點的配置文件中將action.auto_create_index設置為false或通過集群更新設置API來禁用。

禁用自動映射創建：通過將index.mapper.dynamic設置為false per-index作為索引設置，可以禁用自動映射創建。

自動索引創建可以包括一個基於模式的白/黑名單，例如，set action.auto_create_index到+ aaa *， - bbb *，+ ccc *， - *（+意思是允許的，而-意思是不允許的）。

2、Get API檢索文檔

跟索引文檔類似，只是PUT改為GET方式：

curl -XGET localhost:9200/twitter/tweet/0?pretty

{

"_index" : "twitter",

"_type" : "tweet",

"_id" : "0",

"_version" : 1,

"found": true,

"_source" : {

"user" : "kimchy",

"date" : "2009-11-15T14:12:12",

"likes": 0,

"message" : "trying out Elasticsearch"

}

2.2檢索文檔一部分，source過濾

關閉_source：

curl -XGET localhost:9200/twitter/tweet/0?_source=false&pretty

_source指定欄位user：

curl -XGET localhost:9200/twitter/tweet/0?_source=user&pretty

2.3路由，可以通過路由控制es索引哪個分片

curl -XGET localhost:9200/twitter/tweet/2?routing=user1&pretty

3、delete API 刪除文檔

跟之前的語法類似，只是使用Delete

curl -XDELETE localhost:9200/twitter/tweet/1?pretty

返回報文，如果找到

{

……

"found" : true,

……

}

如果未找到就是false了

3.2 time out超時

默認情況下，刪除操作將在主分片上等待最多1分鐘，然後出現故障並作出響應並顯示錯誤。 timeout參數可以用來明確指定等待的時間。以下是將其設置為5分鐘的示例：

curl -XDELETE localhost:9200/twitter/tweet/1?timeout=5m&pretty

4、Delete By Query API 根據查詢條件刪除文檔

刪除符合查詢條件的文檔

curl -XPOST localhost:9200/twitter/_delete_by_query?pretty -H Content-Type: application/json -d

{

"query": {

"match": {

"message": "some message"

}

4.2 scroll_size 批量滾動size大小

默認情況下，_delete_by_query使用1000的滾動批處理。您可以使用scroll_size URL參數更改批處理大小：

curl -XPOST localhost:9200/twitter/_delete_by_query?scroll_size=5000&pretty -H Content-Type: application/json -d

{

"query": {

"term": {

"user": "kimchy"

}

5、Update API更新文檔

文檔在es中是不可變的，如果更新已經存在的文檔，我們需要重新索引或者替換掉它。

其實es是檢索-修改-重新索引的流程

5.1腳本更新 scripted update

curl -XPOST localhost:9200/test/type1/1/_update?pretty -H Content-Type: application/json -d

{

"script" : {

"source": "ctx._source.counter += params.count",

"lang": "painless",

"params" : {

"count" : 4

}

script欄位定義了要對文檔進行的操作，可以是任何腳本。這種腳本的好處是更新文檔的時候可以添加一些額外的邏輯。

5.2簡單欄位更新

更新指定文檔的欄位

curl -XPOST localhost:9200/test/type1/1/_update?pretty -H Content-Type: application/json -d

{

"doc" : {

"name" : "new_name"

}

如果在發送請求之前name是new_name，那麼整個更新請求將被忽略。如果請求被忽略，響應中的結果元素將返回noop。

{

"_shards": {

"total": 0,

"successful": 0,

"failed": 0

"_index": "test",

"_type": "type1",

"_id": "1",

"_version": 6,

"result": noop

}

5.3 更新不存在的文檔

可以使用upsert參數定義文檔不存在時候直接創建

curl -XPOST localhost:9200/test/type1/1/_update?pretty -H Content-Type: application/json -d

{

"script" : {

"source": "ctx._source.counter += params.count"

}

"upsert":{"counter":1}

}

6、批量操作Mget

6.1為了節約每個請求都要發起網路開銷，可以合併多個請求來避免它，它的速度將更快

curl -XGET localhost:9200/_mget?pretty -H Content-Type: application/json -d

{

"docs" : [

{

"_index" : "test",

"_type" : "type",

"_id" : "1"

{

"_index" : "test",

"_type" : "type",

"_id" : "2"

}

]

}

相應的報文是一個doc數組，他們按照請求的順序排序：

{

"docs" : [

{

"_index" : "test",

"_type" : "type",

"_id" : "1",

"found" : false

{

"_index" : "test",

"_type" : "type",

"_id" : "2",

"found" : false

}

]

}

6.2如果你檢索的文檔是在同一個index，甚至同一個type，直接在URL帶上index，type即可

curl -XGET localhost:9200/test/type_mget?pretty -H Content-Type: application/json -d

{

"ids" : [

"1","2"

]

}

7、bulk API 批量操作

就想mget允許在檢索多個文檔的時候進行批量操作。那麼bulk允許我們使用單一請求來實現多個文檔的update、index、delete、create操作。

這個是非常有用的高效索引方式。

curl -XPOST localhost:9200/_bulk?pretty -H Content-Type: application/json -d

{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }

{ "field1" : "value1" }

{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }

{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }

{ "field1" : "value3" }

{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} }

{ "doc" : {"field2" : "value2"} }

響應報文是包含一個items的數組，它羅列了每個結果：

{

"took": 30,

"errors": false,

"items": [

{

"index": {

"_index": "test",

"_type": "type1",

"_id": "1",

"_version": 1,

"result": "created",

"_shards": {

"total": 2,

"successful": 1,

"failed": 0

"created": true,

"status": 201

}

{

"delete": {

…………

………

}

行為 actions 有以下幾種：

index 創建文檔或者替換,

create 當文檔不存在的時候創建它,

update 局部更新文檔

必須要指定index 、type、id

注意bulk請求不是原子操作的，每個請求的操作是分開的，它的成功與否不影響其他請求。

8、重建索引Reindex API

重建索引不會嘗試設置目標索引，它不會複製源索引的設置。你應該在運行_reindex操作之前設置目標索引，包括設置映射，分片數量，副本等。

_reindex最基本的形式只是將文件從一個索引複製到另一個索引。這會將twitter索引中的文檔複製到new_twitter索引中：

curl -XPOST localhost:9200/_reindex?pretty -H Content-Type: application/json -d

{

"source": {

"index": "twitter"

"dest": {

"index": "new_twitter"

}

還可以通過限制query的方式進行重建索引

curl -XPOST localhost:9200/_reindex?pretty -H Content-Type: application/json -d

{

"source": {

"index": "twitter",

"type": "tweet",

"query": {

"term": {

"user": "kimchy"

}

"dest": {

"index": "new_twitter"

}

8.2遠程重建

Reindex支持從遠程Elasticsearch集群重建索引：

curl -XPOST localhost:9200/_reindex?pretty -H Content-Type: application/json -d

{

"source": {

"remote": {

"host": "http://otherhost:9200",

"username": "user",

"password": "pass"

"index": "source",

"query": {

"match": {

"test": "data"

}

"dest": {

"index": "dest"

}

8.3 Task API 任務API配合使用

您可以使用任務API獲取所有正在運行的重新索引請求的狀態：

curl -XGET localhost:9200/_tasks?detailed=true&actions=*reindex&pretty

返回相應結果

{

"nodes" : {

"r1A2WoRbTwKZ516z6NEs5A" : {

"name" : "r1A2WoR",

"transport_address" : "127.0.0.1:9300",

"host" : "127.0.0.1",

"ip" : "127.0.0.1:9300",

"attributes" : {

"testattr" : "test",

"portsfile" : "true"

"tasks" : {

"r1A2WoRbTwKZ516z6NEs5A:36619" : {

"node" : "r1A2WoRbTwKZ516z6NEs5A",

"id" : 36619,

"type" : "transport",

"action" : "indices:data/write/reindex",

"status" : {

"total" : 6154,

"updated" : 3500,

"created" : 0,

"deleted" : 0,

"batches" : 4,

"version_conflicts" : 0,

"noops" : 0,

"retries": {

"bulk": 0,

"search": 0

"throttled_millis": 0

"description" : ""

}

可以進行取消操作

curl -XPOST localhost:9200/_tasks/task_id:1/_cancel?pretty

謝謝，下一篇將介紹es的查詢query相關知識。