hive入門第二章-增刪改查

05-16

接第一章，我們安裝上了hive數據倉庫。

我們知道hive提供類sql語言，在處理層把sql轉移成mapreduce的java底層來實現操作。

1.hive下執行linux腳本。

我們希望查看linux下文本文件，但又不希望退出hive命令窗口，使用！+linux命令即可

!clear;#即可實現清屏操作!hdfs dfs -lsr / #查看分散式的文件系統

2.creat 創建表

#2.2分區表的創建

有的時候查詢我們並不希望掃描全表，所以分區能實現快速查詢。

hive> create table myhive.test1(id int,name string,age int) > partitioned by (provience string,city string)#分區 > row format delimited > fields terminated by > lines terminated by > stored as textfile;

手動添加分區信息alter修改表

alter table test1 add partition(provience=henan,city=nanyang);

可以看到在hdfs下有了一個test1表，有了一個test1河南表，test1河南南陽表

往分區表裡添加數據。

我們先寫一個txt，有id name age 然後上傳到hdfs下對應資料庫。

hive> !hdfs dfs -put /home/hadoop/name.txt /user/hive/warehouse/myhive.db/test1/provience=henan/city=nanyang;

可以看到，原本上傳的只有三列，現在查出來有5列

hive> insert into test1 partition(provience=henan,city=nanyang) values(1001,bbq,18);

也可以使用insert into 語句把新的數值插入進去。

上面的都是靜態分區，需要自己手動添加（alter add），也可以使用動態分區

insert 【overwrite】into tablename partition(povience,city)select id,name,age hebei as provice,shijiazhuang as city from tablename2;

注意上面的語句因為兩個分區屬性都是動態的，所以需要關閉嚴格模式，按報錯提示set就可以了

3.導入導出數據

#3.1導入

既然是數據倉庫，避免不了整進整出。在導入的時候我們說了要先定義表，%

create table tablename row format delimited fields ternamited lines ternamited by stored as textfile

然後load data local inpath 路徑 into tablename

或者hdfs dfs -put 本地路徑 dfs路徑

#3.2導出

insert overwrite local directory /home/hadoop/temp select * from test1 where provience=henan

執行成功，同時去temp下可以找到文件

4.其實到這裡基礎操作已經差不多了，相信大家也都看到了，在類sql程度高達90%，

所以下面就省略寫了，如果沒有sql基礎，建議去看看sql

#查詢

select * from table

where order by [desc] group by having limit

case when then else end

#聚合函數

count() max min sum / * round

#explode -數據分列

hive 有類數組形式和類字典形式，可以分列

select explode（array(tom,toms,tomslee)

#去重 distinct

#類型轉換

select cast(120 as int)

#字元串連接

select concat(120,120)

#避免MapReduce作業

#1.全表掃描不加where

#2.where只匹配分區

#3.limit

#4.設置hive.exec.model.local.auto=true

5.視圖

視圖一般兩個常用的作用，1.簡化嵌套查詢，2.保護底層數據隱私。

因為sql經常有嵌套查詢，所以我們可以把嵌套的子查詢保存為一個視圖（類似於一張表），然後就可以多次使用，節省時間，同時給外部人員查詢時，可以過濾掉敏感信息創建視圖給外人查詢訪問。

create view viewname as query;

這裡解釋一下，視圖的效果只是類似於表，但不是表，你去hdfs下是找不到view1這個數據的，當然我們可以把視圖轉換為表保存在dfs下。

create table tablename like view2;

刪除視圖

drop view if exists view1;

6.索引

我們知道索引可以幫助快速定位目標實現快速查詢，mysql有的pramary key(zhujian) 和auto_increment（自增），hive都沒有，但是也增加了索引功能，

hive> create index test1_idx_id on table test1(id) as org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler with deferred rebuild idxproperties(create=me) in table test1_index comment this is index;

但這個索引只是保存在資料庫元數據，我們可以重建索引，讓他在hdfs下可見。

hive> alter index test1_idx_id on test1 rebuild;

看到在hdfs下有索引文件，最前數字就是索引id，後面是文件，最後面是偏移量，是定位

7.分桶

前面介紹了分區，分區是路徑，是目錄，是文件的邏輯隔離，有效降低查詢速度

而桶則是文件。將你設定的列哈希之後放入桶。

create table orderitems (id int,name string,oid int) clustered by (id) into 3 buckets row format delimited fields terminated by lines terminated by stored as textfile;

抱歉，這裡突然發現 load 不行，只能inser into 才會分桶

insert into orderitems values(10,uuuu,2);