HIVE中關(guān)于collect_set與explode函數(shù)妙用
hive的復(fù)合數(shù)據(jù)類型
Hive中的列支持使用三類復(fù)雜的集合數(shù)據(jù)類型,即:array,map及struct,這些類型的名稱是保留字,具體用法可參見該篇博文,里面有關(guān)于三類基本集合數(shù)據(jù)類型的操作實(shí)例,注:map中可嵌套array類型。
例如,定義表:
- create table example (
- device_id string,
- login_ip array<string>,
- user_info map<string,array<string>>
- address struct<street:string,city:string,state:string>
- )
- row format delimited
- fields terminated by '\001'
- collection items terminated by '\002'
- map keys terminated by '\003'
- lines terminated by '\n'
- stored as RCFile;
假設(shè)這樣的數(shù)據(jù)類型以分區(qū)表存儲,你要統(tǒng)計(jì)一段時間類no=1下的去重score,那么該怎么辦了?這里可配合使用lateral view首先實(shí)現(xiàn)列轉(zhuǎn)行的功能,如下所示:
select no,score from tablaa lateral view explode(score_set) xxx as score;
注:xxx代表虛表名稱,不能缺少。
進(jìn)一步深化上述代碼解決統(tǒng)計(jì)一段時間的去重值,可寫為:
select no,collect_set(score) from tablaa lateral view explode(score_set) xxx as score group by no;
這樣,將兩個函數(shù)結(jié)合實(shí)現(xiàn)了行轉(zhuǎn)列或列轉(zhuǎn)行的妙用。