漫談Cassandra客戶端的使用
51CTO數(shù)據(jù)庫頻道之前也曾有過《NoSOL:關系型數(shù)據(jù)庫終結者?》專題,希望大家能更深入的了解NoSQL。
最近試用了一段時間Cassandra,將Oracle中的數(shù)據(jù)導入進來,遇到了問題然后解決問題,收獲挺大。在這個過程中,除了設計一個合理的數(shù)據(jù)模型,再就是使用Cassandra API進行交互了。
Cassandra在設計的時候,就是支持Thrift的,這意味著我們可以使用多種語言開發(fā)。
對于Cassandra的開發(fā)本身而言,這是使用Thrift的好處:支持多語言。壞處也是顯而易見的:Thrift API功能過于簡單,不具備在生產環(huán)境使用的條件。
在Cassandra Wiki頁面上,也有基于Thrift API開發(fā)的更加高級的API,各個語言都有,具體信息可以參考:http://wiki.apache.org/cassandra/ClientExamples。
這次只談談下面兩類Java的客戶端:
1 Thrift Java API
2 hector
Thrift Java API
這個是Cassandra自帶的最簡單的一類API,這個文件在apache-cassandra-0.5.1.jar中包含了??梢灾苯邮褂谩N覀円部梢宰约喊惭b一個Thrift,然后通過cassandra.thrift文件自動生成。
如果你要使用Cassandra,那么我們必須要了解Thrift API,畢竟所有的其他更加高級的API都是基于這個來包裝的。
提供的功能
插入數(shù)據(jù)
插入數(shù)據(jù)需要指定keyspace,ColumnFamily, Column,Key,Value,timestamp和數(shù)據(jù)同步級別。(如何需要了Cassandra的解數(shù)據(jù)模型,可以參考《大話Cassandra數(shù)據(jù)模型》)
- /** * Insert a Column consisting of (column_path.column, value, timestamp)
 - at the given column_path.column_family and optional
 - * column_path.super_column. Note that column_path.column is here required,
 - since a SuperColumn cannot directly contain binary
 - * values -- it can only contain sub-Columns.
 - *
 - * @param keyspace
 - * @param key
 - * @param column_path
 - * @param value
 - * @param timestamp
 - * @param consistency_level
 - */public void insert(String keyspace, String key, ColumnPath column_path, byte[] value, long timestamp, int consistency_level) throws InvalidRequestException, UnavailableException, TimedOutException, TException;
 - /** * Insert Columns or SuperColumns across different Column Families for the same row key. batch_mutation is a
 - * map<string, list<ColumnOrSuperColumn>> -- a map which pairs column family names with the relevant ColumnOrSuperColumn
 - * objects to insert.
 - *
 - * @param keyspace
 - * @param key
 - * @param cfmap
 - * @param consistency_level
 - */public void batch_insert(String keyspace, String key, Map<String,List<ColumnOrSuperColumn>> cfmap, int consistency_level) throws InvalidRequestException, UnavailableException
 
讀取數(shù)據(jù)
獲取一個查詢條件精確的值。
- /** * Get the Column or SuperColumn at the given column_path. If no value is present, NotFoundException is thrown. (This is
 - * the only method that can throw an exception under non-failure conditions.)
 - * * @param keyspace
 - * @param key
 - * @param column_path
 - * @param consistency_level
 - */public ColumnOrSuperColumn get(String keyspace, String key, ColumnPath column_path,
 - int consistency_level) throws InvalidRequestException, NotFoundException, UnavailableException, TimedOutException, TException;
 - /** * Perform a get for column_path in parallel on the given list<string> keys. The return value maps keys to the
 - * ColumnOrSuperColumn found. If no value corresponding to a key is present, the key will still be in the map, but both
 - * the column and super_column references of the ColumnOrSuperColumn object it maps to will be null.
 - * * @param keyspace
 - * @param keys * @param column_path
 - * @param consistency_level
 - */public Map<String,ColumnOrSuperColumn> multiget(String keyspace, List<String> keys,
 - ColumnPath column_path, int consistency_level) throws InvalidRequestException
 
獲取某一個keyspace,Key,ColumnFamily,SuperColumn(如果有的話需要指定)下面的相關數(shù)據(jù):只查詢Column的name符合條件的相關數(shù)據(jù)(SlicePredicate)。
- /** * Get the group of columns contained by column_parent (either a ColumnFamily name or a ColumnFamily/SuperColumn name
 - * pair) specified by the given SlicePredicate. If no matching values are found, an empty list is returned.
 - * * @param keyspace
 - * @param key
 - * @param column_parent
 - * @param predicate
 - * @param consistency_level
 - */public List<ColumnOrSuperColumn> get_slice(String keyspace, String key, ColumnParent column_parent, SlicePredicate predicate,
 - int consistency_level) throws InvalidRequestException, UnavailableException, TimedOutException, TException; /*
 - * * Performs a get_slice for column_parent and predicate for the given keys in parallel.
 - *
 - * @param keyspace
 - * @param keys
 - * @param column_parent
 - * @param predicate
 - * @param consistency_level
 - */public Map<String,List<ColumnOrSuperColumn>> multiget_slice(String keyspace, List<String> keys, ColumnParent column_parent,
 - SlicePredicate predicate, int consistency_level) throws InvalidRequestException, UnavailableException, TimedOutException, TException;
 
查詢Key的取值范圍(使用這個功能需要使用order-preserving partitioner)。
- /** * @deprecated; use get_range_slice instead
 - *
 - * @param keyspace
 - * @param column_family
 - * @param start
 - * @param finish
 - * @param count
 - * @param consistency_level
 - */public List<String> get_key_range(String keyspace, String column_family,
 - String start, String finish, int count, int consistency_level)
 - throws InvalidRequestException, UnavailableException, TimedOutException, TException; /*
 - *
 - * returns a subset of columns for a range of keys.
 - *
 - * @param keyspace
 - * @param column_parent
 - * @param predicate
 - * @param start_key
 - * @param finish_key
 - * @param row_count
 - * @param consistency_level
 - */public List<KeySlice> get_range_slice(String keyspace, ColumnParent column_parent,
 - SlicePredicate predicate, String start_key, String finish_key, int row_count
 
查詢系統(tǒng)的信息。
- /**
 - * get property whose value is of type string.
 - *
 - * @param property
 - */public String get_string_property(String property) throws TException; /*
 - *
 - * get property whose value is list of strings.
 - *
 - * @param property */public List<String> get_string_list_property(String property) throws TException; /*
 - *
 - * describe specified keyspace
 - *
 - * @param keyspace
 - */public Map<String,Map<String,String>> describe_keyspace(String keyspace)
 - throws NotFoundException, TException;
 
通過這些操作,我們可以了解到系統(tǒng)的信息。
其中一個比較有意思的查詢信息是:token map,通過這個我們可以知道哪些Cassandra Service是可以提供服務的。
刪除數(shù)據(jù)
- /**
 - * Remove data from the row specified by key at the granularity specified by column_path,
 - and the given timestamp. Note
 - * that all the values in column_path besides column_path.column_family are truly optional: you can remove the entire
 - * row by just specifying the ColumnFamily, or you can remove a SuperColumn
 - or a single Column by specifying those levels too.
 - *
 - * @param keyspace
 - * @param key
 - * @param column_path
 - * @param timestamp
 - * @param consistency_level
 - */public void remove(String keyspace, String key, ColumnPath column_path,
 - long timestamp, int consistency_level) throws InvalidRequestException, UnavailableException
 
這里需要注意的是,由于一致性的問題。這里的刪除操作不會立即刪除所有機器上的該數(shù)據(jù),但是最終會一致。
程序范例
- import java.util.List;
 - import java.io.UnsupportedEncodingException;
 - import org.apache.thrift.transport.TTransport;
 - import org.apache.thrift.transport.TSocket;
 - import org.apache.thrift.protocol.TProtocol;
 - import org.apache.thrift.protocol.TBinaryProtocol;
 - import org.apache.thrift.TException;
 - import org.apache.cassandra.service.*;
 - public class CClient{
 - public static void main(String[] args)
 - throws TException, InvalidRequestException,
 - UnavailableException, UnsupportedEncodingException, NotFoundException
 - {
 - TTransport tr = new TSocket("localhost", 9160);
 - TProtocol proto = new TBinaryProtocol(tr);
 - Cassandra.Client client = new Cassandra.Client(proto);
 - tr.open();
 - String key_user_id = "逖靖寒的世界";
 - // insert data
 - long timestamp = System.currentTimeMillis();
 - client.insert("Keyspace1", key_user_id,
 - new ColumnPath("Standard1", null, "網址".getBytes("UTF-8")),
 - "http://gpcuster.cnblogs.com".getBytes("UTF-8"), timestamp,ConsistencyLevel.ONE);
 - client.insert("Keyspace1", key_user_id,
 - new ColumnPath("Standard1", null, "作者".getBytes("UTF-8")),
 - "逖靖寒".getBytes("UTF-8"), timestamp, ConsistencyLevel.ONE);
 - // read single column
 - ColumnPath path = new ColumnPath("Standard1", null, "name".getBytes("UTF-8"));
 - System.out.println(client.get("Keyspace1", key_user_id, path, ConsistencyLevel.ONE));
 - // read entire row
 - SlicePredicate predicate = new SlicePredicate(null, new SliceRange(new byte[0], new byte[0], false, 10));
 - ColumnParent parent = new ColumnParent("Standard1", null);
 - List<ColumnOrSuperColumn> results = client.get_slice("Keyspace1", key_user_id, parent, predicate, ConsistencyLevel.ONE);
 - for (ColumnOrSuperColumn result : results) {
 - Column column = result.column;
 - System.out.println(new String(column.name, "UTF-8") + " -> " + new String(column.value, "UTF-8"));
 - }
 - tr.close();
 - }}
 
優(yōu)點與缺點
優(yōu)點:簡單高效
缺點:功能簡單,無法提供連接池,錯誤處理等功能,不適合直接在生產環(huán)境使用。
Hector
Hector是基于Thrift Java API包裝的一個Java客戶端,提供一個更加高級的一個抽象。
程序范例
- package me.prettyprint.cassandra.service;
 - import static me.prettyprint.cassandra.utils.StringUtils.bytes;
 - import static me.prettyprint.cassandra.utils.StringUtils.string;
 - import org.apache.cassandra.service.Column;
 - import org.apache.cassandra.service.ColumnPath;
 - public class ExampleClient {
 - public static void main(String[] args) throws IllegalStateException, PoolExhaustedException,
 - Exception {
 - CassandraClientPool pool = CassandraClientPoolFactory.INSTANCE.get(); CassandraClient client = pool.borrowClient("localhost", 9160);
 - // A load balanced version would look like this:
 - // CassandraClient client = pool.borrowClient(new String[] {"cas1:9160", "cas2:9160", "cas3:9160"});
 - try {
 - Keyspace keyspace = client.getKeyspace("Keyspace1");
 - ColumnPath columnPath = new ColumnPath("Standard1", null, bytes("網址"));
 - // insert
 - keyspace.insert("逖靖寒的世界", columnPath, bytes("http://gpcuster.cnblogs.com"));
 - // read
 - Column col = keyspace.getColumn("逖靖寒的世界", columnPath); System.out.println("Read from cassandra: " + string(col.getValue()));
 - } finally {
 - // return client to pool. do it in a finally block to make sure it's executed
 - pool.releaseClient(client);
 - } }}
 
優(yōu)點
1 提供連接池。
2 提供錯誤處理:當操作失敗的時候,Hector會根據(jù)系統(tǒng)信息(token map)自動連接另一個Cassandra Service。
3 編程接口容易使用。
4 支持JMX。
缺點
1 不支持多線程的環(huán)境。
2 keyspace封裝過多(數(shù)據(jù)校驗和數(shù)據(jù)重新封裝),如果進行大量的數(shù)據(jù)操作,這里的消耗需要考慮。
3 錯誤處理不夠人性化:如果所有的Cassandra Service都非常繁忙,那么經過多次操作失敗后,最終的結果失敗。
總結
Hector已經是一個基本足夠使用的Java客戶端了,但是還是缺乏一些相關的功能,比如:
1 線程安全。
2 支持自動的多線程查詢和插入,提高操作效率。
3 人性化的錯誤處理機制。
4 避免過多的封裝。
原文標題:談談Cassandra的客戶端
鏈接: http://www.cnblogs.com/gpcuster/archive/2010/03/23/1692794.html















 
 
 
 
 
 
 