Today, we will take another look at another big data technology.
Apache HBase is the topic for today and before we dip our toe into Apache HBase, let's find out what actually is
Apache HBase.
Apache HBase [1] is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al.[2] Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Apache Hadoop [3].
In this article, we can setup a single node for this adventure. Before we begin, let's download a copy of Apache HBase
here. Once downloaded, extract the compressed content. At the time of this writing, I'm using Apache HBase version 1.1.1 for this learning experience.
user@localhost:~/Desktop/hbase-1.1.1$ ls
bin CHANGES.txt conf docs hbase-webapps lib LICENSE.txt NOTICE.txt README.txt
If you have not install java, go ahead and install it. Pick a recent java or at least java7. Make sure terminal prompt the correct version of java. An example would be as of following
user@localhost:~/Desktop/hbase-1.1.1$ java -version
java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)
If you cannot change system configuration for this java, then in the HBase configuration file,
conf/hbase-env.sh, uncomment
JAVA_HOME variable and set to the java that you installed. The main configuration file for hbase is
conf/hbase-site.xml and we will now edit this file so it became such as following. Change to your environment as required.
user@localhost:~/Desktop/hbase-1.1.1$ cat conf/hbase-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
/**
*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-->
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///home/user/Desktop/hbase-1.1.1</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/user/zookeeper</value>
</property>
</configuration>
Okay, we are ready to start hbase. start it with a helpful script
bin/start-hbase.sh
user@localhost:~/Desktop/hbase-1.1.1$ bin/start-hbase.sh
starting master, logging to /home/user/Desktop/hbase-1.1.1/bin/../logs/hbase-user-master-localhost.out
user@localhost:~/Desktop/hbase-1.1.1/logs$ tail -F hbase-user-master-localhost.out SecurityAuth.audit hbase-user-master-localhost.log
==> hbase-user-master-localhost.out <==
==> SecurityAuth.audit <==
2015-08-18 17:49:41,533 INFO SecurityLogger.org.apache.hadoop.hbase.Server: Connection from 127.0.1.1 port: 36745 with version info: version: "1.1.1" url: "git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision: "d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun 23 14:44:07 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"
2015-08-18 17:49:46,812 INFO SecurityLogger.org.apache.hadoop.hbase.Server: Connection from 127.0.0.1 port: 53042 with version info: version: "1.1.1" url: "git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision: "d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun 23 14:44:07 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"
2015-08-18 17:49:48,309 INFO SecurityLogger.org.apache.hadoop.hbase.Server: Connection from 127.0.0.1 port: 53043 with version info: version: "1.1.1" url: "git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision: "d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun 23 14:44:07 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"
2015-08-18 17:49:49,317 INFO SecurityLogger.org.apache.hadoop.hbase.Server: Connection from 127.0.0.1 port: 53044 with version info: version: "1.1.1" url: "git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision: "d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun 23 14:44:07 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"
==> hbase-user-master-localhost.log <==
2015-08-18 17:49:49,281 INFO [StoreOpener-78a2a3664205fcf679d2043ac3259648-1] hfile.CacheConfig: blockCache=LruBlockCache{blockCount=0, currentSize=831688, freeSize=808983544, maxSize=809815232, heapSize=831688, minSize=769324480, minFactor=0.95, multiSize=384662240, multiFactor=0.5, singleSize=192331120, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false
2015-08-18 17:49:49,282 INFO [StoreOpener-78a2a3664205fcf679d2043ac3259648-1] compactions.CompactionConfiguration: size [134217728, 9223372036854775807); files [3, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point 2684354560; major period 604800000, major jitter 0.500000, min locality to compact 0.000000
2015-08-18 17:49:49,295 INFO [RS_OPEN_REGION-localhost:60631-0] regionserver.HRegion: Onlined 78a2a3664205fcf679d2043ac3259648; next sequenceid=2
2015-08-18 17:49:49,303 INFO [PostOpenDeployTasks:78a2a3664205fcf679d2043ac3259648] regionserver.HRegionServer: Post open deploy tasks for hbase:namespace,,1439891388424.78a2a3664205fcf679d2043ac3259648.
2015-08-18 17:49:49,322 INFO [PostOpenDeployTasks:78a2a3664205fcf679d2043ac3259648] hbase.MetaTableAccessor: Updated row hbase:namespace,,1439891388424.78a2a3664205fcf679d2043ac3259648. with server=localhost,60631,1439891378840
2015-08-18 17:49:49,332 INFO [AM.ZK.Worker-pool3-t6] master.RegionStates: Transition {78a2a3664205fcf679d2043ac3259648 state=OPENING, ts=1439891389276, server=localhost,60631,1439891378840} to {78a2a3664205fcf679d2043ac3259648 state=OPEN, ts=1439891389332, server=localhost,60631,1439891378840}
2015-08-18 17:49:49,603 INFO [ProcessThread(sid:0 cport:-1):] server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x14f4036b87d0000 type:create cxid:0x1d5 zxid:0x44 txntype:-1 reqpath:n/a Error Path:/hbase/namespace/default Error:KeeperErrorCode = NodeExists for /hbase/namespace/default
2015-08-18 17:49:49,625 INFO [ProcessThread(sid:0 cport:-1):] server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x14f4036b87d0000 type:create cxid:0x1d8 zxid:0x46 txntype:-1 reqpath:n/a Error Path:/hbase/namespace/hbase Error:KeeperErrorCode = NodeExists for /hbase/namespace/hbase
2015-08-18 17:49:49,639 INFO [localhost:51452.activeMasterManager] master.HMaster: Master has completed initialization
2015-08-18 17:49:49,642 INFO [localhost:51452.activeMasterManager] quotas.MasterQuotaManager: Quota support disabled
and you notice, log file is also available and
jps shown a
HMaster is running.
user@localhost: $ jps
22144 Jps
21793 HMaster
okay, let's experience apache hbase using a hbase shell.
user@localhost:~/Desktop/hbase-1.1.1$ ./bin/hbase shell
2015-08-18 17:55:25,134 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.1, rd0a115a7267f54e01c72c603ec53e91ec418292f, Tue Jun 23 14:44:07 PDT 2015
hbase(main):001:0>
A help command show very helpful description such as the followings.
hbase(main):001:0> help
HBase Shell, version 1.1.1, rd0a115a7267f54e01c72c603ec53e91ec418292f, Tue Jun 23 14:44:07 PDT 2015
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.
COMMAND GROUPS:
Group name: general
Commands: status, table_help, version, whoami
Group name: ddl
Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, show_filters
Group name: namespace
Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables
Group name: dml
Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve
Group name: tools
Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, compact_rs, flush, major_compact, merge_region, move, split, trace, unassign, wal_roll, zk_dump
Group name: replication
Commands: add_peer, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, list_peers, list_replicated_tables, remove_peer, remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs
Group name: snapshots
Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot
Group name: configuration
Commands: update_all_config, update_config
Group name: quotas
Commands: list_quotas, set_quota
Group name: security
Commands: grant, revoke, user_permission
Group name: visibility labels
Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility
SHELL USAGE:
Quote all names in HBase Shell such as table and column names. Commas delimit
command parameters. Type <RETURN> after entering a command to run it.
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this:
{'key1' => 'value1', 'key2' => 'value2', ...}
and are opened and closed with curley-braces. Key/values are delimited by the
'=>' character combination. Usually keys are predefined constants such as
NAME, VERSIONS, COMPRESSION, etc. Constants do not need to be quoted. Type
'Object.constants' to see a (messy) list of all constants in the environment.
If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:
hbase> get 't1', "key\x03\x3f\xcd"
hbase> get 't1', "key\003\023\011"
hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"
The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, see http://hbase.apache.org/book.html
hbase(main):002:0>
To create a table (column family),
hbase(main):002:0> create 'test', 'cf'
0 row(s) in 1.5700 seconds
=> Hbase::Table - test
hbase(main):003:0>
list information about a table.
hbase(main):001:0> list 'test'
TABLE
test
1 row(s) in 0.3530 seconds
=> ["test"]
let's put something into the table we have just created.
hbase(main):002:0> put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 0.2280 seconds
hbase(main):003:0> put 'test', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0140 seconds
hbase(main):004:0> put 'test', 'row3', 'cf:c', 'value3'
0 row(s) in 0.0060 seconds
hbase(main):005:0>
Here, we insert three values, one at a time. The first insert is at row1, column cf:a, with a value of value1. Columns in HBase are comprised of a column family prefix, cf in this example, followed by a colon and then a column qualifier suffix, a in this case.
To select the row from the table, use scan.
hbase(main):005:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1439892359305, value=value1
row2 column=cf:b, timestamp=1439892363921, value=value2
row3 column=cf:c, timestamp=1439892369775, value=value3
3 row(s) in 0.0420 seconds
hbase(main):006:0>
To get a row only.
hbase(main):006:0> get 'test', 'row1'
COLUMN CELL
cf:a timestamp=1439892359305, value=value1
1 row(s) in 0.0340 seconds
hbase(main):007:0>
Something really interesting about apache hbase, say if you want to delete or change settings of a table, you need to disable it first. After that, you can enable it back.
hbase(main):007:0> disable 'test'
0 row(s) in 2.3610 seconds
hbase(main):008:0> enable 'test'
0 row(s) in 1.2790 seconds
hbase(main):009:0>
okay, now, let's delete this table.
hbase(main):009:0> drop 'test'
ERROR: Table test is enabled. Disable it first.
Here is some help for this command:
Drop the named table. Table must first be disabled:
hbase> drop 't1'
hbase> drop 'ns1:t1'
hbase(main):010:0> disable 'test'
0 row(s) in 2.2640 seconds
hbase(main):011:0> drop 'test'
0 row(s) in 1.2800 seconds
hbase(main):012:0>
Okay, we are done for this basic learning. Let's quit for now.
hbase(main):012:0> quit
user@localhost:~/Desktop/hbase-1.1.1$
To stop apache hbase instance,
user@localhost:~/Desktop/hbase-1.1.1$ ./bin/stop-hbase.sh
stopping hbase.................
user@localhost:~/Desktop/hbase-1.1.1$ jps
23399 Jps
5445 org.eclipse.equinox.launcher_1.3.0.v20140415-2008.jar
If you like me who came from apache cassandra, apache hbase looks very similar. If this interest you, I shall leave you with the following three links which will get you further.
http://hbase.apache.org/book.html
http://wiki.apache.org/hadoop/Hbase
https://blogs.apache.org/hbase/