Showing posts with label virtual machine. Show all posts
Showing posts with label virtual machine. Show all posts

Sunday, August 30, 2015

First learning into Cloudera Impala

Let's take a look into a vendor big data technology today. In this article, we will take a look into Cloudera Impala. So what is Impala all about?

wikipedia definition

Cloudera Impala is Cloudera's open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop.[1]

and from the official github repository definition

Lightning-fast, distributed SQL queries for petabytes of data stored in Apache Hadoop clusters. 
Impala is a modern, massively-distributed, massively-parallel, C++ query engine that lets you analyze, transform and combine data from a variety of data sources:

Let us download a virtual machine image, this is good as impala works with integration with hadoop and if you don't have hadoop knowledge, you must start from establish hadoop cluster first before integrating it with Impala. With this virtual machine image, it is as easy as import this virtual machine image into the host and power it up. It also save time for you like setting it up and reduce error.

With that said, I'm downloading a virtual box image. Once download and extract to a directory. If you have not install virtualbox, you should by now install it. apt-get install virtualbox virtualbox-guest-additions-iso and make sure virtualbox instance is running.

 root@localhost:~# /etc/init.d/virtualbox status  
 ● virtualbox.service - LSB: VirtualBox Linux kernel module  
   Loaded: loaded (/etc/init.d/virtualbox)  
   Active: active (exited) since Thu 2015-08-20 17:07:43 MYT; 2min 36s ago  
    Docs: man:systemd-sysv-generator(8)  
  Process: 29390 ExecStop=/etc/init.d/virtualbox stop (code=exited, status=0/SUCCESS)  
  Process: 29425 ExecStart=/etc/init.d/virtualbox start (code=exited, status=0/SUCCESS)  
 Aug 20 17:07:43 localhost systemd[1]: Starting LSB: VirtualBox Linux kernel module...  
 Aug 20 17:07:43 localhost systemd[1]: Started LSB: VirtualBox Linux kernel module.  
 Aug 20 17:07:43 localhost virtualbox[29425]: Starting VirtualBox kernel modules.  

launch virtualbox and add that virtual image into a new instance, see screenshot below.

now power this virtual machine up! Please be patient as it will take a long time to boot it up. At least for my pc. Be patient and you might want to get some drink in the mean time. The ongoing article is using this tutorial. However, I give up as select statement take a long time and it is very slow in virtual environment, at least for me here. But I will illustrate until the point where it became slow.

first you need to copy this csv files (tab1.csv and tab2.csv) into the virtual machine.

Then you can load the script with the sql to create the tables and load the csv into the table. But the example given in the tutorial does not have database and i suggest you add these two lines into the script and load it up.

 create database testdb;  
 use testdb;  
 -- The EXTERNAL clause means the data is located outside the central location  
 -- for Impala data files and is preserved when the associated Impala table is dropped.  
 -- We expect the data to already ex  

After that, you can issue command impala-shell and you can do sql queries, but as you see, the select statement just hang there forever.

Not a good experience but if impala is what you need, find out what is the problem and let me know. :-)

Friday, January 30, 2015

Initial study to docker

Docker making so much fuss lately and today we are going to look into Docker. Let's start something basic, what actually is a docker? According to the definition from official site,
Docker is an open platform for developers and sysadmins to build, ship, and run distributed applications. Consisting of Docker Engine, a portable, lightweight runtime and packaging tool, and Docker Hub, a cloud service for sharing applications and automating workflows, Docker enables apps to be quickly assembled from components and eliminates the friction between development, QA, and production environments. As a result, IT can ship faster and run the same app, unchanged, on laptops, data center VMs, and any cloud.

and explanation from wikipedia
Docker is an open-source project that automates the deployment of applications inside software containers, by providing an additional layer of abstraction and automation of operating system–level virtualization on Linux.[2] Docker uses resource isolation features of the Linux kernel such as cgroups and kernel namespaces to allow independent "containers" to run within a single Linux instance, avoiding the overhead of starting virtual machines.[3]

Okay, that's the theory. If you want to quickly get an idea how docker work, you can try it here!

For people who has run virtual machine environment before, it may seem, hey isn't this very similar to the current virtual machine? But they are not the same really. See the software stack below virtual machines versus docker.


Next, we will install docker locally and the below illustration is using debian sid. If you run other linux distribution, you should read this page. First we will install and then start bash in the ubuntu container. Note that when pulling ubuntu image down, may take sometime which depending on your internet speed.
root@localhost:~# apt-get install
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
aufs-tools cgroupfs-mount libnih-dbus1 libnih1 makedev mountall plymouth
Suggested packages:
btrfs-tools debootstrap lxc rinse plymouth-themes
The following NEW packages will be installed:
aufs-tools cgroupfs-mount libnih-dbus1 libnih1 makedev mountall plymouth
0 upgraded, 8 newly installed, 0 to remove and 557 not upgraded.
Need to get 4,360 kB of archives.
After this operation, 21.6 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 unstable/main makedev all 2.3.1-93 [42.6 kB]
Get:2 unstable/main plymouth amd64 0.9.0-9 [189 kB]
Get:3 unstable/main libnih1 amd64 1.0.3-4.3 [127 kB]
Get:4 unstable/main libnih-dbus1 amd64 1.0.3-4.3 [97.1 kB]
Get:5 unstable/main mountall amd64 2.54 [68.3 kB]
Get:6 unstable/main aufs-tools amd64 1:3.2+20130722-1.1 [92.9 kB]
Get:7 unstable/main cgroupfs-mount all 1.1 [4,572 B]
Get:8 unstable/main amd64 1.3.3~dfsg1-2 [3,739 kB]
Fetched 4,360 kB in 44s (97.8 kB/s)
Selecting previously unselected package makedev.
(Reading database ... 324961 files and directories currently installed.)
Preparing to unpack .../makedev_2.3.1-93_all.deb ...
Unpacking makedev (2.3.1-93) ...
Selecting previously unselected package plymouth.
Preparing to unpack .../plymouth_0.9.0-9_amd64.deb ...
Unpacking plymouth (0.9.0-9) ...
Selecting previously unselected package libnih1.
Preparing to unpack .../libnih1_1.0.3-4.3_amd64.deb ...
Unpacking libnih1 (1.0.3-4.3) ...
Selecting previously unselected package libnih-dbus1.
Preparing to unpack .../libnih-dbus1_1.0.3-4.3_amd64.deb ...
Unpacking libnih-dbus1 (1.0.3-4.3) ...
Selecting previously unselected package mountall.
Preparing to unpack .../mountall_2.54_amd64.deb ...
Unpacking mountall (2.54) ...
Selecting previously unselected package aufs-tools.
Preparing to unpack .../aufs-tools_1%3a3.2+20130722-1.1_amd64.deb ...
Unpacking aufs-tools (1:3.2+20130722-1.1) ...
Selecting previously unselected package cgroupfs-mount.
Preparing to unpack .../cgroupfs-mount_1.1_all.deb ...
Unpacking cgroupfs-mount (1.1) ...
Selecting previously unselected package
Preparing to unpack .../docker.io_1.3.3~dfsg1-2_amd64.deb ...
Unpacking (1.3.3~dfsg1-2) ...
Processing triggers for man-db ( ...
Processing triggers for dbus (1.8.12-3) ...
Setting up makedev (2.3.1-93) ...
/run/udev or .udevdb or .udev presence implies active udev. Aborting MAKEDEV invocation.
/run/udev or .udevdb or .udev presence implies active udev. Aborting MAKEDEV invocation.
/run/udev or .udevdb or .udev presence implies active udev. Aborting MAKEDEV invocation.
Setting up plymouth (0.9.0-9) ...
update-initramfs: deferring update (trigger activated)
update-rc.d: warning: start and stop actions are no longer supported; falling back to defaults
update-rc.d: warning: start and stop actions are no longer supported; falling back to defaults
Setting up libnih1 (1.0.3-4.3) ...
Setting up libnih-dbus1 (1.0.3-4.3) ...
Setting up mountall (2.54) ...
Setting up aufs-tools (1:3.2+20130722-1.1) ...
Setting up (1.3.3~dfsg1-2) ...
Adding group `docker' (GID 139) ...
Processing triggers for dbus (1.8.12-3) ...
Setting up cgroupfs-mount (1.1) ...
Processing triggers for initramfs-tools (0.117) ...
update-initramfs: Generating /boot/initrd.img-3.9-1-amd64
W: mdadm: /etc/mdadm/mdadm.conf defines no arrays.
W: mdadm: no arrays defined in configuration file.
Processing triggers for libc-bin (2.19-13) ...

jason@localhost:~$ docker run -i -t ubuntu /bin/bash
2015/01/08 16:27:07 Post http:///var/run/docker.sock/v1.15/containers/create: dial unix /var/run/docker.sock: permission denied
jason@localhost:~$ sudo docker run -i -t ubuntu /bin/bash
Unable to find image 'ubuntu' locally
Pulling repository ubuntu
8eaa4ff06b53: Download complete
511136ea3c5a: Download complete
3b363fd9d7da: Download complete
607c5d1cca71: Download complete
f62feddc05dc: Download complete
Status: Downloaded newer image for ubuntu:latest
root@bedef9a17ac3:/# cat /etc/issue
Ubuntu 14.04.1 LTS \n \l

root@bedef9a17ac3:/# exit
jason@localhost:~$ sudo docker run ubuntu /bin/echo "hello world"
hello world

One would ask, why should I replace virtualbox to docker? There are four main points as outline in this article :

  • Faster delivery of your applications

  • Deploy and scale more easily

  • Get higher density and run more workloads

  • Faster deployment makes for easier management

If you think the above points are attracting, perhaps you should consider it and I leave these additional materials for your further exploration.

docker 101 video presentation.
remember to sign up
get the image from docker hub.
last but not least, documentation.