Friday, August 28, 2015

First light learning into Apache Storm part 1

Today we will go through another software, Apache Storm. According to the official Apache Storm github

Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation.

Well, if you like me which are new to Apache Storm, this seem a bit vague on what Apache Storm is about. Fear not, we will in this article, go through some basic apache storm like installing storm, setup a storm cluster and perform a storm of hello world. But this is a good video that give introduction to apache storm.

If you study storm, the fundamentals three terminologies which you may come across which are spouts, bolts and topologies. These definition are excerpt from this site link.

There are just three abstractions in Storm: spouts, bolts, and topologies. A spout is a source of streams in a computation. Typically a spout reads from a queueing broker such as Kestrel, RabbitMQ, or Kafka, but a spout can also generate its own stream or read from somewhere like the Twitter streaming API. Spout implementations already exist for most queueing systems.
A bolt processes any number of input streams and produces any number of new output streams. Most of the logic of a computation goes into bolts, such as functions, filters, streaming joins, streaming aggregations, talking to databases, and so on.
A topology is a network of spouts and bolts, with each edge in the network representing a bolt subscribing to the output stream of some other spout or bolt. A topology is an arbitrarily complex multi-stage stream computation. Topologies run indefinitely when deployed


Let's first download and install Apache Storm. Pick a stable version at here, download and then extract it. By now, your directories should be similar to the one below. I'm using Apache Storm 0.9.5 for this learning experience.

 user@localhost:~/Desktop/apache-storm-0.9.5$ ls   
 bin CHANGELOG.md conf DISCLAIMER examples external lib LICENSE logback     NOTICE     public     README.markdown RELEASE SECURITY.md  
 user@localhost:~/Desktop/apache-storm-0.9.5$   

In the next article, we will setup a storm cluster.

No comments:

Post a Comment