Information Technology Blogs

Thursday, June 1, 2017

investigate into apache cassandra 1.2.19 sstable corrupt

Last, I have investigated in apache cassandra 1.0.8 sstable corruption a fresh node after upgraded to apache cassandra 1.2. Both of the articles can be found here and here. Today, we will take another look at a running apache cassandra 1.2.19 encounter sstable corruption. Below is the stack trace found in cassandra system.log

 org.apache.cassandra.io.sstable.CorruptSSTableException: org.apache.cassandra.io.compress.CorruptBlockException: (/var/lib/cassandra/data/<KEYSPACE>/<COLUMN_FAMILY>/<KEYSPACE>-<CF>-ic-112-Data.db): corruption detected, chunk at 19042661 of length 27265.  
     at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:89)  
     at org.apache.cassandra.io.compress.CompressedThrottledReader.reBuffer(CompressedThrottledReader.java:45)  
     at org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:355)  
     at java.io.RandomAccessFile.readFully(RandomAccessFile.java:444)  
     at java.io.RandomAccessFile.readFully(RandomAccessFile.java:424)  
     at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:380)  
     at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:391)  
     at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:370)  
     at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:175)  
     at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:155)  
     at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:142)  
     at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38)  
     at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:145)  
     at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:122)  
     at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:96)  
     at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)  
     at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)  
     at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:145)  
     at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)  
     at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)  
     at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)  
     at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)  
     at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:208)  
     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)  
     at java.util.concurrent.FutureTask.run(FutureTask.java:262)  
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)  
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)  
     at java.lang.Thread.run(Thread.java:745)  
 Caused by: org.apache.cassandra.io.compress.CorruptBlockException: (/var/lib/cassandra/data/<KEYSPACE>/<COLUMN_FAMILY>/<KEYSPACE>-<CF>-ic-112-Data.db): corruption detected, chunk at 19042661 of length 27265.  
     at org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:128)  
     at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:85)  
     ... 27 more

Okay, let's trace into the stacktrace and study what actually cause this and what is sstable corruption means.

     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)  
     at java.util.concurrent.FutureTask.run(FutureTask.java:262)  
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)  
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)  
     at java.lang.Thread.run(Thread.java:745)

Simple, a thread is run by the executor.

     at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:145)  
     at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)  
     at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)  
     at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)  
     at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)  
     at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:208)

We see that a background compaction task is started. Code below.

   // the actual sstables to compact are not determined until we run the BCT; that way, if new sstables  
   // are created between task submission and execution, we execute against the most up-to-date information  
   class BackgroundCompactionTask implements Runnable  
   {  
     private final ColumnFamilyStore cfs;  
   
     BackgroundCompactionTask(ColumnFamilyStore cfs)  
     {  
       this.cfs = cfs;  
     }  
   
     public void run()  
     {  
       compactionLock.readLock().lock();  
       try  
       {  
         logger.debug("Checking {}.{}", cfs.table.name, cfs.columnFamily); // log after we get the lock so we can see delays from that if any  
         if (!cfs.isValid())  
         {  
           logger.debug("Aborting compaction for dropped CF");  
           return;  
         }  
   
         AbstractCompactionStrategy strategy = cfs.getCompactionStrategy();  
         AbstractCompactionTask task = strategy.getNextBackgroundTask(getDefaultGcBefore(cfs));  
         if (task == null)  
         {  
           logger.debug("No tasks available");  
           return;  
         }  
         task.execute(metrics);  
       }  
       finally  
       {  
         compactingCF.remove(cfs);  
         compactionLock.readLock().unlock();  
       }  
       submitBackground(cfs);  
     }  
   }

     at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:145)  
     at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:122)  
     at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:96)  
     at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)  
     at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)

Here we see an iterator going over the sstables for compaction.

     at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:175)  
     at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:155)  
     at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:142)  
     at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38)

DecoratedKey key = sstable.decodeKey(ByteBufferUtil.readWithShortLength(dfile));

Here we see that what actually get iterated is the sstable and particular on the key.

     at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:89)  
     at org.apache.cassandra.io.compress.CompressedThrottledReader.reBuffer(CompressedThrottledReader.java:45)  
     at org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:355)  
     at java.io.RandomAccessFile.readFully(RandomAccessFile.java:444)  
     at java.io.RandomAccessFile.readFully(RandomAccessFile.java:424)  
     at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:380)  
     at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:391)  
     at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:370)

Here, there is code reference has change in a few files due to method override, nonetheless, the important part on method reBuffer.

   @Override  
   protected void reBuffer()  
   {  
     try  
     {  
       decompressChunk(metadata.chunkFor(current));  
     }  
     catch (CorruptBlockException e)  
     {  
       throw new CorruptSSTableException(e, getPath());  
     }  
     catch (IOException e)  
     {  
       throw new FSReadError(e, getPath());  
     }  
   }  
     
   private void decompressChunk(CompressionMetadata.Chunk chunk) throws IOException  
   {  
     if (channel.position() != chunk.offset)  
       channel.position(chunk.offset);  
   
     if (compressed.capacity() < chunk.length)  
       compressed = ByteBuffer.wrap(new byte[chunk.length]);  
     else  
       compressed.clear();  
     compressed.limit(chunk.length);  
   
     if (channel.read(compressed) != chunk.length)  
       throw new CorruptBlockException(getPath(), chunk);  
   
     // technically flip() is unnecessary since all the remaining work uses the raw array, but if that changes  
     // in the future this will save a lot of hair-pulling  
     compressed.flip();  
     try  
     {  
       validBufferBytes = metadata.compressor().uncompress(compressed.array(), 0, chunk.length, buffer, 0);  
     }  
     catch (IOException e)  
     {  
       throw new CorruptBlockException(getPath(), chunk);  
     }  
   
     if (metadata.parameters.getCrcCheckChance() > FBUtilities.threadLocalRandom().nextDouble())  
     {  
       checksum.update(buffer, 0, validBufferBytes);  
   
       if (checksum(chunk) != (int) checksum.getValue())  
         throw new CorruptBlockException(getPath(), chunk);  
   
       // reset checksum object back to the original (blank) state  
       checksum.reset();  
     }  
   
     // buffer offset is always aligned  
     bufferOffset = current & ~(buffer.length - 1);  
   }

we read that if chunk checksum is not the same as the updated crc32 checksum, this is consider sstable corruption.

For this type of exception, I remember I did many things such as below.

1. try online nodetool scrub, does not work
2. try offline sstablescrub, does not work.
3. wipeout the node and rebuild again, does not work.

we had to change the hardware altogether. Then we don't see the problem anymore.

Saturday, July 30, 2016

Brief go through fuglu a mail scanner daemon

This is way long overdue that I promised to write a blog for a good dear friend on his python mail scanner daemon. And today, finally got my spare time to review Fuglu. First, what is Fuglu?

Fuglu is a mail scanner daemon written in Python. Installed as a postfix before-queue or after-queue filter fuglu can be used to filter spam, viruses, unwanted attachments etc.

As I'm not a mail admin, so this article will skip some mail related features. Let's install fuglu by cloning from the github repository.

 $ git clone https://github.com/gryphius/fuglu.git  
 $ cd fuglu/fuglu  
 $ sudo python setup.py install

If the installation is successful, then we go to the basic configuration. Running fuglu --lint

 root@localhost:/etc/fuglu# fuglu --lint  
 Could not drop privileges to nobody/nobody : Can not drop privileges, user nobody or group nobody does not exist  
 Fuglu 0.6.6-2-ge18e56b  
 ---------- LINT MODE ----------  
 Checking dependencies...  
 sqlalchemy: not installed Optional dependency, required if you want to enable any database lookups  
 BeautifulSoup: V4 installed  
 magic: not installed Optional dependency, without python-file or python-magic the attachment plugin's automatic file type detection will easily be fooled  
 Loading extensions...  
 fuglu.extensions.sql: disabled (sqlalchemy not installed)  
 Loading plugins...  
 Plugin loading complete  
 Linting main configuration  
 OK  
   
 Linting Plugin Archive Config section: ArchivePlugin  
 SuspectFilter file not found: /etc/fuglu/archive.regex  
 ERROR  
   
 Linting Plugin Attachment Blocker Config section: FiletypePlugin  
 python libmagic bindings (python-file or python-magic) not available. Will only do content-type checks, no real file analysis  
 ERROR  
   
 Linting Plugin Debugger Config section: debug  
 OK  
   
 Linting Plugin Plugin Skipper Config section: PluginSkipper  
 SuspectFilter file not found: /etc/fuglu/skipplugins.regex  
 ERROR  
 3 plugins reported errors.  
   
 WARNING:  
 Skipping logging configuration check because I could not switch to user 'nobody' earlier.  
 please re-run fuglu --lint as privileged user  
 (problems in the logging configuration could prevent the fuglu daemon from starting up)  
 root@localhost:/etc/fuglu#

You should really fix the above lint error before continue to get a good setup for the incoming features. The solution is different for each lint error output above but for mine, I did the following.

 root@localhost:/etc/fuglu# cp archive.regex.dist archive.regex  
 root@localhost:/etc/fuglu# cp fuglu_mrtg.cfg.dist fuglu_mrtg.cfg  
 root@localhost:/etc/fuglu# cp logging.conf.dist logging.conf  
 root@localhost:/etc/fuglu# cp skipplugins.regex.dist skipplugins.regex  
 root@localhost:/etc/fuglu# apt-get install python-sqlalchemy python-magic   
 root@localhost:/etc/fuglu# groupadd nobody

Run again the fuglu lint command

 root@localhost:/etc/fuglu# fuglu --lint  
 Fuglu 0.6.6-2-ge18e56b  
 ---------- LINT MODE ----------  
 Checking dependencies...  
 sqlalchemy: installed  
 BeautifulSoup: V4 installed  
 magic: found python-file/libmagic bindings (http://www.darwinsys.com/file/)  
 Loading extensions...  
 fuglu.extensions.sql: enabled (available)  
 Loading plugins...  
 Plugin loading complete  
 Linting main configuration  
 OK  
   
 Linting Plugin Archive Config section: ArchivePlugin  
 OK  
   
 Linting Plugin Attachment Blocker Config section: FiletypePlugin  
 Found python-file/libmagic bindings (http://www.darwinsys.com/file/)  
 No database configured. Using per user/domain file configuration from /etc/fuglu/rules  
 rarfile library not found, RAR support disabled  
 Archive scan, available file extensions: ['z', 'zip']  
 OK  
   
 Linting Plugin Debugger Config section: debug  
 OK  
   
 Linting Plugin Plugin Skipper Config section: PluginSkipper  
 OK  
 0 plugins reported errors.  
   
 Checking logging configuration....  
 OK

Perfect! Everything is OK. We are ready to move on. As for fuglu postfix configuration, you can refer here.This I will skip for the aforementioned reason. To start fuglu, just run the command fuglu and fuglu will run in the background. Give -f if you want to quickly test it.

 root@localhost:~# fuglu  
 root@localhost:~#   
 root@localhost:~# tail -F /var/log/fuglu/fuglu.log  
 2016-06-23 20:44:15,178 root    : INFO FuGLU Version 0.6.6-2-ge18e56b starting up  
 2016-06-23 20:44:15,289 fuglu.MainController: INFO Init Stat Engine  
 2016-06-23 20:44:15,290 fuglu.MainController: INFO Init Threadpool  
 2016-06-23 20:44:15,290 fuglu.MainController: INFO Starting interface sockets...  
 2016-06-23 20:44:15,291 fuglu.MainController: INFO starting connector smtp/10025  
 2016-06-23 20:44:15,291 fuglu.incoming.10025: INFO SMTP (After Queue) Server running on port 10025  
 2016-06-23 20:44:15,291 fuglu.MainController: INFO starting connector smtp/10099  
 2016-06-23 20:44:15,292 fuglu.incoming.10099: INFO SMTP (After Queue) Server running on port 10099  
 2016-06-23 20:44:15,292 fuglu.MainController: INFO starting connector smtp/10888  
 2016-06-23 20:44:15,292 fuglu.incoming.10888: INFO SMTP (After Queue) Server running on port 10888  
 2016-06-23 20:44:15,293 fuglu.control.fuglu_control.sock: INFO Control/Info Server running on port /tmp/fuglu_control.sock  
 2016-06-23 20:44:15,293 fuglu.MainController: INFO Startup complete

Started okay. Now let us print out some fuglu status.

 $ sudo fuglu_control stats  
 Fuglu statistics  
 ---------------  
 Uptime:          0:00:20.218391  
 Avg scan time:     0  
 Total msgs:     0  
 Ham:          0  
 Spam:          0  
 Virus:          0  
   
   
 $ sudo fuglu_control workerlist  
 Total 2 Threads  
   
 [1]: waiting for task  
 *******  
 [2]: waiting for task

The response is fast and yet the statistics are simple to understand. If you are a mail admin, I suggest you also setup fuglu so that statistics can be plotted into mrtg. Fuglu comes support with mrtg natively, you can find out how here.

To take a step further, mail admin should really read on the plugin page. It contains all various goodies you will required and if you want more, you can actually write your plugin and integrate to fuglu. You can read it here as the example given on how to write the plugin.

Last but not least, if you have question, you can reach the author in many different medium. You can find it here.

If you are mail admin, consider fuglu your mail scanner daemon!

Friday, July 29, 2016

Sharing thoughts on article 'Agile Vs. Lean: Yeah Yeah, What’s the Difference?'

In this article, I took a bit different writing than the usual coverage on software technology. Today, as I read an article about the different between lean and agile, some points really need extra explanation. I felt compel and need to complement the original article as some non I.T. staff may not really understand.

2. Build Quality In
4. Defer Commitment
5. Deliver Fast

Many especially the management and sales department has NO idea about these three points (other than just we want it now or yesterday). Well yea, in this article, it came with priority listed as 2, 4 and then 5. But really in actual software house, how to achieve this whilst make the non I.T. staff in company understand. In my opinion, there is no immediate unless the software is ready. Development software is a iteration process and hence, that takes time. A software that without quality, that is without going through testing (manual or auto) is something that cannot be profitable. IMO, during scrum stand up meeting, there could be a lot of things discussed and to deliver on the spot is something extremely impractical. I think to defer such commitment to a later date and time will give positive result. That is, individual has the time to think, work and prepare rather than just bullshit just to answer that point 'deliver fast'. In order to achieve quality in software and to deliver fast, an automated tests is the only path to choose. Because of that, a quality end product can be achievable. If you are in software company, you should know this need time to accomplish.

6. Respect People

In a house, we have elder and youngster, in a company, we have different charts of ranking and in a country, we have different society in various lifestyle. But in a company, there are some companies really practice ranking habit and worst some promised flat organization but in actual day to day working style, show sign of ranking practice. IMO, in order to respect people, and to actually do it in line with what promised, ranking/position MUST be abolished. In order to respect people, there should not be ranking in principal or in practical. By respecting people and by removed ranking, only then a team can formed. If you have never feel what's a team like, please contact me.

Eliminating waste means eliminating useless meetings, tasks and documentation.

Countless this I have heard before during my past working life or from word of mouth from friends. I'm sure you have heard something like, "oh i have meeting from 9 to 11 and afternoon 2 to 4. Although I never discount the importance of a meeting, don't get me wrong, I have no ill wrong concept toward meeting, but a meeting should be concise and agenda oriented. A meeting should be a place where people discuss on a predefined topic and some questions which cannot solve over other communication channel.

It also means eliminating inefficient ways of working – like multitasking (!) – so we can deliver fast.

Seriously, to adapt computer multitask into human imho is just stupid. Rather than a staff that produced many things without quality, I choose a staff which is focused on the work and produced the work on time with quality. But really, look at Malaysia companies, how many really understand the actual multitasking definition and how they adapt the change necessary when things does not work in actual practical working life?

For example, many managers want to “optimize” individual developers by ensuring they’re always at 100% – but most of the time, this is actually counter-productive. Let’s not have people coding something that isn’t needed (or fully defined yet) just for the sake of coding, because that actually creates more work for us in the future

Generally I agree to this statement but should be a spare time, either this spare time can be used privately by the team to do something fun (games/hobby) or better yet, code something that can keep their mind sharp. There are many sites that provide this facilities such as hackerrank and StackOverflow when you can answer questions which really beneficial to the company and the individual itself. It's a win win situation really.

Along those lines, Lean says to respect that the people doing the work are the ones that best know how to do it. Give them what they need to be effective and then trust them to do it. Software development is about learning, so structure the work to ensure we’re continuously learning.

Countless times I see before how the manager and/or senior interrupt coder/programmer when they are actively solving problem. There are many problems come to this. First, when a programmer actively solving problem, an interrupt even 5 minutes will take away something programmer might forget later. Second, continuous interruptions means programmer cannot concentrate fully on their work and thus damage the company as a whole. It become worst if only programmer know how to code but not the manager or the management staff. There are numerous more to justify but enough to say that, to empower and to trust a staff to do what they are supposed to do, can produce many positive outcome. Even when a feature/bug fix screw up, it should be a journey of learning experience and the staff should inform accordingly and the manager/management should protect as a whole. This is one classic yet typical example you can hardly find. Remember, a trust is earn and build over time, it will collapse immediate if it is not fix.

3. Frequent delivery of software

continuous delivery of software in actually means, coder added feature or fix a problem where the changes get commit into a software repository, sufficient coverage are done, build is okay and deliver to customer right after. But for a small company house, if any parts of the aforementioned did not do it properly, it only bring disaster. Trust me, I have seen this shit. It's better to understand self and adapt adequately than to pursue whatever out technology out there.

6. Face-to-face conversation is best

Not necessary, I have been in voice communication for years and I think voice communication work best. Why? because software progress normally require reference on work that they do. Text communication is even more vital than voice communication but they do serve complement to each other. Think of it, you don't need to look at your partner face to discuss things. However, you both require to look at the codes and discussed problem. We need to read the code at display and listen to understand problem at hand but not looking at each other. ;)

12. Regular reflection & adaptation

I agree regular reflection and changes are important. This also served a time when everyone can sit (or stand?) to listen and learn from each other. This is something seldom companies practice. This is another recipe to form toward a great team too. In essence, reflection and adaptation is something that should not be miss.

Last but not least, all these software development methodologies might work in company A or company B fully but that does not mean company like you should blindly follow. But if you are startup and have no clue, it is best to learn from this, to serve as a guide and improve and self reflect along the way. Most importantly, to form a team is more important than anything else.

Sunday, July 17, 2016

angelhack kuala lumpur 2016

Coding is something I have passion about and if you have been following me in my blog, you should know most of the article are big data technologies related. Two weeks ago as of this write up, a professor from MMU Malaysia asking me if I would like to join on his team on a hackathon event which held at the city centre in a mall, times square.

I took a look at the events which can be found here and here. To my astonishment it is a paid coding event which the currency operated at united state dollar even though we are in Malaysia. As this is something that I have never done before, that is, go to physical location and start to code for a project to compete against each other is one of my decisive factors and of cause, to know who are in the local development community, get to know more developers as well as have some fun!

There are mainly three challenges this year which are
* big data analytics
* o2o commerce
* smart living

and a grand prize which is an exclusive invite to a hackcelerator program and a chance to fly to silicon valley to compete further! Of cause, each categories come with prize pool!

Having practical background in big data technology and startup before, the professor think by inviting could form a team of five to compete on the big data analytics category. With that, I just give it a try and see how it goes. This event held on 4 and 5th June 2016 and of cause, on the weekend where just fit nicely with working professional schedule.

For my part, I have bring in big data technology and this time I selected elasticsearch as the nosql keystore which match with the challenge category we are aiming for. It was an unfortunate that since much of my time have been devoted to mid and backend development, my user interface development skill has been reduced tremendously. (Note to self, start to code GUI back!) But fortunately, to fix that gap, there was a good combination of software that came with elasticsearch, which is known as kibana fix what I have not bring to the table. Combination elasticsearch and kibana, we are not able to ingest the 65m of iproperty data into the machine, we are able to query and represent it nicely in colour graphs form.

As for the remaining of the team, which bring in skillsets like deep learning and predictive analysis which these two skillsets answer on 'predictive' and 'prescriptive' part. The idea of our product is that, user able to upload a picture of a property through a web interface (through a phone camera) and the system will learn that image and output the result using iproperty dataset that was store in elasticsearch and display in another page.

Below are some interesting pictures taken during the events.

the stage

panorama view

and the developers

As you can see from the above pictures, there are plenty of coders! Each skillful at what they are good at. There are estimated around 370 participants and that's including female coders too! Overall I think the organizer done really good job for this two days events. The participants actually got power, cool air, water, food, network devices and cable, junk food, tshirt, cup, writing notebook and the staffs are very helpful.

One of them actually came to me and asking what do I like, and actually handed me a paper cup with coffee as I requested. I could not express more gratitude since I got there around 8am and that's early for weekend. ;-) But I would like to point out there are definite areas to improve for next year.

* warm air on day one, it was not continuous but you feel the warm air.
* some of the schedule planned (the judging and present) written in paper is way too far off from the actual events flow.
* power and internet connection drop from time to time for two days.
* less junk food and perhaps replaced with fruits.
* the table arrangement too small for computer devices as well as personal comfortable.

But all in all, a good job to complete this two days event in a safe and sounds manner. If you know me, and the moto of this website striving for contribution and mutual benefits, I found out that there are still some and if not many of the developers who are very selfish in term of code sharing and/or code learning. For instance, all my codes and infrastructure are available online which can be found here.But in return, there are many excuses of them not share in between and something really go against my principle. I know that event such as this is compete against other team, I think what's more valuable above all these monetary values is the innovation and to cultivate coding passion. Something still I do not see this year and I don't believe we can thrive in closing environment and innovation definitely does not grow under such shallow person attitude.

hackathon ending in 5 minutes

pitching schedule

the team during pitching session.

Many coder actually slept on the mall but for me, home sweet home. I had a self reflection with the team and professor actually gave valuable insight on the night in slack and on the next days. Something that I have never experience since left academic. The coding session stop at 12pm on day 2 and then participant are free for lunch whilst the organizer and judges are preparing themselves for participant's pitching session. There were two rounds of pitching and each team just got about 2-3 minutes explanation their product and 1-2 minutes for question and answer.

It's a sad but reality that, during shortlisted team who pitch again on the stage, many judges keep using the word such as intellectual property, monetization, how you sell your products or how you make money out of your product. Honestly, these types of questions should be the responsible of the business segment professional. Event such as this should have been focus on the coding spirit and to actually value on the product that actually meet the set criterias. Yes, no doubt money is the topic of a company but that should have been an trivial issue for company, what's matter above is companies should be the main catalyst to drive the country to a better quality country. Not just project monetary society. Remember, the participant are as young as 15.

team effort

solo view

By the time the big data category announcement was made, I was dead tired.It's interesting to learn from other teams on the user interface design as well as the idea to transformed into the mockup prototype in just two days or less. To my surprise, we actually got the second prize for the challenge we compete for! That's worth 3000MYR and what's more valuable is a chance to be invited to Big App Challenge 3.0 Semi Finals. Hehe, whilst I fully acknowledge this reward rightfully belong to a team but a solo picture taken to acknowledge my contribution is not too much to ask for too.

Till then, keep coding and if you are interested to code, please feel free to contact me for any possibility to work together.

Saturday, July 16, 2016

Initial learning into apache cassandra paxos

Recently I have been reading into apache lightweight transaction in cassandra 2.0 and interested into how it implemented in code level. From end user perspective, when you manupulating data either insert and update with if not exists, then internally, paxos operation willl be used.

An example of lightweight transaction.

 INSERT INTO USERS (login, email, name, login_count) values ('jbellis', 'jbellis@datastax.com', 'Jonathan Ellis', 1) IF NOT EXISTS  
   
 UPDATE users SET reset_token = null, password = ‘newpassword’ WHERE login = ‘jbellis’ IF reset_token = ‘some-generated-reset-token’

Essentially the paxos how operation concerntrated in class StorageProxy.We read that from the code documentation,

There are three phases to Paxos:
1. Prepare: the coordinator generates a ballot (timeUUID in our case) and asks replicas to (a) promise
not to accept updates from older ballots and (b) tell us about the most recent update it has already
accepted.
2. Accept: if a majority of replicas reply, the coordinator asks replicas to accept the value of the
highest proposal ballot it heard about, or a new value if no in-progress proposals were reported.
3. Commit (Learn): if a majority of replicas acknowledge the accept request, we can commit the new
value.

So it involve a few operation before an insert and update can be perform and this is not something you want to replace in bulk operation call with. We see that in class StorageService, several paxos verbs are registered. You can find them mostly in the paxos packages. Following are some useful paxos classes.

https://github.com/apache/cassandra/blob/cassandra-2.0.7/src/java/org/apache/cassandra/service/paxos/PaxosState.java

https://github.com/apache/cassandra/blob/cassandra-2.0.7/src/java/org/apache/cassandra/service/paxos/Commit.java

https://github.com/apache/cassandra/blob/cassandra-2.0.7/src/java/org/apache/cassandra/service/paxos/PrepareVerbHandler.java

https://github.com/apache/cassandra/blob/cassandra-2.0.7/src/java/org/apache/cassandra/service/paxos/ProposeVerbHandler.java

https://github.com/apache/cassandra/blob/cassandra-2.0.7/src/java/org/apache/cassandra/service/paxos/CommitVerbHandler.java

https://github.com/apache/cassandra/blob/cassandra-2.0.7/src/java/org/apache/cassandra/service/paxos/AbstractPaxosCallback.java

https://github.com/apache/cassandra/blob/cassandra-2.0.7/src/java/org/apache/cassandra/service/paxos/ProposeCallback.java

https://github.com/apache/cassandra/blob/cassandra-2.0.7/src/java/org/apache/cassandra/service/paxos/PrepareCallback.java

https://github.com/apache/cassandra/blob/cassandra-2.0.7/src/java/org/apache/cassandra/service/paxos/PrepareResponse.java

Interesting in the class PaxosState, noticed the following

locks - an array of length 1024

call system keyspace class to load and save paxos state.

When you check into cassandra using cqlsh, you will find the following.

 cqlsh:jw_schema1> use system;  
 cqlsh:system> desc tables;  
   
 available_ranges     peers        paxos      range_xfers  
 batches          compaction_history batchlog    local     
 "IndexInfo"        sstable_activity  size_estimates hints     
 views_builds_in_progress peer_events     built_views    
   
 cqlsh:system> select * from paxos;  
   
  row_key | cf_id | in_progress_ballot | most_recent_commit | most_recent_commit_at | most_recent_commit_version | proposal | proposal_ballot | proposal_version  
 ---------+-------+--------------------+--------------------+-----------------------+----------------------------+----------+-----------------+------------------  
   
 (0 rows)

You can also read the unit test for paxos in cassandra 2.0.17 as can be read here.

This compare and set (cas) operation looks interesting if you want to ensure the value only be created if it not exists, a nifty feature found in apache cassandra 2.0 onward. Feel free to explore further!

Friday, July 15, 2016

learning and trying on apache ivy

Over years of java programming using ant build file, many have claimed that ant is an ancient automated software build process and you should instead use maven2. Why? Because they claimed that ant does not automate to resolve the libraries that you required in project. So instead of switching to use maven2 entirely, today we will take a look at apache ivy.

Apache Ivy is a transitive dependency manager. It is a sub-project of the Apache Ant project, with which Ivy works to resolve project dependencies. An external XML file defines project dependencies and lists the resources necessary to build a project. Ivy then resolves and downloads resources from an artifact repository: either a private repository or one publicly available on the Internet.

That is really a good news! Instead of starting all over using maven, now the libraries dependency can be solve via ivy in your existing ant build file. What a relieve! Okay, let's go into the basic of apache ivy. Let's start with the terminology that is use widely when you deal with apache ivy. You can reference this link for more details explanation. For concept of how apache ivy works, read this helpful link.

terminology	explanation
Organisation	An organisation is either a company, an individual, or simply any group of people that produces software.
Module	A module is a self-contained, reusable unit of software that, as a whole unit, follows a revision control scheme.
Module Descriptor	A module descriptor is a generic way of identifying what describes a module: the identifier (organisation, module name, branch and revision), the published artifacts, possible configurations and their dependencies.
Artifact	An artifact is a single file ready for delivery with the publication of a module revision, as a product of development.
Type of an artifact	The artifact type is a category of a particular kind of artifact specimen.
Artifact file name extension	In some cases the artifact type already implies its file name extension, but not always.
Module Revision	A unique revision number or version name is assigned to each delivered unique state of a module.
Branch	A branch corresponds to the standard meaning of a branch (or sometimes stream) in source control management tools.
Status of a revision	A module's status indicates how stable a module revision can be considered.
Configurations of a module	A module configuration is a way to use or construct a module.
Ivy Settings	Ivy settings files are xml files used to configure ivy to indicate where the modules can be found and how.
Repository	What is called a repository in Ivy is a distribution site location where Ivy is able to find your required modules' artifacts and descriptors (i.e. Ivy files in most cases).

when you add the ivy namespace into your ant build file, you can call ivy task in ant build file. An example of the ivy integration into ant build file as show below.


 <project xmlns:ivy="antlib:org.apache.ivy.ant" name="go-ivy" default="go">  
 <!--  
    
     this build file is a self contained project: it doesn't require anything else   
     that ant 1.6.2 or greater and java 1.4 or greater properly installed.  
       
     It is used to showcase how easy and straightforward it can be to use Ivy.  
       
     This is not an example of the best pratice to use in a project, especially  
     for the java source code "generation" :-) (see generate-src target)  
       
     To run copy this file in an empty directory, open a shell or a command window  
     in this directory and run "ant". It will download ivy and then use it to resolve   
     the dependency of the class which is itself "contained" in this build script.  
       
     After a successful build run "ant" again and you will see the build will be  
     much faster.  
       
     More information can be found at http://ant.apache.org/ivy/  
       
 -->  
 <!--  
  here is the version of ivy we will use. change this property to try a newer   
      version if you want   
 -->  
 <property name="ivy.install.version" value="2.4.0"/>  
 <property name="ivy.jar.dir" value="${basedir}/ivy"/>  
 <property name="ivy.jar.file" value="${ivy.jar.dir}/ivy.jar"/>  
 <property name="build.dir" value="build"/>  
 <property name="src.dir" value="src"/>  
 <target name="download-ivy" unless="skip.download">  
 <mkdir dir="${ivy.jar.dir}"/>  
 <!--  
  download Ivy from web site so that it can be used even without any special installation   
 -->  
 <echo message="installing ivy..."/>  
 <get src="https://repo1.maven.org/maven2/org/apache/ivy/ivy/${ivy.install.version}/ivy-${ivy.install.version}.jar" dest="${ivy.jar.file}" usetimestamp="true"/>  
 </target>  
 <!--  
  =================================   
      target: install-ivy       
       this target is not necessary if you put ivy.jar in your ant lib directory  
       if you already have ivy in your ant lib, you can simply remove this  
       target and the dependency the 'go' target has on it  
      =================================   
 -->  
 <target name="install-ivy" depends="download-ivy" description="--> install ivy">  
 <!--  
  try to load ivy here from local ivy dir, in case the user has not already dropped  
         it into ant's lib dir (note that the latter copy will always take precedence).  
         We will not fail as long as local lib dir exists (it may be empty) and  
         ivy is in at least one of ant's lib dir or the local lib dir.   
 -->  
 <path id="ivy.lib.path">  
 <fileset dir="${ivy.jar.dir}" includes="*.jar"/>  
 </path>  
 <taskdef resource="org/apache/ivy/ant/antlib.xml" uri="antlib:org.apache.ivy.ant" classpathref="ivy.lib.path"/>  
 </target>  
 <!--  
  =================================   
      target: go  
           Go ivy, go!  
      =================================   
 -->  
 <target name="go" depends="install-ivy, generate-src" description="--> resolve dependencies, compile and run the project">  
 <echo message="using ivy to resolve commons-lang 2.1..."/>  
 <!--  
  here comes the magic line: asks ivy to resolve a dependency on   
        commons-lang 2.1 and to build an ant path with it from its cache   
 -->  
 <ivy:cachepath organisation="commons-lang" module="commons-lang" revision="2.1" pathid="lib.path.id" inline="true"/>  
 <echo message="compiling..."/>  
 <mkdir dir="${build.dir}"/>  
 <javac srcdir="${src.dir}" destdir="${build.dir}" classpathref="lib.path.id"/>  
 <echo>  
 We are now ready to execute our simple program with its dependency on commons-lang. Let's go!  
 </echo>  
 <java classname="example.Hello">  
 <classpath>  
 <path refid="lib.path.id"/>  
 <path location="${build.dir}"/>  
 </classpath>  
 </java>  
 </target>  
 <!--  
  =================================   
      target: generate-src  
       'Generates' the class source. It actually just echo a simple java   
       source code to a file. In real life this file would already be  
       present on your file system, and this target wouldn't be necessary.  
      =================================   
 -->  
 <target name="generate-src">  
 <mkdir dir="${src.dir}/example"/>  
 <echo file="${src.dir}/example/Hello.java">  
 package example; import org.apache.commons.lang.WordUtils; public class Hello { public static void main(String[] args) { String message = "hello ivy !"; System.out.println("standard message : " + message); System.out.println("capitalized by " + WordUtils.class.getName() + " : " + WordUtils.capitalizeFully(message)); } }  
 </echo>  
 </target>  
 <!--  
  =================================   
      target: clean         
      =================================   
 -->  
 <target name="clean" description="--> clean the project">  
 <delete includeemptydirs="true" quiet="true">  
 <fileset dir="${src.dir}"/>  
 <fileset dir="${build.dir}"/>  
 </delete>  
 </target>  
 <!--  
  =================================   
      target: clean-ivy         
      =================================   
 -->  
 <target name="clean-ivy" description="--> clean the ivy installation">  
 <delete dir="${ivy.jar.dir}"/>  
 </target>  
 <!--  
  =================================   
      target: clean-cache         
      =================================   
 -->  
 <target name="clean-cache" depends="install-ivy" description="--> clean the ivy cache">  
 <ivy:cleancache/>  
 </target>  
 </project>

As the saying goes, try it and you will understand better how apache ivy help you in your ant build process. Once you get a hang of it and you want more advance feature, I suggest you take a look at ivy setting file. This link provide comprehensive coverage of the configuration that you can use in advance use cases.

If you still look for what else you can do with apache ivy, take a look at this link. If you just want to quickly use ivy, you can use ivy as a standalone jar file.

I have not mentioned ivy.xml and i think if you have reach this section, you should know what's ivy.xml file for and what does it contain. I hope you found something useful in this quick tutorial.

Sunday, July 3, 2016

apache cassandra 1.0.8 on READ_STAGE threads reference on sstables and so compaction cannot remove the sstables.

Back then when I was administer a apache cassandra 1.0.8 cluster, I noticed there were some (very little) sstables did not get remove even after compaction is done. The leftover sstables cause some administrative problem and I suspect could be due to maybe during reading of the sstables, this maybe not get remove.

 DataTracker.java  
   
   private void replace(Collection<SSTableReader> oldSSTables, Iterable<SSTableReader> replacements)  
   {  
     View currentView, newView;  
     do  
     {  
       currentView = view.get();  
       newView = currentView.replace(oldSSTables, replacements);  
     }  
     while (!view.compareAndSet(currentView, newView));  
   
     addNewSSTablesSize(replacements);  
     removeOldSSTablesSize(oldSSTables);  
   
     cfstore.updateCacheSizes();  
   }

I supposed during replacement of the view and sstables, everything is atomic and hence during read, it will get from the new sstables. But I don't have enough high level knowledge on various subsystems work in cassandra. If you have an idea, please do leave your comment below.

This problem seem to go away after we upgraded the cluster to 1.1. I know by now (april 2016), cassandra 1.0, 1.1 or even 1.2 is ancient but if you are on 1.0 and pre1.0, you should really start to use cassandra 3.x or at least 2.x.

Pages