Friday, July 15, 2016

learning and trying on apache ivy

Over years of java programming using ant build file, many have claimed that ant is an ancient automated software build process and you should instead use maven2. Why? Because they claimed that ant does not automate to resolve the libraries that you required in project. So instead of switching to use maven2 entirely, today we will take a look at apache ivy.

Apache Ivy is a transitive dependency manager. It is a sub-project of the Apache Ant project, with which Ivy works to resolve project dependencies. An external XML file defines project dependencies and lists the resources necessary to build a project. Ivy then resolves and downloads resources from an artifact repository: either a private repository or one publicly available on the Internet.
That is really a good news! Instead of starting all over using maven, now the libraries dependency can be solve via ivy in your existing ant build file. What a relieve! Okay, let's go into the basic of apache ivy. Let's start with the terminology that is use widely when you deal with apache ivy. You can reference this link for more details explanation. For concept of how apache ivy works, read this helpful link.

terminology explanation
Organisation An organisation is either a company, an individual, or simply any group of people that produces software.
Module A module is a self-contained, reusable unit of software that, as a whole unit, follows a revision control scheme.
Module Descriptor A module descriptor is a generic way of identifying what describes a module: the identifier (organisation, module name, branch and revision), the published artifacts, possible configurations and their dependencies.
Artifact An artifact is a single file ready for delivery with the publication of a module revision, as a product of development.
Type of an artifact The artifact type is a category of a particular kind of artifact specimen.
Artifact file name extension In some cases the artifact type already implies its file name extension, but not always.
Module Revision A unique revision number or version name is assigned to each delivered unique state of a module.
Branch A branch corresponds to the standard meaning of a branch (or sometimes stream) in source control management tools.
Status of a revision A module's status indicates how stable a module revision can be considered.
Configurations of a module A module configuration is a way to use or construct a module.
Ivy Settings Ivy settings files are xml files used to configure ivy to indicate where the modules can be found and how.
Repository What is called a repository in Ivy is a distribution site location where Ivy is able to find your required modules' artifacts and descriptors (i.e. Ivy files in most cases).



when you add the ivy namespace into your ant build file, you can call ivy task in ant build file. An example of the ivy integration into ant build file as show below.


 <project xmlns:ivy="antlib:org.apache.ivy.ant" name="go-ivy" default="go">  
 <!--  
    
     this build file is a self contained project: it doesn't require anything else   
     that ant 1.6.2 or greater and java 1.4 or greater properly installed.  
       
     It is used to showcase how easy and straightforward it can be to use Ivy.  
       
     This is not an example of the best pratice to use in a project, especially  
     for the java source code "generation" :-) (see generate-src target)  
       
     To run copy this file in an empty directory, open a shell or a command window  
     in this directory and run "ant". It will download ivy and then use it to resolve   
     the dependency of the class which is itself "contained" in this build script.  
       
     After a successful build run "ant" again and you will see the build will be  
     much faster.  
       
     More information can be found at http://ant.apache.org/ivy/  
       
 -->  
 <!--  
  here is the version of ivy we will use. change this property to try a newer   
      version if you want   
 -->  
 <property name="ivy.install.version" value="2.4.0"/>  
 <property name="ivy.jar.dir" value="${basedir}/ivy"/>  
 <property name="ivy.jar.file" value="${ivy.jar.dir}/ivy.jar"/>  
 <property name="build.dir" value="build"/>  
 <property name="src.dir" value="src"/>  
 <target name="download-ivy" unless="skip.download">  
 <mkdir dir="${ivy.jar.dir}"/>  
 <!--  
  download Ivy from web site so that it can be used even without any special installation   
 -->  
 <echo message="installing ivy..."/>  
 <get src="https://repo1.maven.org/maven2/org/apache/ivy/ivy/${ivy.install.version}/ivy-${ivy.install.version}.jar" dest="${ivy.jar.file}" usetimestamp="true"/>  
 </target>  
 <!--  
  =================================   
      target: install-ivy       
       this target is not necessary if you put ivy.jar in your ant lib directory  
       if you already have ivy in your ant lib, you can simply remove this  
       target and the dependency the 'go' target has on it  
      =================================   
 -->  
 <target name="install-ivy" depends="download-ivy" description="--> install ivy">  
 <!--  
  try to load ivy here from local ivy dir, in case the user has not already dropped  
         it into ant's lib dir (note that the latter copy will always take precedence).  
         We will not fail as long as local lib dir exists (it may be empty) and  
         ivy is in at least one of ant's lib dir or the local lib dir.   
 -->  
 <path id="ivy.lib.path">  
 <fileset dir="${ivy.jar.dir}" includes="*.jar"/>  
 </path>  
 <taskdef resource="org/apache/ivy/ant/antlib.xml" uri="antlib:org.apache.ivy.ant" classpathref="ivy.lib.path"/>  
 </target>  
 <!--  
  =================================   
      target: go  
           Go ivy, go!  
      =================================   
 -->  
 <target name="go" depends="install-ivy, generate-src" description="--> resolve dependencies, compile and run the project">  
 <echo message="using ivy to resolve commons-lang 2.1..."/>  
 <!--  
  here comes the magic line: asks ivy to resolve a dependency on   
        commons-lang 2.1 and to build an ant path with it from its cache   
 -->  
 <ivy:cachepath organisation="commons-lang" module="commons-lang" revision="2.1" pathid="lib.path.id" inline="true"/>  
 <echo message="compiling..."/>  
 <mkdir dir="${build.dir}"/>  
 <javac srcdir="${src.dir}" destdir="${build.dir}" classpathref="lib.path.id"/>  
 <echo>  
 We are now ready to execute our simple program with its dependency on commons-lang. Let's go!  
 </echo>  
 <java classname="example.Hello">  
 <classpath>  
 <path refid="lib.path.id"/>  
 <path location="${build.dir}"/>  
 </classpath>  
 </java>  
 </target>  
 <!--  
  =================================   
      target: generate-src  
       'Generates' the class source. It actually just echo a simple java   
       source code to a file. In real life this file would already be  
       present on your file system, and this target wouldn't be necessary.  
      =================================   
 -->  
 <target name="generate-src">  
 <mkdir dir="${src.dir}/example"/>  
 <echo file="${src.dir}/example/Hello.java">  
 package example; import org.apache.commons.lang.WordUtils; public class Hello { public static void main(String[] args) { String message = "hello ivy !"; System.out.println("standard message : " + message); System.out.println("capitalized by " + WordUtils.class.getName() + " : " + WordUtils.capitalizeFully(message)); } }  
 </echo>  
 </target>  
 <!--  
  =================================   
      target: clean         
      =================================   
 -->  
 <target name="clean" description="--> clean the project">  
 <delete includeemptydirs="true" quiet="true">  
 <fileset dir="${src.dir}"/>  
 <fileset dir="${build.dir}"/>  
 </delete>  
 </target>  
 <!--  
  =================================   
      target: clean-ivy         
      =================================   
 -->  
 <target name="clean-ivy" description="--> clean the ivy installation">  
 <delete dir="${ivy.jar.dir}"/>  
 </target>  
 <!--  
  =================================   
      target: clean-cache         
      =================================   
 -->  
 <target name="clean-cache" depends="install-ivy" description="--> clean the ivy cache">  
 <ivy:cleancache/>  
 </target>  
 </project>  

As the saying goes, try it and you will understand better how apache ivy help you in your ant build process. Once you get a hang of it and you want more advance feature, I suggest you take a look at ivy setting file. This link provide comprehensive coverage of the configuration that you can use in advance use cases.

If you still look for what else you can do with apache ivy, take a look at this link. If you just want to quickly use ivy, you can use ivy as a standalone jar file.

I have not mentioned ivy.xml and i think if you have reach this section, you should know what's ivy.xml file for and what does it contain. I hope you found something useful in this quick tutorial.

Sunday, July 3, 2016

apache cassandra 1.0.8 on READ_STAGE threads reference on sstables and so compaction cannot remove the sstables.

Back then when I was administer a apache cassandra 1.0.8 cluster, I noticed there were some (very little) sstables did not get remove even after compaction is done. The leftover sstables cause some administrative problem and I suspect could be due to maybe during reading of the sstables, this maybe not get remove.

 DataTracker.java  
   
   private void replace(Collection<SSTableReader> oldSSTables, Iterable<SSTableReader> replacements)  
   {  
     View currentView, newView;  
     do  
     {  
       currentView = view.get();  
       newView = currentView.replace(oldSSTables, replacements);  
     }  
     while (!view.compareAndSet(currentView, newView));  
   
     addNewSSTablesSize(replacements);  
     removeOldSSTablesSize(oldSSTables);  
   
     cfstore.updateCacheSizes();  
   }  

I supposed during replacement of the view and sstables, everything is atomic and hence during read, it will get from the new sstables. But I don't have enough high level knowledge on various subsystems work in cassandra. If you have an idea, please do leave your comment below.

This problem seem to go away after we upgraded the cluster to 1.1. I know by now (april 2016), cassandra 1.0, 1.1 or even 1.2 is ancient but if you are on 1.0 and pre1.0, you should really start to use cassandra 3.x or at least 2.x.

Saturday, July 2, 2016

quick maven note for myself

If you are java developer and maven should be very familiar. So I was surprise I never write a blog about maven but now I might as well write a short one. A note to keep myself reminded and can be revisited later.

To start a new maven java project, specify groupId, artifactId and archetypeArtifacId.
 start a mvn project  
 $ mvn archetype:generate -DgroupId=co.weetech -DartifactId=my-webapp -DarchetypeArtifacId=maven-archetype-quickstart -DinteractiveMode=false  

Once you have the directory build, start to add code and then when it is time to compile, let's do it!

 compile the project  
 $ mvn compile  

Sometime you have unit test and you write your unit test in the path ./src/test and then you can run them together using this command.

 test the project  
 $ mvn test

and if everything is compile and tested properly , you can start package them. Well, maven does that as well.

 package the project  
 $ mvn package  

the built library should be in the target directory. Finally if you want to install the package in your local directory, under home .m2, you can run the following command.

 to install the package, into your home .m2 directory  
 $ mvn install  

and if your project grows bigger and you want to edit your source code in eclipse, run the following command to generate the file descriptors.

 maven eclipse  
 $ mvn eclipse:eclipse  

That's it, these should get you started and if you encountered problem, google is your friend :-D and of cause maven documentation.

Friday, July 1, 2016

Yet another sstable corruption - EOFException

During starting up a apache cassandra 1.2 instance, I noticed in the log of the following error.

 INFO 10:38:23,334 Opening /var/lib/cassandra/data/MYKEYSPACE/MYCOLUMNFAMILY/MYKEYSPACE-COLUMNFAMILY-hf-2508 (2275767 bytes)  
 ERROR 10:38:23,467 Exception in thread Thread[SSTableBatchOpen:2,5,RMI Runtime]  
 org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFException  
    at org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:108)  
    at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)  
    at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)  
    at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:418)  
    at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:209)  
    at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)  
    at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:273)  
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)  
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)  
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)  
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)  
    at java.lang.Thread.run(Thread.java:745)  
 Caused by: java.io.EOFException  
    at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)  
    at java.io.DataInputStream.readUTF(DataInputStream.java:589)  
    at java.io.DataInputStream.readUTF(DataInputStream.java:564)  
    at org.apache.cassandra.io.compress.CompressionMetadata<init>(CompressionMetadata.java:83)  
     ... 11 more  

Yes, if you noticed that the cassandra sstable version is hf which belong to cassandra 1.1 as this node is just right after cassandra 1.2 upgrade and first cassandra 1.2 boot up.

Tracing the stacktrace above with cassandra 1.2 source code, it turn out to be the compression metadata cannot be open due to file corruption. I tried using nodetools upgradesstables, scrub and restart cassandra instance, this error still persist. I guess in this case, nothing can really help so I end up stopping the cassandra instance. remove this data sstables together with its metadata sstables and then start it up again. The error is gone and I ran a repair.

I hope you find this useful in your situation too.

Sunday, June 19, 2016

Investigating into apache cassandra 1.2 jmx metrics connection type warn logging

Recently I got the opportunity to upgrade a production cassandra cluster from 1.1.12 to 1.2.19 and during the midst of upgrading, I noticed the following in the log file during boot up of a cassandra 1.2 instance.

1:  WARN 10:30:14,987 Error processing org.apache.cassandra.metrics:type=Connection,scope=1.2.3.4,name=Timeouts  
2:  javax.management.InstanceNotFoundException: org.apache.cassandra.metrics:type=Connection,scope=1.2.3.4,name=Timeouts  
3:     at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095)  
4:     at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427)  
5:     at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415)  
6:     at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546)  
7:     at com.yammer.metrics.reporting.JmxReporter.registerBean(JmxReporter.java:462)  
8:     at com.yammer.metrics.reporting.JmxReporter.processMeter(JmxReporter.java:412)  
9:     at com.yammer.metrics.reporting.JmxReporter.processMeter(JmxReporter.java:16)  
10:     at com.yammer.metrics.core.Meter.processWith(Meter.java:131)  
11:     at com.yammer.metrics.reporting.JmxReporter.onMetricAdded(JmxReporter.java:395)  
12:     at com.yammer.metrics.core.MetricsRegistry.notifyMetricAdded(MetricsRegistry.java:516)  
13:     at com.yammer.metrics.core.MetricsRegistry.getOrAdd(MetricsRegistry.java:491)  
14:     at com.yammer.metrics.core.MetricsRegistry.newMeter(MetricsRegistry.java:240)  
15:     at com.yammer.metrics.Metrics.newMeter(Metrics.java:245)  
16:     at org.apache.cassandra.metrics.ConnectionMetrics.<init>(ConnectionMetrics.java:106)  
17:     at org.apache.cassandra.net.OutboundTcpConnectionPool.<init>(OutboundTcpConnectionPool.java:53)  
18:     at org.apache.cassandra.net.MessagingService.getConnectionPool(MessagingService.java:493)  
19:     at org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:507)  
20:     at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:640)  
21:     at org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:614)  
22:     at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:59)  
23:     at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)  
24:     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)  
25:     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)  
26:     at java.lang.Thread.run(Thread.java:745)  

As the logging level is WARN, I did not worry that much. Going into the codes, it turn out that in cassandra 1.2 , a metric known as ConnectionMetrics is added. This metric is under domain org.apache.cassandra.metrics and of type connection and name is Timeouts. This is not available in cassandra 1.1.

The same situation is applicable to CommandPendingTasks, ResponseCompletedTasks, ResponsePendingTasks, CommandCompletedTasks.


1:  WARN 10:38:58,079 Error processing org.apache.cassandra.metrics:type=Connection,scope=1.2.3.4,name=CommandPendingTasks  
2:  javax.management.InstanceNotFoundException: org.apache.cassandra.metrics:type=Connection,scope=1.2.3.4,name=CommandPendingTasks  
3:     at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095)  
4:     at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427)  
5:     at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415)  
6:     at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546)  
7:     at com.yammer.metrics.reporting.JmxReporter.registerBean(JmxReporter.java:462)  
8:     at com.yammer.metrics.reporting.JmxReporter.processGauge(JmxReporter.java:438)  
9:     at com.yammer.metrics.reporting.JmxReporter.processGauge(JmxReporter.java:16)  
10:     at com.yammer.metrics.core.Gauge.processWith(Gauge.java:28)  
11:     at com.yammer.metrics.reporting.JmxReporter.onMetricAdded(JmxReporter.java:395)  
12:     at com.yammer.metrics.core.MetricsRegistry.notifyMetricAdded(MetricsRegistry.java:516)  
13:     at com.yammer.metrics.core.MetricsRegistry.getOrAdd(MetricsRegistry.java:491)  
14:     at com.yammer.metrics.core.MetricsRegistry.newGauge(MetricsRegistry.java:79)  
15:     at com.yammer.metrics.Metrics.newGauge(Metrics.java:70)  
16:     at org.apache.cassandra.metrics.ConnectionMetrics.<init>(ConnectionMetrics.java:71)  
17:     at org.apache.cassandra.net.OutboundTcpConnectionPool.<init>(OutboundTcpConnectionPool.java:53)  
18:     at org.apache.cassandra.net.MessagingService.getConnectionPool(MessagingService.java:493)  
19:     at org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:507)  
20:     at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:640)  
21:     at org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:614)  
22:     at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:59)  
23:     at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)  
24:     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)  
25:     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)  
26:     at java.lang.Thread.run(Thread.java:745)  
27:       
28:   WARN 07:52:19,882 Error processing org.apache.cassandra.metrics:type=Connection,scope=1.2.3.4,name=ResponseCompletedTasks  
29:  javax.management.InstanceNotFoundException: org.apache.cassandra.metrics:type=Connection,scope=1.2.3.4,name=ResponseCompletedTasks  
30:     at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095)  
31:     at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427)  
32:     at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415)  
33:     at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546)  
34:     at com.yammer.metrics.reporting.JmxReporter.registerBean(JmxReporter.java:462)  
35:     at com.yammer.metrics.reporting.JmxReporter.processGauge(JmxReporter.java:438)  
36:     at com.yammer.metrics.reporting.JmxReporter.processGauge(JmxReporter.java:16)  
37:     at com.yammer.metrics.core.Gauge.processWith(Gauge.java:28)  
38:     at com.yammer.metrics.reporting.JmxReporter.onMetricAdded(JmxReporter.java:395)  
39:     at com.yammer.metrics.core.MetricsRegistry.notifyMetricAdded(MetricsRegistry.java:516)  
40:     at com.yammer.metrics.core.MetricsRegistry.getOrAdd(MetricsRegistry.java:491)  
41:     at com.yammer.metrics.core.MetricsRegistry.newGauge(MetricsRegistry.java:79)  
42:     at com.yammer.metrics.Metrics.newGauge(Metrics.java:70)  
43:     at org.apache.cassandra.metrics.ConnectionMetrics.<init>(ConnectionMetrics.java:99)  
44:     at org.apache.cassandra.net.OutboundTcpConnectionPool.<init>(OutboundTcpConnectionPool.java:53)  
45:     at org.apache.cassandra.net.MessagingService.getConnectionPool(MessagingService.java:493)  
46:     at org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:507)  
47:     at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:640)  
48:     at org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:614)  
49:     at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:59)  
50:     at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)  
51:     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)  
52:     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)  
53:     at java.lang.Thread.run(Thread.java:745)  
54:       
55:   WARN 09:06:07,059 Error processing org.apache.cassandra.metrics:type=Connection,scope=1.2.3.4,name=ResponsePendingTasks  
56:  javax.management.InstanceNotFoundException: org.apache.cassandra.metrics:type=Connection,scope=1.2.3.4,name=ResponsePendingTasks  
57:     at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095)  
58:     at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427)  
59:     at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415)  
60:     at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546)  
61:     at com.yammer.metrics.reporting.JmxReporter.registerBean(JmxReporter.java:462)  
62:     at com.yammer.metrics.reporting.JmxReporter.processGauge(JmxReporter.java:438)  
63:     at com.yammer.metrics.reporting.JmxReporter.processGauge(JmxReporter.java:16)  
64:     at com.yammer.metrics.core.Gauge.processWith(Gauge.java:28)  
65:     at com.yammer.metrics.reporting.JmxReporter.onMetricAdded(JmxReporter.java:395)  
66:     at com.yammer.metrics.core.MetricsRegistry.notifyMetricAdded(MetricsRegistry.java:516)  
67:     at com.yammer.metrics.core.MetricsRegistry.getOrAdd(MetricsRegistry.java:491)  
68:     at com.yammer.metrics.core.MetricsRegistry.newGauge(MetricsRegistry.java:79)  
69:     at com.yammer.metrics.Metrics.newGauge(Metrics.java:70)  
70:     at org.apache.cassandra.metrics.ConnectionMetrics.<init>(ConnectionMetrics.java:92)  
71:     at org.apache.cassandra.net.OutboundTcpConnectionPool.<init>(OutboundTcpConnectionPool.java:53)  
72:     at org.apache.cassandra.net.MessagingService.getConnectionPool(MessagingService.java:493)  
73:     at org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:507)  
74:     at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:640)  
75:     at org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:614)  
76:     at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:59)  
77:     at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)  
78:     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)  
79:     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)  
80:     at java.lang.Thread.run(Thread.java:745)     
81:    
82:    
83:   WARN 02:13:09,861 Error processing org.apache.cassandra.metrics:type=Connection,scope=1.2.3.4,name=CommandCompletedTasks  
84:  javax.management.InstanceNotFoundException: org.apache.cassandra.metrics:type=Connection,scope=1.2.3.4,name=CommandCompletedTasks  
85:     at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095)  
86:     at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427)  
87:     at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415)  
88:     at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546)  
89:     at com.yammer.metrics.reporting.JmxReporter.registerBean(JmxReporter.java:462)  
90:     at com.yammer.metrics.reporting.JmxReporter.processGauge(JmxReporter.java:438)  
91:     at com.yammer.metrics.reporting.JmxReporter.processGauge(JmxReporter.java:16)  
92:     at com.yammer.metrics.core.Gauge.processWith(Gauge.java:28)  
93:     at com.yammer.metrics.reporting.JmxReporter.onMetricAdded(JmxReporter.java:395)  
94:     at com.yammer.metrics.core.MetricsRegistry.notifyMetricAdded(MetricsRegistry.java:516)  
95:     at com.yammer.metrics.core.MetricsRegistry.getOrAdd(MetricsRegistry.java:491)  
96:     at com.yammer.metrics.core.MetricsRegistry.newGauge(MetricsRegistry.java:79)  
97:     at com.yammer.metrics.Metrics.newGauge(Metrics.java:70)  
98:     at org.apache.cassandra.metrics.ConnectionMetrics.<init>(ConnectionMetrics.java:78)  
99:     at org.apache.cassandra.net.OutboundTcpConnectionPool.<init>(OutboundTcpConnectionPool.java:53)  
100:     at org.apache.cassandra.net.MessagingService.getConnectionPool(MessagingService.java:493)  
101:     at org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:507)  
102:     at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:640)  
103:     at org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:614)  
104:     at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:59)  
105:     at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)  
106:     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)  
107:     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)  
108:     at java.lang.Thread.run(Thread.java:745)  
This is indeed nothing to worry about. When you done upgrading all your nodes in the cluster, do another round of restart, this type of warning logging will disappear.


Saturday, June 18, 2016

yet another upgrade to cassandra virtual nodes fail

Recently I was assigned a project to upgrade cassandra from 1.1 to 1.2 (I know it is ancient cassandra but who cares? we just want it to work and cassandra deliver just that) and one of the main feature of cassandra 1.2 is the virtual nodes.

Although there is a red warning note in this instruction, but I took sometime to investigate it knowing that we not enable bleeding edge technology or home based customized the cassandra code. If you selecting cassandra in 1.2 for your upgrade and you want to try on virtual nodes upgrade as well, choose one less version than 1.2.19. why? read here https://github.com/apache/cassandra/blob/cassandra-1.2.19/NEWS.txt#L19-L23

I started three nodes cassandra 1.2.18 in sandbox environment where I can safely test the cassandra upgrade from 1.1 to 1.2 and after upgraded that, upgrade to virtual nodes.

1:  [user@localhost ~]$ sudo ./cassandra-shuffle -h 127.0.0.2 -p 7210 create  
2:  Token                   From      To         
3:  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+~~~~~~~~~~~~~~~+~~~~~~~~~~~~~~~  
4:  73107539768170373009709388315418951678   127.0.0.4    127.0.0.3     
5:  169033493463981801837600797832317151914  127.0.0.2    127.0.0.3     
6:  136467407567251362951457524855448709801  127.0.0.2    127.0.0.3     
7:  133808951575681531205649910734888020649  127.0.0.2    127.0.0.3     
8:  75544457760442718776699701259266250066   127.0.0.4    127.0.0.3     
9:    
10:    
11:  [user@localhost ~]$ sudo ./cassandra-shuffle -h 127.0.0.2 -p 7210 enable  
12:  [user@localhost ~]$ sudo ./cassandra-shuffle -h 127.0.0.2 -p 7210 ls  
13:  Token                   Endpoint    Requested at  
14:  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+~~~~~~~~~~~~~~~+~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
15:  159285821494892418769639546056927958356  127.0.0.3    Tue Feb 02 16:12:41 MYT 2016  
16:  91938269708456681209179988336057166505   127.0.0.3    Tue Feb 02 16:12:41 MYT 2016  
17:  74436767763955288882613195375699296254   127.0.0.3    Tue Feb 02 16:12:41 MYT 2016  
18:  103901321670520924065314251878580267688  127.0.0.3    Tue Feb 02 16:12:41 MYT 2016  

1:  [user@localhost ~]$ apache-cassandra-1.2.18/bin/nodetool -h 127.0.0.2 -p 7210 status  
2:  Datacenter: datacenter1  
3:  ===================  
4:  Status=Up/Down  
5:  |/ State=Normal/Leaving/Joining/Moving  
6:  -- Address  Load    Tokens Owns (effective) Host ID                Rack  
7:  UN 127.0.0.2 20.01 GB  256   100.0%      ba12301a-2e3f-49d2-bb5b-e125e91fcd1b rack1  
8:  UN 127.0.0.3 20.01 GB  1    100.0%      e09705bf-01b1-423a-863c-c425a7796e51 rack1  
9:  UN 127.0.0.4 20.01 GB  1    100.0%      937f97ce-a55f-4785-8636-809456123a63 rack1  
10:  [user@localhost ~]$ apache-cassandra-1.2.18/bin/nodetool -h 127.0.0.2 -p 7210 ring  
11:    
12:  Datacenter: datacenter1  
13:  ==========  
14:  Replicas: 257  
15:    
16:  Address  Rack    Status State  Load      Owns        Token                      
17:                                       169919645461171745752870002539170714965     
18:  127.0.0.2 1e     Up   Normal 20.01 GB    100.00%       0                        
19:  127.0.0.3 1e     Up   Normal 20.01 GB    100.00%       56713727820156410577229101238628035242     
20:  127.0.0.4 1e     Up   Normal 20.01 GB    100.00%       113427455640312821154458202477256070485     
21:  127.0.0.2 1e     Up   Normal 20.01 GB    100.00%       113648993639610307133275503653969461247     
22:  127.0.0.2 1e     Up   Normal 20.01 GB    100.00%       113870531638907793112092804830682852010     
23:  127.0.0.2 1e     Up   Normal 20.01 GB    100.00%       114092069638205279090910106007396242772     
24:  127.0.0.2 1e     Up   Normal 20.01 GB    100.00%       114313607637502765069727407184109633535     
25:  127.0.0.2 1e     Up   Normal 20.01 GB    100.00%       114535145636800251048544708360823024297     
26:  127.0.0.2 1e     Up   Normal 20.01 GB    100.00%       114756683636097737027362009537536415060     
27:  127.0.0.2 1e     Up   Normal 20.01 GB    100.00%       114978221635395223006179310714249805823     
28:  ...  
29:  ...  
30:    
31:    
32:  shuffling ongoing  
33:  [user@localhost ~]$ sudo ./cassandra-shuffle -h 127.0.0.2 -p 7210 ls | wc -l  
34:  767  
35:  [user@localhost ~]$ sudo ./cassandra-shuffle -h 127.0.0.2 -p 7210 ls | wc -l  
36:  764  
37:    
38:  [user@localhost ~]$ apache-cassandra-1.2.18/bin/nodetool -h 127.0.0.2 -p 7210 netstats  
39:  Mode: RELOCATING  
40:  Not sending any streams.  
41:  Not receiving any streams.  
42:  Read Repair Statistics:  
43:  Attempted: 0  
44:  Mismatch (Blocking): 0  
45:  Mismatch (Background): 0  
46:  Pool Name          Active  Pending   Completed  
47:  Commands            n/a     0      154  
48:  Responses            n/a     0      3310  
49:  [user@localhost ~]$ apache-cassandra-1.2.18/bin/nodetool -h 127.0.0.2 -p 7210 compactionstats  
50:  pending tasks: 0  
51:  Active compaction remaining time :    n/a  
52:  [user@localhost ~]$   

As you can read above, I have created a shuffling process and enable it. The tokens started to change to 256 and the shuffling count suddenly coming down. I thought hey man, this can actually work! happily I announce to the team, looks like we able to migrate to cassandra vnodes.

However, on the next morning, when I check the upgrade process, oh gosh, the upgrade goes into a loop it seems.

1:    
2:    
3:   WARN [ScheduledRangeXfers:0] 2016-02-03 05:14:29,594 ScheduledRangeTransferExecutorService.java (line 120) Pausing until token count stabilizes (target=256, actual=282)  
4:   WARN [ScheduledRangeXfers:0] 2016-02-03 05:14:30,836 ScheduledRangeTransferExecutorService.java (line 120) Pausing until token count stabilizes (target=256, actual=282)  
5:   WARN [ScheduledRangeXfers:0] 2016-02-03 05:14:32,667 ScheduledRangeTransferExecutorService.java (line 120) Pausing until token count stabilizes (target=256, actual=282)  
6:   WARN [ScheduledRangeXfers:0] 2016-02-03 05:14:33,339 ScheduledRangeTransferExecutorService.java (line 120) Pausing until token count stabilizes (target=256, actual=282)  
7:   WARN [ScheduledRangeXfers:0] 2016-02-03 05:14:34,582 ScheduledRangeTransferExecutorService.java (line 120) Pausing until token count stabilizes (target=256, actual=282)  
8:     
9:     
10:     
11:       if (res.size() < 1)  
12:      {  
13:        LOG.debug("No queued ranges to transfer");  
14:        return;  
15:      }  
16:    
17:      if (!isReady())  
18:        return;  
19:    
20:      UntypedResultSet.Row row = res.iterator().next();  
21:    
22:      Date requestedAt = row.getTimestamp("requested_at");  
23:      ByteBuffer tokenBytes = row.getBytes("token_bytes");  
24:      Token token = StorageService.getPartitioner().getTokenFactory().fromByteArray(tokenBytes);  
25:    
26:      LOG.info("Initiating transfer of {} (scheduled at {})", token, requestedAt.toString());  
27:      try  
28:      {  
29:        StorageService.instance.relocateTokens(Collections.singleton(token));  
30:      }  
31:      catch (Exception e)  
32:      {  
33:        LOG.error("Error removing {}: {}", token, e);   
34:      }  
35:      finally  
36:      {  
37:        LOG.debug("Removing queued entry for transfer of {}", token);  
38:        processInternal(String.format("DELETE FROM system.%s WHERE token_bytes = '%s'",  
39:                       SystemTable.RANGE_XFERS_CF,  
40:                       ByteBufferUtil.bytesToHex(tokenBytes)));  
41:      }  
42:    }    
43:    
44:    private boolean isReady()  
45:    {    
46:      int targetTokens = DatabaseDescriptor.getNumTokens();  
47:      int highMark = (int)Math.ceil(targetTokens + (targetTokens * .10));  
48:      int actualTokens = StorageService.instance.getTokens().size();  
49:    
50:      if (actualTokens >= highMark)  
51:      {  
52:        LOG.warn("Pausing until token count stabilizes (target={}, actual={})", targetTokens, actualTokens);  
53:        return false;  
54:      }  
55:    
56:      return true;  
57:    }   

The shuffling counts stay at 744, it is unfortunately we have to stay with the non vnodes technology. If you have success virtual nodes upgrade, please leave your comment below like what version path you taken and what shuffling steps you taken to successfully upgrade c* cluster to vnodes.

I end this article with the steps I have taken. If you intend to upgrade to vnodes, I suggest don't waste time and might as well spin up a new cluster if more and more upgrade is not possible. One comes to mind now is the partitioner (random to murmur3) and vnodes technology.


  • stop automatic cassandra maintenace.
  • make sure data consistent.
  • make sure ALL SSTABLE VERSION ARE IC.


1. change in all server cassandra.yaml
num_tokens:256
initial_tokens to empty
rolling restart all server

2. cassandra-shuffle create

3. cassandra-shuffle enable

4. cassandra-shuffle ls

5. periodic checks.
check in log,
check in nodetool netstats
user@localhost ~$ sudo ./cassandra-shuffle -h 127.0.0.2 -p 7210 ls | wc -l
759

Friday, June 17, 2016

trying out aggregations on mongodb

Today, we will again look into mongodb but on the specific topic of aggregation.

Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. MongoDB provides three ways to perform aggregation: the aggregation pipeline, the map-reduce function, and single purpose aggregation methods.

Let's start with a sample aggregation from the zip code.Importing 29353 objects within second. Blazing fast, maybe because I'm using ssd, heh.

 user@localhost:~$ mongoimport --collection zipcodes < ~/Desktop/zips.json   
 connected to: 127.0.0.1  
 Tue Mar 8 20:21:37.950 check 9 29353  
 Tue Mar 8 20:21:38.099 imported 29353 objects  
   
 > db.zipcodes.aggregate( [ { $group: { _id: "$state", totalPop: { $sum: "$pop" } } }, { $match: { totalPop: { $gte: 10*1000*1000 } } } ] )  
 {  
    "result" : [  
       {  
          "_id" : "IL",  
          "totalPop" : 11427576  
       },  
       {  
          "_id" : "OH",  
          "totalPop" : 10846517  
       },  
       {  
          "_id" : "FL",  
          "totalPop" : 12686644  
       },  
       {  
          "_id" : "NY",  
          "totalPop" : 17990402  
       },  
       {  
          "_id" : "PA",  
          "totalPop" : 11881643  
       },  
       {  
          "_id" : "TX",  
          "totalPop" : 16984601  
       },  
       {  
          "_id" : "CA",  
          "totalPop" : 29754890  
       }  
    ],  
    "ok" : 1  
 }  
 > db.zipcodes.aggregate( [ { $group: { _id: { state: "$state", city: "$city" }, pop: { $sum: "$pop" } } }, { $group: { _id: "$_id.state", avgCityPop: { $avg: "$pop" } } } ] )  
 {  
    "result" : [  
       {  
          "_id" : "NH",  
          "avgCityPop" : 5232.320754716981  
       },  
       {  
          "_id" : "MA",  
          "avgCityPop" : 14855.37037037037  
       },  
       {  
          "_id" : "ME",  
          "avgCityPop" : 3006.4901960784314  
       },  
       {  
          "_id" : "NY",  
          "avgCityPop" : 13131.680291970803  
       },  
       {  
          "_id" : "VT",  
          "avgCityPop" : 2315.8765432098767  
       },  
       {  
          "_id" : "PA",  
          "avgCityPop" : 8679.067202337472  
       },  
       {  
          "_id" : "DE",  
          "avgCityPop" : 14481.91304347826  
       },  
       {  
          "_id" : "DC",  
          "avgCityPop" : 303450  
       },  
       {  
          "_id" : "VA",  
          "avgCityPop" : 8526.177931034483  
       },  
       {  
          "_id" : "SC",  
          "avgCityPop" : 11139.626198083068  
       },  
       {  
          "_id" : "FL",  
          "avgCityPop" : 27400.958963282937  
       },  
       {  
          "_id" : "AL",  
          "avgCityPop" : 7907.2152641878665  
       },  
       {  
          "_id" : "NJ",  
          "avgCityPop" : 15775.89387755102  
       },  
       {  
          "_id" : "WV",  
          "avgCityPop" : 2771.4775888717154  
       },  
       {  
          "_id" : "TN",  
          "avgCityPop" : 9656.350495049504  
       },  
       {  
          "_id" : "OH",  
          "avgCityPop" : 12700.839578454332  
       },  
       {  
          "_id" : "MD",  
          "avgCityPop" : 12615.775725593667  
       },  
       {  
          "_id" : "MN",  
          "avgCityPop" : 5372.21375921376  
       },  
       {  
          "_id" : "ND",  
          "avgCityPop" : 1645.0309278350514  
       },  
       {  
          "_id" : "NC",  
          "avgCityPop" : 10622.815705128205  
       },  
       {  
          "_id" : "MT",  
          "avgCityPop" : 2593.987012987013  
       },  
       {  
          "_id" : "IL",  
          "avgCityPop" : 9954.334494773519  
       },  
       {  
          "_id" : "MO",  
          "avgCityPop" : 5672.195338512764  
       },  
       {  
          "_id" : "KS",  
          "avgCityPop" : 3819.884259259259  
       },  
       {  
          "_id" : "LA",  
          "avgCityPop" : 10465.496277915632  
       },  
       {  
          "_id" : "AR",  
          "avgCityPop" : 4175.355239786856  
       },  
       {  
          "_id" : "CO",  
          "avgCityPop" : 9981.075757575758  
       },  
       {  
          "_id" : "IN",  
          "avgCityPop" : 9271.130434782608  
       },  
       {  
          "_id" : "KY",  
          "avgCityPop" : 4767.164721141375  
       },  
       {  
          "_id" : "OK",  
          "avgCityPop" : 6155.743639921722  
       },  
       {  
          "_id" : "ID",  
          "avgCityPop" : 4320.811158798283  
       },  
       {  
          "_id" : "WY",  
          "avgCityPop" : 3384.5373134328356  
       },  
       {  
          "_id" : "UT",  
          "avgCityPop" : 9518.508287292818  
       },  
       {  
          "_id" : "NV",  
          "avgCityPop" : 18209.590909090908  
       },  
       {  
          "_id" : "NE",  
          "avgCityPop" : 3034.882692307692  
       },  
       {  
          "_id" : "RI",  
          "avgCityPop" : 19292.653846153848  
       },  
       {  
          "_id" : "NM",  
          "avgCityPop" : 5872.360465116279  
       },  
       {  
          "_id" : "CA",  
          "avgCityPop" : 27756.42723880597  
       },  
       {  
          "_id" : "AZ",  
          "avgCityPop" : 20591.16853932584  
       },  
       {  
          "_id" : "HI",  
          "avgCityPop" : 15831.842857142858  
       },  
       {  
          "_id" : "IA",  
          "avgCityPop" : 3123.0821147356583  
       },  
       {  
          "_id" : "MS",  
          "avgCityPop" : 7524.023391812865  
       },  
       {  
          "_id" : "WI",  
          "avgCityPop" : 7323.00748502994  
       },  
       {  
          "_id" : "TX",  
          "avgCityPop" : 13775.02108678021  
       },  
       {  
          "_id" : "SD",  
          "avgCityPop" : 1839.6746031746031  
       },  
       {  
          "_id" : "MI",  
          "avgCityPop" : 12087.512353706112  
       },  
       {  
          "_id" : "GA",  
          "avgCityPop" : 11547.62210338681  
       },  
       {  
          "_id" : "OR",  
          "avgCityPop" : 8262.561046511628  
       },  
       {  
          "_id" : "CT",  
          "avgCityPop" : 14674.625  
       },  
       {  
          "_id" : "WA",  
          "avgCityPop" : 12258.670025188916  
       },  
       {  
          "_id" : "AK",  
          "avgCityPop" : 2976.4918032786886  
       }  
    ],  
    "ok" : 1  
 }  
 > db.zipcodes.aggregate( [ { $group: { _id: { state: "$state", city: "$city" }, pop: { $sum: "$pop" } } }, { $sort: { pop: 1 } }, { $group: { _id : "$_id.state", biggestCity: { $last: "$_id.city" }, biggestPop:  { $last: "$pop" }, smallestCity: { $first: "$_id.city" }, smallestPop: { $first: "$pop" } } }, { $project: { _id: 0, state: "$_id", biggestCity: { name: "$biggestCity", pop: "$biggestPop" }, smallestCity: { name: "$smallestCity", pop: "$smallestPop" } } } ] )  
 {  
    "result" : [  
       {  
          "biggestCity" : {  
             "name" : "NEWARK",  
             "pop" : 111674  
          },  
          "smallestCity" : {  
             "name" : "BETHEL",  
             "pop" : 108  
          },  
          "state" : "DE"  
       },  
       {  
          "biggestCity" : {  
             "name" : "SAINT LOUIS",  
             "pop" : 397802  
          },  
          "smallestCity" : {  
             "name" : "BENDAVIS",  
             "pop" : 44  
          },  
          "state" : "MO"  
       },  
       {  
          "biggestCity" : {  
             "name" : "CHICAGO",  
             "pop" : 2452177  
          },  
          "smallestCity" : {  
             "name" : "ANCONA",  
             "pop" : 38  
          },  
          "state" : "IL"  
       },  
       {  
          "biggestCity" : {  
             "name" : "CLEVELAND",  
             "pop" : 536759  
          },  
          "smallestCity" : {  
             "name" : "ISLE SAINT GEORG",  
             "pop" : 38  
          },  
          "state" : "OH"  
       },  
       {  
          "biggestCity" : {  
             "name" : "MANCHESTER",  
             "pop" : 106452  
          },  
          "smallestCity" : {  
             "name" : "WEST NOTTINGHAM",  
             "pop" : 27  
          },  
          "state" : "NH"  
       },  
       {  
          "biggestCity" : {  
             "name" : "WASHINGTON",  
             "pop" : 606879  
          },  
          "smallestCity" : {  
             "name" : "PENTAGON",  
             "pop" : 21  
          },  
          "state" : "DC"  
       },  
       {  
          "biggestCity" : {  
             "name" : "GRAND FORKS",  
             "pop" : 59527  
          },  
          "smallestCity" : {  
             "name" : "TROTTERS",  
             "pop" : 12  
          },  
          "state" : "ND"  
       },  
       {  
          "biggestCity" : {  
             "name" : "BALTIMORE",  
             "pop" : 733081  
          },  
          "smallestCity" : {  
             "name" : "ANNAPOLIS JUNCTI",  
             "pop" : 32  
          },  
          "state" : "MD"  
       },  
       {  
          "biggestCity" : {  
             "name" : "MINNEAPOLIS",  
             "pop" : 344719  
          },  
          "smallestCity" : {  
             "name" : "JOHNSON",  
             "pop" : 12  
          },  
          "state" : "MN"  
       },  
       {  
          "biggestCity" : {  
             "name" : "SALT LAKE CITY",  
             "pop" : 186346  
          },  
          "smallestCity" : {  
             "name" : "MODENA",  
             "pop" : 9  
          },  
          "state" : "UT"  
       },  
       {  
          "biggestCity" : {  
             "name" : "CHEYENNE",  
             "pop" : 70185  
          },  
          "smallestCity" : {  
             "name" : "LOST SPRINGS",  
             "pop" : 6  
          },  
          "state" : "WY"  
       },  
       {  
          "biggestCity" : {  
             "name" : "PHOENIX",  
             "pop" : 890853  
          },  
          "smallestCity" : {  
             "name" : "HUALAPAI",  
             "pop" : 2  
          },  
          "state" : "AZ"  
       },  
       {  
          "biggestCity" : {  
             "name" : "BRIDGEPORT",  
             "pop" : 141638  
          },  
          "smallestCity" : {  
             "name" : "EAST KILLINGLY",  
             "pop" : 25  
          },  
          "state" : "CT"  
       },  
       {  
          "biggestCity" : {  
             "name" : "SEATTLE",  
             "pop" : 520096  
          },  
          "smallestCity" : {  
             "name" : "BENGE",  
             "pop" : 2  
          },  
          "state" : "WA"  
       },  
       {  
          "biggestCity" : {  
             "name" : "BIRMINGHAM",  
             "pop" : 242606  
          },  
          "smallestCity" : {  
             "name" : "ALLEN",  
             "pop" : 0  
          },  
          "state" : "AL"  
       },  
       {  
          "biggestCity" : {  
             "name" : "LAS VEGAS",  
             "pop" : 597557  
          },  
          "smallestCity" : {  
             "name" : "TUSCARORA",  
             "pop" : 1  
          },  
          "state" : "NV"  
       },  
       {  
          "biggestCity" : {  
             "name" : "OMAHA",  
             "pop" : 358930  
          },  
          "smallestCity" : {  
             "name" : "LAKESIDE",  
             "pop" : 5  
          },  
          "state" : "NE"  
       },  
       {  
          "biggestCity" : {  
             "name" : "MIAMI",  
             "pop" : 825232  
          },  
          "smallestCity" : {  
             "name" : "CECIL FIELD NAS",  
             "pop" : 0  
          },  
          "state" : "FL"  
       },  
       {  
          "biggestCity" : {  
             "name" : "SIOUX FALLS",  
             "pop" : 102046  
          },  
          "smallestCity" : {  
             "name" : "ZEONA",  
             "pop" : 8  
          },  
          "state" : "SD"  
       },  
       {  
          "biggestCity" : {  
             "name" : "HOUSTON",  
             "pop" : 2095918  
          },  
          "smallestCity" : {  
             "name" : "FULTON",  
             "pop" : 0  
          },  
          "state" : "TX"  
       },  
       {  
          "biggestCity" : {  
             "name" : "MILWAUKEE",  
             "pop" : 597324  
          },  
          "smallestCity" : {  
             "name" : "CLAM LAKE",  
             "pop" : 2  
          },  
          "state" : "WI"  
       },  
       {  
          "biggestCity" : {  
             "name" : "JACKSON",  
             "pop" : 204788  
          },  
          "smallestCity" : {  
             "name" : "CHUNKY",  
             "pop" : 79  
          },  
          "state" : "MS"  
       },  
       {  
          "biggestCity" : {  
             "name" : "DES MOINES",  
             "pop" : 148155  
          },  
          "smallestCity" : {  
             "name" : "DOUDS",  
             "pop" : 15  
          },  
          "state" : "IA"  
       },  
       {  
          "biggestCity" : {  
             "name" : "HONOLULU",  
             "pop" : 396643  
          },  
          "smallestCity" : {  
             "name" : "NINOLE",  
             "pop" : 0  
          },  
          "state" : "HI"  
       },  
       {  
          "biggestCity" : {  
             "name" : "CHARLOTTE",  
             "pop" : 465833  
          },  
          "smallestCity" : {  
             "name" : "GLOUCESTER",  
             "pop" : 0  
          },  
          "state" : "NC"  
       },  
       {  
          "biggestCity" : {  
             "name" : "BILLINGS",  
             "pop" : 78805  
          },  
          "smallestCity" : {  
             "name" : "MOSBY",  
             "pop" : 7  
          },  
          "state" : "MT"  
       },  
       {  
          "biggestCity" : {  
             "name" : "TULSA",  
             "pop" : 389072  
          },  
          "smallestCity" : {  
             "name" : "SOUTHARD",  
             "pop" : 8  
          },  
          "state" : "OK"  
       },  
       {  
          "biggestCity" : {  
             "name" : "BOISE",  
             "pop" : 165522  
          },  
          "smallestCity" : {  
             "name" : "KEUTERVILLE",  
             "pop" : 0  
          },  
          "state" : "ID"  
       },  
       {  
          "biggestCity" : {  
             "name" : "INDIANAPOLIS",  
             "pop" : 348868  
          },  
          "smallestCity" : {  
             "name" : "WESTPOINT",  
             "pop" : 145  
          },  
          "state" : "IN"  
       },  
       {  
          "biggestCity" : {  
             "name" : "LOUISVILLE",  
             "pop" : 288058  
          },  
          "smallestCity" : {  
             "name" : "BROWDER",  
             "pop" : 0  
          },  
          "state" : "KY"  
       },  
       {  
          "biggestCity" : {  
             "name" : "BURLINGTON",  
             "pop" : 39127  
          },  
          "smallestCity" : {  
             "name" : "UNIV OF VERMONT",  
             "pop" : 0  
          },  
          "state" : "VT"  
       },  
       {  
          "biggestCity" : {  
             "name" : "PHILADELPHIA",  
             "pop" : 1610956  
          },  
          "smallestCity" : {  
             "name" : "HAMILTON",  
             "pop" : 0  
          },  
          "state" : "PA"  
       },  
       {  
          "biggestCity" : {  
             "name" : "NEW ORLEANS",  
             "pop" : 496937  
          },  
          "smallestCity" : {  
             "name" : "FORDOCHE",  
             "pop" : 0  
          },  
          "state" : "LA"  
       },  
       {  
          "biggestCity" : {  
             "name" : "COLUMBIA",  
             "pop" : 269521  
          },  
          "smallestCity" : {  
             "name" : "QUINBY",  
             "pop" : 0  
          },  
          "state" : "SC"  
       },  
       {  
          "biggestCity" : {  
             "name" : "WICHITA",  
             "pop" : 295115  
          },  
          "smallestCity" : {  
             "name" : "ARNOLD",  
             "pop" : 0  
          },  
          "state" : "KS"  
       },  
       {  
          "biggestCity" : {  
             "name" : "NEWARK",  
             "pop" : 275572  
          },  
          "smallestCity" : {  
             "name" : "IMLAYSTOWN",  
             "pop" : 17  
          },  
          "state" : "NJ"  
       },  
       {  
          "biggestCity" : {  
             "name" : "HUNTINGTON",  
             "pop" : 75343  
          },  
          "smallestCity" : {  
             "name" : "MOUNT CARBON",  
             "pop" : 0  
          },  
          "state" : "WV"  
       },  
       {  
          "biggestCity" : {  
             "name" : "MEMPHIS",  
             "pop" : 632837  
          },  
          "smallestCity" : {  
             "name" : "ALLRED",  
             "pop" : 2  
          },  
          "state" : "TN"  
       },  
       {  
          "biggestCity" : {  
             "name" : "VIRGINIA BEACH",  
             "pop" : 385080  
          },  
          "smallestCity" : {  
             "name" : "WALLOPS ISLAND",  
             "pop" : 0  
          },  
          "state" : "VA"  
       },  
       {  
          "biggestCity" : {  
             "name" : "ANCHORAGE",  
             "pop" : 183987  
          },  
          "smallestCity" : {  
             "name" : "CHEVAK",  
             "pop" : 0  
          },  
          "state" : "AK"  
       },  
       {  
          "biggestCity" : {  
             "name" : "BROOKLYN",  
             "pop" : 2300504  
          },  
          "smallestCity" : {  
             "name" : "RAQUETTE LAKE",  
             "pop" : 0  
          },  
          "state" : "NY"  
       },  
       {  
          "biggestCity" : {  
             "name" : "DENVER",  
             "pop" : 451182  
          },  
          "smallestCity" : {  
             "name" : "CHEYENNE MTN AFB",  
             "pop" : 0  
          },  
          "state" : "CO"  
       },  
       {  
          "biggestCity" : {  
             "name" : "DETROIT",  
             "pop" : 963243  
          },  
          "smallestCity" : {  
             "name" : "LELAND",  
             "pop" : 0  
          },  
          "state" : "MI"  
       },  
       {  
          "biggestCity" : {  
             "name" : "PORTLAND",  
             "pop" : 518543  
          },  
          "smallestCity" : {  
             "name" : "KENT",  
             "pop" : 0  
          },  
          "state" : "OR"  
       },  
       {  
          "biggestCity" : {  
             "name" : "ATLANTA",  
             "pop" : 609591  
          },  
          "smallestCity" : {  
             "name" : "FORT STEWART",  
             "pop" : 0  
          },  
          "state" : "GA"  
       },  
       {  
          "biggestCity" : {  
             "name" : "CRANSTON",  
             "pop" : 176404  
          },  
          "smallestCity" : {  
             "name" : "CLAYVILLE",  
             "pop" : 45  
          },  
          "state" : "RI"  
       },  
       {  
          "biggestCity" : {  
             "name" : "ALBUQUERQUE",  
             "pop" : 449584  
          },  
          "smallestCity" : {  
             "name" : "ALGODONES",  
             "pop" : 0  
          },  
          "state" : "NM"  
       },  
       {  
          "biggestCity" : {  
             "name" : "LOS ANGELES",  
             "pop" : 2102295  
          },  
          "smallestCity" : {  
             "name" : "TWIN BRIDGES",  
             "pop" : 0  
          },  
          "state" : "CA"  
       },  
       {  
          "biggestCity" : {  
             "name" : "LITTLE ROCK",  
             "pop" : 192895  
          },  
          "smallestCity" : {  
             "name" : "TOMATO",  
             "pop" : 0  
          },  
          "state" : "AR"  
       },  
       {  
          "biggestCity" : {  
             "name" : "WORCESTER",  
             "pop" : 169856  
          },  
          "smallestCity" : {  
             "name" : "BUCKLAND",  
             "pop" : 16  
          },  
          "state" : "MA"  
       },  
       {  
          "biggestCity" : {  
             "name" : "PORTLAND",  
             "pop" : 63268  
          },  
          "smallestCity" : {  
             "name" : "BUSTINS ISLAND",  
             "pop" : 0  
          },  
          "state" : "ME"  
       }  
    ],  
    "ok" : 1  
 }  
 >   
   

All query in the examples works. It's amazing all three queries quickly bring results within second! Amazing. Whilst this is an short article to convince you to use aggregation on mongodb, and if you have been convince, you should really try on the following useful links too.


https://docs.mongodb.org/manual/core/map-reduce/

https://docs.mongodb.org/manual/reference/aggregation/

good luck!