Sunday, July 30, 2017

First try out of DateTieredCompactionStrategy

DateTieredCompactionStrategy was introduced during cassandra 2.0 and it is meant for time series data, like monitoring temperature over time, instrumenting devices metrics over time. I tested this using cassandra 3.0.11 and it works really solid.

 cqlsh:jw_schema1> desc table temperature;  
   
 CREATE TABLE jw_schema1.temperature (  
   weatherstation_id text,  
   event_time timestamp,  
   temperature text,  
   PRIMARY KEY (weatherstation_id, event_time)  
 ) WITH CLUSTERING ORDER BY (event_time ASC)  
   AND bloom_filter_fp_chance = 0.01  
   AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}  
   AND comment = ''  
   AND compaction = {'base_time_seconds': '3600', 'class': 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy', 'max_sstable_age_days': '365', 'max_threshold': '32', 'min_threshold': '4', 'timestamp_resolution': 'SECONDS'}  
   AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}  
   AND crc_check_chance = 1.0  
   AND dclocal_read_repair_chance = 0.1  
   AND default_time_to_live = 0  
   AND gc_grace_seconds = 864000  
   AND max_index_interval = 2048  
   AND memtable_flush_period_in_ms = 0  
   AND min_index_interval = 128  
   AND read_repair_chance = 0.0  
   AND speculative_retry = '99PERCENTILE';  

above is the table definition. Then I added some sample data

 cqlsh:jw_schema1> insert into temperature (weatherstation_id, event_time, temperature) values ('1', '2017-03-07 20:38:20', '38');  
 cqlsh:jw_schema1> select * from temperature;  
   
  weatherstation_id | event_time        | temperature  
 -------------------+--------------------------+-------------  
          1 | 2017-03-06 16:00:00+0000 |     37  
          1 | 2017-03-07 12:38:20+0000 |     38  
   
 (2 rows)  
 cqlsh:jw_schema1> select * from temperature;  
   
  weatherstation_id | event_time        | temperature  
 -------------------+--------------------------+-------------  
          1 | 2017-03-06 16:00:00+0000 |     37  
          1 | 2017-03-07 12:38:20+0000 |     38  
   
 (2 rows)  
 cqlsh:jw_schema1> insert into temperature (weatherstation_id, event_time, temperature) values ('1', '2017-03-07 20:39:45', '36');  
 cqlsh:jw_schema1> select * from temperature;  
   
  weatherstation_id | event_time        | temperature  
 -------------------+--------------------------+-------------  
          1 | 2017-03-06 16:00:00+0000 |     37  
          1 | 2017-03-07 12:38:20+0000 |     38  
          1 | 2017-03-07 12:39:45+0000 |     36  
   
 (3 rows)  
 cqlsh:jw_schema1> select * from temperature;  
   
  weatherstation_id | event_time        | temperature  
 -------------------+--------------------------+-------------  
          1 | 2017-03-06 16:00:00+0000 |     37  
          1 | 2017-03-07 12:38:20+0000 |     38  
          1 | 2017-03-07 12:39:45+0000 |     36  
   
 (3 rows)  
   

and went on a little further by altering the compaction parameters

   
 cqlsh:jw_schema1> ALTER TABLE jw_schema1.temperature WITH compaction = { 'class': 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy', 'timestamp_resolution': 'MICROSECONDS', 'base_time_seconds': '10', 'max_sstable_age_days': '1' };  
 cqlsh:jw_schema1> desc table jw_schema1.temperature;  
   
 CREATE TABLE jw_schema1.temperature (  
   weatherstation_id text,  
   event_time timestamp,  
   temperature text,  
   PRIMARY KEY (weatherstation_id, event_time)  
 ) WITH CLUSTERING ORDER BY (event_time ASC)  
   AND bloom_filter_fp_chance = 0.01  
   AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}  
   AND comment = ''  
   AND compaction = {'base_time_seconds': '10', 'class': 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy', 'max_sstable_age_days': '1', 'max_threshold': '32', 'min_threshold': '4', 'timestamp_resolution': 'MICROSECONDS'}  
   AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}  
   AND crc_check_chance = 1.0  
   AND dclocal_read_repair_chance = 0.1  
   AND default_time_to_live = 0  
   AND gc_grace_seconds = 864000  
   AND max_index_interval = 2048  
   AND memtable_flush_period_in_ms = 0  
   AND min_index_interval = 128  
   AND read_repair_chance = 0.0  
   AND speculative_retry = '99PERCENTILE';  
   

and I trigger nodetool flush and compact the table, work solid, no exception nor error

 user@localhost:/var/lib/cassandra/data/jw_schema1$ find temperature-0049c010ff6211e6b4aa1d269322be24/  
 temperature-0049c010ff6211e6b4aa1d269322be24/  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-9-big-Statistics.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/backups  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-9-big-CompressionInfo.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-9-big-TOC.txt  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-9-big-Digest.crc32  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-9-big-Index.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-9-big-Filter.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-9-big-Data.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-9-big-Summary.db  
 user@localhost:/var/lib/cassandra/data/jw_schema1$   
 user@localhost:/var/lib/cassandra/data/jw_schema1$   
 user@localhost:/var/lib/cassandra/data/jw_schema1$ find temperature-0049c010ff6211e6b4aa1d269322be24/  
 temperature-0049c010ff6211e6b4aa1d269322be24/  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-9-big-Statistics.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/backups  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-9-big-CompressionInfo.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-9-big-TOC.txt  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-9-big-Digest.crc32  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-9-big-Index.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-9-big-Filter.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-9-big-Data.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-9-big-Summary.db  
 user@localhost:/var/lib/cassandra/data/jw_schema1$ nodetool -h localhost flush jw_schema1 temperature  
 user@localhost:/var/lib/cassandra/data/jw_schema1$ find temperature-0049c010ff6211e6b4aa1d269322be24/  
 temperature-0049c010ff6211e6b4aa1d269322be24/  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-10-big-CompressionInfo.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-10-big-Summary.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-10-big-Data.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-10-big-Filter.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-9-big-Statistics.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/backups  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-10-big-Statistics.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-10-big-Index.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-9-big-CompressionInfo.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-10-big-Digest.crc32  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-9-big-TOC.txt  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-9-big-Digest.crc32  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-9-big-Index.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-9-big-Filter.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-9-big-Data.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-10-big-TOC.txt  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-9-big-Summary.db  
 user@localhost:/var/lib/cassandra/data/jw_schema1$ nodetool -h localhost compact jw_schema1 temperature  
 user@localhost:/var/lib/cassandra/data/jw_schema1$ find temperature-0049c010ff6211e6b4aa1d269322be24/  
 temperature-0049c010ff6211e6b4aa1d269322be24/  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-11-big-Data.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-11-big-Statistics.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-11-big-Digest.crc32  
 temperature-0049c010ff6211e6b4aa1d269322be24/backups  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-11-big-TOC.txt  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-11-big-Summary.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-11-big-Filter.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-11-big-Index.db  
 temperature-0049c010ff6211e6b4aa1d269322be24/mc-11-big-CompressionInfo.db  
   

then I inserted a ttl value to the table , flush and compact, again, no exception nor error.

 cqlsh:jw_schema1> insert into temperature (weatherstation_id, event_time, temperature) values ('1', '2017-03-07 20:54:59', '37') using ttl 5;  
 cqlsh:jw_schema1> select * from temperature;  
   
  weatherstation_id | event_time        | temperature  
 -------------------+--------------------------+-------------  
          1 | 2017-03-06 16:00:00+0000 |     37  
          1 | 2017-03-07 12:38:20+0000 |     38  
          1 | 2017-03-07 12:39:45+0000 |     36  
          1 | 2017-03-07 12:52:59+0000 |     37  
          1 | 2017-03-07 12:54:59+0000 |     37  
   
 (5 rows)  
 cqlsh:jw_schema1>   
 cqlsh:jw_schema1> select * from temperature;  
   
  weatherstation_id | event_time        | temperature  
 -------------------+--------------------------+-------------  
          1 | 2017-03-06 16:00:00+0000 |     37  
          1 | 2017-03-07 12:38:20+0000 |     38  
          1 | 2017-03-07 12:39:45+0000 |     36  
          1 | 2017-03-07 12:52:59+0000 |     37  
   
 (4 rows)  
 cqlsh:jw_schema1> select * from temperature;  
   
  weatherstation_id | event_time        | temperature  
 -------------------+--------------------------+-------------  
          1 | 2017-03-06 16:00:00+0000 |     37  
          1 | 2017-03-07 12:38:20+0000 |     38  
          1 | 2017-03-07 12:39:45+0000 |     36  
          1 | 2017-03-07 12:52:59+0000 |     37  
   

that's it , if you plan to use this, better don't as cassandra 3.8 has deprecated this in favor of TimeWindowCompactionStrategy.




No comments:

Post a Comment