Although there is a red warning note in this instruction, but I took sometime to investigate it knowing that we not enable bleeding edge technology or home based customized the cassandra code. If you selecting cassandra in 1.2 for your upgrade and you want to try on virtual nodes upgrade as well, choose one less version than 1.2.19. why? read here https://github.com/apache/cassandra/blob/cassandra-1.2.19/NEWS.txt#L19-L23
I started three nodes cassandra 1.2.18 in sandbox environment where I can safely test the cassandra upgrade from 1.1 to 1.2 and after upgraded that, upgrade to virtual nodes.
1: [user@localhost ~]$ sudo ./cassandra-shuffle -h 127.0.0.2 -p 7210 create
2: Token From To
3: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+~~~~~~~~~~~~~~~+~~~~~~~~~~~~~~~
4: 73107539768170373009709388315418951678 127.0.0.4 127.0.0.3
5: 169033493463981801837600797832317151914 127.0.0.2 127.0.0.3
6: 136467407567251362951457524855448709801 127.0.0.2 127.0.0.3
7: 133808951575681531205649910734888020649 127.0.0.2 127.0.0.3
8: 75544457760442718776699701259266250066 127.0.0.4 127.0.0.3
9:
10:
11: [user@localhost ~]$ sudo ./cassandra-shuffle -h 127.0.0.2 -p 7210 enable
12: [user@localhost ~]$ sudo ./cassandra-shuffle -h 127.0.0.2 -p 7210 ls
13: Token Endpoint Requested at
14: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+~~~~~~~~~~~~~~~+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
15: 159285821494892418769639546056927958356 127.0.0.3 Tue Feb 02 16:12:41 MYT 2016
16: 91938269708456681209179988336057166505 127.0.0.3 Tue Feb 02 16:12:41 MYT 2016
17: 74436767763955288882613195375699296254 127.0.0.3 Tue Feb 02 16:12:41 MYT 2016
18: 103901321670520924065314251878580267688 127.0.0.3 Tue Feb 02 16:12:41 MYT 2016
1: [user@localhost ~]$ apache-cassandra-1.2.18/bin/nodetool -h 127.0.0.2 -p 7210 status
2: Datacenter: datacenter1
3: ===================
4: Status=Up/Down
5: |/ State=Normal/Leaving/Joining/Moving
6: -- Address Load Tokens Owns (effective) Host ID Rack
7: UN 127.0.0.2 20.01 GB 256 100.0% ba12301a-2e3f-49d2-bb5b-e125e91fcd1b rack1
8: UN 127.0.0.3 20.01 GB 1 100.0% e09705bf-01b1-423a-863c-c425a7796e51 rack1
9: UN 127.0.0.4 20.01 GB 1 100.0% 937f97ce-a55f-4785-8636-809456123a63 rack1
10: [user@localhost ~]$ apache-cassandra-1.2.18/bin/nodetool -h 127.0.0.2 -p 7210 ring
11:
12: Datacenter: datacenter1
13: ==========
14: Replicas: 257
15:
16: Address Rack Status State Load Owns Token
17: 169919645461171745752870002539170714965
18: 127.0.0.2 1e Up Normal 20.01 GB 100.00% 0
19: 127.0.0.3 1e Up Normal 20.01 GB 100.00% 56713727820156410577229101238628035242
20: 127.0.0.4 1e Up Normal 20.01 GB 100.00% 113427455640312821154458202477256070485
21: 127.0.0.2 1e Up Normal 20.01 GB 100.00% 113648993639610307133275503653969461247
22: 127.0.0.2 1e Up Normal 20.01 GB 100.00% 113870531638907793112092804830682852010
23: 127.0.0.2 1e Up Normal 20.01 GB 100.00% 114092069638205279090910106007396242772
24: 127.0.0.2 1e Up Normal 20.01 GB 100.00% 114313607637502765069727407184109633535
25: 127.0.0.2 1e Up Normal 20.01 GB 100.00% 114535145636800251048544708360823024297
26: 127.0.0.2 1e Up Normal 20.01 GB 100.00% 114756683636097737027362009537536415060
27: 127.0.0.2 1e Up Normal 20.01 GB 100.00% 114978221635395223006179310714249805823
28: ...
29: ...
30:
31:
32: shuffling ongoing
33: [user@localhost ~]$ sudo ./cassandra-shuffle -h 127.0.0.2 -p 7210 ls | wc -l
34: 767
35: [user@localhost ~]$ sudo ./cassandra-shuffle -h 127.0.0.2 -p 7210 ls | wc -l
36: 764
37:
38: [user@localhost ~]$ apache-cassandra-1.2.18/bin/nodetool -h 127.0.0.2 -p 7210 netstats
39: Mode: RELOCATING
40: Not sending any streams.
41: Not receiving any streams.
42: Read Repair Statistics:
43: Attempted: 0
44: Mismatch (Blocking): 0
45: Mismatch (Background): 0
46: Pool Name Active Pending Completed
47: Commands n/a 0 154
48: Responses n/a 0 3310
49: [user@localhost ~]$ apache-cassandra-1.2.18/bin/nodetool -h 127.0.0.2 -p 7210 compactionstats
50: pending tasks: 0
51: Active compaction remaining time : n/a
52: [user@localhost ~]$
As you can read above, I have created a shuffling process and enable it. The tokens started to change to 256 and the shuffling count suddenly coming down. I thought hey man, this can actually work! happily I announce to the team, looks like we able to migrate to cassandra vnodes.
However, on the next morning, when I check the upgrade process, oh gosh, the upgrade goes into a loop it seems.
1:
2:
3: WARN [ScheduledRangeXfers:0] 2016-02-03 05:14:29,594 ScheduledRangeTransferExecutorService.java (line 120) Pausing until token count stabilizes (target=256, actual=282)
4: WARN [ScheduledRangeXfers:0] 2016-02-03 05:14:30,836 ScheduledRangeTransferExecutorService.java (line 120) Pausing until token count stabilizes (target=256, actual=282)
5: WARN [ScheduledRangeXfers:0] 2016-02-03 05:14:32,667 ScheduledRangeTransferExecutorService.java (line 120) Pausing until token count stabilizes (target=256, actual=282)
6: WARN [ScheduledRangeXfers:0] 2016-02-03 05:14:33,339 ScheduledRangeTransferExecutorService.java (line 120) Pausing until token count stabilizes (target=256, actual=282)
7: WARN [ScheduledRangeXfers:0] 2016-02-03 05:14:34,582 ScheduledRangeTransferExecutorService.java (line 120) Pausing until token count stabilizes (target=256, actual=282)
8:
9:
10:
11: if (res.size() < 1)
12: {
13: LOG.debug("No queued ranges to transfer");
14: return;
15: }
16:
17: if (!isReady())
18: return;
19:
20: UntypedResultSet.Row row = res.iterator().next();
21:
22: Date requestedAt = row.getTimestamp("requested_at");
23: ByteBuffer tokenBytes = row.getBytes("token_bytes");
24: Token token = StorageService.getPartitioner().getTokenFactory().fromByteArray(tokenBytes);
25:
26: LOG.info("Initiating transfer of {} (scheduled at {})", token, requestedAt.toString());
27: try
28: {
29: StorageService.instance.relocateTokens(Collections.singleton(token));
30: }
31: catch (Exception e)
32: {
33: LOG.error("Error removing {}: {}", token, e);
34: }
35: finally
36: {
37: LOG.debug("Removing queued entry for transfer of {}", token);
38: processInternal(String.format("DELETE FROM system.%s WHERE token_bytes = '%s'",
39: SystemTable.RANGE_XFERS_CF,
40: ByteBufferUtil.bytesToHex(tokenBytes)));
41: }
42: }
43:
44: private boolean isReady()
45: {
46: int targetTokens = DatabaseDescriptor.getNumTokens();
47: int highMark = (int)Math.ceil(targetTokens + (targetTokens * .10));
48: int actualTokens = StorageService.instance.getTokens().size();
49:
50: if (actualTokens >= highMark)
51: {
52: LOG.warn("Pausing until token count stabilizes (target={}, actual={})", targetTokens, actualTokens);
53: return false;
54: }
55:
56: return true;
57: }
The shuffling counts stay at 744, it is unfortunately we have to stay with the non vnodes technology. If you have success virtual nodes upgrade, please leave your comment below like what version path you taken and what shuffling steps you taken to successfully upgrade c* cluster to vnodes.
I end this article with the steps I have taken. If you intend to upgrade to vnodes, I suggest don't waste time and might as well spin up a new cluster if more and more upgrade is not possible. One comes to mind now is the partitioner (random to murmur3) and vnodes technology.
- stop automatic cassandra maintenace.
- make sure data consistent.
- make sure ALL SSTABLE VERSION ARE IC.
1. change in all server cassandra.yaml
num_tokens:256
initial_tokens to empty
rolling restart all server
2. cassandra-shuffle create
3. cassandra-shuffle enable
4. cassandra-shuffle ls
5. periodic checks.
check in log,
check in nodetool netstats
user@localhost ~$ sudo ./cassandra-shuffle -h 127.0.0.2 -p 7210 ls | wc -l
759
No comments:
Post a Comment