-
Notifications
You must be signed in to change notification settings - Fork 390
Recovery procedure when contrail database is down for greater than gc_grace_seconds
When contrail-database has been down (either due to network partitioning or process stopped) for greater than gc_grace_seconds (defaulted to 10 days), the contrail-database init.d script will not start it. On issuing the service contrail-database start
command, user will see a message similar to
Cassandra has been down for at least 777600 seconds, not starting
In this scenario, if contrail-database is brought online without following the below procedure, then it can result in inconsistent configuration database.
To recover from this situation the following steps need to be followed:
For cassandra 1.2.x, the steps are at https://docs.datastax.com/en/cassandra/1.2/cassandra/operations/ops_remove_node_t.html.
For cassandra 2.1.x, the steps are at https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsRemoveNode.html
Note: The nodetool removenode
command mentioned in the steps above needs to run on the other cassandra nodes since cassandra is already stopped on the node to be removed.
For cassandra 1.2.x, the steps are at https://docs.datastax.com/en/cassandra/1.2/cassandra/reference/referenceClearCpkgData_t.html
For cassandra 2.1.x, the steps are at https://docs.datastax.com/en/cassandra/2.1/cassandra/reference/referenceClearCpkgData_t.html
rm -f /var/log/cassandra/status-up
service contrail-database start