A B Upgrade FROM (less than to 3.0.0) TO ( greater than or equal to 3.0.0 and less than 3.1.0)
Contrail upgrade procedure to support roll back to old contrail CONTROLLER in case of upgrade/post upgrade performance issues with the new CONTROLLER, also during upgrade the data traffic disruption will be minimal.
This is achieved by provisioning new set of CONTROLLERs node with the new contrail release/build and then switching the XMPP connection of compute to new control node one by one.
- CONTROLLER - config, database, collector, control and webui Nodes
- OLD_CONTROLLER_VIP – Virtual IP of the old controller nodes
- NEW_CONTROLLER_VIP – Virtual IP of the new controller nodes
- NEW_FAB_NODE – Node from which fab commands are triggered to provision new controllers (node that contains the new testbed.py)
- OLD_FAB_NODE – Existing node from which fab commands were triggered to provision the old Cluster
- OLD_DATABASE | OLD_DATABASE_IP – Nth old database node
- NEW_DATABASE | NEW_DATABASE_IP – Nth new database node
- OLD_CONFIG | OLD_CONFIG_IP – Nth old config node
- NEW_CONFIG | NEW_CONFIG_IP – Nth new config node
- OLD_CONTROL | OLD_CONTROL_IP – Nth old database node
- NEW_CONTROL | NEW_CONTROL_IP – Nth new config node
- BACKUP_NODE – Node in which the cassandra snapshots and zookeeper data dir is to be backed up
It is strongly recommended that cassandra commit logs and cassandra data to be on different disks. Recommendation is the following,
- 2 separate local disks - one for commit logs and one for data those can be set in testbed.py as the following parameters so the fab scripts will do the provisioning appropriately
- database_dir = '/cassandra’
- ssd_data_dir = '/commit_logs_data'
database_dir = '/var/lib/cassandra/mydata'
ssd_data_dir = '/var/lib/cassandra/commit_logs_data'
Steps to upgrade contrail cluster is explained below with fab commands and manual commands
Following is the steps to bring up new controllers. Involves installing old database/config package and then upgrading them to newer version. This two step method is followed because from Contrail Release 3.00 cassandra is upgraded to a 2.1.9 version from 1.2.12(Which involves Cassandra SSTables upgrade and intermediate version (2.0.7) upgrade)
Old version of contrail-install-packages needs to be installed in one of the config node to get the new provisioning code, using which the new controllers can be provisioned. Copy old contrail-install-packages to one of the new config node
Execute from NEW_FAB_NODE
dpkg -i contrail-install-packages
/opt/contrail/contrail_packages/setup.sh
New testbed.py should be created in the one of the new database node to provision the new contrail databases Populate existing testbed.py in OLD_FAB_NODE with the parameters required for backup/restore of databases(cassandra/zookeeper), following are the parameters,
backup_node="root@x.y.x.z"
cassandra_backup="custom "
backup_db_path = ["/root/ "]
skip_keyspace=["DISCOVERY_SERVER", "system_traces", "system"] # Add ContrailAnalytics|ContrailAnalyticsCql if it needs to be skipped during backup/restore
Populate ‘backup’ role in env.roledefs with backup_node and set root password for the backup_node in env.password
Populate existing testbed.py with the parameters required for separate database and ssd dir locations
ssd_data_dir = '<commit-logs-partition>/commit_logs_data'
database_dir = '<cassandra-data-partition>/cassandra’
OLD_FAB_NODE# scp /opt/contail/utils/fabfile/testbeds/testbed.py root@NEW_FAB_NODE#:/opt/contrail/utils/fabfile/testbeds/testbed.py
NEW_FAB_NODE# vi fabfile/testbeds/testbed.py
Replace the "cfgm/collector/database/control/webui” nodes with new node ip’s/passwords
Set new contrail_internal_vip in the env.ha section
Set new contrail_external_vip in case of multi interface setup
Execute from NEW_FAB_NODE
fab install_pkg_node:<old contrail-install-pkg>, root@controller1, root@controller2, root@controller3
fab create_install_repo_node: root@controller1, root@controller2, root@controller3
fab install_database
fab install_cfgm
fab install_collector
Execute from NEW_FAB_NODE
fab setup_interface_node: root@<newcontroller1>, root@<newcontroller2>, root@<newcontroller3>
# (if vm cluster)
fab all_command:"cat /etc/network/interfaces.d/eth0.cfg >> /etc/network/interfaces"
fab all_command:"sed -i '/source*/d' /etc/network/interfaces"
fab setup_interface,root@controller1,root@controller2,root@controller3
fab setup_contrail_keepalived #verify new vip
fab setup_database
fab verify_database
fab setup_rabbitmq_cluster
fab setup_cfgm
fab verify_cfgm
fab setup_collector
fab verify_collector
Execute from OLD_CONFIG
cd /opt/contrail/utils
python provision_control.py --api_server_ip <OLD_CONTROLLER_VIP> --api_server_port 8082 --host_name <NEW_CONTROL1> --host_ip <NEW_CONTROL1_IP> --oper add --admin_user admin --admin_tenant_name admin --admin_password contrail123 --router_asn 64512
python provision_control.py --api_server_ip <OLD_CONTROLLER_VIP> --api_server_port 8082 --host_name <NEW_CONTROL2> --host_ip <NEW_CONTROL2_IP> --oper add --admin_user admin --admin_tenant_name admin --admin_password contrail123 --router_asn 64512
python provision_control.py --api_server_ip <OLD_CONTROLLER_VIP> --api_server_port 8082 --host_name <NEW_CONTROL3> --host_ip <NEW_CONTROL3_IP> --oper add --admin_user admin --admin_tenant_name admin --admin_password contrail123 --router_asn 64512
Stop config services (supervisor-config and neutron-server) in the old config nodes to not let any new configuration during the next step of database snapshot/restore
Execute from OLD_FAB_NODE
fab stop_cfgm
NOTE: No CRUD operations are possible after stopping config services.
##2 Backup/Restore Databases
Execute from OLD_FAB_NODE
fab backup_zookeeper_data
Execute from OLD_FAB_NODE
fab backup_cassandra_db
fab stop_database
Following are the steps to restore the database (CASSANDRA) from old database to the new database.
Execute from BACKUP_NODE
cd /root
mv OLD_DB1_HOSTNAME_DIR/ NEW_DB1_HOSTNAME_DIR/
mv OLD_DB2_HOSTNAME_DIR/ NEW_DB2_HOSTNAME_DIR/
mv OLD_DB3_HOSTNAME_DIR/ NEW_DB3_HOSTNAME_DIR/
Execute from NEW_FAB_NODE
fab restore_zookeeper_data
fab -R database -- "cd <data_file_directories>;rm -rf config_db_uuid svc_monitor_keyspace to_bgp_keyspace dm_keyspace" # NOTE:<data_file_directories> will be found at /etc/cassandra/cassandra.yaml, search for data_file_directories
fab -R database -- 'mv /opt/contrail/utils/cass-db-restore.sh /opt/contrail/utils/cass-db-restore.sh.old’
fab -R database -- 'wget -O /opt/contrail/utils/cass-db-restore.sh https://raw.githubusercontent.com/Juniper/contrail-controller/R3.0/src/config/utils/cass-db-restore.sh’
fab -R database -- 'chmod 755 /opt/contrail/utils/cass-db-restore.sh’
fab restore_cassandra_db
Execute from NEW_FAB_NODE,
fab verify_database
# Verify Cassandra cluster using node tool status
fab -R database -- "nodetool status"
fab restart_cfgm
fab verify_cfgm
Execute from CONFIG_NODE,
# Make sure the objects created in old database are available in new cassandra after restore
curl -u <adminUser>:<adminPassword> http://localhost:8095/virtual-networks | python -m json.tool
curl -u <adminUser>:<adminPassword> http://localhost:8095/virtual-machines | python -m json.tool
Execute from NEW_FAB_NODE
fab install_pkg_node:<new contrail-install-pkg>, root@<new_controller1>, root@<new_controller2>, root@<new_controller3>
fab create_install_repo_node: root@<new_controller1>, root@<new_controller2>, root@<new_controller3>
fab upgrade_database:<from_rel>,<new contrail-install-pkg>
fab verify_database
# Make sure the objects created in old database are available in new after Cassandra upgrade
curl -u <adminUser>:<adminPassword> http://localhost:8095/virtual-networks | python -m json.tool
curl -u <adminUser>:<adminPassword> http://localhost:8095/virtual-machines | python -m json.tool
fab upgrade_config:<from_rel>,<new contrail-install-pkg>
fab restart_cfgm
fab verify_cfgm
fab install_control
fab upgrade_collector:<from_rel>,<new contrail-install-pkg>
fab verify_collector
fab install_webui
Execute from NEW_FAB_NODE
fab setup_ha
fab fixup_restart_haproxy_in_collector
fab setup_control
fab verify_control
fab setup_webui
fab verify_webui
fab prov_config
fab prov_database
fab prov_analytics
fab prov_control_bgp
fab prov_external_bgp
Set rabbit_host and api-server in /etc/heat/heat.conf file of all openstack nodes to NEW_CONTROLLER_VIP (contrail_internal_vip)
Set plugin_dirs in heat.conf
plugin_dirs = /usr/lib/python2.7/dist-packages/vnc_api/gen/heat/resources,/usr/lib/python2.7/dist-packages/contrail_heat/resources
Restart all heat services
service heat-api restart; service heat-api-cfn restart; service heat-engine restart
Set rabbit_host and neutron_url in /etc/nova/nova.conf file of all openstack nodes to NEW_CONTROLLER_VIP (contrail_internal_vip)
Restart all nova services
service nova-api restart
service nova-scheduler restart
service nova-conductor restart
Upgrade the compute nodes, one by one make changes according to section 3.2, 3.3 and then reboot the compute node. Make sure not to reboot the compute node after upgrade before changing the discovery ip in config files Execute from NEW_FAB_NODE
fab upgrade_compute_node:<old_from_rel>,<path_to_new_package>,root@compute1
Set “[DISCOVERY] server" to NEW_CONTROLLER_VIP in /etc/contrail/contrail-vrouter-agent.conf of all the compute nodes
Set “DISCOVERY” to NEW_CONTROLLER_VIP in /etc/contrail/contrail-vrouter-nodemgr.conf file of all the compute nodes
Set rabbit_host and neutron_url in /etc/nova/nova.conf file of all compute nodes to NEW_CONTROLLER_VIP
service nova-compute restart
Repeat steps in section 5.1, 5.2 and 5.3 and reboot compute nodes one by one to switch compute from OLD_CONTROLLER to NEW_CONTROLLER, so that the traffic in the other computes will not be disturbed and the traffic from/to the rebooted compute can be restored after reboot.
reboot
or
service supervisor-vrouter stop;modprobe -r vrouter;modprobe vrouter;service supervisor-vrouter start;service nova-compute restart
Execute from NEW_FAB_NODE
fab upgrade_openstack:<old_from_rel>,/path/to/contrail/new/package
Execute from NEW_CONFIG
python provision_control.py --api_server_ip <NEW_CONTROLLER_VIP> --api_server_port 8082 --host_name <OLD_CONTROL1> --host_ip <OLD_CONTROL1_IP> --oper del --admin_user admin --admin_tenant_name admin --admin_password contrail123 --router_asn 64512
python provision_control.py --api_server_ip <NEW_CONTROLLER_VIP> --api_server_port 8082 --host_name <OLD_CONTROL2> --host_ip <OLD_CONTROL2_IP> --oper del --admin_user admin --admin_tenant_name admin --admin_password contrail123 --router_asn 64512
python provision_control.py --api_server_ip <NEW_CONTROLLER_VIP> --api_server_port 8082 --host_name <OLD_CONTROL3> --host_ip <OLD_CONTROL3_IP> --oper del --admin_user admin --admin_tenant_name admin --admin_password contrail123 --router_asn 64512
Execute from NEW_CONFIG
python provision_config_node.py --api_server_ip <NEW_CONTROLLER_VIP> --host_name <OLD_CONFIG1> --host_ip <OLD_CONFIG1_IP> --oper del --admin_user admin --admin_tenant_name admin --admin_password contrail123
python provision_config_node.py --api_server_ip <NEW_CONTROLLER_VIP> --host_name <OLD_CONFIG2> --host_ip <OLD_CONFIG2_IP> --oper del --admin_user admin --admin_tenant_name admin --admin_password contrail123
python provision_config_node.py --api_server_ip <NEW_CONTROLLER_VIP> --host_name <OLD_CONFIG3> --host_ip <OLD_CONFIG3_IP> --oper del --admin_user admin --admin_tenant_name admin --admin_password contrail123
python provision_database_node.py --api_server_ip <NEW_CONTROLLER_VIP> --host_name <OLD_DATABASE1> --host_ip <OLD_DATABASE1_IP>--oper del --admin_user admin --admin_tenant_name admin --admin_password contrail123
python provision_database_node.py --api_server_ip <NEW_CONTROLLER_VIP> --host_name <OLD_DATABASE2> --host_ip <OLD_DATABASE2_IP> --oper del --admin_user admin --admin_tenant_name admin --admin_password contrail123
python provision_database_node.py --api_server_ip <NEW_CONTROLLER_VIP> --host_name <OLD_DATABASE3> --host_ip <OLD_DATABASE3_IP> --oper del --admin_user admin --admin_tenant_name admin --admin_password contrail123
python provision_analytics_node.py --api_server_ip <NEW_CONTROLLER_VIP> --host_name <OLD_ANALYTICS1> --host_ip <OLD_ANALYTICS1_IP> --oper del --admin_user admin --admin_tenant_name admin --admin_password contrail123
python provision_analytics_node.py --api_server_ip <NEW_CONTROLLER_VIP> –host_name <OLD_ANALYTICS2> --host_ip <OLD_ANALYTICS2_IP> --oper del --admin_user admin --admin_tenant_name admin --admin_password contrail123
python provision_analytics_node.py --api_server_ip <NEW_CONTROLLER_VIP> --host_name <OLD_ANALYTICS3> --host_ip <OLD_ANALYTICS3_IP> --oper del --admin_user admin --admin_tenant_name admin --admin_password contrail123
In case of post upgrade issues, we can roll back to use the old version of contrail controllers, as we have detached it from the cluster without disturbing its configs and introduced new version of controllers in the above steps.
Execute from OLD_FAB_NODE
fab restart_database
fab restart_cfgm
Execute from NEW_CONFIG
python provision_control.py --api_server_ip <NEW_CONTROLLER_VIP> --api_server_port 8082 --host_name <OLD_CONTROL1> --host_ip <OLD_CONTROL1_IP> --oper add --admin_user admin --admin_tenant_name admin --admin_password contrail123 --router_asn 64512
python provision_control.py --api_server_ip <NEW_CONTROLLER_VIP> --api_server_port 8082 --host_name <OLD_CONTROL2> --host_ip <OLD_CONTROL2_IP> --oper add --admin_user admin --admin_tenant_name admin --admin_password contrail123 --router_asn 64512
python provision_control.py --api_server_ip <NEW_CONTROLLER_VIP> --api_server_port 8082 --host_name <OLD_CONTROL3> --host_ip <OLD_CONTROL3_IP> --oper add --admin_user admin --admin_tenant_name admin --admin_password contrail123 --router_asn 64512
Set rabbit_host and neutron_url in /etc/nova/nova.conf file of all openstack nodes to **OLD_CONTROLLER_VIP**
Restart all nova services
Set “[DISCOVERY] server" to **OLD_CONTROLLER_VIP** in /etc/contrail/contrail-vrouter-agent.conf of all the compute nodes
Set “DISCOVERY” to **OLD_CONTROLLER_VIP** in /etc/contrail/contrail-vrouter-nodemgr.conf file of all the compute nodes
Set rabbit_host and neutron_url in /etc/nova/nova.conf file of all compute nodes to OLD_CONTROLLER_VIP
Repeat steps in section 9.3 and 9.5 and restart compute services in compute nodes one by one to switch compute from **NEW_CONTROLLER** to **OLD_CONTROLLER**, so that the traffic in the other computes will not be disturbed and the traffic from/to the restarted compute can be restored after restart.
service supervisor-vrouter restart; service nova-compute restart