GracefulRestart support in control-node (Phase 1)

At present, when ever xmpp connection to an agent goes down, all routes learned from that agent are deleted and also withdrawn from all other peers. This causes traffic loss end-to-end even if vrouter associated with the agent going down has routes retained in the data plane for hit-less forwarding until new session comes up and routes are re-learned and re-advertised again to other peers With this change, o Added support in control-node to retain routes learned from agents when xmpp session goes down (In the hope that the session comes back up) o Used mark and sweep approach to retain and later purge routes. Routes are purged after graceful restart timer expires 1. When ever session is closed, all routes learned from the agent going down are marked as stale (but retained in the routing table and still eligible for best path election, outbound advertisement, etc.). Also a GR timer is triggered to expire after a minute or so 2. If and when the session comes backup, and some/all paths are relearned, stale flag is cleared (only for those relearned paths) 3. When the GR timer expires, table is walked again and any paths that are still marked as Stale are swept (deleted) from the table 4. During the GR wait time (when the timer has not yet fired), subsequent session flaps are considered double failures and all paths are deleted (There by not doing GR) o When the new session comes up, most of the old connection's data structures are retained. o Only XmppSession and XmppStateMachine are switched from the new connection to the old o Code mainly tests via unit tests TODO (In subsequent phases) o Fix code and add tests for configuration changes during the midst of GR (e.g. routing-instance deletion) o EndOfRib marker (Instead of always waiting on the GR timer to expire) o Add GR support to BGP IPeers as well o Handle route-targets cleanup properly o Re-enable couple of connection endpoint's unit tests o Enable GR for control-node process (and run systests) o Enable GR in bgp_stress_test o Configurable GR timer on a per peer basis o Configurable GR ability on a per peer basis o Do not do (4) above. Instead retain routes and remain in GR mode even after multiple session closures (Work towards LLGR) ----------------------------------------------------------------- GracefulRestart Phase 2 -- Handle routing-instance deletions during GR When GR is in progress, all routes and subscription states (in BgpXmppChannel) are retained in the hope that agent resubscribes after GR. But if the config changes between the time agent went down and came back up, new agent may not re-subscribe to all the instances it had subscribed before. Such sticky subscriptions must also be cleaned up properly using mark and sweep approach (Similar to how routes are handled) Mark and sweep approach is already implemented for routes in PeerCloseManager. BgpXmppChannel rides on this to get necessary callback for o When to mark all subscription states as stale o If agent re-subscribes after restart, stale flag is cleared in the subscription state. o When to sweep all subscription states (and delete still stale entries) Only after all (still) stale subscription states are deleted, routing-instance deletion process can resume and complete. Btw, during GR, all route-targets retained as is, similar to how routes are retained. At the moment, the rtarget entries are not individually marked stale (and then swept). Instead, it is handled by marking the subscription states which is already maintained on a per instance basis in BgpXmppChannel Added test cases to cover many (but not all) scenarios ----------------------------------------------------------------- GracefulRestart Phase 3 -- Send and process EoR marker After agent restarts and sends routes to all instances across all address families, it can send EoR marker to trigger termination of GR process sooner. This prevents possible traffic black-holing until GR timer expiry, which could be potentially a minute or so. At the moment, empty publish and collection are used to denote EoR marker. e.g. <?xml version="1.0"?> <iq type="set" from="agent0@vnsw.contrailsystems.com" to="network-control@contrailsystems.com/bgp-peer"> <pubsub xmlns="http://jabber.org/protocol/pubsub"> <publish/> </pubsub> </iq> <?xml version="1.0"?> <iq type="set" from="agent0@vnsw.contrailsystems.com" to="network-control@contrailsystems.com/bgp-peer"> <pubsub xmlns="http://jabber.org/protocol/pubsub"> <collection/> </pubsub> </iq> Added test cases to verify this part. ----------------------------------------------------------------- GracefulRestart Phase 4 - Handle some of the initial review comments o Increase WAIT_FOR_IDLE time for certain stressful unit tests o Rename some functions from delete to close if applicable o Simplify route stale login in BgpTable::InputCommon() o Restore the asserts disabled in previous commits o Stablize tests o Add code to support LLGR (nested closures restart GR afresh) ----------------------------------------------------------------- GracefulRestart Phase 5 - Handle LLGR Cases o When sessions flap in GR_TIMER state, cancel the timer and restart GR all over again. This is required in order to mark newly sent (partial) routes as stale o Modfify tests to handle the nested closure cases as well o Eventually when ever GR timer fires, routes are kept and swept if the session is established, deleted otherwise. ----------------------------------------------------------------- Change-Id: Ie589d69b6390356d4a052cc4415bff4b5dabd499 Partial-Bug: #1537933
Juniper · Mar 2, 2016 · 1b62a30 · 1b62a30
1 parent 4a81880
commit 1b62a30
Show file tree

Hide file tree

Showing 32 changed files with 2,366 additions and 562 deletions.
diff --git a/ci_unittests.json b/ci_unittests.json
@@ -53,6 +53,9 @@
             {
                 "tuples" : [
                     "NO_HEAPCHECK=TRUE",
+                    "TASK_UTIL_DEFAULT_RETRY_COUNT=6000",
+                    "TASK_UTIL_WAIT_TIME=10000",
+                    "WAIT_FOR_IDLE=60",
                     "LOG_DISABLE=TRUE"
                 ],
                 "tests" : [
@@ -71,6 +74,7 @@
                     ".*/bgp/test/bgp_xmpp_inet6vpn_test$",
                     ".*/bgp/test/bgp_xmpp_mcast_test$",
                     ".*/bgp/test/bgp_xmpp_rtarget_test$",
+                    ".*/bgp/test/graceful_restart_test$",
                     ".*/bgp/test/service_chain_test",
                     ".*/bgp/test/svc_static_route_intergration_test",
                     ".*/bgp/test/xmpp_ecmp_test$"

diff --git a/src/base/test/task_test_util.h b/src/base/test/task_test_util.h
@@ -59,8 +59,8 @@ do {                                                                           \
 #define TASK_UTIL_WAIT_MSG(cnt, expected, actual, wait, type, msg)             \
     do {                                                                       \
         ostream << __FILE__ << ":" <<  __FUNCTION__ << "():" << __LINE__;      \
-        ostream << ": " << msg << ": Waiting for " << actual << type;          \
-        ostream << expected << "\n";                                           \
+        ostream << ": " << msg << ": Waiting for " << (actual) << type;        \
+        ostream << (expected) << "\n";                                         \
         log4cplus::Logger logger = log4cplus::Logger::getRoot();               \
         LOG4CPLUS_DEBUG(logger,  ostream.str());                               \
     } while (false)

diff --git a/src/bgp/bgp_peer.cc b/src/bgp/bgp_peer.cc
@@ -49,67 +49,45 @@ class BgpPeer::PeerClose : public IPeerClose {
         return peer_->ToString();
     }
 
+    // If the peer is deleted or administratively held down, do not attempt
+    // graceful restart
     virtual bool IsCloseGraceful() {
-
-        //
-        // If the peer is deleted or administratively held down, do not attempt
-        // graceful restart
-        //
-        if (peer_->IsDeleted() || peer_->IsAdminDown()) return false;
-
+        if (peer_->IsDeleted() || peer_->IsAdminDown())
+            return false;
         return peer_->server()->IsPeerCloseGraceful();
     }
 
-    virtual void CustomClose() {
-        return peer_->CustomClose();
-    }
+    virtual void CustomClose() { return peer_->CustomClose(); }
+    virtual void GracefulRestartStale() { }
+    virtual void GracefulRestartSweep() { }
 
     // CloseComplete
     //
     // Close process for this peer is complete. Restart the state machine and
     // attempt to bring up session with the neighbor
     //
-    virtual bool CloseComplete(bool from_timer, bool gr_cancelled) {
+    virtual void CloseComplete() {
         peer_->server()->decrement_closing_count();
+        if (!peer_->IsAdminDown())
+            peer_->state_machine_->Initialize();
+    }
+    virtual void Delete() {
         if (!peer_->IsDeleted()) {
-
-            //
-            // If this closure is off graceful restart timer, nothing else to
-            // do as we retain the peer based on the configuration
-            //
-            if (from_timer) return false;
-
-            //
-            // Reset peer's state machine
-            //
-            if (!peer_->IsAdminDown()) peer_->state_machine_->Initialize();
-
-            return false;
+            CloseComplete();
+            return;
         }
-
-        //
-        // This peer is deleted. Timer should have already been cancelled
-        //
-        assert(!from_timer);
-
+        peer_->server()->decrement_closing_count();
         peer_->deleter()->RetryDelete();
         is_closed_ = true;
-        return true;
-    }
-
-    bool IsClosed() const {
-        return is_closed_;
     }
 
-    virtual PeerCloseManager *close_manager() {
-        return manager_.get();
-    }
+    bool IsClosed() const { return is_closed_; }
+    virtual PeerCloseManager *close_manager() { return manager_.get(); }
 
     void Close() {
-        if (!is_closed_ && !manager_->IsCloseInProgress()) {
-            manager_->Close();
+        if (!manager_->IsCloseInProgress())
             peer_->server()->increment_closing_count();
-        }
+        manager_->Close();
     }
 
 private:
@@ -1659,6 +1637,10 @@ static void FillSocketStats(const IPeerDebugStats::SocketStats &socket_stats,
     }
 }
 
+void BgpPeer::FillCloseInfo(BgpNeighborResp *resp) const {
+    peer_close_->close_manager()->FillCloseInfo(resp);
+}
+
 void BgpPeer::FillBgpNeighborDebugState(BgpNeighborResp *bnr,
                                         const IPeerDebugStats *peer_state) {
     bnr->set_last_state(peer_state->last_state());
@@ -1722,7 +1704,7 @@ void BgpPeer::FillNeighborInfo(const BgpSandeshContext *bsc,
     bnr->set_instance_name(rtinstance_->name());
     bnr->set_peer(peer_basename_);
     bnr->set_deleted(IsDeleted());
-    bnr->set_deleted_at(UTCUsecToString(deleter_->delete_time_stamp_usecs()));
+    bnr->set_closed_at(UTCUsecToString(deleter_->delete_time_stamp_usecs()));
     bnr->set_admin_down(admin_down_);
     bnr->set_passive(passive_);
     bnr->set_peer_address(peer_address_string());

diff --git a/src/bgp/bgp_peer.h b/src/bgp/bgp_peer.h
@@ -336,6 +336,7 @@ class BgpPeer : public IPeer {
     void CustomClose();
 
     void FillBgpNeighborFamilyAttributes(BgpNeighborResp *nbr) const;
+    void FillCloseInfo(BgpNeighborResp *resp) const;
 
     std::string BytesToHexString(const u_int8_t *msg, size_t size);
 

diff --git a/src/bgp/bgp_peer.sandesh b/src/bgp/bgp_peer.sandesh
@@ -41,11 +41,26 @@ struct ShowBgpNeighborFamily {
     3: u32 prefix_limit;
 }
 
+struct PeerCloseInfo {
+    1: string state;
+    2: bool close_again;
+    3: u64 init;
+    4: u64 close;
+    5: u64 nested;
+    6: u64 deletes;
+    7: u64 stale;
+    8: u64 sweep;
+    9: u64 gr_timer;
+    10: u64 deleted_state_paths;
+    11: u64 deleted_paths;
+    12: u64 marked_state_paths;
+}
+
 struct BgpNeighborResp {
     53: string instance_name;
     1: string peer (link="BgpNeighborReq"); // Peer name
     36: bool deleted;           // Deletion in progress
-    43: string deleted_at;
+    43: string closed_at;
     47: bool admin_down;
     48: bool passive;
     2: string peer_address (link="BgpNeighborReq");
@@ -60,6 +75,7 @@ struct BgpNeighborResp {
     4: string local_address;    // local ip address and port
     26: string local_id;
     5: u32 local_asn;
+    54: optional PeerCloseInfo peer_close_info;
     9: optional string send_state;       // in sync/not in sync
     10: optional string last_event;
     11: optional string last_state;