Skip to content

Commit

Permalink
GracefulRestart support in control-node (Phase 1)
Browse files Browse the repository at this point in the history
At present, when ever xmpp connection to an agent goes down, all routes learned
from that agent are deleted and also withdrawn from all other peers. This causes
traffic loss end-to-end even if vrouter associated with the agent going down has
routes retained in the data plane for hit-less forwarding until new session comes
up and routes are re-learned and re-advertised again to other peers

With this change,

o Added support in control-node to retain routes learned from agents when xmpp
  session goes down (In the hope that the session comes back up)
o Used mark and sweep approach to retain and later purge routes. Routes are
  purged after graceful restart timer expires
    1. When ever session is closed, all routes learned from the agent going down
       are marked as stale (but retained in the routing table and still eligible
       for best path election, outbound advertisement, etc.). Also a GR timer
       is triggered to expire after a minute or so
    2. If and when the session comes backup, and some/all paths are relearned,
       stale flag is cleared (only for those relearned paths)
    3. When the GR timer expires, table is walked again and any paths that are
       still marked as Stale are swept (deleted) from the table
    4. During the GR wait time (when the timer has not yet fired), subsequent
       session flaps are considered double failures and all paths are deleted
       (There by not doing GR)
o When the new session comes up, most of the old connection's data structures
  are retained.
o Only XmppSession and XmppStateMachine are switched from the new connection to
  the old
o Code mainly tests via unit tests

TODO (In subsequent phases)
o Fix code and add tests for configuration changes during the midst of GR
  (e.g. routing-instance deletion)
o EndOfRib marker (Instead of always waiting on the GR timer to expire)
o Add GR support to BGP IPeers as well
o Handle route-targets cleanup properly
o Re-enable couple of connection endpoint's unit tests
o Enable GR for control-node process (and run systests)
o Enable GR in bgp_stress_test
o Configurable GR timer on a per peer basis
o Configurable GR ability on a per peer basis
o Do not do (4) above. Instead retain routes and remain in GR mode even after
  multiple session closures (Work towards LLGR)

-----------------------------------------------------------------

GracefulRestart Phase 2 -- Handle routing-instance deletions during GR

When GR is in progress, all routes and subscription states (in BgpXmppChannel)
are retained in the hope that agent resubscribes after GR. But if the config
changes between the time agent went down and came back up, new agent may not
re-subscribe to all the instances it had subscribed before. Such sticky
subscriptions must also be cleaned up properly using mark and sweep approach
(Similar to how routes are handled)

Mark and sweep approach is already implemented for routes in PeerCloseManager.
BgpXmppChannel rides on this to get necessary callback for

o When to mark all subscription states as stale
o If agent re-subscribes after restart, stale flag is cleared in the
  subscription state.
o When to sweep all subscription states (and delete still stale entries)

Only after all (still) stale subscription states are deleted, routing-instance
deletion process can resume and complete.

Btw, during GR, all route-targets retained as is, similar to how routes are
retained. At the moment, the rtarget entries are not individually marked
stale (and then swept). Instead, it is handled by marking the subscription
states which is already maintained on a per instance basis in BgpXmppChannel

Added test cases to cover many (but not all) scenarios

-----------------------------------------------------------------

GracefulRestart Phase 3 -- Send and process EoR marker

After agent restarts and sends routes to all instances across all
address families, it can send EoR marker to trigger termination
of GR process sooner. This prevents possible traffic black-holing
until GR timer expiry, which could be potentially a minute or so.

At the moment, empty publish and collection are used to denote
EoR marker.

e.g.

<?xml version="1.0"?>
<iq type="set" from="agent0@vnsw.contrailsystems.com" to="network-control@contrailsystems.com/bgp-peer">
  <pubsub xmlns="http://jabber.org/protocol/pubsub">
    <publish/>
  </pubsub>
</iq>
<?xml version="1.0"?>
<iq type="set" from="agent0@vnsw.contrailsystems.com" to="network-control@contrailsystems.com/bgp-peer">
  <pubsub xmlns="http://jabber.org/protocol/pubsub">
    <collection/>
  </pubsub>
</iq>

Added test cases to verify this part.

-----------------------------------------------------------------

GracefulRestart Phase 4 - Handle some of the initial review comments

o Increase WAIT_FOR_IDLE time for certain stressful unit tests
o Rename some functions from delete to close if applicable
o Simplify route stale login in BgpTable::InputCommon()
o Restore the asserts disabled in previous commits
o Stablize tests
o Add code to support LLGR (nested closures restart GR afresh)

-----------------------------------------------------------------

GracefulRestart Phase 5 - Handle LLGR Cases

o When sessions flap in GR_TIMER state, cancel the timer and restart GR all over
  again. This is required in order to mark newly sent (partial) routes as stale
o Modfify tests to handle the nested closure cases as well
o Eventually when ever GR timer fires, routes are kept and swept if the session
  is established, deleted otherwise.

-----------------------------------------------------------------

Change-Id: Ie589d69b6390356d4a052cc4415bff4b5dabd499
Partial-Bug: #1537933
  • Loading branch information
ananth-at-camphor-networks committed Mar 2, 2016
1 parent 4a81880 commit 1b62a30
Show file tree
Hide file tree
Showing 32 changed files with 2,366 additions and 562 deletions.
4 changes: 4 additions & 0 deletions ci_unittests.json
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,9 @@
{
"tuples" : [
"NO_HEAPCHECK=TRUE",
"TASK_UTIL_DEFAULT_RETRY_COUNT=6000",
"TASK_UTIL_WAIT_TIME=10000",
"WAIT_FOR_IDLE=60",
"LOG_DISABLE=TRUE"
],
"tests" : [
Expand All @@ -71,6 +74,7 @@
".*/bgp/test/bgp_xmpp_inet6vpn_test$",
".*/bgp/test/bgp_xmpp_mcast_test$",
".*/bgp/test/bgp_xmpp_rtarget_test$",
".*/bgp/test/graceful_restart_test$",
".*/bgp/test/service_chain_test",
".*/bgp/test/svc_static_route_intergration_test",
".*/bgp/test/xmpp_ecmp_test$"
Expand Down
4 changes: 2 additions & 2 deletions src/base/test/task_test_util.h
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,8 @@ do { \
#define TASK_UTIL_WAIT_MSG(cnt, expected, actual, wait, type, msg) \
do { \
ostream << __FILE__ << ":" << __FUNCTION__ << "():" << __LINE__; \
ostream << ": " << msg << ": Waiting for " << actual << type; \
ostream << expected << "\n"; \
ostream << ": " << msg << ": Waiting for " << (actual) << type; \
ostream << (expected) << "\n"; \
log4cplus::Logger logger = log4cplus::Logger::getRoot(); \
LOG4CPLUS_DEBUG(logger, ostream.str()); \
} while (false)
Expand Down
66 changes: 24 additions & 42 deletions src/bgp/bgp_peer.cc
Original file line number Diff line number Diff line change
Expand Up @@ -49,67 +49,45 @@ class BgpPeer::PeerClose : public IPeerClose {
return peer_->ToString();
}

// If the peer is deleted or administratively held down, do not attempt
// graceful restart
virtual bool IsCloseGraceful() {

//
// If the peer is deleted or administratively held down, do not attempt
// graceful restart
//
if (peer_->IsDeleted() || peer_->IsAdminDown()) return false;

if (peer_->IsDeleted() || peer_->IsAdminDown())
return false;
return peer_->server()->IsPeerCloseGraceful();
}

virtual void CustomClose() {
return peer_->CustomClose();
}
virtual void CustomClose() { return peer_->CustomClose(); }
virtual void GracefulRestartStale() { }
virtual void GracefulRestartSweep() { }

// CloseComplete
//
// Close process for this peer is complete. Restart the state machine and
// attempt to bring up session with the neighbor
//
virtual bool CloseComplete(bool from_timer, bool gr_cancelled) {
virtual void CloseComplete() {
peer_->server()->decrement_closing_count();
if (!peer_->IsAdminDown())
peer_->state_machine_->Initialize();
}
virtual void Delete() {
if (!peer_->IsDeleted()) {

//
// If this closure is off graceful restart timer, nothing else to
// do as we retain the peer based on the configuration
//
if (from_timer) return false;

//
// Reset peer's state machine
//
if (!peer_->IsAdminDown()) peer_->state_machine_->Initialize();

return false;
CloseComplete();
return;
}

//
// This peer is deleted. Timer should have already been cancelled
//
assert(!from_timer);

peer_->server()->decrement_closing_count();
peer_->deleter()->RetryDelete();
is_closed_ = true;
return true;
}

bool IsClosed() const {
return is_closed_;
}

virtual PeerCloseManager *close_manager() {
return manager_.get();
}
bool IsClosed() const { return is_closed_; }
virtual PeerCloseManager *close_manager() { return manager_.get(); }

void Close() {
if (!is_closed_ && !manager_->IsCloseInProgress()) {
manager_->Close();
if (!manager_->IsCloseInProgress())
peer_->server()->increment_closing_count();
}
manager_->Close();
}

private:
Expand Down Expand Up @@ -1659,6 +1637,10 @@ static void FillSocketStats(const IPeerDebugStats::SocketStats &socket_stats,
}
}

void BgpPeer::FillCloseInfo(BgpNeighborResp *resp) const {
peer_close_->close_manager()->FillCloseInfo(resp);
}

void BgpPeer::FillBgpNeighborDebugState(BgpNeighborResp *bnr,
const IPeerDebugStats *peer_state) {
bnr->set_last_state(peer_state->last_state());
Expand Down Expand Up @@ -1722,7 +1704,7 @@ void BgpPeer::FillNeighborInfo(const BgpSandeshContext *bsc,
bnr->set_instance_name(rtinstance_->name());
bnr->set_peer(peer_basename_);
bnr->set_deleted(IsDeleted());
bnr->set_deleted_at(UTCUsecToString(deleter_->delete_time_stamp_usecs()));
bnr->set_closed_at(UTCUsecToString(deleter_->delete_time_stamp_usecs()));
bnr->set_admin_down(admin_down_);
bnr->set_passive(passive_);
bnr->set_peer_address(peer_address_string());
Expand Down
1 change: 1 addition & 0 deletions src/bgp/bgp_peer.h
Original file line number Diff line number Diff line change
Expand Up @@ -336,6 +336,7 @@ class BgpPeer : public IPeer {
void CustomClose();

void FillBgpNeighborFamilyAttributes(BgpNeighborResp *nbr) const;
void FillCloseInfo(BgpNeighborResp *resp) const;

std::string BytesToHexString(const u_int8_t *msg, size_t size);

Expand Down
18 changes: 17 additions & 1 deletion src/bgp/bgp_peer.sandesh
Original file line number Diff line number Diff line change
Expand Up @@ -41,11 +41,26 @@ struct ShowBgpNeighborFamily {
3: u32 prefix_limit;
}

struct PeerCloseInfo {
1: string state;
2: bool close_again;
3: u64 init;
4: u64 close;
5: u64 nested;
6: u64 deletes;
7: u64 stale;
8: u64 sweep;
9: u64 gr_timer;
10: u64 deleted_state_paths;
11: u64 deleted_paths;
12: u64 marked_state_paths;
}

struct BgpNeighborResp {
53: string instance_name;
1: string peer (link="BgpNeighborReq"); // Peer name
36: bool deleted; // Deletion in progress
43: string deleted_at;
43: string closed_at;
47: bool admin_down;
48: bool passive;
2: string peer_address (link="BgpNeighborReq");
Expand All @@ -60,6 +75,7 @@ struct BgpNeighborResp {
4: string local_address; // local ip address and port
26: string local_id;
5: u32 local_asn;
54: optional PeerCloseInfo peer_close_info;
9: optional string send_state; // in sync/not in sync
10: optional string last_event;
11: optional string last_state;
Expand Down

0 comments on commit 1b62a30

Please sign in to comment.