From 156ad0b760f9b532572116d813d7afa695555bea Mon Sep 17 00:00:00 2001 From: Atul Moghe Date: Mon, 21 Dec 2015 14:29:14 -0800 Subject: [PATCH] Cherry pick controller commits from R2.20 to R2.22.x updating version.info from 2.22 to 2.23 in 2.20 branch Closes-Bug:#1528370 Change-Id: Ic649422979a926cc5f5b8457c01610b848dc206b Storage stats daemon fix Partial-Bug: #1528327 Fixed latency monitor code based on the Ceph 0.94.3 version. Fixed issues in OSD throughput/IOPs calculation. Updated code based on the latest Sandesh apis. Change-Id: I12caf951f84c8b213b1b5ec01371bb68b4c48cb3 Fix contrail-collector back pressure mechanism contrail-collector DB queue back presssure mechanism was not working since the DB drop level is initialized to INVALID and even the water marks levels are INVALID and hence the defer/undefer callbacks are not called. Change-Id: Ib28141a69aeed3c4ad6f50abbaed2a285e3e7db2 Partial-Bug: #1528380 Fix Agent crash for flow index tree management Issue: ------ During a flow index change vrouter-agent triggers a delete on index tree using new flow handle instead of currently held flow_handle resulting in flow entry getting associated to two slots in the flow index tree, which further on flow entry delete due to aging or eviction never releases the slot for old flow handle, causing failures for further insertions in the flow index tree Fix: ---- Avoid taking flow handle as argument to DeleteByIndex and use the currently associated flow_handle to remove from tree Adding assert in DeleteByIndex to catch delete failure Avoid doing delete from index tree in code paths other than flow entry index update of flow entry delete. Add logic for KSync Sock User to Mock vrouter behavior returning index for an entry if it is already allocated instead of allocating a new one. Closes-Bug: 1527425 Change-Id: I10e77fb59650acfdd924a5f1d35d6b8dea03a3f0 Fix discovery dependency issue. Originally made in master branch via https://review.opencontrail.org/#/c/15749 Change-Id: I5d874de3714074c66fa73bfd7c9119772dc681fd Partial-Bug: #1530186 Avoid calling get_routing_instances on VN object Calling get_routing_instances could trigger another read of the VN if the VN has no routing instance. This is not only inefficient, but could also cause exception if the VN has disappeared. We can avoid this by calling getattr. Change-Id: Ie5500585b9e6c578576276c2c04ec03f32c75112 Partial-Bug: 1528950 Fix Centos 65 agent compilation issues. Closes-Bug: #1532159 Change-Id: Ia8b77619c80737000d5bd949534c9e0a16967359 Closes-Bug: #1524063, contrail-status is showing contrail-web-ui, even it is not configured, in case of SMLite Change-Id: I55afc19140b1ce52b3b529a644124705de5ce6a8 Fix a corner case with routing instance delete Sequence of event that causes the crash 1. Static route config deleted 2. Static Route maanger triggers resolve_trigger_ to re-evaluate static route config 3. Before the resolve trigger is invoked routing instance is deleted Resolve trigger calls ProcessStaticRouteConfig to apply any pending static route config. ProcessStaticRouteConfig accesses the NULL config pointer of the routing instance Fix: 1. Check whether the routing instance is deleted in ProcessStaticRouteConfig 2. Reset the resolve_trigger_ in StaticRouteMgr destructor 3. Add API to disable resolve_trigger_ and Add UT to test delayed processing of resolve_trigger_ Change-Id: Icb1b9bad340ccefc9fbab75188034ade79a6193a Closes-bug: #1533435 Fix scons failure due to pip ugrade Closes-Bug: 1536541 Change-Id: I5138f6fb7d073beddafc9b3c686ef968f21e7e31 Fix uninitialized vrouter params Issue: ------ uninitialized variable results in bad calculation of number of MPLS labels for multicast Fix: ---- initialize vrouter params to 0, handle case if vrouter_max_labels is lesser than fixed unicast range. Conflicts: src/vnsw/agent/cmn/agent.cc Closes-Bug: 1535735 Change-Id: Id4a7b74f12728e78dcb5b8b24d0848e560ff0138 (cherry picked from commit eb0488b9280871229122e71692c12fe92364fa8f) Update task policy for contrail-dns. When XMPP channel with agent clsoes, the records exported from there are removed in the cleaner task context. Update the task policy to ensure the config and bind tasks do not run in parallel with it. Change-Id: Ic92f147060c39e452742aa67ace7e3a6f3ddc5a3 closes-bug: 1533811 Fix corner case in join processing This problem can happen when a number of vRouter agents restart in quick succesion and subscribe to a table and advertise routes into the table. Consider the following sequence of events on a route in foo.inet.0: - Route is added with a path with attribute A and nexthop N1 - XMPP peer P1 subscribes to the table - As a result a RouteUpdate is created in QBULK (the join queue). The RouteUpdate has an UpdateInfo with a bitset P1 and a RibOutAttr with attribute A and nexthop N1 - Another path for the same route is added with nexthop N2 - This new path is ecmp eligible i.e. has same local preference as the best path - XMPP peer P2 subscribes to the table - As a result the existing RouteUpdate in QBULK gets a new UpdateInfo. A new UpdateInfo is created because RibOutAttr for P2 has attribute A and nexthops (N1, N2). Note that a different UpdateInfo is required for different RibOutAttr. This UpdateInfo has a bitset P2. Notice that we now have a RouteUpdate with 2 UpdateInfos that have the same attribute A, but different RibOutAttrs (by virtue having different forwarding nexthops). The UpdateQueue maintains a set (attr_set_ of type UpdatesByAttr) of UpdateInfos keyed by BgpAttr and timestamp. The label/nexthops in the RibOutAttr are not included in key to achieve optimal packing of bgp updates by attribute. As a result, both the UpdateInfos are inserted (or rather attempted to be inserted) into attr_set_ with the same key. This causes a crash when we later try to erase both UpdateInfos from the attr_set_ when doing export processing for the route. Note that we run into this case only if join processing for P2 happens before export processing for the route after the 2nd path got added. If export processing happens before the join for P2, the RouteUpdate would move from QBULK to QUPDATE and would have only 1 UpdateInfo with attribute A and nexthop (N1, N2). The fix consists of 2 parts: 1. Use the UpdateInfo pointer itself as the final tie-breaker in the key for UpdateQueue::UpdatesByAttr to ensure we have no duplicates. 2. When traversing the UpdatesByAttr set to build update messages, fix UpdateQueue::AttrNext to not return an UpdateInfo for same RouteUpdate as the current UpdateInfo. Doing so invalidates the locking design in RibOutUpdates and results in a deadlock. Add unit tests to recreate the above scenario and verify the fix. Change-Id: I45ce1bbd72d8b6a163a5aa61358491cc1d3f6a93 Closes-Bug: 1536729 Create only 1 ksync-socket We are creating as many ksync sockets as number of TBB threads even though we only use only the first socket for all operations. Modified code to create only one ksync socket. Change-Id: I2f1bf8558c219fc97402f8192c3d9d6cebacaf98 Fixes-Bug: #1533495 Fix ToR agent crash for duplicate VxLAN-ID Issue: ------ ToR agent doesnot program/expect QFX to have two logical switch with same VxLAN ID at any point of time, observing the same in certain negative test scenarios doesnot allow ToR agent to recover OVSDB database to sane state Fix: ---- Allow creation of stale entry with duplicate VxLAN ID, even though it is not expected, allowing creation helps to recover OVSDB database to sane state the deletion of the same on stale entry timeout. Closes-Bug: 1535093 Change-Id: I32a4fbab665f433d6a5dae7eb185be8e50de53d0 (cherry picked from commit c1cfe79983da1b3391aef4205deb6e723be449c7) Handle VMs with whitespace in the name vrouter-port-control script fail when there is a whitespace in the vm name (in fact in any of the arguments). This patch add a regex based split on vrouter-port-control to fix that, so that it will pass the arguments with whitespace in it correctly. Change-Id: Ibf52dc23321d1c4c7f231cb5cd386afa495de0aa Fixes-Bug: #1519768 Signed-off-by: hkumarmk * Add exclusion between flow table and flow stats collector Reference for flow entry can be release by flow stats collector resulting in flow being deleted from flow tree and parallel modification of flow tree from flow table and flow stats collector context. Fixing the same. Closes-bug:#1535040 Change-Id: I8e7c18aaacbe1ed16639917dc51480af55b2da86 --- src/analytics/db_handler.cc | 7 +- src/api-lib/tools/install_venv_common.py | 2 +- src/base/version.info | 2 +- src/bgp/bgp_update_queue.cc | 9 + src/bgp/bgp_update_queue.h | 14 +- src/bgp/routing-instance/static_route.cc | 2 + src/bgp/routing-instance/static_route.h | 4 + src/bgp/test/bgp_export_rtupdate_test.cc | 165 ++++++++++- src/bgp/test/bgp_export_test.h | 70 ++++- src/bgp/test/static_route_test.cc | 142 ++++++++++ src/config/schema-transformer/to_bgp.py | 2 +- src/config/utils/contrail-status.py | 7 +- src/discovery/SConscript | 9 + src/discovery/requirements.txt | 2 - src/discovery/test-requirements.txt | 6 +- src/dns/cmn/dns.cc | 3 + src/ksync/SConscript | 4 + src/ksync/ksync_sock.cc | 12 +- src/ksync/ksync_sock_user.cc | 26 +- src/ksync/ksync_sock_user.h | 17 ++ .../stats_daemon/storage_nodemgr.py | 262 ++++++++++-------- src/vnsw/agent/cmn/agent.cc | 1 + src/vnsw/agent/controller/controller_init.cc | 2 +- .../ovsdb_client/logical_switch_ovsdb.cc | 14 +- .../ovsdb_client/logical_switch_ovsdb.h | 1 + src/vnsw/agent/pkt/flow_table.cc | 16 +- src/vnsw/agent/pkt/flow_table.h | 2 +- src/vnsw/agent/port_ipc/vrouter-port-control | 5 +- src/vnsw/agent/vrouter/ksync/ksync_init.cc | 2 +- 29 files changed, 657 insertions(+), 153 deletions(-) diff --git a/src/analytics/db_handler.cc b/src/analytics/db_handler.cc index b14a6a7ce05..bc6d6a65d33 100644 --- a/src/analytics/db_handler.cc +++ b/src/analytics/db_handler.cc @@ -119,9 +119,10 @@ void DbHandler::SetDropLevel(size_t queue_count, SandeshLevel::type level, Sandesh::LevelToString(level) << "], DB QUEUE COUNT: " << queue_count); drop_level_ = level; - if (!cb.empty()) { - cb(); - } + } + // Always invoke the callback + if (!cb.empty()) { + cb(); } } diff --git a/src/api-lib/tools/install_venv_common.py b/src/api-lib/tools/install_venv_common.py index 469eed0ae13..d767ae2403c 100644 --- a/src/api-lib/tools/install_venv_common.py +++ b/src/api-lib/tools/install_venv_common.py @@ -122,7 +122,7 @@ def install_dependencies(self, find_links): # First things first, make sure our venv has the latest pip and # setuptools and pbr - self.pip_install(find_links, 'pip>=1.4') + self.pip_install(find_links, 'pip<=7.1') self.pip_install(find_links, 'setuptools') self.pip_install(find_links, 'pbr') diff --git a/src/base/version.info b/src/base/version.info index 4699fb07e80..aac433c1a6b 100644 --- a/src/base/version.info +++ b/src/base/version.info @@ -1 +1 @@ -2.22 +2.23 diff --git a/src/bgp/bgp_update_queue.cc b/src/bgp/bgp_update_queue.cc index 4ecedda64cd..e8101399f14 100644 --- a/src/bgp/bgp_update_queue.cc +++ b/src/bgp/bgp_update_queue.cc @@ -115,6 +115,12 @@ void UpdateQueue::AttrDequeue(UpdateInfo *current_uinfo) { // // Returns NULL if there are no more updates with the same BgpAttr. // +// Also return NULL if the next UpdateInfo is for the same RouteUpdate. This +// can happen in corner cases where the label (or the set for ecmp nexthops +// in case of an XMPP ribout) for a route changes between Join operations for +// 2 different sets of IPeerUpdates. Returning such an UpdateInfo breaks the +// locking design in RibOutUpdates and results in a deadlock. +// UpdateInfo *UpdateQueue::AttrNext(UpdateInfo *current_uinfo) { tbb::mutex::scoped_lock lock(mutex_); UpdatesByAttr::iterator iter = attr_set_.iterator_to(*current_uinfo); @@ -123,6 +129,9 @@ UpdateInfo *UpdateQueue::AttrNext(UpdateInfo *current_uinfo) { return NULL; } UpdateInfo *next_uinfo = iter.operator->(); + if (next_uinfo->update == current_uinfo->update) { + return NULL; + } if (next_uinfo->roattr.attr() == current_uinfo->roattr.attr()) { return next_uinfo; } diff --git a/src/bgp/bgp_update_queue.h b/src/bgp/bgp_update_queue.h index 897b48438f6..c94030e4521 100644 --- a/src/bgp/bgp_update_queue.h +++ b/src/bgp/bgp_update_queue.h @@ -18,6 +18,12 @@ // Looks at the BgpAttr, Timestamp and the associated RouteUpdate but not // the Label, in order to achieve optimal packing of BGP updates. // +// Compare the UpdateInfo pointers themselves as the final tie-breaker to +// handle the case where there are 2 UpdateInfos with the same BgpAttr in +// the same RouteUpdate. This can happen if the label (or the set for ecmp +// nexthops in case of an XMPP ribout) for a route changes between the Join +// operations for 2 different sets of IPeerUpdates. +// struct UpdateByAttrCmp { bool operator()(const UpdateInfo &lhs, const UpdateInfo &rhs) const { if (lhs.roattr.attr() < rhs.roattr.attr()) { @@ -32,7 +38,13 @@ struct UpdateByAttrCmp { if (lhs.update->tstamp() > rhs.update->tstamp()) { return false; } - return (lhs.update < rhs.update); + if (lhs.update < rhs.update) { + return true; + } + if (lhs.update > rhs.update) { + return false; + } + return (&lhs < &rhs); } }; diff --git a/src/bgp/routing-instance/static_route.cc b/src/bgp/routing-instance/static_route.cc index 3020f0bb322..55e975a13cd 100644 --- a/src/bgp/routing-instance/static_route.cc +++ b/src/bgp/routing-instance/static_route.cc @@ -676,6 +676,7 @@ void StaticRouteMgr::RemoveStaticRoutePrefix(const Ip4Prefix &static_route) { void StaticRouteMgr::ProcessStaticRouteConfig() { CHECK_CONCURRENCY("bgp::Config"); + if (routing_instance()->deleted() || !routing_instance()->config()) return; const BgpInstanceConfig::StaticRouteList &list = routing_instance()->config()->static_routes(); typedef BgpInstanceConfig::StaticRouteList::const_iterator iterator_t; @@ -690,6 +691,7 @@ bool StaticRouteMgr::ResolvePendingStaticRouteConfig() { } StaticRouteMgr::~StaticRouteMgr() { + resolve_trigger_->Reset(); if (static_route_queue_) delete static_route_queue_; } diff --git a/src/bgp/routing-instance/static_route.h b/src/bgp/routing-instance/static_route.h index c04efa6bae9..39e18f071b7 100644 --- a/src/bgp/routing-instance/static_route.h +++ b/src/bgp/routing-instance/static_route.h @@ -75,6 +75,10 @@ class StaticRouteMgr { RoutingInstance *instance_; BgpConditionListener *listener_; StaticRouteMap static_route_map_; + + void DisableResolveTrigger() { resolve_trigger_->set_disable(); } + void EnableResolveTrigger() { resolve_trigger_->set_enable(); } + void DisableQueue() { static_route_queue_->set_disable(true); } void EnableQueue() { static_route_queue_->set_disable(false); } bool IsQueueEmpty() { return static_route_queue_->IsQueueEmpty(); } diff --git a/src/bgp/test/bgp_export_rtupdate_test.cc b/src/bgp/test/bgp_export_rtupdate_test.cc index ee85d09ce6f..8b96ff1ecb7 100644 --- a/src/bgp/test/bgp_export_rtupdate_test.cc +++ b/src/bgp/test/bgp_export_rtupdate_test.cc @@ -15,7 +15,11 @@ using namespace std; class BgpExportRouteUpdateCommonTest : public BgpExportTest { protected: - BgpExportRouteUpdateCommonTest() : rt_update_(NULL), count_(0) { + BgpExportRouteUpdateCommonTest() + : qid_(RibOutUpdates::QUPDATE), + rt_update_(NULL), + tstamp_(0), + count_(0) { } void InitAdvertiseInfo(BgpAttrPtr attrX, int start_idx, int end_idx) { @@ -34,6 +38,12 @@ class BgpExportRouteUpdateCommonTest : public BgpExportTest { InitUpdateInfoCommon(attrX, start_idx, end_idx, uinfo_slist_); } + void InitUpdateInfo(BgpAttrPtr attrX, uint32_t label, + int start_idx, int end_idx, int qid = RibOutUpdates::QUPDATE) { + qid_ = qid; + InitUpdateInfoCommon(attrX, label, start_idx, end_idx, uinfo_slist_); + } + void InitUpdateInfo(BgpAttrPtr attr_blk[], int start_idx, int end_idx, int qid = RibOutUpdates::QUPDATE) { qid_ = qid; @@ -750,6 +760,159 @@ TEST_F(BgpExportRouteUpdateTest1, JoinMerge4) { } } +// +// Description: Join processing for route that already has join pending updates +// for some peers. Export policy accepts the route for other join +// peers with same attribute, but different label. +// Common attribute for initial join peers and new join peers. +// Different labels for initial join peers and new join peers. +// +// Old DBState: RouteUpdate in QBULK. +// No AdvertiseInfo. +// UpdateInfo peer x=[vJoinPeerCount,kPeerCount-1], +// attr A + label 100. +// Join Peers: Peers x=[0,vJoinPeerCount-1]. +// Export Rslt: Accept peer x=[0,vJoinPeerCount-1], attr A + label 200. +// New DBState: RouteUpdate in QBULK. +// No AdvertiseInfo. +// UpdateInfo peer x=[vJoinPeerCount,kPeerCount-1], +// attr A + label 100. +// UpdateInfo peer x=[0,vJoinPeerCount-1], attr A + label 200. +// +TEST_F(BgpExportRouteUpdateTest1, JoinMerge5a) { + int qid = RibOutUpdates::QBULK; + for (int vJoinPeerCount = 1; vJoinPeerCount < kPeerCount; + vJoinPeerCount++) { + InitUpdateInfo(attrA_, 100, vJoinPeerCount, kPeerCount-1, qid); + Initialize(); + + RibPeerSet join_peerset; + BuildPeerSet(join_peerset, 0, vJoinPeerCount-1); + BuildExportResult(attrA_, 0, vJoinPeerCount-1, 200); + RunJoin(join_peerset); + table_.VerifyExportResult(true); + + RouteUpdate *rt_update = ExpectRouteUpdate(&rt_, qid); + VerifyRouteUpdateDequeueEnqueue(rt_update); + EXPECT_EQ(rt_update_, rt_update); + VerifyUpdates(rt_update, attrA_, 100, vJoinPeerCount, kPeerCount-1, 2); + VerifyUpdates(rt_update, attrA_, 200, 0, vJoinPeerCount-1, 2); + VerifyHistory(rt_update); + + DrainAndDeleteRouteState(&rt_); + } +} + +// +// Description: Join processing for route that already has join pending updates +// for some peers. Export policy accepts the route for other join +// peers with same attribute, but different label. +// Common attribute for initial join peers and new join peers. +// Different labels for initial join peers and new join peers. +// After this the route is exported to all peers with the same +// attribute and the label. The attribute and label are same as +// that for the new join peers. +// +// Old DBState: RouteUpdate in QBULK. +// No AdvertiseInfo. +// UpdateInfo peer x=[vJoinPeerCount,kPeerCount-1], +// attr A + label 100. +// Join Peers: Peers x=[0,vJoinPeerCount-1]. +// Export Rslt1:Accept peer x=[0,vJoinPeerCount-1], attr A + label 200. +// Export Rslt2:Accept peer x=[0,kPeerCount-1], attr A + label 200. +// New DBState: RouteUpdate in QBULK. +// No AdvertiseInfo. +// UpdateInfo peer x=[0,kPeerCount-1], attr A + label 200. +// +TEST_F(BgpExportRouteUpdateTest1, JoinMerge5b) { + int qid = RibOutUpdates::QBULK; + for (int vJoinPeerCount = 1; vJoinPeerCount < kPeerCount; + vJoinPeerCount++) { + InitUpdateInfo(attrA_, 100, vJoinPeerCount, kPeerCount-1, qid); + Initialize(); + + RibPeerSet join_peerset; + BuildPeerSet(join_peerset, 0, vJoinPeerCount-1); + BuildExportResult(attrA_, 0, vJoinPeerCount-1, 200); + RunJoin(join_peerset); + table_.VerifyExportResult(true); + + RouteUpdate *rt_update = ExpectRouteUpdate(&rt_, qid); + VerifyRouteUpdateDequeueEnqueue(rt_update); + EXPECT_EQ(rt_update_, rt_update); + VerifyUpdates(rt_update, attrA_, 100, vJoinPeerCount, kPeerCount-1, 2); + VerifyUpdates(rt_update, attrA_, 200, 0, vJoinPeerCount-1, 2); + VerifyHistory(rt_update); + + BuildExportResult(attrA_, 0, kPeerCount-1, 200); + RunExport(); + table_.VerifyExportResult(true); + + rt_update = ExpectRouteUpdate(&rt_, RibOutUpdates::QUPDATE); + VerifyRouteUpdateDequeueEnqueue(rt_update); + EXPECT_EQ(rt_update_, rt_update); + VerifyUpdates(rt_update, attrA_, 200, 0, kPeerCount-1); + VerifyHistory(rt_update); + + DrainAndDeleteRouteState(&rt_); + } +} + +// +// Description: Join processing for route that already has join pending updates +// for some peers. Export policy accepts the route for other join +// peers with same attribute, but different label. +// Common attribute for initial join peers and new join peers. +// Different labels for initial join peers and new join peers. +// After this the route is exported to all peers with the same +// attribute and the label. The attribute and label are different +// than those for both the initial and new join peers. +// +// Old DBState: RouteUpdate in QBULK. +// No AdvertiseInfo. +// UpdateInfo peer x=[vJoinPeerCount,kPeerCount-1], +// attr A + label 100. +// Join Peers: Peers x=[0,vJoinPeerCount-1]. +// Export Rslt1:Accept peer x=[0,vJoinPeerCount-1], attr A + label 200. +// Export Rslt2:Accept peer x=[0,kPeerCount-1], attr B + label 300. +// New DBState: RouteUpdate in QBULK. +// No AdvertiseInfo. +// UpdateInfo peer x=[0,kPeerCount-1], attr B + label 300. +// +TEST_F(BgpExportRouteUpdateTest1, JoinMerge5c) { + int qid = RibOutUpdates::QBULK; + for (int vJoinPeerCount = 1; vJoinPeerCount < kPeerCount; + vJoinPeerCount++) { + InitUpdateInfo(attrA_, 100, vJoinPeerCount, kPeerCount-1, qid); + Initialize(); + + RibPeerSet join_peerset; + BuildPeerSet(join_peerset, 0, vJoinPeerCount-1); + BuildExportResult(attrA_, 0, vJoinPeerCount-1, 200); + RunJoin(join_peerset); + table_.VerifyExportResult(true); + + RouteUpdate *rt_update = ExpectRouteUpdate(&rt_, qid); + VerifyRouteUpdateDequeueEnqueue(rt_update); + EXPECT_EQ(rt_update_, rt_update); + VerifyUpdates(rt_update, attrA_, 100, vJoinPeerCount, kPeerCount-1, 2); + VerifyUpdates(rt_update, attrA_, 200, 0, vJoinPeerCount-1, 2); + VerifyHistory(rt_update); + + BuildExportResult(attrB_, 0, kPeerCount-1, 300); + RunExport(); + table_.VerifyExportResult(true); + + rt_update = ExpectRouteUpdate(&rt_, RibOutUpdates::QUPDATE); + VerifyRouteUpdateDequeueEnqueue(rt_update); + EXPECT_EQ(rt_update_, rt_update); + VerifyUpdates(rt_update, attrB_, 300, 0, kPeerCount-1); + VerifyHistory(rt_update); + + DrainAndDeleteRouteState(&rt_); + } +} + // // Description: Leave processing for route that has scheduled updates or join // pending. The leave peerset does not overlap with the peers that diff --git a/src/bgp/test/bgp_export_test.h b/src/bgp/test/bgp_export_test.h index 61a51f7ab8b..c4a270656ed 100644 --- a/src/bgp/test/bgp_export_test.h +++ b/src/bgp/test/bgp_export_test.h @@ -65,7 +65,11 @@ class BgpTestPeer : public IPeerUpdate { class InetTableMock : public InetTable { public: - InetTableMock(DB *db, const std::string &name) : InetTable(db, name) { } + InetTableMock(DB *db, const std::string &name) + : InetTable(db, name), + executed_(false), + reach_(false) { + } virtual ~InetTableMock() { return; } virtual bool Export(RibOut *ribout, Route *route, @@ -156,6 +160,7 @@ class BgpExportTest : public ::testing::Test { protected: typedef std::vector BgpAttrVec; + typedef std::vector LabelVec; typedef std::vector RibPeerSetVec; BgpExportTest() @@ -281,6 +286,14 @@ class BgpExportTest : public ::testing::Test { peerset_vec.push_back(peerset); } + void BuildVectors(BgpAttrVec &attr_vec, LabelVec &label_vec, + RibPeerSetVec &peerset_vec, BgpAttrPtr attrX, uint32_t label, + RibPeerSet *peerset) { + attr_vec.push_back(attrX); + label_vec.push_back(label); + peerset_vec.push_back(peerset); + } + void BuildVectors(BgpAttrVec &attr_vec, RibPeerSetVec &peerset_vec, BgpAttrPtr attr_blk[], int start_idx, int end_idx) { for (int idx = start_idx; idx <= end_idx; idx++) { @@ -301,6 +314,20 @@ class BgpExportTest : public ::testing::Test { } } + void BuildUpdateInfo(BgpAttrVec *attr_vec, LabelVec *label_vec, + RibPeerSetVec *peerset_vec, UpdateInfoSList &uu_slist) { + ASSERT_EQ(attr_vec->size(), peerset_vec->size()); + for (size_t idx = 0; idx < attr_vec->size(); idx++) { + UpdateInfo *uinfo = new UpdateInfo; + if (attr_vec->at(idx)) { + uinfo->roattr.set_attr(attr_vec->at(idx).get(), + label_vec->at(idx)); + } + uinfo->target = *peerset_vec->at(idx); + uu_slist->push_front(*uinfo); + } + } + void InitUpdateInfoCommon(BgpAttrPtr attrX, int start_idx, int end_idx, UpdateInfoSList &uinfo_slist) { RibPeerSet uu_peerset; @@ -311,6 +338,19 @@ class BgpExportTest : public ::testing::Test { BuildUpdateInfo(&uu_attr_vec, &uu_peerset_vec, uinfo_slist); } + void InitUpdateInfoCommon(BgpAttrPtr attrX, uint32_t label, + int start_idx, int end_idx, UpdateInfoSList &uinfo_slist) { + RibPeerSet uu_peerset; + BgpAttrVec uu_attr_vec; + LabelVec uu_label_vec; + RibPeerSetVec uu_peerset_vec; + BuildPeerSet(uu_peerset, start_idx, end_idx); + BuildVectors(uu_attr_vec, uu_label_vec, uu_peerset_vec, attrX, label, + &uu_peerset); + BuildUpdateInfo(&uu_attr_vec, &uu_label_vec, &uu_peerset_vec, + uinfo_slist); + } + void InitUpdateInfoCommon(BgpAttrPtr attr_blk[], int start_idx, int end_idx, UpdateInfoSList &uinfo_slist) { BgpAttrVec uu_attr_vec; @@ -390,14 +430,23 @@ class BgpExportTest : public ::testing::Test { return uplist; } - void BuildExportResult(BgpAttrPtr attrX, int start_idx, int end_idx) { + void BuildExportResult(BgpAttrPtr attrX, int start_idx, int end_idx, + uint32_t label = 0) { UpdateInfoSList res_slist; RibPeerSet res_peerset; BuildPeerSet(res_peerset, start_idx, end_idx); BgpAttrVec res_attr_vec; RibPeerSetVec res_peerset_vec; - BuildVectors(res_attr_vec, res_peerset_vec, attrX, &res_peerset); - BuildUpdateInfo(&res_attr_vec, &res_peerset_vec, res_slist); + if (label) { + LabelVec res_label_vec; + BuildVectors(res_attr_vec, res_label_vec, res_peerset_vec, + attrX, label, &res_peerset); + BuildUpdateInfo(&res_attr_vec, &res_label_vec, &res_peerset_vec, + res_slist); + } else { + BuildVectors(res_attr_vec, res_peerset_vec, attrX, &res_peerset); + BuildUpdateInfo(&res_attr_vec, &res_peerset_vec, res_slist); + } table_.SetExportResult(res_slist); } @@ -468,6 +517,19 @@ class BgpExportTest : public ::testing::Test { EXPECT_TRUE(uinfo->target == uinfo_peerset); } + void VerifyUpdates(RouteUpdate *rt_update, BgpAttrPtr attr, uint32_t label, + int start_idx, int end_idx, int count = 1) { + RibPeerSet uinfo_peerset; + BuildPeerSet(uinfo_peerset, start_idx, end_idx); + EXPECT_TRUE(rt_update != NULL); + EXPECT_EQ(count, rt_update->Updates()->size()); + RibOutAttr roattr; + roattr.set_attr(attr, label); + const UpdateInfo *uinfo = rt_update->FindUpdateInfo(roattr); + EXPECT_TRUE(uinfo != NULL); + EXPECT_TRUE(uinfo->target == uinfo_peerset); + } + void VerifyUpdates(RouteUpdate *rt_update, BgpAttrPtr attr_blk[], int start_idx, int end_idx, int count = 0) { if (!count) count = end_idx - start_idx + 1; diff --git a/src/bgp/test/static_route_test.cc b/src/bgp/test/static_route_test.cc index bb61f0aef45..2f169c31a48 100644 --- a/src/bgp/test/static_route_test.cc +++ b/src/bgp/test/static_route_test.cc @@ -153,6 +153,19 @@ class StaticRouteTest : public ::testing::Test { return table->Size(); } + void DisableResolveTrigger(const string &instance_name) { + RoutingInstance *rtinstance = + ri_mgr_->GetRoutingInstance(instance_name); + rtinstance->static_route_mgr()->DisableResolveTrigger(); + } + + void EnableResolveTrigger(const string &instance_name) { + RoutingInstance *rtinstance = + ri_mgr_->GetRoutingInstance(instance_name); + if (rtinstance) + rtinstance->static_route_mgr()->EnableResolveTrigger(); + } + void DisableStaticRouteQ(const string &instance_name) { RoutingInstance *rtinstance = ri_mgr_->GetRoutingInstance(instance_name); @@ -1607,6 +1620,135 @@ TEST_F(StaticRouteTest, DeleteRoutingInstance) { task_util::WaitForIdle(); } +// +// Delete the static route config and instance with resolve_trigger disabled +// Allow the routing instance to get deleted with Resolve trigger +// +TEST_F(StaticRouteTest, DeleteRoutingInstance_DisabledResolveTrigger) { + vector instance_names = list_of("blue")("nat"); + multimap connections; + this->NetworkConfig(instance_names, connections); + task_util::WaitForIdle(); + + std::auto_ptr params = + this->GetStaticRouteConfig( + "controller/src/bgp/testdata/static_route_1.xml"); + + ifmap_test_util::IFMapMsgPropertyAdd(&this->config_db_, "routing-instance", + "nat", "static-route-entries", params.release(), 0); + task_util::WaitForIdle(); + + // Add Nexthop Route + this->AddInetRoute(NULL, "nat", "192.168.1.254/32", 100, "2.3.4.5"); + task_util::WaitForIdle(); + + // Check for Static route + TASK_UTIL_WAIT_NE_NO_MSG(this->InetRouteLookup("blue", "192.168.1.0/24"), + NULL, 1000, 10000, "Wait for Static route in blue.."); + + // Disable resolve trigger + this->DisableResolveTrigger("nat"); + + // Delete the configuration for the nat instance. + ifmap_test_util::IFMapMsgPropertyDelete( + &this->config_db_, "routing-instance", + "nat", "static-route-entries"); + + // Delete nexthop route + this->DeleteInetRoute(NULL, "nat", "192.168.1.254/32"); + task_util::WaitForIdle(); + + // Check for Static route + TASK_UTIL_WAIT_EQ_NO_MSG( + this->InetRouteLookup("blue", "192.168.1.0/24"), + NULL, 1000, 10000, "Wait for Static route in blue.."); + + ifmap_test_util::IFMapMsgUnlink( + &this->config_db_, "routing-instance", "nat", + "virtual-network", "nat", "virtual-network-routing-instance"); + ifmap_test_util::IFMapMsgUnlink(&this->config_db_, "routing-instance", + "nat", "route-target", "target:64496:2", "instance-target"); + ifmap_test_util::IFMapMsgNodeDelete( + &this->config_db_, "virtual-network", "nat"); + ifmap_test_util::IFMapMsgNodeDelete( + &this->config_db_, "routing-instance", "nat"); + ifmap_test_util::IFMapMsgNodeDelete( + &this->config_db_, "route-target", "target:64496:2"); + task_util::WaitForIdle(); + + this->EnableResolveTrigger("nat"); +} + +// +// Delete the static route config and instance with resolve_trigger disabled +// Routing instance is not destroyed when the task trigger is enabled. +// Verify that enabling the task trigger ignores the deleted routing instance +// +TEST_F(StaticRouteTest, DeleteRoutingInstance_DisabledResolveTrigger_1) { + vector instance_names = list_of("blue")("nat"); + multimap connections; + this->NetworkConfig(instance_names, connections); + task_util::WaitForIdle(); + + std::auto_ptr params = + this->GetStaticRouteConfig( + "controller/src/bgp/testdata/static_route_1.xml"); + + ifmap_test_util::IFMapMsgPropertyAdd(&this->config_db_, "routing-instance", + "nat", "static-route-entries", params.release(), 0); + task_util::WaitForIdle(); + + // Add Nexthop Route + this->AddInetRoute(NULL, "nat", "192.168.1.254/32", 100, "2.3.4.5"); + task_util::WaitForIdle(); + + // Check for Static route + TASK_UTIL_WAIT_NE_NO_MSG(this->InetRouteLookup("blue", "192.168.1.0/24"), + NULL, 1000, 10000, "Wait for Static route in blue.."); + + // Disable resolve trigger + this->DisableResolveTrigger("nat"); + + // Delete the configuration for the nat instance. + ifmap_test_util::IFMapMsgPropertyDelete( + &this->config_db_, "routing-instance", + "nat", "static-route-entries"); + + // Check for Static route + TASK_UTIL_WAIT_EQ_NO_MSG( + this->InetRouteLookup("blue", "192.168.1.0/24"), + NULL, 1000, 10000, "Wait for Static route in blue.."); + + // Delete the nat routing instance + ifmap_test_util::IFMapMsgUnlink( + &this->config_db_, "routing-instance", "nat", + "virtual-network", "nat", "virtual-network-routing-instance"); + ifmap_test_util::IFMapMsgUnlink(&this->config_db_, "routing-instance", + "nat", "route-target", "target:64496:2", "instance-target"); + ifmap_test_util::IFMapMsgNodeDelete( + &this->config_db_, "virtual-network", "nat"); + ifmap_test_util::IFMapMsgNodeDelete( + &this->config_db_, "routing-instance", "nat"); + ifmap_test_util::IFMapMsgNodeDelete( + &this->config_db_, "route-target", "target:64496:2"); + task_util::WaitForIdle(); + + RoutingInstance *nat_inst = this->ri_mgr_->GetRoutingInstance("nat"); + TASK_UTIL_WAIT_EQ_NO_MSG(nat_inst->deleted(), + true, 1000, 10000, "Wait for nat instance to be marked deleted"); + // + // Since the nexthop route is not yet deleted, routing instance is + // not destroyed + // + this->EnableResolveTrigger("nat"); + + // Delete nexthop route + this->DeleteInetRoute(NULL, "nat", "192.168.1.254/32"); + task_util::WaitForIdle(); + TASK_UTIL_WAIT_EQ_NO_MSG(this->ri_mgr_->GetRoutingInstance("nat"), + NULL, 1000, 10000, "Wait for nat instance to get destroyed"); +} + // // Add the routing instance that imports the static route after the static // route has already been added. Objective is to check that the static route diff --git a/src/config/schema-transformer/to_bgp.py b/src/config/schema-transformer/to_bgp.py index c9414ad262e..73aa9587848 100644 --- a/src/config/schema-transformer/to_bgp.py +++ b/src/config/schema-transformer/to_bgp.py @@ -214,7 +214,7 @@ def __init__(self, name, obj=None, acl_dict=None, ri_dict=None): if remote_ri_fq_name[-1] == remote_ri_fq_name[-2]: self.connections.add(':'.join(remote_ri_fq_name[0:-1] )) - for ri in self.obj.get_routing_instances() or []: + for ri in getattr(self.obj, 'routing_instances', None) or []: ri_name = ri['to'][-1] if ri_name not in self.rinst: sc_id = self._get_service_id_from_ri(ri_name) diff --git a/src/config/utils/contrail-status.py b/src/config/utils/contrail-status.py index a158b637c6f..d4254b2187c 100755 --- a/src/config/utils/contrail-status.py +++ b/src/config/utils/contrail-status.py @@ -380,7 +380,8 @@ def main(): analytics = package_installed('contrail-analytics') agent = package_installed('contrail-vrouter') capi = package_installed('contrail-config') - cwebui = package_installed('contrail-web-core') + cwebui = package_installed('contrail-web-controller') + cwebstorage = package_installed('contrail-web-storage') database = (package_installed('contrail-openstack-database') or package_installed('contrail-database')) storage = package_installed('contrail-storage') @@ -406,8 +407,8 @@ def main(): if capi: supervisor_status('config', options) - - if cwebui: + + if cwebui or cwebstorage: supervisor_status('webui', options) if database: diff --git a/src/discovery/SConscript b/src/discovery/SConscript index ddb94aaea65..5e6cd6ba4cb 100644 --- a/src/discovery/SConscript +++ b/src/discovery/SConscript @@ -119,6 +119,15 @@ cov_cmd = env.Command('coveragetest.log', sdist_gen, test_cmd = env.Command('test.log', sdist_gen, 'bash -c "set -o pipefail && cd ' + Dir(top_dir).path + ' && python setup.py run_tests 2>&1 | tee test.log"') +test_depends = ['/config/common/dist/cfgm_common-0.1dev.tar.gz', + '/api-lib/dist/vnc_api-0.1dev.tar.gz', + '/discovery/client/dist/discoveryclient-0.1dev.tar.gz', + '/tools/sandesh/library/python/dist/sandesh-0.1dev.tar.gz', + '/sandesh/common/dist/sandesh-common-0.1dev.tar.gz', + '/config/api-server/dist/vnc_cfg_api_server-0.1dev.tar.gz',] + +env.Depends(test_cmd, [env['TOP']+x for x in test_depends]) +env.Depends(cov_cmd, [env['TOP']+x for x in test_depends]) env.Alias('controller/src/discovery:test', test_cmd) env.Alias('controller/src/discovery:coverage', cov_cmd) diff --git a/src/discovery/requirements.txt b/src/discovery/requirements.txt index c466cd06157..012113b3312 100644 --- a/src/discovery/requirements.txt +++ b/src/discovery/requirements.txt @@ -1,6 +1,4 @@ gevent==1.1a2 pycassa -sandesh -sandesh-common ConfigParser xmltodict diff --git a/src/discovery/test-requirements.txt b/src/discovery/test-requirements.txt index 320ca42a73d..38e72d363ca 100644 --- a/src/discovery/test-requirements.txt +++ b/src/discovery/test-requirements.txt @@ -9,7 +9,9 @@ distribute>=0.7.3 lxml kombu kazoo -vnc_api -cfgm_common python-novaclient +cfgm_common +vnc_api discoveryclient +sandesh +sandesh-common diff --git a/src/dns/cmn/dns.cc b/src/dns/cmn/dns.cc index 9cdc9ba871b..ac5c2550916 100644 --- a/src/dns/cmn/dns.cc +++ b/src/dns/cmn/dns.cc @@ -81,6 +81,9 @@ void Dns::SetTaskSchedulingPolicy() { exclude_io); const char *garbage_exclude_list[] = { + "dns::Config", + "dns::BindStatus", + "db::DBTable", "bgp::Config", "xmpp::StateMachine", }; diff --git a/src/ksync/SConscript b/src/ksync/SConscript index 61ab721b6c1..f49a8549af6 100644 --- a/src/ksync/SConscript +++ b/src/ksync/SConscript @@ -8,10 +8,14 @@ Import('BuildEnv') env = BuildEnv.Clone() env.Append(CPPPATH = '#vrouter/include') +env.Append(CPPPATH = env['TOP']) env.Append(CPPPATH = env['TOP'] + '/vrouter/sandesh') env.Append(CPPPATH = env['TOP'] + '/vnsw/agent') +env.Append(CPPPATH = env['TOP'] + '/vnsw/agent/cmn/') +env.Append(CPPPATH = env['TOP'] + '/vnsw/agent/oper/') env.Append(CPPPATH = env['TOP'] + '/sandesh') env.Append(CPPPATH = env['TOP'] + '/ksync') +env.Append(CPPPATH = env['TOP'] + '/schema') env.CppEnableExceptions() diff --git a/src/ksync/ksync_sock.cc b/src/ksync/ksync_sock.cc index 0ed3c1ec37f..c68318de707 100644 --- a/src/ksync/ksync_sock.cc +++ b/src/ksync/ksync_sock.cc @@ -218,7 +218,8 @@ void KSyncSockNetlink::Receive(mutable_buffers_1 buf) { //Udp socket class for interacting with kernel KSyncSockUdp::KSyncSockUdp(boost::asio::io_service &ios, int port) - : sock_(ios, ip::udp::endpoint(ip::udp::v4(), 0)), server_ep_(ip::address::from_string("127.0.0.1"), port) { + : sock_(ios, boost::asio::ip::udp::endpoint(boost::asio::ip::udp::v4(), 0)), + server_ep_(boost::asio::ip::address::from_string("127.0.0.1"), port) { //sock_.open(ip::udp::v4()); } @@ -285,18 +286,18 @@ size_t KSyncSockUdp::SendTo(const char *data, uint32_t data_len, } void KSyncSockUdp::AsyncReceive(mutable_buffers_1 buf, HandlerCb cb) { - ip::udp::endpoint ep; + boost::asio::ip::udp::endpoint ep; sock_.async_receive_from(buf, ep, cb); } void KSyncSockUdp::Receive(mutable_buffers_1 buf) { - ip::udp::endpoint ep; + boost::asio::ip::udp::endpoint ep; sock_.receive_from(buf, ep); } //TCP socket class for interacting with vrouter KSyncSockTcp::KSyncSockTcp(EventManager *evm, - ip::address ip_address, int port) : TcpServer(evm), evm_(evm), + boost::asio::ip::address ip_address, int port) : TcpServer(evm), evm_(evm), session_(NULL), server_ep_(ip_address, port), connect_complete_(false) { session_ = CreateSession(); Connect(session_, server_ep_); @@ -626,7 +627,8 @@ void KSyncSockUdp::Init(io_service &ios, int count, int port) { } } -void KSyncSockTcp::Init(EventManager *evm, int count, ip::address ip_addr, +void KSyncSockTcp::Init(EventManager *evm, int count, + boost::asio::ip::address ip_addr, int port) { KSyncSock::Init(count); SetNetlinkFamilyId(10); diff --git a/src/ksync/ksync_sock_user.cc b/src/ksync/ksync_sock_user.cc index 404e7686ef1..52d07687289 100644 --- a/src/ksync/ksync_sock_user.cc +++ b/src/ksync/ksync_sock_user.cc @@ -614,9 +614,9 @@ void KSyncSockTypeMap::Init(boost::asio::io_service &ios, int count) { assert(singleton_ == NULL); singleton_ = new KSyncSockTypeMap(ios); - singleton_->local_ep_.address(ip::address::from_string("127.0.0.1")); + singleton_->local_ep_.address(boost::asio::ip::address::from_string("127.0.0.1")); singleton_->local_ep_.port(0); - singleton_->sock_.open(ip::udp::v4()); + singleton_->sock_.open(boost::asio::ip::udp::v4()); singleton_->sock_.bind(singleton_->local_ep_); singleton_->local_ep_ = singleton_->sock_.local_endpoint(); @@ -689,24 +689,40 @@ void KSyncUserSockFlowContext::Process() { uint16_t flags = 0; int flow_error = sock->GetKSyncError(KSyncSockTypeMap::KSYNC_FLOW_ENTRY_TYPE); + FlowKey key(req_->get_fr_flow_nh_id(), + Ip4Address(ntohl(req_->get_fr_flow_sip())), + Ip4Address(ntohl(req_->get_fr_flow_dip())), + req_->get_fr_flow_proto(), + ntohs(req_->get_fr_flow_sport()), + ntohs(req_->get_fr_flow_dport())); + flags = req_->get_fr_flags(); //delete from map if command is delete if (!flags) { sock->flow_map.erase(req_->get_fr_index()); + sock->flow_index_map.erase(key); //Deactivate the flow-entry in flow mmap KSyncSockTypeMap::SetFlowEntry(req_, false); } else { /* Send reverse-flow index as one more than fwd-flow index */ uint32_t fwd_flow_idx = req_->get_fr_index(); if (fwd_flow_idx == 0xFFFFFFFF) { - if (flow_error == 0) { - /* Allocate entry only of no error case */ - fwd_flow_idx = rand() % 50000; + KSyncSockTypeMap::FlowIndexMap::iterator it = + sock->flow_index_map.find(key); + if (it != sock->flow_index_map.end()) { + fwd_flow_idx = it->second; req_->set_fr_index(fwd_flow_idx); + } else { + if (flow_error == 0) { + /* Allocate entry only of no error case */ + fwd_flow_idx = rand() % 50000; + req_->set_fr_index(fwd_flow_idx); + } } } if (fwd_flow_idx != 0xFFFFFFFF) { + sock->flow_index_map[key] = fwd_flow_idx; //store info from binary sandesh message vr_flow_req flow_info(*req_); diff --git a/src/ksync/ksync_sock_user.h b/src/ksync/ksync_sock_user.h index 5f7b8b09291..f226e2d0af6 100644 --- a/src/ksync/ksync_sock_user.h +++ b/src/ksync/ksync_sock_user.h @@ -12,6 +12,7 @@ #include #include +#include "pkt/flow_table.h" #include "ksync_sock.h" #include "vr_types.h" @@ -92,6 +93,7 @@ class KSyncSockTypeMap : public KSyncSock { ~KSyncSockTypeMap() { assert(nh_map.size() == 0); assert(flow_map.size() == 0); + assert(flow_index_map.size() == 0); assert(if_map.size() == 0); assert(rt_tree.size() == 0); assert(mpls_map.size() == 0); @@ -105,6 +107,21 @@ class KSyncSockTypeMap : public KSyncSock { ksync_map_nh nh_map; typedef boost::unordered_map ksync_map_flow; ksync_map_flow flow_map; + + struct FlowKeyCmp { + bool operator() (const FlowKey &a, const FlowKey &b) const { + return a.IsLess(b); + } + }; + + // map to maintain flow key to index association to mimic + // vrouter behavior of returning already allocated index + // on request for index instead of allocating a new index + // which can result in pending flow entries in test case + // as a false failure + typedef std::map FlowIndexMap; + FlowIndexMap flow_index_map; + typedef std::map ksync_map_if; ksync_map_if if_map; typedef std::set ksync_rt_tree; diff --git a/src/storage/stats-daemon/stats_daemon/storage_nodemgr.py b/src/storage/stats-daemon/stats_daemon/storage_nodemgr.py index 1be38947c05..6e79dec13f0 100644 --- a/src/storage/stats-daemon/stats_daemon/storage_nodemgr.py +++ b/src/storage/stats-daemon/stats_daemon/storage_nodemgr.py @@ -7,7 +7,8 @@ """ import sys from gevent import monkey -monkey.patch_all(thread=not 'unittest' in sys.modules) +from gevent.subprocess import Popen, PIPE +monkey.patch_all() import os import glob import socket @@ -101,8 +102,8 @@ def __init__(self, node_type): self.call_subprocess(pattern) def exec_local(self, arg): - ret = subprocess.Popen('%s' %(arg), shell=True, - stdout=subprocess.PIPE).stdout.read() + ret = Popen('%s' %(arg), shell=True, + stdout=PIPE).stdout.read() ret = ret[:-1] return ret @@ -117,12 +118,11 @@ def init_units(self): ''' This function is a wrapper for subprocess call. Timeout functionality - is used to timeout after 3 seconds of no response from subprocess call + is used to timeout after 5 seconds of no response from subprocess call and the corresponding cmd will be logged into syslog ''' def call_subprocess(self, cmd): times = datetime.datetime.now() - # latest 14.0.4 requires "HOME" env variable to be passed # copy current environment variables and add "HOME" variable # pass the newly created environment variable to Popen subprocess @@ -131,20 +131,23 @@ def call_subprocess(self, cmd): # stdout and stderr are redirected. # stderr not used (stdout validation is done so stderr check is # is not needed) - p = subprocess.Popen(cmd, stdout=subprocess.PIPE, \ - stderr=subprocess.PIPE, shell=True, env=env_home) - - while p.poll() is None: - time.sleep(0.1) - now = datetime.datetime.now() - diff = now - times - if diff.seconds > 3: - os.kill(p.pid, signal.SIGKILL) - os.waitpid(-1, os.WNOHANG) - message = "command:" + cmd + " ---> hanged" - ssdlog = StorageStatsDaemonLog(message = message) - self.call_send(ssdlog) - return None + try: + p = Popen(cmd, stdout=PIPE, \ + stderr=PIPE, shell=True, env=env_home) + while p.poll() is None: + gevent.sleep(0.1) + now = datetime.datetime.now() + diff = now - times + if diff.seconds > 5: + os.kill(p.pid, signal.SIGKILL) + os.waitpid(-1, os.WNOHANG) + message = "command:" + cmd + " ---> hanged" + ssdlog = StorageStatsDaemonLog(message = message) + self.call_send(ssdlog) + return None + except: + pass + return None # stdout is used return p.stdout.read() @@ -170,20 +173,28 @@ def create_and_send_pool_stats(self): result = re.sub( '\s+', ' ', line).strip() arr1 = result.split() + READS = 7 + READKBS = 8 + WRITES = 9 + WRITEKBS = 10 + if len(arr1) == 10: + READS = 6 + READKBS = 7 + WRITES = 8 + WRITEKBS = 9 if arr1[0] != "total": cs_pool = ComputeStoragePool() cs_pool.name = self._hostname + ':' + arr1[0] pool_stats = PoolStats() - pool_stats.reads = int(arr1[7]) - pool_stats.read_kbytes = int(arr1[8]) - pool_stats.writes = int(arr1[9]) - pool_stats.write_kbytes = int(arr1[10]) + pool_stats.reads = int(arr1[READS]) + pool_stats.read_kbytes = int(arr1[READKBS]) + pool_stats.writes = int(arr1[WRITES]) + pool_stats.write_kbytes = int(arr1[WRITEKBS]) cs_pool.info_stats = [pool_stats] pool_stats_trace = ComputeStoragePoolTrace(data=cs_pool) self.call_send(pool_stats_trace) - def populate_osd_total_stats(self, osdname, osd_stats, prev_osd_latency): ceph_name = "ceph-" + osdname + ".asok" cmd = "ceph --admin-daemon /var/run/ceph/" + ceph_name + \ @@ -195,6 +206,7 @@ def populate_osd_total_stats(self, osdname, osd_stats, prev_osd_latency): res1 = self.call_subprocess(cmd) if res1 is None: return False + osd_stats.stats_time = datetime.datetime.now() arr1 = res1.splitlines() for line1 in arr1: result = re.sub('\s+', ' ', line1).strip() @@ -226,36 +238,40 @@ def populate_osd_total_stats(self, osdname, osd_stats, prev_osd_latency): return True def diff_read_kbytes(self, line, osd_stats, temp_osd_stats, - osd_prev_stats): + osd_prev_stats, diff_time): # 'line' format : " xyz," self.curr_read_kbytes += int(line.rstrip(",").strip(' ')) / 1024 temp_osd_stats.read_kbytes = self.curr_read_kbytes osd_stats.read_kbytes = self.curr_read_kbytes - \ - osd_prev_stats.read_kbytes + osd_prev_stats.read_kbytes + osd_stats.read_kbytes = int(osd_stats.read_kbytes / diff_time) def diff_write_kbytes(self, line, osd_stats, temp_osd_stats, - osd_prev_stats): + osd_prev_stats, diff_time): # 'line' format : " xyz," self.curr_write_kbytes += int(line.rstrip(",").strip(' ')) / 1024 temp_osd_stats.write_kbytes = self.curr_write_kbytes osd_stats.write_kbytes = self.curr_write_kbytes - \ - osd_prev_stats.write_kbytes + osd_prev_stats.write_kbytes + osd_stats.write_kbytes = int(osd_stats.write_kbytes / diff_time) def diff_read_cnt(self, line, osd_stats, temp_osd_stats, - osd_prev_stats): + osd_prev_stats, diff_time): # 'line' format : " xyz," self.curr_reads += int(line.rstrip(",").strip(' ')) temp_osd_stats.reads = self.curr_reads osd_stats.reads = self.curr_reads - \ - osd_prev_stats.reads + osd_prev_stats.reads + osd_stats.reads = int(osd_stats.reads / diff_time) def diff_write_cnt(self, line, osd_stats, temp_osd_stats, - osd_prev_stats): + osd_prev_stats, diff_time): # 'line' format : " xyz," self.curr_writes += int(line.rstrip(",").strip(' ')) temp_osd_stats.writes = self.curr_writes osd_stats.writes = self.curr_writes - \ - osd_prev_stats.writes + osd_prev_stats.writes + osd_stats.writes = int(osd_stats.writes / diff_time) def populate_osd_diff_stats(self, osdname, osd_stats, @@ -270,6 +286,11 @@ def populate_osd_diff_stats(self, osdname, osd_stats, res1 = self.call_subprocess(cmd) if res1 is None: return False + stats_time = datetime.datetime.now() + diff_time = stats_time - osd_prev_stats.stats_time + fdiff_time = float(diff_time.seconds) + \ + float(diff_time.microseconds)/1000000 + temp_osd_stats.stats_time = stats_time arr1 = res1.splitlines() for line1 in arr1: result = re.sub('\s+', ' ', line1).strip() @@ -278,54 +299,61 @@ def populate_osd_diff_stats(self, osdname, osd_stats, if line2[0].find('subop_r_out_bytes') != -1 or \ line2[0].find('op_r_out_bytes') != -1: self.diff_read_kbytes(line2[1], - osd_stats, - temp_osd_stats, - osd_prev_stats) + osd_stats, + temp_osd_stats, + osd_prev_stats, + fdiff_time) elif line2[0].find('subop_w_in_bytes') != -1 or \ line2[0].find('op_w_in_bytes') != -1: self.diff_write_kbytes(line2[1], - osd_stats, - temp_osd_stats, - osd_prev_stats) + osd_stats, + temp_osd_stats, + osd_prev_stats, + fdiff_time) elif line2[0].find('subop_r') != -1 or \ - line2[0].find('op_r') != -1: + line2[0].find('op_r') != -1: self.diff_read_cnt(line2[1], - osd_stats, - temp_osd_stats, - osd_prev_stats) + osd_stats, + temp_osd_stats, + osd_prev_stats, + fdiff_time) elif line2[0].find('subop_w') != -1 or \ line2[0].find('op_w') != -1: self.diff_write_cnt(line2[1], osd_stats, temp_osd_stats, - osd_prev_stats) + osd_prev_stats, + fdiff_time) except: pass return True - def compute_read_latency(self, arr, line, index, osd_stats, + def compute_read_latency(self, arr, osd_stats, prev_osd_latency, op_flag): - # 'line' format : " xyz," - avgcount = int(line.rstrip(",").strip(' ')) + # 'line' format : ['op_read_latency', 'avgcount', '2822,', 'sum', '240.2423},'] + + avgcount = int(arr[2].rstrip(",")) # 'arr' format : "'sum': xyz.yzw}," - sum_rlatency = int( - float(arr[index + 1].split(":")[1].strip().rstrip("},"))) + sum_rlatency = int(float(arr[4].rstrip("},"))) + # sum_rlatency is in seconds # multiplied by 1000 to convert seconds to milliseconds if avgcount != 0: # op_flag = 1 indicates replica osd read latency if op_flag == 1: - osd_stats.op_r_latency += ((sum_rlatency * 1000) - \ - (prev_osd_latency.prev_subop_rsum * 1000)) / \ - (avgcount - prev_osd_latency.prev_subop_rcount) + if(avgcount > prev_osd_latency.prev_subop_rcount): + osd_stats.op_r_latency += ((sum_rlatency * 1000) - \ + (prev_osd_latency.prev_subop_rsum * 1000)) / \ + (avgcount - prev_osd_latency.prev_subop_rcount) prev_osd_latency.prev_subop_rsum = sum_rlatency prev_osd_latency.prev_subop_rcount = avgcount # op_flag = 2 indicates primary osd read latency if op_flag == 2: - osd_stats.op_r_latency += ((sum_rlatency * 1000) - \ - (prev_osd_latency.prev_op_rsum * 1000)) / \ - (avgcount - prev_osd_latency.prev_op_rcount) + if(avgcount > prev_osd_latency.prev_op_rcount): + osd_stats.op_r_latency += ((sum_rlatency * 1000) - \ + (prev_osd_latency.prev_op_rsum * 1000)) / \ + (avgcount - prev_osd_latency.prev_op_rcount) prev_osd_latency.prev_op_rsum = sum_rlatency prev_osd_latency.prev_op_rcount = avgcount else: @@ -339,28 +367,30 @@ def compute_read_latency(self, arr, line, index, osd_stats, prev_osd_latency.prev_op_rsum = 0 prev_osd_latency.prev_op_rcount = 0 - def compute_write_latency(self, arr, line, index, osd_stats, + def compute_write_latency(self, arr, osd_stats, prev_osd_latency, op_flag): - # line format : " xyz," - avgcount = int(line.rstrip(",").strip(' ')) - # arr format : "'sum': xyz.yzw}," - sum_wlatency = int( - float(arr[index + 1].split(":")[1].strip().rstrip("},"))) + # 'line' format : ['op_read_latency', 'avgcount', '2822,', 'sum', '240.2423},'] + + avgcount = int(arr[2].rstrip(",")) + # 'arr' format : "'sum': xyz.yzw}," + sum_wlatency = int(float(arr[4].rstrip("},"))) # sum_wlatency is in seconds # multiplied by 1000 to convert seconds to milliseconds if avgcount != 0: # op_flag = 1 indicates replica osd write latency if op_flag == 1: - osd_stats.op_w_latency += ((sum_wlatency * 1000) - \ - (prev_osd_latency.prev_subop_wsum * 1000)) / \ - (avgcount - prev_osd_latency.prev_subop_wcount) + if(avgcount > prev_osd_latency.prev_subop_wcount): + osd_stats.op_w_latency += ((sum_wlatency * 1000) - \ + (prev_osd_latency.prev_subop_wsum * 1000)) / \ + (avgcount - prev_osd_latency.prev_subop_wcount) prev_osd_latency.prev_subop_wsum = sum_wlatency prev_osd_latency.prev_subop_wcount = avgcount # op_flag = 2 indicates primary osd write latency if op_flag == 2: - osd_stats.op_w_latency += ((sum_wlatency * 1000) - \ - (prev_osd_latency.prev_op_wsum * 1000)) / \ - (avgcount - prev_osd_latency.prev_op_wcount) + if(avgcount > prev_osd_latency.prev_op_wcount): + osd_stats.op_w_latency += ((sum_wlatency * 1000) - \ + (prev_osd_latency.prev_op_wsum * 1000)) / \ + (avgcount - prev_osd_latency.prev_op_wcount) prev_osd_latency.prev_op_wsum = sum_wlatency prev_osd_latency.prev_op_wcount = avgcount else: @@ -375,41 +405,40 @@ def compute_write_latency(self, arr, line, index, osd_stats, prev_osd_latency.prev_op_wcount = 0 def populate_osd_latency_stats(self, osdname, osd_stats, prev_osd_latency): - ceph_name = "ceph-" + osdname + ".asok" - cmd2 = "ceph --admin-daemon /var/run/ceph/" + ceph_name + \ - " perf dump | egrep -A 1 -w \"\\\"" +\ - "op_r_latency\\\":|\\\"subop_r_latency\\\":|" + \ - "\\\"op_w_latency\\\":|\\\"" + \ - "subop_w_latency\\\":\"" - try: - res2 = self.call_subprocess(cmd2) - if res2 is None: + lat_list={"op_r_latency","subop_r_latency","op_w_latency","subop_w_latency"} + for entry in lat_list: + ceph_name = "ceph-" + osdname + ".asok" + cmd = ('ceph --admin-daemon /var/run/ceph/%s perf dump | \ + egrep -A5 -w %s |tr \"\\\"\" \" \" | \ + awk \'BEGIN{start=0;title=\"\";avgcount=\"\";sum=\"\"} \ + {i=1;while (i<=NF) {if($i == \"{\"){start=1} \ + if($i == \"}\" && start==1){break} \ + if($i==\"%s\"){title=$i} \ + if($i==\"avgcount\"){i=i+2;avgcount=$i} \ + if($i==\"sum\"){i=i+2;sum=$i}i=i+1}} \ + END{print title \" avgcount \" avgcount \" sum \" sum}\'' + %(ceph_name, entry, entry)) + res = self.call_subprocess(cmd) + if res is None: return False - arr2 = res2.splitlines() - for index in range(len(arr2)): - # replace multiple spaces - # to single space here - result = re.sub('\s+', ' ', arr2[index]).strip() - line2 = result.split(":") - if len(line2) != 0: - # subop_r_latency: replica osd read latency value - if line2[0].find('subop_r_latency') != -1: - self.compute_read_latency(arr2, - line2[2], index, osd_stats, prev_osd_latency, 1) - # op_r_latency: primary osd read latency value - elif line2[0].find('op_r_latency') != -1: - self.compute_read_latency(arr2, - line2[2], index, osd_stats, prev_osd_latency, 2) - # subop_w_latency: replica osd write latency value - elif line2[0].find('subop_w_latency') != -1: - self.compute_write_latency(arr2, - line2[2], index, osd_stats, prev_osd_latency, 1) - # op_w_latency: primary osd write latency value - elif line2[0].find('op_w_latency') != -1: - self.compute_write_latency(arr2, - line2[2], index, osd_stats, prev_osd_latency, 2) - except: - pass + res.lstrip(' ') + line = res.split(' ') + # subop_r_latency: replica osd read latency value + if line[0] == 'subop_r_latency': + self.compute_read_latency(line, + osd_stats, prev_osd_latency, 1) + # op_r_latency: primary osd read latency value + elif line[0] == 'op_r_latency': + self.compute_read_latency(line, + osd_stats, prev_osd_latency, 2) + # subop_w_latency: replica osd write latency value + elif line[0] == 'subop_w_latency': + self.compute_write_latency(line, + osd_stats, prev_osd_latency, 1) + # op_w_latency: primary osd write latency value + elif line[0] == 'op_w_latency': + self.compute_write_latency(line, + osd_stats, prev_osd_latency, 2) return True @@ -531,7 +560,9 @@ def compute_usage(self, disk_usage_obj, unit): def create_and_send_disk_stats(self): # iostat to get the raw disk list - res = self.call_subprocess('iostat') + cmd = 'iostat 4 2 | awk \'{arr[NR]=$0} \ + END{for(i=NR/2;ivrouter_max_labels() == 0) { + if (agent_->vrouter_max_labels() <= MIN_UNICAST_LABEL_RANGE) { str << 0 << "-" << 0; fabric_multicast_label_range_[idx].start = 0; fabric_multicast_label_range_[idx].end = 0; diff --git a/src/vnsw/agent/ovs_tor_agent/ovsdb_client/logical_switch_ovsdb.cc b/src/vnsw/agent/ovs_tor_agent/ovsdb_client/logical_switch_ovsdb.cc index 354976b0c2d..de2ca47e41c 100644 --- a/src/vnsw/agent/ovs_tor_agent/ovsdb_client/logical_switch_ovsdb.cc +++ b/src/vnsw/agent/ovs_tor_agent/ovsdb_client/logical_switch_ovsdb.cc @@ -216,8 +216,17 @@ KSyncEntry *LogicalSwitchEntry::UnresolvedReference() { // while creating stale entry we should not wait for physical // switch object since it will not be available till config // comes up + // for stale entry we should always be able to acquire vxlan id - assert(res_vxlan_id_.AcquireVxLanId((uint32_t)vxlan_id_)); + // However in certain cases, where OVSDB database is already + // in a state where two Logical switch entries exists with + // same VxLAN ID, we need to recover by deleting the entry + // from OVSDB database + bool ret = res_vxlan_id_.AcquireVxLanId((uint32_t)vxlan_id_); + if (!ret) { + SendTrace(DUP_TUNNEL_KEY_ADD); + } + return NULL; } @@ -297,6 +306,9 @@ void LogicalSwitchEntry::SendTrace(Trace event) const { case DEL_ACK: info.set_op("Delete Received"); break; + case DUP_TUNNEL_KEY_ADD: + info.set_op("Add Request with Duplicate tunnel key"); + break; default: info.set_op("unknown"); } diff --git a/src/vnsw/agent/ovs_tor_agent/ovsdb_client/logical_switch_ovsdb.h b/src/vnsw/agent/ovs_tor_agent/ovsdb_client/logical_switch_ovsdb.h index 1383624443f..0ed6e9f23af 100644 --- a/src/vnsw/agent/ovs_tor_agent/ovsdb_client/logical_switch_ovsdb.h +++ b/src/vnsw/agent/ovs_tor_agent/ovsdb_client/logical_switch_ovsdb.h @@ -62,6 +62,7 @@ class LogicalSwitchEntry : public OvsdbDBEntry { DEL_REQ, ADD_ACK, DEL_ACK, + DUP_TUNNEL_KEY_ADD, }; LogicalSwitchEntry(OvsdbDBObject *table, const std::string &name); LogicalSwitchEntry(OvsdbDBObject *table, const LogicalSwitchEntry *key); diff --git a/src/vnsw/agent/pkt/flow_table.cc b/src/vnsw/agent/pkt/flow_table.cc index 634b6e9dbf9..b713a9e6dde 100644 --- a/src/vnsw/agent/pkt/flow_table.cc +++ b/src/vnsw/agent/pkt/flow_table.cc @@ -1047,7 +1047,7 @@ void FlowTable::Add(FlowEntry *flow, FlowEntry *rflow, bool update) { ksync_obj->Delete(rflow->ksync_entry_); rflow->ksync_entry_ = NULL; } - DeleteByIndex(rflow->flow_handle_, rflow); + DeleteByIndex(rflow); rflow->flow_handle_ = FlowEntry::kInvalidFlowHandle; rflow->data().vrouter_evicted_flow_ = vrouter_evicted_flow; } @@ -1734,7 +1734,7 @@ void FlowTable::DeleteInternal(FlowEntryMap::iterator &it, uint64_t time, fe->set_reverse_flow_entry(NULL); DeleteFlowInfo(fe); - DeleteByIndex(fe->flow_handle_, fe); + DeleteByIndex(fe); FlowTableKSyncEntry *ksync_entry = fe->ksync_entry_; KSyncEntry::KSyncEntryPtr ksync_ptr = ksync_entry; @@ -2106,10 +2106,10 @@ void FlowTable::InsertByIndex(uint32_t flow_handle, FlowEntry *flow) { } } -void FlowTable::DeleteByIndex(uint32_t flow_handle, FlowEntry *fe) { - if (flow_handle != FlowEntry::kInvalidFlowHandle) { - if (flow_index_tree_[flow_handle].get() == fe) { - flow_index_tree_[flow_handle] = NULL; +void FlowTable::DeleteByIndex(FlowEntry *fe) { + if (fe->flow_handle() != FlowEntry::kInvalidFlowHandle) { + if (flow_index_tree_[fe->flow_handle()].get() == fe) { + flow_index_tree_[fe->flow_handle()] = NULL; } } } @@ -2873,8 +2873,6 @@ void FlowTable::DeleteFlowInfo(FlowEntry *fe) DeleteVmFlowInfo(fe); // Remove from RouteFlowTree DeleteRouteFlowInfo(fe); - //Remove from flow handle tree - DeleteByIndex(fe->flow_handle_, fe); } void FlowTable::DeleteVnFlowInfo(FlowEntry *fe) @@ -3346,7 +3344,7 @@ void FlowTable::AddIndexFlowInfo(FlowEntry *fe, uint32_t flow_handle) { } if (flow_handle != fe->flow_handle_) { - DeleteByIndex(flow_handle, fe); + DeleteByIndex(fe); } FlowEntry *flow = FindByIndex(flow_handle); diff --git a/src/vnsw/agent/pkt/flow_table.h b/src/vnsw/agent/pkt/flow_table.h index ebe277efad9..c7c0e60933e 100644 --- a/src/vnsw/agent/pkt/flow_table.h +++ b/src/vnsw/agent/pkt/flow_table.h @@ -772,7 +772,7 @@ class FlowTable { // Update flow port bucket information void NewFlow(const FlowEntry *flow); void DeleteFlow(const FlowEntry *flow); - void DeleteByIndex(uint32_t flow_handle, FlowEntry *flow); + void DeleteByIndex(FlowEntry *flow); void InsertByIndex(uint32_t flow_handle, FlowEntry *flow); FlowEntry *FindByIndex(uint32_t flow_handle); void DeleteVrouterEvictedFlow(FlowEntry *flow); diff --git a/src/vnsw/agent/port_ipc/vrouter-port-control b/src/vnsw/agent/port_ipc/vrouter-port-control index e9eb0de83ca..73197be5d72 100755 --- a/src/vnsw/agent/port_ipc/vrouter-port-control +++ b/src/vnsw/agent/port_ipc/vrouter-port-control @@ -11,6 +11,7 @@ import json import os import errno import datetime +import re sys.path.insert(1, sys.path[0]+'/../api-venv/lib/python2.7/site-packages') from vnc_api.vnc_api import * @@ -78,7 +79,9 @@ class VrouterPortControl(object): # Turn off help, so we print all options in response to -h conf_parser = argparse.ArgumentParser(add_help = False) - args, remaining_argv = conf_parser.parse_known_args(args_str.split()) + args, remaining_argv = conf_parser.parse_known_args( + re.compile("\s+(?=\-\-)").split(args_str) + ) # Don't surpress add_help here so it will handle -h parser = argparse.ArgumentParser( diff --git a/src/vnsw/agent/vrouter/ksync/ksync_init.cc b/src/vnsw/agent/vrouter/ksync/ksync_init.cc index b6b6a05eea7..fc848ff2969 100644 --- a/src/vnsw/agent/vrouter/ksync/ksync_init.cc +++ b/src/vnsw/agent/vrouter/ksync/ksync_init.cc @@ -94,7 +94,7 @@ void KSync::NetlinkInit() { event_mgr = agent_->event_manager(); boost::asio::io_service &io = *event_mgr->io_service(); - KSyncSockNetlink::Init(io, DB::PartitionCount(), NETLINK_GENERIC); + KSyncSockNetlink::Init(io, 1, NETLINK_GENERIC); KSyncSock::SetAgentSandeshContext(new KSyncSandeshContext( flowtable_ksync_obj_.get())); GenericNetlinkInit();