Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R4.1 #101

Open
wants to merge 130 commits into
base: master
Choose a base branch
from
Open

R4.1 #101

wants to merge 130 commits into from

Conversation

olayinkaoladimeji
Copy link

No description provided.

vmahuli and others added 30 commits September 25, 2017 00:17
Change-Id: I12ffeccc1dec08705ac16a6a5bd1f998920b9b46
Partial-bug: #1696209
DPDK 17.02 has changed the RTE_LOG api. Need to use RTE_LOG_DP to
avoid compilation of the log funciton.

Change-Id: I674154764a4fa3529f28d383f334b83c71f7040a
To convert the hash table entry to fragment entry, hash table API's like
vr_find_free_hentry() are directly passed to Fragment Macro
VR_FRAGMENT_FROM_HENTRY(). As the function address gets passed to the
macro, function gets invoked twice leading issues like fragment never
getting added to fragment table

As a fix, the hash table API's return value is stored and that is
passed to fragment Macro

Change-Id: I9350551f6bc3a6b28eaf8acea8df2d2ec42c3eae
closes-bug: #1721251
Partial-bug: #1724326
Used tcpnodelay option for the tcp socket between
agent and dpdk vrouter to avoid inconsistent delays.
This was causing huge variation in hold entries.

Change-Id: Ide2bcfaa28c992e31129a25b444028efce2f3c15
If ACL's are configured on Vhost interface, the packets need to be
subjected to flow processing. These packets would be un-tunneled
packets. The Vhost route in VRF 0 points to L3 Receive Nh. This NH can
not be marked with policy bit as we do not want to create Flows for
outer IP fields of Tunneled packets which also get processed by L3
receive NH. Also if Policy on Vhost is disabled and if L3 receive NH is
enabled with Relaxed policy, packets have to be flow processed  for
link local processing. For this reason, policy bit on Receive NH's
outgoing interface is verified to decide whether to invoke flow processing
for packets

Change-Id: I86b8ea2ecdf1460bb0d3599675d84ad46dfcf1e8
closes-bug: #1711045
For Vxlan tunneled packets that are received on Fabric interface RPF
callback was missing, leading to no RPF validation of the vxlan packets.

Due to Tor Evpn support, Tors can be in Ecmp, which is a composite Ecmp
nexthop in Vrouter. When the first packet corressponding to unique 5 tuple is
received on Fabric (rather from VM) from one of the Ecmp sources of Ecmp
composite nexthop, the component nexthop is pinned to the source, only if
RPF callback exists. Lack of this RPF callback was making the Ecmp go
wrong

As a fix, Vxlan RPF call back is provided

Change-Id: If53a0bc76398cfc8c176a3a94a0aede8b26262b4
closes-bug: #1724681
Change-Id: I2135446723dddf526babf1f033dd1492e22e9cf3
closes-bug: #1729818
Vrouter currently allocates contiguous memory for Flow and Bridge
tables and this forces the module to be inserted into kernel at the
boot up time as huge memory is likely to be presetnt at the boot of the
sytem. This requirement is taken out with the huge pages support.

Provisioning:
If the huge page support is intended in the system, provisioning of the
compute node takes care of enabling the required number of huge pages in
the system. After enabling the huge pages successfully, the hugetable
file system is mounted at a specific location and creates the required
files (2 files as of now - vrouter_mem1 and vrouter_mem2). The location
of these files is made avaialble to Agent.

Agent changes:
Agent looks for the huge files at the specific path provided by
provisioning (through agent configuration file) and mmap()s the 1G size
of data. If succeeds, Agent shares this virtual memory address with
Vrouter kernel module through netlink socket. If the huge files do not
exist, or if the mmap fails, still it communicates the failure of this
to Vrouter kernel module. This messaging is done in blockig way, so that
no other configuration is added to Vrouter till this is communicated.
Along with 1G pages, Agent attempts to allocate same number of 2Mb
pages, which Vrouter internally uses. Even these page addresses are sent
to Vrouter kernel module using same sandesh message

Vrouter changes:
1) The module_init routine is changed in such a way that, the modules like
   fib and flow which require huge contiguous amount of memory do not
   allocate any memory in their init routines.

2) A new call back "mem" is introduced which allocates the required
   memory and is invoked after succesfully initialising the hugepages
   when Agent communicates it.

3) The modules like interface etc are initialised as usual so that
   addition of fabric and Vhost interfaces are created normally and
   pushed to cross connect mode till the other configuration is
   available

4) When Agent communicates the virtual memory address, Vrouter pins
   these pages using get_user_pages(). The "mem" callback is invoked
   which allocates the memory either from huge pages or using regular
   btable. The memory required to hold this huge page in terms of 4Kb
   pages, is another huge page of size 2Mb. This 2Mb huge memory is also
   sent by Agent. If no such huge page of size 2Mb is sent by Agent,
   Vroter allocates the memory using regular malloc.

5) As the momeory allocate from the huge pages is intended for the
   entire life time of Vrouter module free() routinues are not provided
   for these memory segments. These pages are freed from the system when
   the module is removed from the kernel.

6) If agent restarts, memeory is not reinitilised again and proper
   return codes are shared with Agent

Change-Id: I942ed42463d7a0aba7bf857df7c62d9c95b83056
closes-bug: #1728925
Fortville NICs do not allow setting the MTU while the NIC is running.

Change-Id: I4280715db2f3085041aa7bb5796665a65409effd
Closes-Bug: 1729742
Currently the healthcheck is broken if policy is enabled on Vhost
interface. Health check packets are destined to metadata IPs. The
metadata routes point to VMI's in Vhost VRF. Regular VM's routes are
present in VN's VRF. If VM is detected unreachable, HC withdraws routes
in VRF corresponding to VM, but metadata Routes are not withdrawn. HC
packets should be routed using these metadata routes, if routes are
withdrawn. If Vhost has policy enabled, flow processing happens earlier
than route look up and destination metadata IP's are Nated to VM's IP.
These Nated IP's would be looked up in VN's VRF which results in drop
nexthop as routes are withdrawn.

The solutino is not to Nat the packet till the route lookup is complete
for these metadata IP's if the policy is enabled. The required flow
processing would be completed if the nexthop is marked Policy enabled.

Change-Id: I144c36faf39b062026316a067e912eed5a2fa792
closes-bug: #1724945
Currently the Ecmp component NH is calculated for IP/IP6 packets only as
Ecmp has been supported for only for L3 packets. With the support of L2 Ecmp, the
component NH needs to be chosen even for L2 packets. As there is no Flow
available for L2 packets, the hash is calculated on Ether dst, src and
VRF

Change-Id: I61085c729da9633630088604c1a6b8db5897bba8
closes-bug: #1732285
Currently, if there is a sub interface created with a Vlan Tag on any
parent interface and if mirroring is enabled on subinterface, the Vlan
Tag of the subinterface is also carried in the mirrored packet. Incase
of Vmware compute, all the VMIs are configred in Vrouter as
subinterfaces with Pvlan. And if the mirroring is enabled on these
VMI's, as the VMI's are subinterfaces, even these vlans are carried in
the mirroring. But the expectation is not to carry these Vlan tags as
these Vlan tags are Contrail specific and not seen by the user.

As a fix, there is a new VIF flag introduced "VIF_FLAG_MIRROR_NOTAG"
which is set by Agent on Vxmware VMI's only. If these flags is seen on
VIF, Vrouter discards the  Vlan Tag before mirroring the packets

Change-Id: I9bd8c33c735159937f4b325a6ee67540f4f15f39
closes-bug: #1711459
This is sandesh change for using different multicast VRF in terface req for VN with provider network

Change-Id: I0c0310232b5e44eec780ad54d552e6750d42b9a6
partial-bug: #1728545
Currently if a VN is configured with Provider network, the VRF of the
VMI's belonging to that VN are seen as Fabric VRF in Vrouter. But the
Multicast/Broadcast tree is built in the VRF corresponding to VN. When
BUM traffic is received on such VMI, traffic needs to be replicated as
per the mutlicast tree of VN's VRF rather on Fabric VRF. To achieve
this, the VN's VRF is added to VMI as different VRF and the bridge
lookup is done for this traffic in this new VRF.

Change-Id: I38721e27cebe80c7f4d14937588ba5b6c180b112
closes-bug: #1728545
With the introduction of Xps (Xmit packet steering), the sender_cpu
needs to be cleared before dev_queue_transmit(). Not doing this results
in accessing netdev's xps map at a wrong location resulting in a
crash. Clearing this makes the hard xmit calculate the sener_cpu

Change-Id: Ifb0757ffdaa3e27b15261a9944281d9224afa8f0
closes-bug: #1733431
When L2 Ecmp is not in place, packets to/from Ecmp source are forced to
L3 packets by replying ARP requests using Vrouter's and Vhost's MAc.
This is no longer required once we have L2 Ecmp in place.

So all ARP processing which specially deals with Ecmp source side is not
reqired any more. This processing is removed from the current ARP
handling

Change-Id: I484da69b3c173891b915fb86b81c2e57d711b18a
closes-bug: #1733811
Incase of TCP a session to VRRP address, if VRRP mastership changes,
flow's RPF nexthop changes as RPF nexthop needs to start pointing to new
Master. But the TCP other end might send a FIN/FIN-ACK to old VRRP
master with the intention of tearing down the session. Due to RPF
failure these packets get dropped. Before dropping the packets itself
Flow getis marked with the required flags. Incase of last FIN-Ack packet
being dropped, the Flow gets marked as Dead but eviction does not get
kicked-off as eviction is not invoked for a packet that is dropped. Due
to this, the flow never gets evicted and Agent also might not delete
this flow as part of Aging, if these flows are for BGPaaS. To fix this,
eviction is kicked-off, if the Flow is marked Dead and even if the
packet is dropped.

Change-Id: Ib4a527477403f40bc3016c7ea58813f168a81e02
closes-bug: #1733608
rt --monitor can be used to get route creations and deletions in
live. They are broadcasted on Netlink by the vrouter kernel module.
Only routes of type AF_INET and AF_INET6 are fully printed while
routes of type AF_BRIDGE are not generated yet.

The output format is 'jsonline' to be easily parsed by external
tools.

This is an example of the output:
{"operation":"add","family":"AF_INET","vrf_id":2,"prefix":32,"address":"20.1.1.254","nh":12,"flags":{"label_valid":false, "arp_proxy":true, "arp_trap":true, "arp_flood":false}}

Change-Id: I8cb8556f1c4adda0bb0ef10f98fc38702b2942c4
Closes-bug: #1650316
(cherry picked from commit 12b9b49)
If fragmentes are being processed, fragment assembler maintains the
statistics of framents in the fragment entry to decide when to delete
the fragment. When mirroring is enabled, the mirrored packet gets
processes earlier than real packet and these fragment calculations are
invoked on the mirrored packet leading to fragment entry being deleted
if the mirrored packet is the last fragment of the packet. Once the
fragment entry is deleted and mirrored packet is out of system, the real
packet gets processes and this packet will not have a matching fragment
entry to do the flow processing resulting in packet being discarded.

As a fix, the fragment calculations are invoked only when the real
packets are processed rather for mirrored packets.

Change-Id: I30b3092622f8c661c6bebaf5ccbff6c9621cc3dc
closes-bug: #1739602
(cherry picked from commit c9bf938)
Closes-bug: #1734994
Send netlink updates to agent for the VM's port in
dpdk mode of operation when VM state changes.

Change-Id: I9bc3cf8da01ed97ea2409ea2c16239447a867924
VIF_FLAG_GRO_NEEDED and VIF_FLAG_ETREE_ROOT flags were conflicting

Change-Id: Iefa51d3561fc208690fef9b1e6942167a3d925a9
Closes-Bug: 1748261
…from the vrouter,

instead of reading each flow entries and counting hold entries based on its status.
Now with this change flow -s and flow -r will directly read the hold entries from the vrouter.

Change-Id: I2d873583ac9668d9bc38c2305d86f3256ab29d94
Closes-Bug: 1738282
…ntries from the vrouter, instead of reading each flow entries and counting hold entries based on its status. Now with this change flow -s and flow -r will directly read the hold entries from the vrouter." into R4.1
Closes-bug: #1750711
When the dpdk vrouter tries to forward the traffic
after the vif gets deleted, the vif data structure
is access to update the statistics. This is causing
the vrouter to crash. Fixed by removing the stats
update after the vif is deleted.

Change-Id: Id3dcf31a8cdf98d6a6cc92ff552963f490476c68
anandrao79 and others added 30 commits January 31, 2019 13:57
In BGPaaS in pkt mode, the pkt will be L3 routed with the NH
pointing to pkt0 or GW interface at one stage (before NAT).
In single hop BGPaaS case, since the TTL is decremented in
vr_forward() during L3 routing, the pkt is dropped later
since the TTL becomes 0. This leads to BGP session not
coming up at all.

Change-Id: I959dcf4e1b316a76698f2b5d954bbaf0f197b64e
closes-jira-bug: JCB-199874
Issue fixed by setting GRO and merge buffer flag in vif_set_flags API.
If those flags are already set, it will retain else ignore.
Closes Jira Bug: JCB-218956

Change-Id: I45020c3231fe79ea7b4b629471c234d2cb87d1e9
After an agent soft reset, vif 0 gets deleted and added back. But due to
a timing issue, before it gets added back, the vhost0 MTU notification
used to come resulting in MTU not getting set (Since vif is not added
yet). Fix is to query the MTU from PMD during vif 0 addition.

Change-Id: I2b102b82e21fcc137e62db748d8982bdbbdf87e2
Closes-Bug: 1795839
(cherry picked from commit 514d42f)
Fixed pull counter issue by having check for GRE IP fragmentation check.
If Outer header is GRE and fragmented packet, we return 0,:wq so that it
will be queued for furhter processing.
Closes Jira Bug: JCB-218856

Change-Id: I805c7d66d2dad20c15dfe2656a8fe24e746b1c99
closes-jira-bug: CEM-2807
Limitation in rte_port_ethdev_writer_tx_bulk in the pkts_mask
field that we can transfer only 64 segments. The segmentation
code can send a max of 128 segments, which was causing the
mbuf leak. Fixed the code, by calling the tx_bulk function in
a loop

Change-Id: I81212f1ae32e47dd80c0938631e466db98242a3e
closes-jira-bug: CEM-2807
tx_bulk can support upto burst of 64 packets.
Change the send size to 64 instead of 32.

Change-Id: Ifbf45e9fc23dd4d8e0f69fefc745730d3fbf2726
closes-jira-bug: CEM-2996
Added a null check for variable.

Change-Id: Ieab7662d27340bddd2c36a30bfa7aad3a4b3f67c
…:IPv4

when mirroring is configured at policy level to a physical analyzer

    Root cause: issue seen for mirroring Ingress packet while trying to form
    overlay ethernet header. It tries to form from next hop encap data and
    its always IPV4.
    Fixed the issue by checking the packet type and updating overlay
    ethernet header protocol.

Change-Id: Ie998d7de7f7d0869f46a646e4a147d2cea638565
closes-jira-bug: CEM-4574
Change-Id: Ic31876d87610e1998593431053593b7f17bae207
closes-jira-bug: CEM-5420
closes-jira-bug: CEM-4659
Seems like the 18.05 dpdk has issue when sending 64
packets in burst. Reverting back to use VR_DPDK_TX_BURST_SZ

Change-Id: I4d08be695a5a7968f6d07be2688bf1243b399f82
(cherry picked from commit a61b3dc)
Updated Defensive check to overcome this issue, If actual interface provided
for "vifdump stop <Id>" an erro message would be displayed.
closes-jira-bug: CEM-5980

Change-Id: I2e5ff28e7da15527c9ac4628ebff5aa1dee5f233
Root cause:
===========
This is due a race condition between setting of the
Evict Candidate flag and the Eviction of the flow.

We have 2 threads here say Thread1 and Thread2.

Thread1 is in the flow defer_callback function,
while Thread2 is in the flow mark evict function.
Now consider the following sequence (Time T0 to T4)
which leads to the non eviction of the flow 100.
T0 and T3 are executed in Thread1, while T1, T2 and T4
are executed as part of Thread 2.

Lets assume the flow index for the flows as 100 and 200.

Thread 1                               Thread 2
-------------                          -------------
Defer_cb_Func()                        Mark_Evict_Func()

Time T0 -  CheckEvictFlow(100)  –
           No op, as Evict candidate
           flag is not set for flow 100

                                      Time T1 – Set Evict Candidate
                                                for flow(100)
                                      Time T2 – Set Evict Candidate
                                                for flow(200)

Time T3 – CheckEvictFlow(200) –
          Now flow 200 would get
          evicted since Evict
          Candidate flag is set.

                                      Time T4 – Schedule Defer_cb for
                                                flow(200)

Now, since flow 200 already got evicted at Time T3,
the callback would never be scheduled at time T4.
And hence, the flow 100 would never get evicted.
Was able to repro the problem by simulating the above order
using instrumented code and scapy script.

The problem is with Defer_cb_func() which should do the eviction
only when the Evict Candidate flag is set for both flows.

Fix:
====
The fix is to make sure the eviction is done only after both
the flows evict candidate flags are set.

Testing:
========
- Done UT for the code changes
- QA has qualified the fix

Change-Id: I07730ab190646260d08de6ca4fdf9bc1caf16d6e
closes-jira-bug: CEM-4275
(cherry picked from commit e04c2b0)
…to Type:IPv4 when mirroring is configured at policy level to a physical analyzer" into R4.1
closes-jira-bug: CEM-6709

Change-Id: If74a824cf7493a3e5c5152902c49a8a0dd88f9b3
closes-jira-bug: CEM-5251
Disabling promiscuous mode should be done based on the result
of adding the unicast address.
Drivers such as i40e doesnt support the apis to set multicast
address. But the bond driver is made to enable all multicast for
all the slave drives. So this shouldnt be an issue.

Change-Id: Ie754fad59a215462b62da6e2ab309de13422d1fd
(cherry picked from commit 7a20fc6)
When vr_send_broadcast() is called during route add or delete,
there will be a 2048 slab memory leak. This is because response->vr_message_buf
is getting set to NULL and so vr_message_free() will not free
response->vr_message_buf.

response->vr_message_buf is only applicable to unicast netlink case and not
for multicast netlink. Fix is to move it to unicast case only.

Closes-Jira-bug: CEM-5343
Change-Id: Ia4d035e88765c55f65fedce1963000c0bced55c1
As per linux kernel git log, the get_user_pages() was changed in 4.4.168
which also maps to Ubuntu 4.4.0-143.

Fixed by checking kernel version 4.4.168.

Verified by building vrouter.ko on CI VM with ubuntu 4.4.0-143
and regular centos setup.

Change-Id: Ie759e567a6f5f82d1c4d1dd6ea0f21c23f7287ff
closes-jira-bug: CEM-4223
(cherry picked from commit c33e99f)
A spurious vr_uvh_cl_timer_setup() call was happening which was leaking
fds

Closes-Jira-Bug: CEM-10799
Change-Id: I1a02ba31b747469f8ebab79c603f80b8102238a4
(cherry picked from commit ec8ede8)
This reverts commit 33778eb.
Closes-Jira-Bug: CEM-11181

Change-Id: I78b909a442df6b2ec6e77520fe38b3ee17518f95
Root cause:

The issue was reproduced with the following sequence of route
adds and delete.

addr = 10.60.7.0 plen = 25 vrf = 4 operation = ADD/CHANGE nh_idx = 38 label = 410760
addr = 10.60.0.0 plen = 23 vrf = 4 operation = ADD/CHANGE nh_idx = 38 label = 410760
addr = 10.60.7.128 plen = 25 vrf = 4 operation = ADD/CHANGE nh_idx = 38 label = 410760
addr = 10.60.0.0 plen = 20 vrf = 4 operation = ADD/CHANGE nh_idx = 63 label = 609700
addr = 10.60.0.0 plen = 20 vrf = 4 operation = DELETE nh_idx = 51

After executing this sequence, it was observed that the 10.60.7.0/25 and 10.60.7.128/25
routes were getting deleted as part of 10.60.0.0/20 delete operation.
This was because as part of delete we were deleting the mtrie bucket containing 10.60.7.0/25
and 10.60.7.128/25 routes. The mtrie bucket was being deleted unconditionally and wrongly.

Fix:

The mtrie bucket deletion logic is changed to delete the bkt using one of the
entries in the bucket instead of deleting it using the values from the route delete
request. This fixes the issue of bucket with non matching prefix length being deleted.

Verification:

The fix was verified by rerunning the same sequence again and
checking if the 10.60.7.0/25 and 10.60.7.128/25 routes are present or not.
The fix also passed full vrouter regression.

closes-jira-bug: CEM-11421
Change-Id: I4e879a1753f273a8c23b1baf5e82b8d56f675e98
When VIF is added but not connected, vru_cl->vruc_fd is not added to the
fd list even though it is created. This can happen when VM is stopped.
In this case, vr_uvhost_del_fds_by_arg() would not close
the fd. This leads to socket leak, so added fcntl() call in
vr_uvhost_del_client() to check if vruc_fd is closed or not.
Closes-jira-bug: CEM-9285

Change-Id: Ia55e44b8830b96e8313691dc77873bfd133e1469
(cherry picked from commit 194c8bc)
Return mod of hash table during scanning instead of monotonically
increasing number. With the later, there can be integer overflow
after some time resulting in incorrectly being handled as error.
Due to this scanning of hashtable will stall.

Change-Id: Ia55da9a6d7e66db21638c4c71f50d7940b02138a
Closes-Jira-Bug: CEM-17147
(cherry picked from commit 9195829)
Issue:
When the VM is rebooted, server stops running and health check fails momentarily, due to which the route to VM points to nh_discard. When the first TCP SYN packet is sent, it is trapped by agent while flow is programmed. Before the packet is enqueued NH of the packet is set to NULL. After flushing the packet, vr_inet_route_lookup() is done on packet with NAT IP whose route is already pointed to nh_discard. Due to this the first SYN packet is being dropped. By the time retransmission of this packet happens, it reaches health check timeout of 1 sec and connection closes. This keeps on repeating

Fix:
After flushing the packet, before the IP NAT happens, we do a vr_inet_route_lookup() which gives us the NH associated with non-NAT IP which prevents the first SYN packet drop.

Closes-Jira-Bug: CEM-11226
Change-Id: I9193242918d42bcf1bc7b6590884a3da9f785d20
… Listening after closure of connections; once max connections are reached.

 New implementation shall ensure that.

Change-Id: Ief2e6ba15f7da127b4d488484bff346fec00374e
Closes-jira-bug: CEM-18916
(cherry picked from commit 385c774)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet