New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
R4.1 #101
Open
olayinkaoladimeji
wants to merge
130
commits into
master
Choose a base branch
from
R4.1
base: master
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
R4.1 #101
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Change-Id: I12ffeccc1dec08705ac16a6a5bd1f998920b9b46
Partial-bug: #1696209 DPDK 17.02 has changed the RTE_LOG api. Need to use RTE_LOG_DP to avoid compilation of the log funciton. Change-Id: I674154764a4fa3529f28d383f334b83c71f7040a
To convert the hash table entry to fragment entry, hash table API's like vr_find_free_hentry() are directly passed to Fragment Macro VR_FRAGMENT_FROM_HENTRY(). As the function address gets passed to the macro, function gets invoked twice leading issues like fragment never getting added to fragment table As a fix, the hash table API's return value is stored and that is passed to fragment Macro Change-Id: I9350551f6bc3a6b28eaf8acea8df2d2ec42c3eae closes-bug: #1721251
Partial-bug: #1724326 Used tcpnodelay option for the tcp socket between agent and dpdk vrouter to avoid inconsistent delays. This was causing huge variation in hold entries. Change-Id: Ide2bcfaa28c992e31129a25b444028efce2f3c15
If ACL's are configured on Vhost interface, the packets need to be subjected to flow processing. These packets would be un-tunneled packets. The Vhost route in VRF 0 points to L3 Receive Nh. This NH can not be marked with policy bit as we do not want to create Flows for outer IP fields of Tunneled packets which also get processed by L3 receive NH. Also if Policy on Vhost is disabled and if L3 receive NH is enabled with Relaxed policy, packets have to be flow processed for link local processing. For this reason, policy bit on Receive NH's outgoing interface is verified to decide whether to invoke flow processing for packets Change-Id: I86b8ea2ecdf1460bb0d3599675d84ad46dfcf1e8 closes-bug: #1711045
For Vxlan tunneled packets that are received on Fabric interface RPF callback was missing, leading to no RPF validation of the vxlan packets. Due to Tor Evpn support, Tors can be in Ecmp, which is a composite Ecmp nexthop in Vrouter. When the first packet corressponding to unique 5 tuple is received on Fabric (rather from VM) from one of the Ecmp sources of Ecmp composite nexthop, the component nexthop is pinned to the source, only if RPF callback exists. Lack of this RPF callback was making the Ecmp go wrong As a fix, Vxlan RPF call back is provided Change-Id: If53a0bc76398cfc8c176a3a94a0aede8b26262b4 closes-bug: #1724681
Change-Id: I2135446723dddf526babf1f033dd1492e22e9cf3 closes-bug: #1729818
Vrouter currently allocates contiguous memory for Flow and Bridge tables and this forces the module to be inserted into kernel at the boot up time as huge memory is likely to be presetnt at the boot of the sytem. This requirement is taken out with the huge pages support. Provisioning: If the huge page support is intended in the system, provisioning of the compute node takes care of enabling the required number of huge pages in the system. After enabling the huge pages successfully, the hugetable file system is mounted at a specific location and creates the required files (2 files as of now - vrouter_mem1 and vrouter_mem2). The location of these files is made avaialble to Agent. Agent changes: Agent looks for the huge files at the specific path provided by provisioning (through agent configuration file) and mmap()s the 1G size of data. If succeeds, Agent shares this virtual memory address with Vrouter kernel module through netlink socket. If the huge files do not exist, or if the mmap fails, still it communicates the failure of this to Vrouter kernel module. This messaging is done in blockig way, so that no other configuration is added to Vrouter till this is communicated. Along with 1G pages, Agent attempts to allocate same number of 2Mb pages, which Vrouter internally uses. Even these page addresses are sent to Vrouter kernel module using same sandesh message Vrouter changes: 1) The module_init routine is changed in such a way that, the modules like fib and flow which require huge contiguous amount of memory do not allocate any memory in their init routines. 2) A new call back "mem" is introduced which allocates the required memory and is invoked after succesfully initialising the hugepages when Agent communicates it. 3) The modules like interface etc are initialised as usual so that addition of fabric and Vhost interfaces are created normally and pushed to cross connect mode till the other configuration is available 4) When Agent communicates the virtual memory address, Vrouter pins these pages using get_user_pages(). The "mem" callback is invoked which allocates the memory either from huge pages or using regular btable. The memory required to hold this huge page in terms of 4Kb pages, is another huge page of size 2Mb. This 2Mb huge memory is also sent by Agent. If no such huge page of size 2Mb is sent by Agent, Vroter allocates the memory using regular malloc. 5) As the momeory allocate from the huge pages is intended for the entire life time of Vrouter module free() routinues are not provided for these memory segments. These pages are freed from the system when the module is removed from the kernel. 6) If agent restarts, memeory is not reinitilised again and proper return codes are shared with Agent Change-Id: I942ed42463d7a0aba7bf857df7c62d9c95b83056 closes-bug: #1728925
Fortville NICs do not allow setting the MTU while the NIC is running. Change-Id: I4280715db2f3085041aa7bb5796665a65409effd Closes-Bug: 1729742
Currently the healthcheck is broken if policy is enabled on Vhost interface. Health check packets are destined to metadata IPs. The metadata routes point to VMI's in Vhost VRF. Regular VM's routes are present in VN's VRF. If VM is detected unreachable, HC withdraws routes in VRF corresponding to VM, but metadata Routes are not withdrawn. HC packets should be routed using these metadata routes, if routes are withdrawn. If Vhost has policy enabled, flow processing happens earlier than route look up and destination metadata IP's are Nated to VM's IP. These Nated IP's would be looked up in VN's VRF which results in drop nexthop as routes are withdrawn. The solutino is not to Nat the packet till the route lookup is complete for these metadata IP's if the policy is enabled. The required flow processing would be completed if the nexthop is marked Policy enabled. Change-Id: I144c36faf39b062026316a067e912eed5a2fa792 closes-bug: #1724945
Currently the Ecmp component NH is calculated for IP/IP6 packets only as Ecmp has been supported for only for L3 packets. With the support of L2 Ecmp, the component NH needs to be chosen even for L2 packets. As there is no Flow available for L2 packets, the hash is calculated on Ether dst, src and VRF Change-Id: I61085c729da9633630088604c1a6b8db5897bba8 closes-bug: #1732285
Currently, if there is a sub interface created with a Vlan Tag on any parent interface and if mirroring is enabled on subinterface, the Vlan Tag of the subinterface is also carried in the mirrored packet. Incase of Vmware compute, all the VMIs are configred in Vrouter as subinterfaces with Pvlan. And if the mirroring is enabled on these VMI's, as the VMI's are subinterfaces, even these vlans are carried in the mirroring. But the expectation is not to carry these Vlan tags as these Vlan tags are Contrail specific and not seen by the user. As a fix, there is a new VIF flag introduced "VIF_FLAG_MIRROR_NOTAG" which is set by Agent on Vxmware VMI's only. If these flags is seen on VIF, Vrouter discards the Vlan Tag before mirroring the packets Change-Id: I9bd8c33c735159937f4b325a6ee67540f4f15f39 closes-bug: #1711459
This is sandesh change for using different multicast VRF in terface req for VN with provider network Change-Id: I0c0310232b5e44eec780ad54d552e6750d42b9a6 partial-bug: #1728545
Currently if a VN is configured with Provider network, the VRF of the VMI's belonging to that VN are seen as Fabric VRF in Vrouter. But the Multicast/Broadcast tree is built in the VRF corresponding to VN. When BUM traffic is received on such VMI, traffic needs to be replicated as per the mutlicast tree of VN's VRF rather on Fabric VRF. To achieve this, the VN's VRF is added to VMI as different VRF and the bridge lookup is done for this traffic in this new VRF. Change-Id: I38721e27cebe80c7f4d14937588ba5b6c180b112 closes-bug: #1728545
With the introduction of Xps (Xmit packet steering), the sender_cpu needs to be cleared before dev_queue_transmit(). Not doing this results in accessing netdev's xps map at a wrong location resulting in a crash. Clearing this makes the hard xmit calculate the sener_cpu Change-Id: Ifb0757ffdaa3e27b15261a9944281d9224afa8f0 closes-bug: #1733431
When L2 Ecmp is not in place, packets to/from Ecmp source are forced to L3 packets by replying ARP requests using Vrouter's and Vhost's MAc. This is no longer required once we have L2 Ecmp in place. So all ARP processing which specially deals with Ecmp source side is not reqired any more. This processing is removed from the current ARP handling Change-Id: I484da69b3c173891b915fb86b81c2e57d711b18a closes-bug: #1733811
Incase of TCP a session to VRRP address, if VRRP mastership changes, flow's RPF nexthop changes as RPF nexthop needs to start pointing to new Master. But the TCP other end might send a FIN/FIN-ACK to old VRRP master with the intention of tearing down the session. Due to RPF failure these packets get dropped. Before dropping the packets itself Flow getis marked with the required flags. Incase of last FIN-Ack packet being dropped, the Flow gets marked as Dead but eviction does not get kicked-off as eviction is not invoked for a packet that is dropped. Due to this, the flow never gets evicted and Agent also might not delete this flow as part of Aging, if these flows are for BGPaaS. To fix this, eviction is kicked-off, if the Flow is marked Dead and even if the packet is dropped. Change-Id: Ib4a527477403f40bc3016c7ea58813f168a81e02 closes-bug: #1733608
rt --monitor can be used to get route creations and deletions in live. They are broadcasted on Netlink by the vrouter kernel module. Only routes of type AF_INET and AF_INET6 are fully printed while routes of type AF_BRIDGE are not generated yet. The output format is 'jsonline' to be easily parsed by external tools. This is an example of the output: {"operation":"add","family":"AF_INET","vrf_id":2,"prefix":32,"address":"20.1.1.254","nh":12,"flags":{"label_valid":false, "arp_proxy":true, "arp_trap":true, "arp_flood":false}} Change-Id: I8cb8556f1c4adda0bb0ef10f98fc38702b2942c4 Closes-bug: #1650316 (cherry picked from commit 12b9b49)
If fragmentes are being processed, fragment assembler maintains the statistics of framents in the fragment entry to decide when to delete the fragment. When mirroring is enabled, the mirrored packet gets processes earlier than real packet and these fragment calculations are invoked on the mirrored packet leading to fragment entry being deleted if the mirrored packet is the last fragment of the packet. Once the fragment entry is deleted and mirrored packet is out of system, the real packet gets processes and this packet will not have a matching fragment entry to do the flow processing resulting in packet being discarded. As a fix, the fragment calculations are invoked only when the real packets are processed rather for mirrored packets. Change-Id: I30b3092622f8c661c6bebaf5ccbff6c9621cc3dc closes-bug: #1739602 (cherry picked from commit c9bf938)
Closes-bug: #1734994 Send netlink updates to agent for the VM's port in dpdk mode of operation when VM state changes. Change-Id: I9bc3cf8da01ed97ea2409ea2c16239447a867924
VIF_FLAG_GRO_NEEDED and VIF_FLAG_ETREE_ROOT flags were conflicting Change-Id: Iefa51d3561fc208690fef9b1e6942167a3d925a9 Closes-Bug: 1748261
…from the vrouter, instead of reading each flow entries and counting hold entries based on its status. Now with this change flow -s and flow -r will directly read the hold entries from the vrouter. Change-Id: I2d873583ac9668d9bc38c2305d86f3256ab29d94 Closes-Bug: 1738282
…ntries from the vrouter, instead of reading each flow entries and counting hold entries based on its status. Now with this change flow -s and flow -r will directly read the hold entries from the vrouter." into R4.1
Closes-bug: #1750711 When the dpdk vrouter tries to forward the traffic after the vif gets deleted, the vif data structure is access to update the statistics. This is causing the vrouter to crash. Fixed by removing the stats update after the vif is deleted. Change-Id: Id3dcf31a8cdf98d6a6cc92ff552963f490476c68
In BGPaaS in pkt mode, the pkt will be L3 routed with the NH pointing to pkt0 or GW interface at one stage (before NAT). In single hop BGPaaS case, since the TTL is decremented in vr_forward() during L3 routing, the pkt is dropped later since the TTL becomes 0. This leads to BGP session not coming up at all. Change-Id: I959dcf4e1b316a76698f2b5d954bbaf0f197b64e closes-jira-bug: JCB-199874
Issue fixed by setting GRO and merge buffer flag in vif_set_flags API. If those flags are already set, it will retain else ignore. Closes Jira Bug: JCB-218956 Change-Id: I45020c3231fe79ea7b4b629471c234d2cb87d1e9
After an agent soft reset, vif 0 gets deleted and added back. But due to a timing issue, before it gets added back, the vhost0 MTU notification used to come resulting in MTU not getting set (Since vif is not added yet). Fix is to query the MTU from PMD during vif 0 addition. Change-Id: I2b102b82e21fcc137e62db748d8982bdbbdf87e2 Closes-Bug: 1795839 (cherry picked from commit 514d42f)
Fixed pull counter issue by having check for GRE IP fragmentation check. If Outer header is GRE and fragmented packet, we return 0,:wq so that it will be queued for furhter processing. Closes Jira Bug: JCB-218856 Change-Id: I805c7d66d2dad20c15dfe2656a8fe24e746b1c99
closes-jira-bug: CEM-2807 Limitation in rte_port_ethdev_writer_tx_bulk in the pkts_mask field that we can transfer only 64 segments. The segmentation code can send a max of 128 segments, which was causing the mbuf leak. Fixed the code, by calling the tx_bulk function in a loop Change-Id: I81212f1ae32e47dd80c0938631e466db98242a3e
closes-jira-bug: CEM-2807 tx_bulk can support upto burst of 64 packets. Change the send size to 64 instead of 32. Change-Id: Ifbf45e9fc23dd4d8e0f69fefc745730d3fbf2726
closes-jira-bug: CEM-2996 Added a null check for variable. Change-Id: Ieab7662d27340bddd2c36a30bfa7aad3a4b3f67c
…:IPv4 when mirroring is configured at policy level to a physical analyzer Root cause: issue seen for mirroring Ingress packet while trying to form overlay ethernet header. It tries to form from next hop encap data and its always IPV4. Fixed the issue by checking the packet type and updating overlay ethernet header protocol. Change-Id: Ie998d7de7f7d0869f46a646e4a147d2cea638565 closes-jira-bug: CEM-4574
Change-Id: Ic31876d87610e1998593431053593b7f17bae207 closes-jira-bug: CEM-5420
closes-jira-bug: CEM-4659 Seems like the 18.05 dpdk has issue when sending 64 packets in burst. Reverting back to use VR_DPDK_TX_BURST_SZ Change-Id: I4d08be695a5a7968f6d07be2688bf1243b399f82 (cherry picked from commit a61b3dc)
Updated Defensive check to overcome this issue, If actual interface provided for "vifdump stop <Id>" an erro message would be displayed. closes-jira-bug: CEM-5980 Change-Id: I2e5ff28e7da15527c9ac4628ebff5aa1dee5f233
Root cause: =========== This is due a race condition between setting of the Evict Candidate flag and the Eviction of the flow. We have 2 threads here say Thread1 and Thread2. Thread1 is in the flow defer_callback function, while Thread2 is in the flow mark evict function. Now consider the following sequence (Time T0 to T4) which leads to the non eviction of the flow 100. T0 and T3 are executed in Thread1, while T1, T2 and T4 are executed as part of Thread 2. Lets assume the flow index for the flows as 100 and 200. Thread 1 Thread 2 ------------- ------------- Defer_cb_Func() Mark_Evict_Func() Time T0 - CheckEvictFlow(100) – No op, as Evict candidate flag is not set for flow 100 Time T1 – Set Evict Candidate for flow(100) Time T2 – Set Evict Candidate for flow(200) Time T3 – CheckEvictFlow(200) – Now flow 200 would get evicted since Evict Candidate flag is set. Time T4 – Schedule Defer_cb for flow(200) Now, since flow 200 already got evicted at Time T3, the callback would never be scheduled at time T4. And hence, the flow 100 would never get evicted. Was able to repro the problem by simulating the above order using instrumented code and scapy script. The problem is with Defer_cb_func() which should do the eviction only when the Evict Candidate flag is set for both flows. Fix: ==== The fix is to make sure the eviction is done only after both the flows evict candidate flags are set. Testing: ======== - Done UT for the code changes - QA has qualified the fix Change-Id: I07730ab190646260d08de6ca4fdf9bc1caf16d6e closes-jira-bug: CEM-4275 (cherry picked from commit e04c2b0)
…to Type:IPv4 when mirroring is configured at policy level to a physical analyzer" into R4.1
closes-jira-bug: CEM-6709 Change-Id: If74a824cf7493a3e5c5152902c49a8a0dd88f9b3
closes-jira-bug: CEM-5251 Disabling promiscuous mode should be done based on the result of adding the unicast address. Drivers such as i40e doesnt support the apis to set multicast address. But the bond driver is made to enable all multicast for all the slave drives. So this shouldnt be an issue. Change-Id: Ie754fad59a215462b62da6e2ab309de13422d1fd (cherry picked from commit 7a20fc6)
When vr_send_broadcast() is called during route add or delete, there will be a 2048 slab memory leak. This is because response->vr_message_buf is getting set to NULL and so vr_message_free() will not free response->vr_message_buf. response->vr_message_buf is only applicable to unicast netlink case and not for multicast netlink. Fix is to move it to unicast case only. Closes-Jira-bug: CEM-5343 Change-Id: Ia4d035e88765c55f65fedce1963000c0bced55c1
As per linux kernel git log, the get_user_pages() was changed in 4.4.168 which also maps to Ubuntu 4.4.0-143. Fixed by checking kernel version 4.4.168. Verified by building vrouter.ko on CI VM with ubuntu 4.4.0-143 and regular centos setup. Change-Id: Ie759e567a6f5f82d1c4d1dd6ea0f21c23f7287ff closes-jira-bug: CEM-4223 (cherry picked from commit c33e99f)
A spurious vr_uvh_cl_timer_setup() call was happening which was leaking fds Closes-Jira-Bug: CEM-10799 Change-Id: I1a02ba31b747469f8ebab79c603f80b8102238a4 (cherry picked from commit ec8ede8)
This reverts commit 33778eb. Closes-Jira-Bug: CEM-11181 Change-Id: I78b909a442df6b2ec6e77520fe38b3ee17518f95
Root cause: The issue was reproduced with the following sequence of route adds and delete. addr = 10.60.7.0 plen = 25 vrf = 4 operation = ADD/CHANGE nh_idx = 38 label = 410760 addr = 10.60.0.0 plen = 23 vrf = 4 operation = ADD/CHANGE nh_idx = 38 label = 410760 addr = 10.60.7.128 plen = 25 vrf = 4 operation = ADD/CHANGE nh_idx = 38 label = 410760 addr = 10.60.0.0 plen = 20 vrf = 4 operation = ADD/CHANGE nh_idx = 63 label = 609700 addr = 10.60.0.0 plen = 20 vrf = 4 operation = DELETE nh_idx = 51 After executing this sequence, it was observed that the 10.60.7.0/25 and 10.60.7.128/25 routes were getting deleted as part of 10.60.0.0/20 delete operation. This was because as part of delete we were deleting the mtrie bucket containing 10.60.7.0/25 and 10.60.7.128/25 routes. The mtrie bucket was being deleted unconditionally and wrongly. Fix: The mtrie bucket deletion logic is changed to delete the bkt using one of the entries in the bucket instead of deleting it using the values from the route delete request. This fixes the issue of bucket with non matching prefix length being deleted. Verification: The fix was verified by rerunning the same sequence again and checking if the 10.60.7.0/25 and 10.60.7.128/25 routes are present or not. The fix also passed full vrouter regression. closes-jira-bug: CEM-11421 Change-Id: I4e879a1753f273a8c23b1baf5e82b8d56f675e98
When VIF is added but not connected, vru_cl->vruc_fd is not added to the fd list even though it is created. This can happen when VM is stopped. In this case, vr_uvhost_del_fds_by_arg() would not close the fd. This leads to socket leak, so added fcntl() call in vr_uvhost_del_client() to check if vruc_fd is closed or not. Closes-jira-bug: CEM-9285 Change-Id: Ia55e44b8830b96e8313691dc77873bfd133e1469 (cherry picked from commit 194c8bc)
Return mod of hash table during scanning instead of monotonically increasing number. With the later, there can be integer overflow after some time resulting in incorrectly being handled as error. Due to this scanning of hashtable will stall. Change-Id: Ia55da9a6d7e66db21638c4c71f50d7940b02138a Closes-Jira-Bug: CEM-17147 (cherry picked from commit 9195829)
Issue: When the VM is rebooted, server stops running and health check fails momentarily, due to which the route to VM points to nh_discard. When the first TCP SYN packet is sent, it is trapped by agent while flow is programmed. Before the packet is enqueued NH of the packet is set to NULL. After flushing the packet, vr_inet_route_lookup() is done on packet with NAT IP whose route is already pointed to nh_discard. Due to this the first SYN packet is being dropped. By the time retransmission of this packet happens, it reaches health check timeout of 1 sec and connection closes. This keeps on repeating Fix: After flushing the packet, before the IP NAT happens, we do a vr_inet_route_lookup() which gives us the NH associated with non-NAT IP which prevents the first SYN packet drop. Closes-Jira-Bug: CEM-11226 Change-Id: I9193242918d42bcf1bc7b6590884a3da9f785d20
… Listening after closure of connections; once max connections are reached. New implementation shall ensure that. Change-Id: Ief2e6ba15f7da127b4d488484bff346fec00374e Closes-jira-bug: CEM-18916 (cherry picked from commit 385c774)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.