Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrading from 1.04 to 1.0.18 , lldpd is stuck in loop after systemctl restart lldpd #639

Open
kanna-uk opened this issue Mar 14, 2024 · 5 comments
Labels

Comments

@kanna-uk
Copy link

kanna-uk commented Mar 14, 2024

Bug description

I have recently upgraded my local code from 1.0.4 release to 1.0.18 .
once the endpoints are connected the lldpctl is. showing all neighbours perfectly,
here is the output before doing the restart

root@tiger:mgmt:~# systemctl status lldpd.service 
● lldpd.service - LLDP daemon
     Loaded: loaded (/lib/systemd/system/lldpd.service; enabled; preset: enabled)
     Active: active (running) since Thu 2024-03-14 12:01:09 UTC; 7min ago
       Docs: man:lldpd(8)
   Main PID: 54215 (lldpd)
      Tasks: 2 (limit: 9081)
     Memory: 2.2M
        CPU: 135ms
     CGroup: /system.slice/lldpd.service
             ├─54215 "lldpd: monitor. "
             └─54218 "lldpd: 6 neighbors."

 lldpd[54218]: unable to send packet on real device for swp1s3: Network is down
 lldpd[54218]: unable to send packet on real device for swp31s3: Network is down
 lldpd[54218]: unable to send packet on real device for swp1s1: Network is down
 lldpd[54218]: unable to send packet on real device for swp31s2: Network is down
 lldpd[54218]: unable to send packet on real device for swp31s2: Network is down
 lldpd[54218]: unable to send packet on real device for swp1s1: Network is down
 lldpd[54218]: unable to send packet on real device for swp31s3: Network is down
lldpd[54218]: unable to send packet on real device for swp1s0: Network is down
 lldpd[54218]: unable to send packet on real device for swp1s3: Network is down
 lldpd[54218]: unable to send packet on real device for swp1s2: Network is down

but the actual problem starts when we do "systemctl restart lldpd.service"
The device is continuosly stuck in a loop and when i tried attaching gdb it shows the following back trace

#0  0x00005623b826cbfd in netlink_parse_rtattr (tb=tb@entry=0x7ffdc2458330, max=max@entry=44, rta=<optimized out>,
--
rta@entry=0x5623b9f71d00, len=<optimized out>) at ./src/daemon/netlink.c:289
#1  0x00005623b826d9ae in netlink_parse_linkinfo (len=<optimized out>, rta=0x5623b9f71cf0, iff=0x5623b9f79bd0)
at ./src/daemon/netlink.c:355
#2  netlink_parse_link (iff=0x5623b9f79bd0, msg=0x5623b9f71a90) at ./src/daemon/netlink.c:489
#3  netlink_recv (cfg=cfg@entry=0x5623b9f6fef0, s=<optimized out>, ifs=ifs@entry=0x5623b9f6f7f0, ifas=ifas@entry=0x0)
at ./src/daemon/netlink.c:752
#4  0x00005623b826e688 in netlink_initialize (cfg=cfg@entry=0x5623b9f6fef0) at ./src/daemon/netlink.c:1029
#5  0x00005623b826e941 in netlink_get_interfaces (cfg=cfg@entry=0x5623b9f6fef0) at ./src/daemon/netlink.c:1092
#6  0x00005623b826c17b in interfaces_update (cfg=cfg@entry=0x5623b9f6fef0) at ./src/daemon/interfaces-linux.c:1013
#7  0x00005623b826345d in lldpd_update_localports (cfg=cfg@entry=0x5623b9f6fef0) at ./src/daemon/lldpd.c:1402
#8  0x00005623b8263682 in lldpd_loop (cfg=cfg@entry=0x5623b9f6fef0) at ./src/daemon/lldpd.c:1420
#9  0x00005623b827a911 in levent_loop (cfg=cfg@entry=0x5623b9f6fef0) at ./src/daemon/event.c:581
#10 0x00005623b8264750 in lldpd_main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
at ./src/daemon/lldpd.c:2148
#11 0x00007f4d4256324a in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x00007f4d42563305 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#13 0x00005623b8260b81 in _start ()
root@tiger:~$:~# lldpcli -vv
lldpcli 0.7.16-1002-g8dfaf5c6-dirty
  Built on 2024-03-11T01:11:45Z

Additional output formats:   TEXT, KV, JSON, XML

C compiler command: C compiler command is not available for reproducible builds
Linker command:     Linker compiler command is not available for reproducible builds

Steps to reproduce the problem

bring up the lldp neighbours and see them in lldpctl
just after doing the systemctl restart lldpd.service the lldpd process is stuck in a loop occupying 100% cpu

Current outcome

Additional information

In order to resolve this i triggered the lldpd with -dddd option

-dddd -D send -D main -D interfaces -D lldp -D decode -D event -D receive -D alloc -D netlink c -M 4
With above command i do see the systemctl restart lldpd.service is working and having the neighbours updated.
What is the difference here ?

How to resolve this issue?

  • Output of ps -fp $(pgrep -d, -x lldpd):
  • BEFORE RESTART
root@tigert:/# ps -fp $(pgrep -d, -x lldpd)
UID          PID    PPID  C STIME TTY          TIME CMD
_lldpd     54215       1  0 12:01 ?        00:00:00 lldpd: monitor. 
_lldpd     54218   54215  0 12:01 ?        00:00:00 lldpd: 6 neighbors.
  • Output of uname -sro:

Linux 6.1.0-cl-1-amd64 GNU/Linux

@kanna-uk kanna-uk added the bug label Mar 14, 2024
@vincentbernat
Copy link
Member

What's the output of systemctl cat lldpd?

@kanna-uk
Copy link
Author

kanna-uk commented Mar 15, 2024

Hi Vincent,
thank you for prompt reply.
Here is the ouput
root@tiger:~# systemctl cat lldpd.service

/lib/systemd/system/lldpd.service

[Unit]
Description=LLDP daemon
Documentation=man:lldpd(8)
After=network-online.target networking.service syslog.service
RequiresMountsFor=/var/run/lldpd

[Service]
Type=notify
NotifyAccess=main
EnvironmentFile=-/etc/default/lldpd
EnvironmentFile=-/etc/sysconfig/lldpd
ExecStart=/usr/sbin/lldpd $DAEMON_ARGS $LLDPD_OPTIONS
Restart=always
StartLimitInterval=180
StartLimitBurst=3
TimeoutStartSec=30
PrivateTmp=yes
ProtectHome=yes
ProtectSystem=yes
ProtectSystem=full

systemd v232 and higher, only.

#ProtectKernelTunables=yes
#ProtectControlGroups=yes
#ProtectKernelModules=yes
#ProtectSystem=full

Security hardening:

NoNewPrivileges=true
RestrictRealtime=true
LockPersonality=true
MemoryDenyWriteExecute=true
ProtectKernelLogs=true
ProtectKernelTunables=true
ProtectProc=invisible
ProtectKernelModules=true
ProtectControlGroups=true
ProtectHostname=true
ProtectClock=yes
UMask=0077
RestrictSUIDSGID=true
RestrictNamespaces=true
PrivateDevices=true
SystemCallFilter=~@clock @debug @reboot @raw-io @swap @module @obsolete @cpu-emulation
SystemCallArchitectures=native

[Install]
WantedBy=multi-user.target

@kanna-uk
Copy link
Author

Just to add the following is the config present before hitting the loop...
DAEMON_ARGS="-c -i -M 4"
configure lldp tx-interval 30
configure lldp tx-hold 4

The intresting part is, if i add the DAEMON_ARGS like the following
DAEMON_ARGS="-dddd -D send -D main -D interfaces -D lldp -D decode -D event -D receive -D alloc -D loop c -M 4"

then do systemctl restart lldpd....i see the neighbours going through....but i do see an extra line in systemctl status lldpd along with the following lines

CGroup: /system.slice/lldpd.service
├─54215 "lldpd: monitor. "
└─54218 "lldpd: 6 neighbors."
|---- dddd -D send -D main -D interfaces -D lldp -D decode -D event -D receive -D alloc -D loop c. -----> similar to this

I am really interested to know the issue here.
Thank you in advance

@kanna-uk
Copy link
Author

HI Vincent,
Could you please let me know

@vincentbernat
Copy link
Member

The .service file is not the one shipped by lldpd. I suppose it was tweaked by Cumulus Linux. Notably, there are many restrictions. Also, in general, I don't think this is a good idea replacing Cumulus version of lldpd with the upstream one. You are likely to get more problems for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants