Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSP: _changetracking.lua:154: Invalid buffer id: (race condition in 'initialize' RPC?) #28575

Closed
sapient-cogbag opened this issue Apr 30, 2024 · 29 comments · Fixed by #28914 or #29029
Closed
Labels
bug issues reporting wrong behavior lsp
Milestone

Comments

@sapient-cogbag
Copy link

sapient-cogbag commented Apr 30, 2024

Problem

Problem

When using a system to automatically attach buffers to a language server that reinitializes itself, or responds to the initialize RPC query slowly, or something similar, if one of those buffers becomes invalid then the _on_attach method of LSP Clients produces an invalid buffer-number error.

This problem is compounded if for one reason or another the initialize query is repeatedly responded to or multiple buffers attached to a client become invalid over time (for example, if significant numbers of ephemeral preview buffers are attached to a language server and then removed in short succession), as all of these invalid buffers are provided to _on_attach on what I assume are repeated initialize responses from the LSP server, every time, creating a large error log spam which greatly interferes with typing >.<

While this is most acute when using a system that happens to attach and invalidate ephemeral buffers, it is (again, as far as I can tell) a race condition that should apply in any case where an LSP server reinitializes itself after any of it's attached buffers are rendered invalid. However, in practice it seems extremely difficult to reproduce....

Error executing vim.schedule lua callback: /usr/share/nvim/runtime/lua/vim/lsp/_changetracking.lua:154: Invalid buffer id: 44                                                                     
stack traceback:                                                                                                                                                                                  
        [C]: in function 'nvim_buf_get_name'                                                                                                                                                      
        /usr/share/nvim/runtime/lua/vim/lsp/_changetracking.lua:154: in function 'init'                                                                                                           
        /usr/share/nvim/runtime/lua/vim/lsp/client.lua:915: in function '_text_document_did_open_handler'                                                                                         
        /usr/share/nvim/runtime/lua/vim/lsp/client.lua:950: in function '_on_attach'                                                                                                              
        /usr/share/nvim/runtime/lua/vim/lsp/client.lua:623: in function ''                                                                                                                        
        vim/_editor.lua: in function <vim/_editor.lua:0> 

Originating Code Path

runtime/lua/vim/lsp/client.lua:608 - this is inside a callback passed to rpc.request with method initialize

    -- If server is being restarted, make sure to re-attach to any previously attached buffers.
    -- Save which buffers before on_init in case new buffers are attached.
    local reattach_bufs = vim.deepcopy(self.attached_buffers) -- <=== This is the source of the invalid bufnrs 

    self:_run_callbacks(self._on_init_cbs, lsp.client_errors.ON_INIT_CALLBACK_ERROR, self, result)

    for buf in pairs(reattach_bufs) do
      self:_on_attach(buf) -- <==== this is the source of the error(s) when the bufnr is invalid, which we will traverse into
    end

runtime/lua/vim/lsp/client.lua:938 - this is the actual _on_attach method. The error comes from inside _text_document_did_open_handler method

--- @package
--- Runs the on_attach function from the client's config if it was defined.
--- @param bufnr integer Buffer number
function Client:_on_attach(bufnr)
  self:_text_document_did_open_handler(bufnr) -- <=== Source of error. 

  lsp._set_defaults(self, bufnr)

  api.nvim_exec_autocmds('LspAttach', {
    buffer = bufnr,
    modeline = false,
    data = { client_id = self.id },
  })

runtime/lua/vim/lsp/client.lua:902

--- @package
--- Default handler for the 'textDocument/didOpen' LSP notification.
---
--- @param bufnr integer Number of the buffer, or 0 for current
function Client:_text_document_did_open_handler(bufnr)
  changetracking.init(self, bufnr)  -- <=== Error comes from here. Note that further down there is some kind of bufnr validation
  if not vim.tbl_get(self.server_capabilities, 'textDocumentSync', 'openClose') then
    return
  end
  if not api.nvim_buf_is_loaded(bufnr) then
    return
  end
  local filetype = vim.bo[bufnr].filetype

runtime/lua/vim/lsp/_changetracking.lua:149, inside the M.init module function.

  local buf_state = state.buffers[bufnr]
  if buf_state then
    buf_state.refs = buf_state.refs + 1
  else
    buf_state = {
      name = api.nvim_buf_get_name(bufnr), -- <=== The real source of the error ^.^
      lines = {},
      lines_tmp = {},
      pending_changes = {},
      needs_flush = false,
      refs = 1,
    }
    state.buffers[bufnr] = buf_state

Steps to reproduce using "nvim -u minimal_init.lua"

I could not get a minimal reproduction, unfortunately. I spent many many hours trying. However, minimal reproductions of race conditions are rather difficult. In particular, I've failed to get a minimal config to get an invalid bufnr into the attached_buffers of any LSP server, even though I know it is possible because my custom LSP setup (which is a bit of an abomination at this point...) has a status command which I modified to report on that (among other things like listing files with each language server client, filtering by language, etc.)

My final failed attempt at minimal_init.lua

local pattern = 'lua'
local cmd = { 'lua-language-server' }
-- Add files/folders here that indicate the root of a project
local root_markers = { '.git', '.editorconfig' }
-- Change to table with settings if required
local settings = vim.empty_dict()

-- LSP that we want to directly attach to
local client_id = nil

-- Files to open
--
-- First is the one put in a buffer to immediately wipe it out, second is to induce a re-initialize
local bufhidewipe_file = "dummy-lua.lua"
local reinit_trigger_file = "dummy-lua-two.lua"

-- Start the LSP with the bufnr, using the filename as the location...
--
-- Bufnr may not actually be a standard buffer constructed from something like `:edit` ^.^
local function do_start(bufnr, fname)
	local match = vim.fs.find(root_markers, { path = fname, upward = true })[1]
	local root_dir = match and vim.fn.fnamemodify(match, ':p:h') or nil
	return vim.lsp.start({
		name = 'lua-language-server',
		cmd = cmd,
		root_dir = root_dir,
		settings = settings,	
	}, { bufnr = bufnr })
end

--[[
vim.api.nvim_create_autocmd('FileType', {
	pattern = pattern,
	callback = function(args)
		client_id = do_start(0, args.file)
	end
})]]--



-- Open a file and heuristically locate it's buffer number
--
-- Note that this will totally break with duplicate files in different current working directories :p
-- It's just intended to be a very quick method of getting buffer numbers to work with :)
--
-- Optionally, specify `preview = true` to open in the preview window....
local function open_buffer_via_name(filename, preview)
	-- Open file in split.
	if not preview then
		vim.cmd.vsplit(filename)
	else
		vim.cmd.pedit(filename)
	end
	-- We need to find the bufnr for this buffer... can't find a way to actually directly create/edit a buffer in a way that gets the bufnr >.<
	local test_bufnr = nil
	for _, bufnr in ipairs(vim.api.nvim_list_bufs()) do
		local name = vim.api.nvim_buf_get_name(bufnr)
		-- Plain search
		if string.find(name, filename, 1, true) ~= nil then
			test_bufnr = bufnr
		end
	end
	return test_bufnr
end

-- Load a file into a scratch buffer. Will still have the filename as the name. Returns the bufnr
-- also allows you to set a filetype. Also reads contents from the file in question into the buffer.
--
-- This sets the scratch buffer's `bufhidden` option to `wipe`, which will wipe the buffer when it's not in a window
-- Importantly, this doesn't trigger autocommands! :p (i think this might be related).
--
-- It also creates a window (to prevent immediate wipe).
--
-- Returns the scratch buffer id as well as the scratch window id. The important thing is to destroy the buffer by hiding
-- the window and not deleting it directly ^.^

-- is in fzf-lua, which suggests this may be related to lsp??
local function open_scratch_buffer_via_name(filename, set_filetype)
	-- Read data into the buffer.
	--
	-- We read before creating the buffer so if there are errors no extranuous buffers
	-- are left hanging around :p
	local filelines = {}
	for l in io.lines(filename) do
		-- one-based indices 🤢
		filelines[#filelines + 1] = l
	end
	local scratch_buffer = vim.api.nvim_create_buf(false, true)
	vim.api.nvim_buf_set_lines(scratch_buffer, 0, 1, false, filelines)
	vim.api.nvim_buf_set_name(scratch_buffer, filename)
	-- This is probably the important bit. FZF sets `bufhidden=wipe` which does NOT CALL AUTOCMDS
	vim.api.nvim_set_option_value("bufhidden", "wipe", { buf = scratch_buffer })
	vim.api.nvim_set_option_value("filetype", set_filetype, { buf = scratch_buffer })

	-- Create the window. Doesn't need to be float or anything :p
	-- Automatically enter it.
	local scratch_window = vim.api.nvim_open_win(scratch_buffer, true, {
		relative = "cursor",
		row = 0,
		col = 0,
		height = 40,
		width = 30,
		noautocmd = true,
		border = "rounded"
	})
	return scratch_buffer, scratch_window
end


vim.api.nvim_create_user_command("ReproduceRaceCondition", function(args) 
	local bufhidewipe_bufnr, bufhidewipe_win_id = open_scratch_buffer_via_name(bufhidewipe_file, "lua")
	local ephemeral_client_id = do_start(bufhidewipe_bufnr, bufhidewipe_file)
	vim.defer_fn(function()
		vim.api.nvim_win_close(bufhidewipe_win_id, true)
		--[[vim.defer_fn(function() 
			local client = vim.lsp.get_client_by_id(ephemeral_client_id)
			client:stop()
		end)]]

		vim.defer_fn(function()
			local nonephemeral_bufnr = open_buffer_via_name(bufhidewipe_file, false)
			vim.lsp.buf_attach_client(nonephemeral_bufnr, ephemeral_client_id)
		end, 100)
	end, 50)
end, {
	nargs = 0
})

Folder Arrangement

Just placing minimal_init.lua in a folder with an empty .editorfile, and dummy-lua.lua (the .editorfile is for lsp sharing) will get the configuration i was using while writing this failed minimal config.

Actual Setup Behaviour

It's probably worth noting that the actual setup I have is more complex than this in certain ways that may be relevant. Importantly, the main LSP servers that I see in my listing (:LspStatus) command with invalid bufnrs is marksman (the markdown LSP I use).

My suspicion is that somehow the autocmd I have that automatically produces a popup when typing (which is constantly being refreshed) manages to trigger this race condition. But it also seems to occur more after I use preview windows with fzf-lua if they are scrolled through very rapidly (I use no plugins for the popup/documentation aspect, so it's not an fzf-lua thing). However, this is not marksman stuff, even if I havent seen it in any kind of :LspStatus listing yet :/

Logs

I have looked in logs, but there doesn't seem to be much useful there (though there are a signifcant number of errors from marksman and a few from my lua-language-server in some of the other earlier loglines. Still:

[START][2024-04-30 00:08:27] LSP logging initiated
[ERROR][2024-04-30 00:08:27] .../vim/lsp/rpc.lua:752	"rpc"	"/usr/bin/marksman"	"stderr"	"[00:08:26 INF] <LSP Entry> Starting Marksman LSP server: {}\n"
[ERROR][2024-04-30 00:26:00] .../vim/lsp/rpc.lua:752	"rpc"	"/usr/bin/marksman"	"stderr"	"[00:26:00 INF] <LSP Entry> Starting Marksman LSP server: {}\n"
[ERROR][2024-04-30 00:26:36] .../vim/lsp/rpc.lua:752	"rpc"	"/usr/bin/marksman"	"stderr"	"[00:26:35 INF] <LSP Entry> Starting Marksman LSP server: {}\n"
[ERROR][2024-04-30 00:27:09] .../vim/lsp/rpc.lua:752	"rpc"	"/usr/bin/marksman"	"stderr"	"[00:27:09 INF] <LSP Entry> Starting Marksman LSP server: {}\n"
[ERROR][2024-04-30 00:27:17] .../vim/lsp/rpc.lua:752	"rpc"	"/usr/bin/marksman"	"stderr"	"[00:27:17 INF] <LSP Entry> Starting Marksman LSP server: {}\n"
[ERROR][2024-04-30 00:28:13] .../vim/lsp/rpc.lua:752	"rpc"	"/usr/bin/marksman"	"stderr"	"[00:28:13 INF] <LSP Entry> Starting Marksman LSP server: {}\n"
[ERROR][2024-04-30 00:35:52] .../vim/lsp/rpc.lua:752	"rpc"	"/usr/bin/marksman"	"stderr"	"[00:35:51 INF] <LSP Entry> Starting Marksman LSP server: {}\n"
[ERROR][2024-04-30 00:35:52] .../vim/lsp/rpc.lua:752	"rpc"	"/usr/bin/marksman"	"stderr"	"[00:35:52 INF] <LSP Entry> Starting Marksman LSP server: {}\n"
[ERROR][2024-04-30 00:35:54] .../vim/lsp/rpc.lua:752	"rpc"	"/usr/bin/marksman"	"stderr"	"[00:35:54 INF] <LSP Entry> Starting Marksman LSP server: {}\n"
[ERROR][2024-04-30 00:35:55] .../vim/lsp/rpc.lua:752	"rpc"	"/usr/bin/marksman"	"stderr"	"[00:35:54 INF] <LSP Entry> Starting Marksman LSP server: {}\n"
[ERROR][2024-04-30 00:35:55] .../vim/lsp/rpc.lua:752	"rpc"	"/usr/bin/marksman"	"stderr"	"[00:35:54 INF] <LSP Entry> Starting Marksman LSP server: {}\n"
[ERROR][2024-04-30 00:35:55] .../vim/lsp/rpc.lua:752	"rpc"	"/usr/bin/marksman"	"stderr"	"[00:35:54 INF] <LSP Entry> Starting Marksman LSP server: {}\n"
[ERROR][2024-04-30 00:35:57] .../vim/lsp/rpc.lua:752	"rpc"	"/usr/bin/marksman"	"stderr"	"[00:35:56 INF] <LSP Entry> Starting Marksman LSP server: {}\n"
[ERROR][2024-04-30 00:35:58] .../vim/lsp/rpc.lua:752	"rpc"	"/usr/bin/marksman"	"stderr"	"[00:35:57 INF] <LSP Entry> Starting Marksman LSP server: {}\n"
[ERROR][2024-04-30 00:36:24] ...m/lsp/client.lua:980	"LSP[marksman]"	"on_error"	{  code = "ON_EXIT_CALLBACK_ERROR",  err = "vim/_editor.lua:0: E5560: Vimscript function must not be called in a lua loop callback"}
[ERROR][2024-04-30 00:36:24] ...m/lsp/client.lua:980	"LSP[marksman]"	"on_error"	{  code = "ON_EXIT_CALLBACK_ERROR",  err = "vim/_editor.lua:0: E5560: Vimscript function must not be called in a lua loop callback"}
[ERROR][2024-04-30 00:36:24] ...m/lsp/client.lua:980	"LSP[marksman]"	"on_error"	{  code = "ON_EXIT_CALLBACK_ERROR",  err = "vim/_editor.lua:0: E5560: Vimscript function must not be called in a lua loop callback"}
[ERROR][2024-04-30 00:36:24] ...m/lsp/client.lua:980	"LSP[marksman]"	"on_error"	{  code = "ON_EXIT_CALLBACK_ERROR",  err = "vim/_editor.lua:0: E5560: Vimscript function must not be called in a lua loop callback"}
[ERROR][2024-04-30 00:36:24] ...m/lsp/client.lua:980	"LSP[marksman]"	"on_error"	{  code = "ON_EXIT_CALLBACK_ERROR",  err = "vim/_editor.lua:0: E5560: Vimscript function must not be called in a lua loop callback"}
[ERROR][2024-04-30 00:36:24] ...m/lsp/client.lua:980	"LSP[marksman]"	"on_error"	{  code = "ON_EXIT_CALLBACK_ERROR",  err = "vim/_editor.lua:0: E5560: Vimscript function must not be called in a lua loop callback"}
[ERROR][2024-04-30 00:36:24] ...m/lsp/client.lua:980	"LSP[marksman]"	"on_error"	{  code = "ON_EXIT_CALLBACK_ERROR",  err = "vim/_editor.lua:0: E5560: Vimscript function must not be called in a lua loop callback"}
[ERROR][2024-04-30 00:36:24] ...m/lsp/client.lua:980	"LSP[marksman]"	"on_error"	{  code = "ON_EXIT_CALLBACK_ERROR",  err = "vim/_editor.lua:0: E5560: Vimscript function must not be called in a lua loop callback"}
[ERROR][2024-04-30 00:36:40] ...m/lsp/client.lua:980	"LSP[rust-analyzer]"	"on_error"	{  code = "ON_EXIT_CALLBACK_ERROR",  err = "vim/_editor.lua:0: E5560: Vimscript function must not be called in a lua loop callback"}
[ERROR][2024-04-30 00:36:59] .../vim/lsp/rpc.lua:752	"rpc"	"/usr/bin/marksman"	"stderr"	"[00:36:59 INF] <LSP Entry> Starting Marksman LSP server: {}\n"
[ERROR][2024-04-30 00:36:59] .../vim/lsp/rpc.lua:752	"rpc"	"/usr/bin/marksman"	"stderr"	"[00:36:59 INF] <LSP Entry> Starting Marksman LSP server: {}\n"

(I don't know what exactly is causing the loop callback error and it seems to sometimes not appear... I think it is not related since it seems to only occur after the first epoch reload I do, whereas the actual issue I'm reporting occurs all of the time - and yes, I have an epoch system for reloading my config... it's about 10x as horrifying as it sounds especially because I was rushing at the time :p)

:LspStatus (my version) - probably not useful, but putting it here anyway in case it is potentially useful in figuring out the underlying causes

Client ID [1] did not have a client!                                                                                                                                                              
marksman [ID: 112]                                                                                                                                                                                
 Status:                                                                                                                                                                                          
  - Running                                                                                                                                                                                       
 Root Directory (config):                                                                                                                                                                         
  - (none)                                                                                                                                                                                        
 Workspace Directories (initial config):                                                                                                                                                          
  - (none)                                                                                                                                                                                        
 Workspace Directories (live):                                                                                                                                                                    
  - (none)                                                                                                                                                                                        
 Attached Buffers                                                                                                                                                                                 
  - [228] [INVALID BUFFER]                                                                                                                                                                        
rust-analyzer [ID: 111]                                                                                                                                                                           
 Status:                                                                                                                                                                                          
  - Running                                                                                                                                                                                       
 Root Directory (config):                                                                                                                                                                         
  - [anon]/sapient/self/limg                                                                                                                                                           
 Workspace Directories (initial config):                                                                                                                                                          
  - (none)                                                                                                                                                                                        
 Workspace Directories (live):                                                                                                                                                                    
  - file://[anon]/sapient/self/limg                                                                                                                                                    
 Attached Buffers                                                                                                                                                                                 
  - [1] [anon]/sapient/self/limg/src/lib.rs                                                                                                                                            
  - [27] [anon]/sapient/self/limg/src/bin/limg.rs                                                                                                                                      
  - [118] [anon]/sapient/self/limg/src/semantic_region.rs (current)                                                                                                                    
  - [72] [anon]/sapient/self/limg/src/locator.rs                                                                                                                                       
marksman [ID: 113]                                                                                                                                                                                
 Status:                                                                                                                                                                                          
  - Running                                                                                                                                                                                       
 Root Directory (config):                                                                                                                                                                         
  - (none)                                                                                                                                                                                        
 Workspace Directories (initial config):                                                                                                                                                          
  - (none)                                                                                                                                                                                        
 Workspace Directories (live):                                                                                                                                                                    
  - (none)                                                                                                                                                                                        
 Attached Buffers                                                                                                                                                                                 
  - [229] [INVALID BUFFER]                                                                                                                                                                        

Expected behavior

Handle invalid buffers gracefully in whatever way is most conceptually in line with how the LSP infrastructure is meant to work - I don't know the best place to put the fix, even if it is extremely simple (just a check with vim.api.nvim_buf_is_valid ^.^). It may also not be worth doing as it seems to be quite hard to produce without some pretty weird config stuff, although it might just be that I didn't put the language servers under enough pressure in my many attempts at minimal_init.lua?

It might be related to the todo in runtime/vim/lsp.lua:70 relating to resolve_bufnr on scratch buffers, but I really just don't know

From a practical perspective, fixing this issue could probably occur in two locations - the rpc.request call, and deep in the _changetracker init method - and it would be a very simple check.

Neovim version (nvim -v)

NVIM v0.10.0-dev-2995+ga1550dbf0a (but applies in all versions with relevant code)

Language server name/version

lua-language-server (3.8.3-1), marksman (2023_12_09-1), rust-analyzer (1.79.0-nightly), presumably others

Operating system/version

Arch Linux

Log file

No response

@sapient-cogbag sapient-cogbag added bug issues reporting wrong behavior lsp labels Apr 30, 2024
@justinmk justinmk added this to the backlog milestone May 1, 2024
@justinmk justinmk changed the title Buffer invalidation race condition in 'initialize' RPC in certain cases (I think!) LSP: _changetracking.lua:154: Invalid buffer id: (race condition in 'initialize' RPC?) May 1, 2024
@justinmk
Copy link
Member

justinmk commented May 1, 2024

From a practical perspective, fixing this issue could probably occur in two locations - the rpc.request call, and deep in the _changetracker init method - and it would be a very simple check.

Something like that.

Possibly related: #27383

@justinmk
Copy link
Member

Assuming this is fixed by #28875

@icholy
Copy link
Contributor

icholy commented May 21, 2024

@justinmk I'm not sure if this is the same bug.

@zeertzjq zeertzjq reopened this May 21, 2024
@icholy

This comment was marked as outdated.

@icholy
Copy link
Contributor

icholy commented May 22, 2024

You can consistently get a bad buffer into reattach_bufs by deleting the buffer in the on_init callback.

local dummy = vim.lsp.start_client({
	name = "dummy",
	cmd = { "dummylsp" },
	on_init = function()
		vim.api.nvim_buf_delete(0, {})
	end
})

vim.api.nvim_create_autocmd("FileType", {
	pattern = { "go" },
	callback = function()
		vim.lsp.buf_attach_client(0, dummy)
	end
})

This doesn't cause the reported error though. The nvim_buf_get_name and nvim_buf_get_lines succeed. The actual buffer deletion appears to be deferred:

edit: using nvim_buf_delete makes it fail.

@icholy
Copy link
Contributor

icholy commented May 22, 2024

@sapient-cogbag can you test out #28914 ?

@wookayin
Copy link
Member

runtime/lua/vim/lsp/client.lua:902

--- @package
--- Default handler for the 'textDocument/didOpen' LSP notification.
---
--- @param bufnr integer Number of the buffer, or 0 for current
function Client:_text_document_did_open_handler(bufnr)
  changetracking.init(self, bufnr)  -- <=== Error comes from here. Note that further down there is some kind of bufnr validation
  if not vim.tbl_get(self.server_capabilities, 'textDocumentSync', 'openClose') then
    return
  end
  if not api.nvim_buf_is_loaded(bufnr) then
    return
  end
  local filetype = vim.bo[bufnr].filetype

runtime/lua/vim/lsp/_changetracking.lua:149, inside the M.init module function.

  local buf_state = state.buffers[bufnr]
  if buf_state then
    buf_state.refs = buf_state.refs + 1
  else
    buf_state = {
      name = api.nvim_buf_get_name(bufnr), -- <=== The real source of the error ^.^
      lines = {},
      lines_tmp = {},
      pending_changes = {},
      needs_flush = false,
      refs = 1,
    }
    state.buffers[bufnr] = buf_state

Question: if changetracking.init is the culprit who doesn't correctly validate the buffer, wouldn't it suffice to perform a check earlier?

+   if not api.nvim_buf_is_loaded(bufnr) then
+     return
+   end
    changetracking.init(self, bufnr)  -- <=== Error comes from here. Note that further down there is some kind of bufnr validation
    if not vim.tbl_get(self.server_capabilities, 'textDocumentSync', 'openClose') then
      return
    end
-   if not api.nvim_buf_is_loaded(bufnr) then
-     return
-   end
    local filetype = vim.bo[bufnr].filetype

or having such a logic inside changetracking.init so that it doesn't work for a buffer that is no longer valid?

@icholy
Copy link
Contributor

icholy commented May 22, 2024

@wookayin see #28913

TLDR: right now it's failing at the changetracking.init, but if we put a guard in _text_document_did_open_handler, then _on_attach will just fail further down because it's still dealing with a bad buffer number.

@icholy
Copy link
Contributor

icholy commented May 23, 2024

@sapient-cogbag please report back if the issue persist after applying #28914

@sapient-cogbag
Copy link
Author

Sorry for the delay in reply, I appreciate the effort that has gone into solving this.

I just installed the latest neovim master branch on my machine (which includes the patch), and the issue does still persist (with the same error message as previously documented).

When I examine :LspStatus! now, I see a significant number of listings for language server processes with only a single (invalid, now deleted) buffer number still attached to them. The language server in this case is marksman, which makes sense since the main trigger for this issue for me is my autohover system which is of course creating large numbers of markdown buffers and then destroying them. The actual hover window is not created by any plugin, only the native neovim LSP hover system (vim.lsp.buf.hover() with some custom config and keybinds). The hover opts are:

    local hover_opts = {
        close_events = { "InsertLeavePre", "InsertEnter" },
        focus = false,
        border = "rounded",
        max_height = 20
    }

Somehow when the buffers are deleted they aren't getting detached from the language server when they're destroyed, which means that the check for attached_buffers[bufnr] seems not to work. I don't know how this is occurring >.<

I see: #28913, but I don't think it was merged to master, as I don't see any guards inside my version of changetracking.init (and as was mentioned it won't completely solve the issue, and it probably doesn't deal with the fundamental architectural issue). I made sure to check that #28914 was present in the lua files for my neovim version (pulled from the AUR with some modifications to make the PKGBUILD work ^.^).

Noticing that the buffer was not successfully detached, I also looked at the things I call on LspDetach, and this issue might be related to errors that occur during LspDetach callbacks or something associated with them? Though i have a system that explicitly manages all LspDetach callbacks and ensures that buffers are valid before the actual callbacks are called, in all cases (this system is actually designed so that I can call things only when the first LS is attached and the last LS is detached rather than for every LS attach/detach, but I put a guard in there a while back ^.^ - I checked every single reference to LspDetach in my config). Note though that some of my deletion/detach callbacks use vim.schedule for functions that might produce errors, which could be related.

If a language server then tries to send an initialize response (either the initial one or some kind of repeat) after some sort of failed detach, then that could still replicate the issue, especially if the failure to detach also means that the language server process isn't auto-stopped by someone's config when nothing is attached.

My config does not auto-stop any language servers though, unless I reload my config or manually stop them with my version of :LspStop or :LspStop!, anyhow.

Either that, or buffers that get deleted in certain ways don't invoke the detach machinery and, if a language server responds to initialize after that, it causes the problem. Just using :bdelete definitely properly detaches, though.

@sapient-cogbag
Copy link
Author

sapient-cogbag commented May 25, 2024

Also, I explicitly (myself) add an autocmd to all buffers that manually detaches buffers from clients when they are deleted (which actually doesn't check for buffer validity). This may also be related.

The relevant code is as follows:

    local augid = vim.api.nvim_create_augroup("lsp-client-detach-on-buffer-deletion", { clear = true })

    vim.api.nvim_create_autocmd({ "LspAttach" }, {
        group = augid,
        desc = "Add autocmds to attached buffers so their deletion detaches them from the LSP client",
        callback = function(event)
            local bufnr = event.buf
            local client = vim.lsp.get_client_by_id(event.data.client_id)
            if client ~= nil then
                -- Create the buffer-local autocmd
                --
                -- `BufUnload` is not something ok for this as unloading seems to be quite common and will result in weird duplication.
                vim.api.nvim_create_autocmd({ "BufDelete" }, {
                    group = augid,
                    buffer = bufnr,
                    desc = "Detach this buffer from LSP client - [" .. client.name .. "] - when the buffer is deleted",
                    callback = function(buflocal_event)
                        -- only detach if the buffer is actually still attached to the client.
                        if client.attached_buffers[bufnr] then
                            -- If the buffer is invalid, buf_detach_client won't work, so we just delete it from the array 
                            -- directly.
                            --[[if not vim.api.nvim_buf_is_valid(bufnr) then
                                client.attached_buffers[bufnr] = nil
                            else]]--
                            vim.lsp.buf_detach_client(bufnr, client.id)
                            -- end
                        end
                    end
                })
            end
        end
    })

Edit:
I tried removing this, and it made zero difference.

@sapient-cogbag
Copy link
Author

Looking inside lsp.lua it actually seems like it'd be theoretically possible for an LspDetach callback to be invoked before an LspAttach callback.

Moreover, this also applies to the functions that call those autocmds, which also manage the attached buffer information. The on_attach stuff seems to forcefully mark a bufnr as attached to a client, so if it was somehow invoked after the buffer deletion event but before the buffer was actually deleted/rendered invalid, it could end up re-inserting that buffer into a client's buffer list.

In particular, note the following:

  • Inside vim/lsp.lua:577, api.nvim_buf_attach is called with the on_detach callback itself calling buf_detach_client@vim/lsp.lua:506. buf_detach_client sets client.attached_buffers[bufnr] to nil, but only after invoking the LspDetach callbacks. If those somehow error, or something - or some of the other functions called inside buf_detach_client before it sets client.attached_buffers[bufnr] = nil fail - then this will not clear out bufnr from the attached_buffers table.
  • api.nvim_buf_attach (and the containing function - buf_attach@vim/lsp.lua:532) are invoked before Client:_on_attach inside vim.lsp.buf_attach_client:vim/lsp.lua:623 ^.^
  • Client:_on_attach is only invoked if the client is already initialized by the time of this call, otherwise it waits until the initialize RPC gets a reply (unless there are other ways Client:_on_attach is invoked ^.^).
  • Therefore, if the on_detach callback to api.nvim_buf_attach is ever invoked before some particular call to Client:_on_attach - which is then itself only called when the LS indicates initialization - and some error inside that stops the attached_buffers[bufnr] from getting set to nil - an invalid bufnr will remain inside the list of attached bufnrs for a client when Client:_on_attach checks it!

The sequence of events that triggers this issue - if my hypothesis is correct - is then as follows:

  • vim.lsp.buf_attach_client() is invoked - either directly or by vim.lsp.start. This immediately marks a buffer as attached to the client.
  • This invokes buf_attach, which uses api.nvim_buf_attach to register an on_detach callback that will automatically remove a bufnr from all clients.
  • Meanwhile, the client has sent the initialize RPC and is waiting for a response for the language server.
  • At some point here, before the LS has responded to the initialize request but after the api.nvim_buf_attach callbacks have been registered, the buffer is deleted and rendered invalid. The on_detach callback is invoked (or it might actually not be invoked, which would cause the same issue), but somehow there is an error that causes it not to reach the parts of the code that set client.attached_buffers[bufnr] = nil
  • Then, sometime later, the LS finally responds, invoking the initialize callback. An invalid bufnr remains in client.attached_buffers, and the error is induced in spite of the check ^.^

Hopefully this is helpful, as I suspect the real issue is now related to buffer deletion and the on_detach callback and the things inside of it.

@icholy
Copy link
Contributor

icholy commented May 25, 2024

@sapient-cogbag I cannot reproduce the error using your minimal_init.lua. I'm getting a different error: Vim:E95: Buffer with this name already exists.

@sapient-cogbag
Copy link
Author

Yes, the minimal_init.lua doesn't work, I just posted it to attempt to illustrate my attempt at reproducing the bug a while back. I spent a lot of time trying to do it but couldn't actually reproduce it :/

I have a better understanding of the actual issue now (documented above is my new hypothesis) after the recent commit, but I still don't know exactly what could cause the buffer detach-on-delete machinery to fail. I can dig around some more in the lsp.lua code later to try and figure it out, but i'm not really that familiar with the deeper internals of neovim (especially api.nvim_buf_attach >.<).

@sapient-cogbag
Copy link
Author

Until I know the root cause of the failure to remove from the attached_buffers inside on_detach, I probably can't reproduce this issue (or if on_detach is not being called, why that is the case for my setup and not other setups)

@icholy
Copy link
Contributor

icholy commented May 25, 2024

How consistently can you reproduce this using your config?

@sapient-cogbag
Copy link
Author

sapient-cogbag commented May 25, 2024

My own config produces this constantly (I have to turn my autohover/auto-documentation off to even be able to type something without 20 different errors getting in the way). I can work somewhat without the autohover but it also pops up with manual hover too, so every time I look at docs.

This is related to the fact that my markdown LS is attaching to the hover popups, but it works with other LSes as well e.g. sometimes it happens when using fzf-lua and preview mode (and having language server stuff there turns out to be surprisingly useful because I can scan through documents and see issues quickly), and it also sonetimes happens when I use the config reload mechanism.

I'll attempt a new minimal_init.lua later today with my new understanding from yesterday, try and figure out how the deletion mechanism is broken. Trying to minimally reproduce this has been pretty difficult, sorry >.<

For this I'll see about creating an LspDetach callback that errors, and some other things.

@icholy
Copy link
Contributor

icholy commented May 26, 2024

@sapient-cogbag if you share your config + some setup/repro steps I can take a look.

@sapient-cogbag
Copy link
Author

sapient-cogbag commented May 26, 2024

My config is here. I will warn you it's an absolute disaster though because it was made in a massive rush, and then I piled hacks on top of other hacks to fix issues but now it's... really something :/, honestly it's a bit embarrassing xP. Make sure to stay with the fix_epoch branch as that's the latest version of my config ^.^.

The main thing is that it still uses the old Packer plugin repo, and it should automatically bootstrap on that when you first load it, then you'll want to run :PackerSync. To be honest even I'm not sure if this is completely enough to initialize it. You can probably get rid of most of the old Packer plugin entries inside cfg/plugins.lua:381 and it should be fine as this issue is not really related to them - you just need to keep the cmp-nvim-lua stuff (of which I have a small fork on my repo) and the neodev stuff because that's used in language server initialization, all other plugin config is done via callbacks and will be fine if you just delete things, though you could also just get rid of some of that stuff too and not use the language server stuff it's relevant for.

It also produces a lot of log messages (based on it's discovery of various natively installed language servers). The main thing here is that it won't do the standard lspconfig thing of downloading language servers - it uses the native language servers installed via a package manager. This also applies if you keep the fzf-lua plugin, which uses the natively installed fzf binary and I don't know how that will react if it's not installed >.<

The most reliable way to trigger the issue is:

  • Install the marksman language server specifically (as the documentation popup is markdown, and that's by far the most reliable way of triggering the issue, though it has worked for FzfLua previews before I disabled language servers for those because it was even more unusable than the autohover being enabled).
  • Install some other language server for one of the languages in cfg/lsp/<language>.lua. lua-language-server, rust-analyzer, and clangd are easy options for this. You can also only install marksman and then open markdown files, the effect will still work fine ^.^
  • Open some project in one of the languages, and start typing or find somewhere you can get a documentation popup. Then run :LspHover on that location (or, if autohover is enabled, just start trying to type), and move the cursor until the hover window goes away. Wait a couple seconds and you should get serious error spam.

On another note, I put logging inside the runtime itself while I've been trying to create a minimal reproduction, and it seems like with my config enabled, on_detach just isn't getting called at all somehow (I checked with the minimal_init.lua efforts to ensure that the logging does work on things which don't induce the issue), which is why the invalid bufnrs are remaining inside client.attached_buffers.

One problem I have with creating a minimal reproduction is that my config is multi-file, so instead of trying to cut out bits of config and localise the issue, I'm stuck trying to rebuild the actual causative scenario from scratch. If you have any tips for reducing a large and complex multi-file config (where I can't just glob everything into one file, and modifying it in place would cause serious issues because I also need to use neovim to, well, edit and work on other stuff, including the minimal reproduction itself), I'd appreciate it ^.^ - in particular, how does -u and --no-plugins interact with lua imports?

I've found additional evidence that it's related to invalid bufnrs somehow remaining inside client.attached_buffers, as when I forcefully stop all LSP servers via :LspStop!, I get similar errors to the initial issue, but relating to attempts to run on_exit callbacks for the client - rather than issues in on_attach and changetracking.

EDIT: I've discovered NVIM_APPNAME, hopefully this will accelerate finding a minimal config.

@sapient-cogbag
Copy link
Author

Trying to make a minimal_init.lua seems almost impossible without me being able to reduce from my config rather than trying to build one up from scratch, though. I'm going to start digging through the neovim C codebase to see if I can find out what might be causing nvim_buf_attach not to call on_detach...

@sapient-cogbag
Copy link
Author

sapient-cogbag commented May 26, 2024

So, I've had a look for all references to BufUpdateCallbacks.on_detach in the C files. The only place where that callback is called is inside buf_updates_unload@src/nvim/buffer_updates.c:151. There is also an on_reload callback specified inside the lua code that controls on_detach, so the primary determinant as to whether or not on_detach is called is whether or not buf_updates_unload is called with can_reload = true or can_reload = false. i.e., if buf_updates_unload is even called, then in the error case, it's being unloaded with can_reload = true when it should be being called with can_reload = false ^.^

Either that, or a buffer is being detached when it should not be, then being somehow reattached. The detaching process cleans up all callbacks but leaves the list of callbacks in a valid state that could theoretically be re-appended to (just from a cursory glance, idk if this is true for sure :p)

The only places where that function is called with can_reload = true are:

  • src/nvim/fileio.c:3179 - in buf_reload, and only if the buffer is the current active buffer and valid, and it didn't save ok? there's a lot of conditionality going on in this function and it's beyond my ability to easily trace further back at the minute >.<
  • src/nvim/undo.c:2442 - inside undoredo, and specifically inside a condition on whether something was previously-reloaded (at least, I'm assuming that's what UH_RELOAD means, I really am not familiar with any of this).

There are also several places where can_reload = false, that may have some condition preventing them from being called properly, but I haven't looked at those yet >.<

Another note: Reloading a file with :e! will ensure that the buffer is detached and reattached just fine.

@sapient-cogbag
Copy link
Author

sapient-cogbag commented May 26, 2024

I managed to figure out a change to my (minimal) config that fixes the issue. So I can hopefully reproduce it now! (the short story is that having vim.lsp.start directly inside a filetype autogroup that is triggered on some sort of ephemeral buffer seems to cause this issue, as wrapping that vim.lsp.start in a vim.schedule seemed to fix the problem).

Hopefully, I will have a minimal_init.lua now I actually am pretty sure I know what the cause is! ^.^

EDIT: still having some trouble, but I will try and make it so the filetype autocmd is triggered during textlock, as my current setup of manually creating a scratchbuffer is not sufficient. I need to sleep though, but I think we might actually be able to reproduce this at somepoint! And even if not, I have managed to fix it for my own config by wrapping vim.lsp.start in vim.schedule.

Basically, in some situations, if you don't wrap vim.lsp.start in vim.schedule in certain environments, then it can leave the client.attached_buffers set contaminated with invalid bufnrs (or perhaps valid bufnrs that fail to register some on_detach hooks and can't notify the client when they become invalid), which won't immediately error but causes issues down the line, such as if an initialization reply is received after the buffers become invalid, or while your client is exiting ^.^

@TimelordUK
Copy link

TimelordUK commented May 26, 2024

guys,

thanks so much for looking into this, seems like this has fixed it - and even for say a file type lua which i do not even have a callback such as below.

image

e.g. https://github.com/TimelordUK/configs/blob/main/.config/nvim/lua/plugins/lspconfig.lua

       local function start_lsp(cfg)
         vim.schedule(function()
            vim.lsp.start(cfg)
         end)
      end

      vim.api.nvim_create_autocmd('FileType', {
         pattern = 'cpp',
         callback = function()
            start_lsp({
               name = 'clangd',
               cmd = { 'clangd' },
            })
         end,
      })

@icholy
Copy link
Contributor

icholy commented May 26, 2024

Here's a minimal repro: https://github.com/icholy/nvim-lsp-bug-4

@zeertzjq can you re-open this?

@icholy
Copy link
Contributor

icholy commented May 26, 2024

The on_detach isn't called because this condition is hit

return true -- detach

@mfussenegger
Copy link
Member

mfussenegger commented May 26, 2024

The on_detach isn't called because this condition is hit

return true -- detach

Can you try with #lsp.get_clients({ bufnr = bufnr, _uninitialized = true } in the condition instead?

Or actually, we shouldn't send changes for the uninitialized clients. But we shouldn't detach either.

@icholy
Copy link
Contributor

icholy commented May 26, 2024

@mfussenegger yeah that works.

diff --git a/runtime/lua/vim/lsp.lua b/runtime/lua/vim/lsp.lua
index 1592fd315..60b3f3e50 100644
--- a/runtime/lua/vim/lsp.lua
+++ b/runtime/lua/vim/lsp.lua
@@ -577,7 +577,8 @@ local function buf_attach(bufnr)
   api.nvim_buf_attach(bufnr, false, {
     on_lines = function(_, _, changedtick, firstline, lastline, new_lastline)
       if #lsp.get_clients({ bufnr = bufnr }) == 0 then
-        return true -- detach
+        -- detach if there are no clients
+        return #lsp.get_clients({ bufnr = bufnr, _uninitialized = true }) == 0
       end
       util.buf_versions[bufnr] = changedtick
       changetracking.send_changes(bufnr, firstline, lastline, new_lastline)

@icholy
Copy link
Contributor

icholy commented May 27, 2024

@sapient-cogbag I'm pretty sure it's actually fixed this time

@sapient-cogbag
Copy link
Author

This fixes the problem for me, I just pulled the latest version. Thanks :)

@wookayin wookayin modified the milestones: backlog, 0.10.1 May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment