Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] provide Python/R implementations of all the built-in objectives? #6440

Open
jameslamb opened this issue May 2, 2024 · 1 comment
Open

Comments

@jameslamb
Copy link
Collaborator

Summary

Should we provide example Python (and maybe R) implementations of LightGBM's objective functions which exactly match the behavior of the builtin objectives from the C++ side?

Motivation

Over the years of maintaining LightGBM, I've seen significant interest in implementing LightGBM's built-in objective functions in Python, for purposes like:

  • learning how LightGBM works (for people who are not comfortable with C++)
  • making it easier to measure the difference between custom objectives and LightGBM builtin ones
    • (e.g. if you have a Python function that exactly matches the builtin, then you can modify it and know any performance differences are due to your modifications)

See "References" for evidence.

Description

I am NOT proposing adding such implementations to any library that we publish.

Instead, I'm thinking of something like the following:

  • new directory in examples/ containing these implementations
  • tests that run in CI which compare the results to those calculated by the C++ side
  • those implementations accounting for the main concerns that confuse people:
    • calculating an init_score if Dataset doesn't have one
    • correctly using sample weights
    • correctly respecting boost_from_average

Things that do not necessarily need to be in scope for the first versions of implementations:

  • distributed training / collective operations
  • respect for deterministic parameter
  • anything related to quantized training
  • exact numerical precision (being within, say, 1e-6, would probably good enough to start)

References

GitHub posts that could be summarized as "how do I replicate a built-in LightGBM objective in Python"?

And Stack Overflow:

@shiyu1994
Copy link
Collaborator

I agree with this proposal. But we should also note that there can be minor (numerical) differences between Python (or R) and c++ implementations that can cause slight different results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants