You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
i have a tiny lil computer and am always teetering on 100% full disk, so maybe i run ncdu more than the average person, so bear that in mind.
linkml is ~300MB, which is not huge but it's also not small.
Thankfully it looks like there is a pretty clear culprit and a relatively lossless way of thinning the repo:
In-repo space usage
doing ncdu from the root on the main branch shows us that
tests - 167MB
data - 109MB
hp.dill - 63.4MB
hp.ttl - 42.2MB
.git - 113.9MB
So the major contributor there are those hp.dill and hp.ttl files, and just doing a search for those filenames as well as hp from within the tests directory doesn't match anything. Are those just vestigial historical files? If so we could remove those safely, and if they aren't unique/likely to be depended on we can recover the space.
git history usage
looking at the usage of space within the .git directory with this:
shows us that we have a few versions of the hp* files above, and then nearly all the rest of the space in the git history is from historically versioned generated docs files - linkml.generators.html, _modules/linkml_runtime/linkml_model/meta.html and so on.
Since the source of those docs files is likely in the version history, we can also safely remove those html files from the git history.
mitigations
removing files from git history is safer than it might appear, though it does require one leap of faith moment (that can also be fully recovered with a single backup).
git filter repo is super easy to use, to remove a file from the history you just do
you can test it out by cloning the repo, running the filter repo command, and then diffing between that and the other non-cleaned repo. you can validate history by iterating through commits and diffing those too. if all went well the only diff should be the files you removed.
if you want to be extra sure you can make a fork and test force pushing to that before doing so to the main repo, and then yes the final leap of faith is force pushing to main. Even if that were to go catastrophically wrong, if you make a local clone of the repo, you should be able to fully restore it with another force push.
anyway, feel free to rapidly triage and close if not something y'all are interested in, it would just be a minor quality of life improvement, drop barriers to contribution, and also it's sort of an aesthetic thing - we want new contributors to be delighted and pleasantly surprised, and starting with a big clone is a minor code smell. if no, totally cool.
How important is this feature? Select from the options below:
• Low - it's an enhancement but not crucial for work
When will use cases depending on this become relevant? Select from the options below:
• Long-term - 6 months - 1 year
The text was updated successfully, but these errors were encountered:
I can PR to remove the files, that's np, but i can't PR to remove them from the git history which is what would save the space :). i can't force push to main (and if i could, i would want that turned off lol, i don't need that kinda stress ;) )
Is your feature request related to a problem? Please describe.
i have a tiny lil computer and am always teetering on 100% full disk, so maybe i run
ncdu
more than the average person, so bear that in mind.linkml is ~300MB, which is not huge but it's also not small.
Thankfully it looks like there is a pretty clear culprit and a relatively lossless way of thinning the repo:
In-repo space usage
doing
ncdu
from the root on themain
branch shows us thattests
- 167MBdata
- 109MBhp.dill
- 63.4MBhp.ttl
- 42.2MB.git
- 113.9MBSo the major contributor there are those
hp.dill
andhp.ttl
files, and just doing a search for those filenames as well ashp
from within the tests directory doesn't match anything. Are those just vestigial historical files? If so we could remove those safely, and if they aren't unique/likely to be depended on we can recover the space.git history usage
looking at the usage of space within the
.git
directory with this:shows us that we have a few versions of the
hp*
files above, and then nearly all the rest of the space in the git history is from historically versioned generated docs files -linkml.generators.html
,_modules/linkml_runtime/linkml_model/meta.html
and so on.Since the source of those docs files is likely in the version history, we can also safely remove those
html
files from the git history.mitigations
removing files from git history is safer than it might appear, though it does require one leap of faith moment (that can also be fully recovered with a single backup).
git filter repo is super easy to use, to remove a file from the history you just do
you can test it out by cloning the repo, running the filter repo command, and then diffing between that and the other non-cleaned repo. you can validate history by iterating through commits and diffing those too. if all went well the only diff should be the files you removed.
if you want to be extra sure you can make a fork and test force pushing to that before doing so to the main repo, and then yes the final leap of faith is force pushing to
main
. Even if that were to go catastrophically wrong, if you make a local clone of the repo, you should be able to fully restore it with another force push.anyway, feel free to rapidly triage and close if not something y'all are interested in, it would just be a minor quality of life improvement, drop barriers to contribution, and also it's sort of an aesthetic thing - we want new contributors to be delighted and pleasantly surprised, and starting with a big clone is a minor code smell. if no, totally cool.
How important is this feature? Select from the options below:
• Low - it's an enhancement but not crucial for work
When will use cases depending on this become relevant? Select from the options below:
• Long-term - 6 months - 1 year
The text was updated successfully, but these errors were encountered: