Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paperspace cell stopping #2793

Open
sixpt opened this issue Mar 17, 2024 · 16 comments
Open

Paperspace cell stopping #2793

sixpt opened this issue Mar 17, 2024 · 16 comments

Comments

@sixpt
Copy link

sixpt commented Mar 17, 2024

While running SD in a Paperspace session, everything starts fine but the 'Start Stable Diffusion' cell stops after running a few minutes in repeatedly. I can get a generation or two off then it quits and I can restart the cell and start generating again, but it keeps quitting on me. Started a couple days ago and has been consistently happening.

@TheLastBen
Copy link
Owner

TheLastBen commented Mar 18, 2024

does the machine have enough VRAM ? is there any error log ?

@AbyssalLizard
Copy link

Same issue here. Cell runs but often randomly just stops after finishing an image in a batch. No logs or such, I can see at least. Never had VRAM problems at these resolutions either, so I can't imagine it being that. Really just been a new thing since a week or so.

@TheLastBen
Copy link
Owner

have you tried a different machine, slightly more powerful one and try to replicate the issue

@sixpt
Copy link
Author

sixpt commented Mar 19, 2024

AbyssalLizard describes the issue right - no log whatsoever, cell just stops running and generation stops. Sometimes during a multi-batch generation, sometimes between single batches before I start the next one. I'm using A4000 or RTX5000, the most powerful ones available in my paid tier. No VRAM errors indicated.

@sixpt
Copy link
Author

sixpt commented Mar 19, 2024

Another wrinkle that leads me to believe that paperspace may be at least part of the issue: I noticed now that after running for an hour or so and having to restart the cell a few times, eventually the A1111 tab stops communicating with the paperspace tab (e.g. clicking Generate does not change it to Interrupt and there's no new line in the running cell's log). If I open in JupyterLab, the file browser appears completely empty and refreshing does not make the files appear. If i close out of Chrome and start a new window and log back into the same paperspace session, JupyterLab loads my files properly and restarting the kernel allows it all to run again with the same issue as in the initial post above.

@TheLastBen
Copy link
Owner

@sixpt Thank you for the feedback, if there is any other insight, feel free to communicate it, hopefully it'll help finding the cause of the issue

@sixpt
Copy link
Author

sixpt commented Mar 25, 2024

Still having this issue. Wiped my sd folder completely from the notebook, as I very recently switched to SDXL, and started fresh. Watching more closely, it seems that the cell is typically stopping at the very end of a generation, right before the image is saved, but I can't say for sure if this is every time.

@sixpt
Copy link
Author

sixpt commented Mar 27, 2024

More details: I noticed my GPU was spiking at the end of generation, VAE maybe? To test I dropped my resolution to 512x512 using an A4000 GPU and would generate the first 3-6 images or so with no problem, minor blips in GPU usage up to ~ 25%. Eventually a generation would spike me north of 85% GPU and the cell would stop running. Using ADetailer, no hires fix, just 512x512. Any ideas?

@TheLastBen
Copy link
Owner

try a completely new workspace and see if the issue persists, to make sure the issue isn't caused by some extension

@sixpt
Copy link
Author

sixpt commented Mar 27, 2024

Thanks, I coincidentally was setting up to do exactly this earlier today but got sidetracked. At first glance it appears to be much more stable with a new workspace, if not totally fixed. I've tried adding back ADetailer, since that seemed to exacerbate the issue in my old workspace, and it seems to be running fine. I'll add back my old extensions (only a few) one by one and see if I can reproduce the error and get back to you.

@sixpt
Copy link
Author

sixpt commented Mar 28, 2024

I'm still able to get generations to quit. It's happening far more often when I'm running small batch counts and/or interrupting/skipping generations. If I set batch count to 50 and hit Generate, it will generate all the images no problem. If I set batch count to 1-3 and use 'Generate Forever,' or start and stop small batches individually, I can generate a dozen or so images at most before the cell quits

@sixpt
Copy link
Author

sixpt commented Apr 1, 2024

Looking at the metrics tab in Paperspace, I noticed the RAM usage continually steps up with each generation and never empties. GPU doesn't seem to be the issue.
Screenshot 2024-04-01 083119

@TheLastBen
Copy link
Owner

from the chart you posted, the issue is definitely the RAM not being purged, I'll see if I can reproduce the issue

@sixpt
Copy link
Author

sixpt commented Apr 1, 2024

The memory usage rampup seems to depend on checkpoint size. Switching to SD1.5 base checkpoint instead of SDXL caused the increases in memory use to be much smaller increments. I used the same image generation parameters with SD1.5 and ran 'generate forever' and my usage looked like this about 20 minutes in:

Screenshot 2024-04-01 122125

@TheLastBen
Copy link
Owner

I removed the arg --medvram, let's see if it solves the issue

@sixpt
Copy link
Author

sixpt commented Apr 3, 2024

Looks like it's working! Thanks, glad it seems to have been a simple fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants