benjamin

joined 1 year ago
[–] benjamin@lemmy.dbzer0.com 2 points 1 year ago (1 children)

YOU GOT IT WORKING?

You are the first person to stick through to the end and do it. Seriously. Thank you so much for confirming that it works on some machine besides mine and monster servers in the cloud.

The configuration is obviously a pain point, but we're running along the cutting edge with TensorRT on Windows at all. I'm hoping Nvidia makes it easier soon, or at least relaxes the license so I'm not running afoul if I redistribute required dll's (for comparison, Nvidia publishes TensorRT binary libraries for Linux directly on pip, no license required.)

It's also a pain that 11.7 is the best CUDA version for Stable Diffusion with TensorRT. I couldn't even get 11.8, 12.0 or 12.1 to work at all on Windows with TensorRT (they work fine on their own.) On Linux, they would work, but would at best give me the same speed as regular GPU inference, and at worst would be slower, completely defeating the point.

[–] benjamin@lemmy.dbzer0.com 1 points 1 year ago (6 children)

You're the best, thanks so much for trying it and getting it working!

I don't think it's ever not worth chasing improved performance, so I'm definitely going to continue looking for optimizations. While cannibalizing the code for Comfy and A1111, I saw a lot (and I mean a lot) of shortcuts being made over the official Stability code release that improves performance in specific situations. I'm going to try and see how I can leverage some of those shortcuts into options for the user to tune to their hardware.

This latest release has attracted some more developer attention (and also some inquiries from hosting providers about offering Enfugue in the cloud!) I'm hoping that some of the authors of those improvements find their way to the Enfugue repository and perhaps are inspired to contribute.

With that being said, TensorRT will definitely knock your socks off in terms of speed if you haven't used it before, if you've got the hardware for it. I'd be happy to troubleshoot whatever went wrong with your Windows install - there should be up to three enfugue-engine.log files in your ~/.cache/ directory that will have more information about what went wrong, if you'd like to share them here (or we can start a GitHub thread if you have that.)

Thank you again for all your help!

[–] benjamin@lemmy.dbzer0.com 0 points 1 year ago* (last edited 1 year ago) (10 children)

I can't thank you enough for linking that!!!

It made me realize that there must have been a way to effectively downcast without getting NaN's. There's just no way that app could work on the devices it does with SDXL without having figured it out, so I scoured the web for references and dug in to figure it out. I'm happy to say I got it working on my M1 Pro! That also means memory usage is cut down by about a third, and speed is up by about 50% on Mac in general thanks to being able to work in half-precision instead of full.

I was able to do the same 512x512 20 steps in 17 seconds using a fine-tuned model (Realistic Vision 5.) SDXL took it's sweet time coming in at almost 3 minutes, so yeah probably not my usual workflow, but SDXL isn't even in my usual workflow on my 3090 Ti Windows/Ubuntu hybrid machine. I still use TensorRT and fine-tuned SD 1.5 models - 512x512 is roughly 3 seconds on that, but the beautiful part is when doing a 2000-iteration upscale and TensorRT caps out at ~30 it/s on Windows or ~40 it/s on Linux.

I have a little bit more testing to do for this, but I'm going to be releasing a 0.2.1 build in the next couple days. I would love it if you would give it another shot - I'll send you a message with a link, if that's okay with you!

With respect to AMD - that's a complicated answer, I'm trying to work with some AMD users to test out the combination of dependencies that will work for them. I'm not sure if anyone has managed to successfully use the GPU for AI on the steam deck, but I do know officially ROCm is unsupported and will be for the foreseeable future on the deck. I've seen people successfully use Stable Diffusion with CPU inference on it, which Enfugue will allow - but those same people reported it took half an hour to generate a single image on the deck, so I'm not sure it's worth trying.

[–] benjamin@lemmy.dbzer0.com 0 points 1 year ago (12 children)

I had hoped that would sell a few people on it! I agree entirely on the motivation - I was able to test it on work machines without even needing to log out of an unprivileged user thanks to the portable install working nicely for me. MPS is of course slower than an equivalent CUDA device, but I was able to ensure the entire E2E test plan passed on Mac, including all ControlNets, inpainting, schedulers, upscaling, etc.

If you want SDXL on Mac, your mileage will definitely vary. I ran out of memory while loading the checkpoint on my M1 Pro 12GB, it may have been able to work if I allotted it a dangerously large amount of memory, but I could also have crashed the machine and I don't feel like bothering with that. In theory there's nothing stopping it from working, you just might need an M2 Max to get it off the ground.

Please let me know if you encounter any unforeseen issues!

 

Hello everyone!

My name's Benjamin, I'm the developer of ENFUGUE, a self-hosted Stable Diffusion Web UI that's built around an intuitive canvas interface, while still trying to deliver the power and deep customization of the popular tab-and-slider web UI's.

I'm taking it out of Alpha and into Beta with the v0.2 release, which brings SDXL support while still maintaining most of the feature set of 1.5 by allowing you to configure multiple checkpoints for various diffusion plans. It also has a ton of changes since 0.1 as suggested by other users, like the the ability to point ENFUGUE to the directories of other Web UI installations to share models and other files.

This is not monetized software in any way; I simply built the tool I wanted to use, and wanted to share. Thanks you taking a look!