Selfhosted
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
-
No spam posting.
-
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
-
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
-
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
-
No trolling.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
view the rest of the comments
Hm... Alright, I'll have to take another look at it. I kinda gave up, figuring my old server just didn't have the specs for it
Specs? Try mistral with llama.ccp.
It has a Intel Xeon E3-1225 V2, 20gb of ram, and a Strix GTX 970 with 4gb of VRAM. I've actually tried Mistral 7b and Decapoda Llama 7b, running them in Python with Huggingface's Transformers library (from local models)
Yeah, it's not a potato but not that powerful eaither. Nonetheless, it should run a 7b/8b/9b and maybe 13b models easily.
That's your problem right here. Python is great for making llms but is horrible at running them. With a computer as weak as yours, every bit of performance counts.
Just try ollama or llama.ccp . Their github is also a goldmine for other projects you could try.
Llama.ccp can partially run the model on the gpu for way faster inference.
Piper is a pretty decent very lightweight tts engine that can be directly run on your cpu if you want to add tts capabilities to your setup.
Good luck and happy tinkering!
Ah, that's good to know! I'll give those other options a shot. Thank you so much for taking the time to help me with that! I'm very new to the whole LLM things, and sorta figuring it out as I go
Completely forgot to tell you to only use quantized models. Your pc can run 4bit quantized versions of the models I mentioned. That's the key for running llms on at consumer level hardware. You can later read further about the different quantizations and toy with other ones like Q5_K_M and such.
Just read phi-3 got released and apparently it's a 4B that reach gpt 3.5 level. Follow the news and wait for it to be add to ollama/llama.ccp
I became fascinated with llms after the first AI booms but all this knowledge is basically useless where I live, so might as well make it useful by teaching people what i know.