LocalLLaMA

2249 readers

1 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 1 year ago

MODERATORS

SkySyrup@sh.itjust.works

pax@sh.itjust.works

noneabove1182@sh.itjust.works

Best Upgrade Path for my Desktop (lemm.ee)

submitted 6 months ago by projectmoon@lemm.ee to c/localllama@sh.itjust.works

5 comments fedilink hide all child comments

Current situation: I've got a desktop with 16 GB of DDR4 RAM, a 1st gen Ryzen CPU from 2017, and an AMD RX 6800 XT GPU with 16 GB VRAM. I can 7 - 13b models extremely quickly using ollama with ROCm (19+ tokens/sec). I can run Beyonder 4x7b Q6 at around 3 tokens/second.

I want to get to a point where I can run Mixtral 8x7b at Q4 quant at an acceptable token speed (5+/sec). I can run Mixtral Q3 quant at about 2 to 3 tokens per second. Q4 takes an hour to load, and assuming I don't run out of memory, it also runs at about 2 tokens per second.

What's the easiest/cheapest way to get my system to be able to run the higher quants of Mixtral effectively? I know that I need more RAM Another 16 GB should help. Should I upgrade the CPU?

As an aside, I also have an older Nvidia GTX 970 lying around that I might be able to stick in the machine. Not sure if ollama can split across different brand GPUs yet, but I know this capability is in llama.cpp now.

Thanks for any pointers!

you are viewing a single comment's thread
view the rest of the comments

[–] CaptDust@sh.itjust.works 2 points 6 months ago (1 children)

Good call out on not mixing CUDA and ROCm, I wasn't aware of this

[–] OpticalMoose@discuss.tchncs.de 1 points 6 months ago

Yep, I had been hoping for the same thing.

Also, to @projectmoon@lemm.ee, you might want to wait and see what gets announced at Computex next month. Hopefully they announce some new stuff and the current gen prices drop.