this post was submitted on 02 Oct 2023
31 points (97.0% liked)

LocalLLaMA

2590 readers
10 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 2 years ago
MODERATORS
 

Trying something new, going to pin this thread as a place for beginners to ask what may or may not be stupid questions, to encourage both the asking and answering.

Depending on activity level I'll either make a new one once in awhile or I'll just leave this one up forever to be a place to learn and ask.

When asking a question, try to make it clear what your current knowledge level is and where you may have gaps, should help people provide more useful concise answers!

you are viewing a single comment's thread
view the rest of the comments
[โ€“] hendrik@palaver.p3x.de 2 points 1 day ago* (last edited 1 day ago) (1 children)

From what I know, I assume yes, the relation between model size and speed/performance should be linear. Maybe there is some additional small overhead making it a bit faster or slower than expected. But I'm really not an expert on the maths, so don't trust me.

And maybe have a look at this bugreport: https://github.com/ggml-org/llama.cpp/issues/11332
I think it matches your situation. They resolve this by messing with the batch size and someone recommends not to use Vulkan on an iGPU.

[โ€“] corvus@lemmy.ml 1 points 1 day ago

Oh great, thanks