this post was submitted on 01 Feb 2025
157 points (100.0% liked)

TechTakes

1595 readers
322 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago
MODERATORS
 

Sam "wrong side of FOSS history" Altman must be pissing himself.

Direct Nitter Link:

https://nitter.lucabased.xyz/jiayi_pirate/status/1882839370505621655

you are viewing a single comment's thread
view the rest of the comments
[–] reallykindasorta@slrpnk.net 15 points 1 day ago* (last edited 1 day ago) (3 children)

Non-techie requesting a laymen explanation if anyone has time!

After reading a couple of”what makes nvidias h100 chips so special” articles I’m gathering that they were supposed to have a significant amount more computational capability than their competitors (which I’m taking to mean more computations per second). So the question with deepseek and similar is something like ‘how are they able to get the same results with less computations?’ and the answer is speculated to be more efficient code/instructions for the AI model so it can make the same conclusions with less computations overall, potentially reducing the need for special jacked up cpus to run it?

[–] mountainriver@awful.systems 4 points 4 hours ago

Good question!

The guesses and rumours that you have got as replies makes me lean towards "apparently no one knows".

And because it's slop machines (also referred to as "AI", there is always a high probability of some sort of scam.

[–] justOnePersistentKbinPlease@fedia.io 14 points 1 day ago (4 children)

From a technical POV, from having read into it a little:

Deepseek devs worked in a very low level language called Assembly. This language is unlike relatively newer languages like C in that it provides no guardrails at all and is basically CPU instructions in extreme shorthand. An "if" statement would be something like BEQ 1000, where it goes to a specific memory location(in this case address 1000 if two CPU registers are equal.)

The advantage of using it is that it is considerably faster than C. However, it also means that the code is mostly locked to that specific hardware. If you add more memory or change CPUs you have to refactor. This is one of the reasons the language was largely replaced with C and other languages.

Edit: to expound on this: "modern" languages are even slower, but more flexible in terms of hardware. This would be languages like Python, Java and C#

[–] froztbyte@awful.systems 6 points 8 hours ago (1 children)

for anyone reading this comment hoping for an actual eli5, the "technical POV" here is nonsense bullshit. you don't program GPUs with assembly.

the rest of the comment is the poster filling in bad comparisons with worse details

[–] justOnePersistentKbinPlease@fedia.io 0 points 5 hours ago (1 children)

For anyone reading this comment, that person doesnt know anything about assembly or C.

[–] froztbyte@awful.systems 3 points 4 hours ago* (last edited 4 hours ago)

yep, clueless. can't tell a register apart from a soprano. and allocs? the memory's right there in the machine, it has it already! why does it need an alloc!

fuckin' dipshit

next time you want to do a stupid driveby, pick somewhere else

[–] V0ldek@awful.systems 11 points 11 hours ago* (last edited 6 hours ago) (2 children)

This is a really weird comment. Assembly is not faster than C, that's a nonsensical statement, C compiles down to assembly. LLVM's optimizations will most likely outperform or directly match whatever hand-crafted assembly you write. Why would BEQ 1000 be "considerably faster" than if (x == y) goto L_1000;? This collapses even further if you consider any application larger than a few hundred lines of code, any sensible compiler is going to beat you on optimizations if you try to write hand-crafted assembly. Try loading up assembly code and manually performing intraprocedural optimizations, lol, there's a reason every compiled language goes through an intermediate representation.

Saying that C# is slower than C is also nonsensical, especially now that C# has built-in PGO it's very likely it could outperform an application written in C. C#'s JIT compiler is not somehow slower because it's flexible in terms of hardware, if anything that's what makes it fast. For example you can write a vectorized loop that will be JIT-compiled to the ideal fastest instruction set available on the CPU running the program, whereas in C or assembly you'd have to manually write a version for each. There's no reason to think that manual implementation would be faster than what the JIT comes up with at runtime, though, especially with PGO.

It's kinda like you're saying that a V12 engine is faster than a Ferrari and that they are both faster than a spaceship because the spaceship doesn't have wheels.

I know you're trying to explain this to a non-technical person but what you said is so terribly misleading I cannot see educational value in it.

[–] froztbyte@awful.systems 6 points 8 hours ago

and one doesn't program GPUs with assembly (in the sense as it's used with CPUs)

[–] justOnePersistentKbinPlease@fedia.io 0 points 5 hours ago (1 children)

I have have crafted assembly instructions and have made it faster than the same C code.

Particular to if statements, C will do things push and pull values from the stack which takes a small but occasionally noticeable amount of cycles.

[–] khalid_salad@awful.systems 1 points 2 hours ago* (last edited 2 hours ago)

python, what are you doing?"

idk, I'm written in C, it does things push and pull values from the stack, have you tried assembly, it's faster

[–] msage@programming.dev 1 points 12 hours ago (2 children)

Putting Python, the slowest popular language, alongside Java and C# really irks me bad.

The real benefit of R1 is Mixture of Experts - the model is separated into smaller sections, that are trained and used independently, meaning you don't need the entire model to be active all the time, just parts of it.

Meaning it uses less resources during training and general usage. For example instead of 670 billion parameters all the time, it can use 30 billion for specific question, and you can get away with using 2% of the hardware used by competition.

[–] UndercoverUlrikHD@programming.dev 0 points 1 hour ago (1 children)

Putting Python, the slowest popular language, alongside Java and C# really irks me bad.

I wouldn't call python the slowest language when the context is machine learning. It's essentially C.

[–] msage@programming.dev 1 points 1 hour ago (1 children)

Python is still the slowest, it just utilizes libraries written in C for this specific math.

And that maths happens to be 99% of the workload

I used them as they are well known modern languages that the average person might have heard about.

[–] fartsparkles@lemmy.world 7 points 1 day ago (1 children)

I’m sure that non techie person understood every word of this.

[–] blakestacey@awful.systems 16 points 1 day ago

And I'm sure that your snide remark will both tell them what to simplify and explain how to do so.

Enjoy your free trip to the egress.

[–] fallowseed@lemmy.world 3 points 20 hours ago* (last edited 20 hours ago) (1 children)

i read that that the chinese made alterations to the cards, as well-- they dismantled them to access the chips themselves and were able to do more precise micromanagement that cuda doesn't support, for instance.. basically they took the training wheels off and used a more fine-tuned and hands-on approach that gave them some serious advantages

[–] froztbyte@awful.systems 5 points 16 hours ago (1 children)
[–] fallowseed@lemmy.world -1 points 15 hours ago (1 children)
[–] froztbyte@awful.systems 6 points 15 hours ago* (last edited 15 hours ago) (2 children)

okay so that post’s core supposition (“using ptx instead of cuda”) is just ~~fucking wrong~~ fucking weird and I’m not going to spend time on it, but it links to this tweet which has this:

DeepSeek customized parts of the GPU’s core computational units, called SMs (Streaming Multiprocessors), to suit their needs. Out of 132 SMs, they allocated 20 exclusively for server-to-server communication tasks instead of computational tasks

this still reads more like simply tuning allocation than outright scheduler and execution control (which your post alluded to)

[x] doubt

e: original wording because cuda still uses ptx anyway, whereas this post looks like it’s saying “they steered ptx directly”. at first I read the tweet more like “asm vs python” but it doesn’t appear to be what that part meant to convey. still doubting the core hypothesis tho

[–] froztbyte@awful.systems 6 points 15 hours ago* (last edited 15 hours ago) (1 children)

sidebar: I definitely wouldn’t be surprised if it comes to this overall being a case of “a shop optimised by tuning, and then it suddenly turns out the entire industry has never tried to tune a thing ever”

because why try hard when the money taps are open and flowing free? velocity over everything! this is the bayfucker way.

[–] skillissuer@discuss.tchncs.de 3 points 7 hours ago (1 children)

ah yes the ultimate american NOBUS - we can throw money at the problem until it disappears

[–] froztbyte@awful.systems 3 points 7 hours ago (1 children)

it might disappear under the gigantic heap of money but gosh darn it we can KEEP HEAPING

[–] froztbyte@awful.systems 3 points 7 hours ago

I do sorta get the idea that this is (one of the reasons) exactly why 'ole felon is trying to get his hand on all the funding faucets

[–] fallowseed@lemmy.world -3 points 15 hours ago (1 children)

well you're always free to doubt and do your own research-- as i mentioned- it is something i read and between believing what the US tech bros are saying when all their money and hegemony is on the line vs what the chinese have given up for free-use, i am going to go out on a limb and trust the chinese. you're free to make your own decisions in this regard and kudos for having your own mind.

[–] froztbyte@awful.systems 7 points 14 hours ago (1 children)

mine isn’t a “USA v China: Jelly Wrestling Deluxe” comment and you’re not really understanding the point

[–] fallowseed@lemmy.world -3 points 14 hours ago (1 children)

what is your point? i thought i was giving a "explain like i'm 5" answer to a guy asking for one... you came along asking me to show sources... now this?

[–] froztbyte@awful.systems 7 points 14 hours ago (1 children)

the point is that your eli5 is unfounded rumour hearsay bullshit (and thus it’s entirely pointless to spread it), then when giving you a relatively gentle indication of that you decided to cosplay an ostrich

pro-tip: if it ain’t something you actually understand something about, probably best to avoid uncritically amplifying shit about it

[–] fallowseed@lemmy.world -3 points 14 hours ago (2 children)

so you're saying i'm wrong and i'm spreading misinfo... its somehow wrong that china got more juice out of the cards by bypassing cuda to better micromanage some aspect of the process?

[–] self@awful.systems 5 points 14 hours ago (1 children)

its somehow wrong that china got more juice out of the cards by bypassing cuda to better micromanage some aspect of the process?

according to some shit called xatakaon paraphrasing the parts of Nazi social media frequented by goofy as shit AI grifters

[–] fallowseed@lemmy.world -3 points 14 hours ago (2 children)

that was the first source i found and used it .. and anyway you found the corroborating details yourself which supported my claim..

[–] froztbyte@awful.systems 5 points 14 hours ago

I know you’ve already been made to fuck off but “corroborating details” looooooool

[–] self@awful.systems 4 points 14 hours ago (1 children)

????? are goofy as shit AI grifters “corroborating details”?

[–] fallowseed@lemmy.world -3 points 14 hours ago (1 children)

thought i was talking to the other guy, didn't know i was getting tag teamed, you understand. (or you can choose not to i guess)

[–] self@awful.systems 6 points 14 hours ago (1 children)

didn’t know i was getting tag teamed, you understand. (or you can choose not to i guess)

jesus christ you’re going to make this entire thread this exact type of tedious nonsense aren’t you? time for you to fuck off out of this thread. TechTakes isn’t the place for you and your friends to salivate over a supposedly less shitty LLM or repost horseshit from e/a Twitter

[–] self@awful.systems 5 points 14 hours ago

also, holy fuck their entire post history on their 5 day old account is just them doing this rapid fire tedious shit and then throwing themselves dramatically on the ground when they get called out for it. good fucking riddance

[–] froztbyte@awful.systems 5 points 14 hours ago (1 children)

you are not tall enough for this ride, go try some candyfloss and walking the hall of mirrors instead

[–] fallowseed@lemmy.world -3 points 14 hours ago (1 children)
[–] froztbyte@awful.systems 5 points 14 hours ago

a klaxon going off for dumbasses would be a health-impacting constant noise, no thx