this post was submitted on 23 Jan 2024
15 points (77.8% liked)
ChatGPT
8909 readers
1 users here now
Unofficial ChatGPT community to discuss anything ChatGPT
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Surprised nobody mentioned this: Most of these models use tokenization; they group words into groups of symbols like "ea" and "the" and "anti" - they don't pick which key to press for the text, they pick which bunch of keys to press. These are called tokens. I believe there are tokens it just can't output, or tokens that are extremely unlikely. I could imagine that "etc." and "...." are tokens with relatively high probabilities, but perhaps "etc..." doesn't break into a nice set of them? (or the tokens it can be broken into all have extremely low weights for the model).