This is an automated archive.
The original was posted on /r/singularity by /u/flexaplext on 2023-08-10 14:36:31+00:00.
Consider this:
You train an LLM on an incredibly large sample of high quality chess data, text on countless positions and strategies in a systematic and ordered format perfect for training and output. It's an effective system by which you could probably get the LLM to be reasonably good at chess. But then, what is the point?
Hikaru Nakamura, a well known chess player and streamer, often says that chess players are generally stupid. The point being, that chess is just a game, you get good at it and then you're just good at chess. It has little in the way of transferable skills. All you achieve is bloating up your model with a practically useless skill, making it less efficient. It's actually counterproductive and hurtful to train this data.
The question becomes: What do you actually want your AI to know and be good at? That is the vital point. What aspects of 'G' do you actually want in your AGI?
Humans are good at all sorts. And different people, adept at many different things. There's so many countless things to know and learn in the world. What we have is an incredible base point for learning, from which we can add most any desired skills. Then billions of bodies in the population that wander around for years learning countless different skills. You put us all together in a society and you get a system that is incredibly effective.
I feel our AI training probably needs to head in the same sort of direction.
We make a very capable 'base model' like GPT is. But way, way smaller by trying to curate the data to make it as small and efficient as possible. Only using the most useful data for the specific purpose of being this base platform.
Some data like basic programming, language skills, logic puzzles, maths I imagine is always useful. The key would be in working out specifically what data is best to use. It would be a treacherous job sifting through all the available data to do this though, it would be best to automate it with tests of smaller models if that is possible to do. A number of people in the field are now starting to see and say this. That the quality of the data is key and of utmost importance, not necessarily the quantity of data. However, more data is still always better and useful if it is of the right quality for a desired task. It's a fine balancing act.
But then the hard part that we can't really do yet: have the ability to add further data to said model without having to fully retrain the model. This, I believe, will be one of the most important breakthroughs for the field if it's managed. What would be even better is if you could have such a small and efficient base model separate and potentially local and then you simply download modules that are trained for specific roles. That can be easily swapped out or mixed and matched on top of your base model.
If we could do that then we could branch the AI off into countless different specialised AI models. Like a 'Doctor GPT' which is fed a mountain of medical data and any other data that might be transferably useful. You curate the data to the specific task that you want the model to be proficient at, as this will create the best and most efficient model for that purpose.
I see this as the future we will arrive at. You'll have a Doctor GPT, Electrician GPT, Geography Teacher GPT, Biology Teacher GPT, Lawyer GPT, etcetera. Just think of it like training towards the specific jobs that people currently do. What we have in society is a good and efficient working system.
Once we have the data that is curated for specific roles, we can keep adding to it and fine-tuning it, and if the base model ever gets updated and improved, we can just use that same data again on top of the new base model to retrain.
I see all this as the best way forward as it is a way more efficient system and much smaller in size to potentially keep or download locally. It greatly increases the chances of us being able to have local models on our PCs or smartphones that are actually truly capable and even world-leading in their domain. I can foresee these fine-tuned smaller models soon being more capable than the much larger general models like GPT in their perspective field.
But things like our current GPT will still be highly important and useful. Some jobs or roles will be more difficult to be specifically targeted though, but they could still be done through an additional separate, non-specialised and more general model. Things like a CEO GPT or an Investing GPT or an Inventor GPT or a Writer GPT or a Conversational GPT to be people's friend. They just likely need to be more generalised, like GPT is now, to perform such roles. This is indeed kind of what is being built now and why we see current models being best suited to these sorts of tasks.
It is more difficult to ascertain what data will be more useful to such roles. You could ask yourself: what exact knowledge would a model have to have in order to be a good friend and interesting for you specifically to talk to? You would want it to know about certain things and talk in a certain way to you whilst also not wanting it talking about certain things and talking in a wrong way to you. Care should still be taken to try and remove useless data from the training set as huge gains can still be made by doing so but it's rather hard to say exactly what such general models should really be capable of and be trained to know.