this post was submitted on 26 Mar 2024
16 points (80.8% liked)

Technology

34894 readers
927 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 5 years ago
MODERATORS
 

In the study, the UC Berkeley researchers used a video game called Overcooked, where two chefs divvy up tasks to prepare and serve meals, in this case soup, which earns them points. It’s a 2-D world, seen from above, filled with onions, tomatoes, dishes and a stove with pots. At each time step, each virtual chef can stand still, interact with whatever is in front of it, or move up, down, left or right.

The researchers first collected data from pairs of people playing the game. Then they trained AIs using offline RL or one of three other methods for comparison. (In all methods, the AIs were built on a neural network, a software architecture intended to roughly mimic how the brain works.) In one method, the AI just imitated the humans. In another, it imitated the best human performances. The third method ignored the human data and had AIs practice with each other. And the fourth was the offline RL, in which AI does more than just imitate; it pieces together the best bits of what it sees, allowing it to perform better than the behavior it observes. It uses a kind of counterfactual reasoning, where it predicts what score it would have gotten if it had followed different paths in certain situations, then adapts.

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here