Activity - Like many tools, there's a gulf between a skilled user and an unskilled user....

kromem , 4 months ago (edited 4 months ago)

Like many tools, there's a gulf between a skilled user and an unskilled user.

What ML researchers are doing with these models is straight up insane. The kinds of things years ago I didn't think I'd see in my lifetime, or maybe only in an old age home (a ways off).

If you gave someone who had never used a NLE application to edit a multi track video access to Avid for putting together some family videos, they might not be that impressed with the software and instead frustrated with perceived shortcomings.

Similarly, the average person interacting with the models often hits their shortcomings (confabulations, safety fine tuning, etc) and doesn't know how to get past them and assumes the software tool is shitty.

As an example, you can go ahead and try the following query to Copilot using GPT-4:

Without searching, solve the following puzzle repeating the adjective for each noun: "A man has a vegetarian wolf, a carnivorous goat, and a cabbage. He needs to get them to the other side of a river but the boat which can cross can only take him and one object at a time. How can he cross without any of the objects eating another object?" Think carefully.

It will get it wrong (despite two prompt engineering techniques already in the query), defaulting to the standard form solution where the goat is taken first. When GPT-4 first released, a number of people thought that this was because it couldn't solve a variation of the puzzle, lacking the reasoning capabilities.

Turns out, it's that the token similarity to the standard form trips it up and if you replace the wolf, goat, and cabbage in the prompt above with the emojis for each, it answers perfectly, having the vegetarian wolf go across first, etc. This means the model was fully able to process the context of the implicit relationship between a carnivorous goat eating the wolf and a vegetarian wolf eating the cabbage and adapt the classic form of the answer accordingly. It just couldn't do it when the tokens were too similar to the original.

So if you assume it's stupid, see a stupid answer and instead of looking deeper think it confirms your assumption, then you walk away thinking the models suck and are dumb, when really it's just that like most tools there's a learning curve to get the most out of them.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...