To wrap up the year weāre talking about whatās breaking the internet, again. Yes, weāre talking about ChatGPT and weāre joined by our good friend Shawn āswyxā Wang. Between his writings on L-Space Diaries and his AI notes repo on GitHub, we had a lot to cover around the world of AI and what might be coming in 2023.
Also, we have one more show coming out before the end of the year ā our 5th annual āState of the logā episode where Adam and Jerod look back at the year and talk through their favorite episodes of the year and feature voices from the community. So, stay tuned for that next week.
Matched from the episode's transcript š
Jerod Santo: Well, so I donāt know if you know, but the Substack itself got its start because I listened to the Simon episode, and I was like, āNo, no, no. Spellcasting is not the way to view this thing. Itās not something we glorify.ā And thatās why I wrote āMultiverse, not Metaverseā, because the argument was that prompting is ā you can view prompting as a window into a different universe, with a different seed, and every seed is a different universe. And funny enough, thereās a finite number of seeds, because basically, Stable Diffusion has a 512x512 space that determines the total number of seeds.
So yeah, prompt engineering [unintelligible 00:04:23.23] I have to say this is not my opinion. Iām just reporting on what the AI thought leaders are already saying, and I just happen to agree with it, which is that itās very, very brittle. The most interesting finding in the academic arena about prompt engineering is that default GPT-3, they ran it against some benchmarks and it came up with like a score of 17 out of 100. So thatās a pretty low benchmark of like just some logical, deductive reasoning type intelligence tests. But then you add the prompt āLetās think step by stepā to it, and that increases the score from 17 to 83⦠Which is extremely ā like, that sounds great. Like I said, itās a magic spell that I can just kind of throw onto any problems and make it think better⦠But if you think about it a little bit more, like, would you actually use this in a real work environment, if you said the wrong thing and it suddenly deteriorates in quality - thatās not good, and thatās not something that you want to have in any stable, robust product; you want robustness, you want natural language understanding, to understand what you want, not to react to random artifacts and keywords that you give.
Since then, we actually now know why āLetās think step by stepā is a magic keyword, by the way, because ā and this is part of transformer architecture, which is that the neural network has a very limited working memory, and if you ask a question that requires too many steps to calculate the end result, it doesnāt have the working memory to store the result, therefore it makes one up. But if you give it the working memory, which is to ask for a longer answer, the longer answer stores the intermediate steps, therefore giving you the correct result.