AI Can Tweet Better Than You
GPT-3, a philosophizing robot, and what's under the hood of AI models
I decided to start this newsletter to capture and share some of what I'm learning as I dedicate more time to understanding technology, science and AI.
I’m in the middle of some fun experiments that I’ll be sharing over the coming months. If you want them in your inbox, you can subscribe below:
Now, on to the fun...
Adventures with GPT-3
Last summer, I couldn’t spend 5 minutes on Twitter without hearing about a new AI model called GPT-3. From simple plain-text inputs, it could write stories, answer questions, creatively brainstorm, and even write code.
How could it do all these things that felt so… human? Was it just reciting patterns it had picked up or was it truly thinking? (I sometimes wonder the same thing about my two year old daughter).
When I got accepted to the beta version, the first thing I wanted to see was whether it could write. I decided to use book descriptions as a test because (a) they’re short and (b) they require lots of critical thinking and creativity.
Two immediate lessons I learned about using the model:
1) All GPT-3 does is text completion. Sometime this felt natural, but sometimes I needed to get a little creative to coax the task I wanted into that format.
To have it write a story, I’d start the story and have it keep going — easy enough.
But to have it brainstorm ideas, or write something original, I needed to give it text such that completing the text accidentally answers the question. You’ll see the example for book descriptions below, but I saw some people even go as far as to write screenplay style scripts where characters are solving problems so that it will fill in the rest of the scene with its answers.
2) You can’t (really) train GPT-3 for your specific task. To grossly oversimplify: Most AI models have two phases: training (they take in data and learn how to do some work) and inference (when they use what they’ve learned to solve problems on new data). But GPT-3's training is done in a “task agnostic” way, without knowing anything about what you're going to ask it.
Rather than having another training phase, all you get to do is give it a text input, which acts both as a micro-training on the task you need, and the prompt for the inference you’d like it to do.
After a bit of testing, the best results I got took this form:
Once it was spitting out intelligible descriptions, I built an app where our team could input bullet points, and the app would tack their bullet points onto my “training examples'', and send the combined text to GPT-3. Within a minute, you’d get back a fully written description.
It was pretty cool! Its consistency wasn't great, but about 20% of the time it would come back with something insightful that I couldn't imagine how it came from the bullet points I put in. We aren't using it as a company (we love our writers!) but it's fun to play around with, and we're considering using something similar for less important copy.
I made it public so you can mess around with it, if you're interested. Here's the link: https://description-bot.herokuapp.com/
The Wise Mthfkr
My impression at that point (which I still feel is true) was that the model was good enough to brainstorm things that humans could filter, but it wasn't good enough to deliver consistently. But I figured it would be fun to find out.
Could I train it to create content that people actually found valuable, without any human filtering?
I knew that without filtering, my robot friend would occasionally spit out complete gibberish. But what if I could find a place where his gibberish might fly, and even be viewed as poignant? I'm talking, of course, about the world of philosophical Twitter influencers.
I built an input that randomly shuffles a collection of a few hundred popular Tweets from Naval, Ed Latimore, etc. Then it outputs its own tweets, loads them into Ayrshare, and sends them out three times per day. Set it and forget it.
The result was Wise Mthrfkr. I let him run for two months and he managed to amass almost 500 followers. Not bad for a robot. 🤖
More impressive to me was that some of the tweets actually felt insightful. Here were a few of my favorites:
At age 20, you're a complete idiot. At age 40, you're the result of your complete idiocy.
It is in doing that we become.
People only quit bad habits for a few seconds at a time.
Those who were happiest during their life will be forgiven in the afterlife.
Anything, done poorly enough, becomes Art.
In the end, I found GPT-3 fun to play with, but ultimately not as useful as I hoped. The biggest problem for most practical uses was that it couldn't be further trained. Unlike other AI models that can be tuned for exactly what you need, this model is a black box. It can do some pretty incredible things, but I think in the short term, it will serve more of a role in brainstorming than actually producing content of value.
(Along those lines, a version of the above I may whip up is to have it scrape your most successful past tweets and use those to generate other tweets you might write, and then email you a few as inspiration every morning. More brainstorming partner, less content creator. If I have some time this month, I'll build it and share it here next time.)
How Is All This Possible?
"Any sufficiently advanced technology is indistinguishable from magic." - Arthur C. Clarke
It can be easy to see something like GPT-3 and just assume it’s magic. How does it know what to say? How is any of this possible?
But if I can summarize the overriding feeling I’ve got from the AI studying I’ve done so far, it’s that the big idea is actually surprisingly simple. There are a ton of complicated problems to solve in the space, but the underlying technology is anything but magic.
The simple idea is this: In most computer programs, you start with a set of rules (the program). When some new data is inputted, it runs through the rules, and creates some output. You can think of it like having a recipe — you add the ingredients, and you get the finished product.
With machine learning, the script is backwards. You start with many examples of ingredients and finished products, and the AI’s job is to find the recipe.
The process to do this was first laid out by Arthur Samuel in a 1962 paper. The key insight is that the ability to make a prediction can be boiled down to a series of weights (aka parameters) in a formula, and these parameters can be adjusted until they minimize the gap between prediction and reality.
Let’s think about a simple example: Imagine you’re trying to predict real estate prices in your city. There are any number of variables that could influence prices — bedrooms, bathrooms, square footage, neighborhood, etc. You could organize these variables into a formula like this:
You can guess at values for a, b, c, and d, and test it with the houses you have the data for. When you miss, you can use the derivative to find the slope towards the optimal values that best approximate the real prices. Once you’ve done that enough times, you can use those a, b, c, and d values with a new house that you don’t have the price of, and have a decent estimate for the price.
Your input data (bedrooms, bathrooms, etc) are combined with your parameters (a, b, c, d) in a model (the linear formula above) to create results. They’re compared to the labels (real values) to find the loss (how much you missed by), and you use the loss to determine how to adjust a, b, c, and d. Then you do it again. Over and over until your parameter values are good at making predictions.
Of course, this whole example is only capable of creating a linear model, and would only include three variables. The real world is rarely so simple. The art of machine learning is in discovering the right variables and the right models so that you can update the a, b, c, and d to find a mechanism that accurately predicts reality.
Neural networks are more complicated than a linear formula, but the same idea holds.
So how does GPT-3 work? There’s lots of brilliance in how they trained it (pre-normalization, transformers, etc) but at its core, it isn’t fundamentally different from the example above. The thing that makes it special is that it was trained on 175 billion parameters. It’s not a different structure than many of the simpler language models out there. It just did a lot more learning.
This isn’t at all obvious. There was no guarantee that more parameters alone would lead to a trained model that could have such wide-ranging abilities with so little additional training. It’s possible that we would have needed entirely new paradigms. But GPT-3 seems to have proven that, with enough extra parameters, surprisingly human behavior can naturally emerge. On the whole, this seems to be good news for AI, as it points to more doors naturally opening as computing power increases.
Cool Stuff
📖 Sum. If I know you well, I’ve probably already told you about this book. It’s a collection of 40 super short stories — thought experiments explored through the lens of what the afterlife could be like. They give me goosebumps. I still read it 1-2 times a year.
✏️ Expectations. Given how much I love Sum, I thought it would be fun to take a crack at writing my own “Sum-style” story. This was my first attempt. It was fun. I’ll probably do more of this.
🥘 Snow Days Healthy Pizza Bites. My friend Justin Mares recommended these in his newsletter last month. I innocently ordered a pack to try. Next thing I know, my wife ordered $80 of pizza bites last week. If you don’t want to be addicted, don’t try them.
📹 A fucking monkey played pong with its brain. What struck me about this is that it feels almost... too easy? We didn't need to decode the meaning of each neural firing to interpret them. We just needed to gather some data and pattern match. A lot of AI gives me a similar feeling. It's both inspiring (we can solve problems way beyond our understanding) but also kind of sad, like it's not as much of an achievement as I want it to be.
###
Thanks for reading! I had a lot of fun writing this and I’m excited to share more next month.
If you have an extra 30 seconds to spare, hit reply and let me know: What did you find interesting? What was boring? How could I make this more fun and valuable for you?
Until next month,
Zach
PS If you enjoyed this and want it in your inbox next month, subscribe here:
I used the description machine for a book I wrote a few years ago. The description is a little generic, but way better than I expected:
Book Title: The Business of Criminal Law
Author: Joshua Baron
Bullets:
- You have to get really good at getting great results for your clients.
- Then you can charge premium prices.
- Once you're in control of your career, you should use that control to make the world a better place.
Description (generated by bot):
The Business of Criminal Law is a hard-hitting, no-nonsense, and honest look at the reality of being a criminal defense attorney.
In this entertaining and informative book, Joshua Baron, a former prosecutor and criminal defense attorney, takes you behind the scenes of the legal world.
He opens the door to the inner workings of defense attorneys, from the mindset of the criminal defendant to the ethical dilemmas defense attorneys face, to the financial realities of being a criminal defense attorney.
Whether you are a current or aspiring criminal defense attorney, or someone with a general interest in criminal law, The Business of Criminal Law will give you unprecedented access to the reality of criminal law as it is practiced today.
Two notes:
- How did it know that I'm a former prosecutor?
- I love your idea of sending twitter inspiration drafts. I hate writing first drafts. I much prefer to edit something already written. The Description Bot would have actually really helped me to write the back cover of my book by giving me something to start with.