We asked GPT-4o and its rising rival Claude 3.5 Sonnet to write us their best stories. Here’s what came out alongside with our conclusions.
“Wait! I’ve got a joke to tell you!” I agree that if someone were to call you out like that in the street, you would most likely not wait a second… and rightfully so. But what if, instead of a person, it was an undead creature – called Bob – who told you that a few seconds before getting shot?1 Not the same, right.
You probably wonder why I am sharing this wacky story with you and what all this has to do with… AI? Well, I conducted a special experiment: to ask both GPT-4o and Claude 3.5 Sonnet to write their best stories using the exact same prompts and compare the results. Hoping that you’ll enjoy reading about it as much as I enjoyed doing it, I am bringing you with me to reread this study and determine the best storyteller among them.
The ultimate goal of the experiment was to assess the quality of Claude 3.5 Sonnet’s outputs to determine whether or not we should integrate this rising language model in the AI-driven products we are currently developing. In order to achieve this goal, I provided the two artificial intelligences – GPT and Claude – with mini-synopsis2 and asked them to invent stories and an article based on those ideas or synopsis. Those ideas were selected to span different universes and explore different styles of writing. I ran this experiment twelve times and yet started to notice patterns and features characterizing each AI.
GPT-4o’s poetic prose against Claude 3.5 Sonnet’s page-turners
As we were discovering the results, GPT-4o seemed right away more classic, even monotone. Whenever we ask it to write some literature, it takes a poetic tone which makes the storytelling less engaging and makes everything sound like a rosewater story. On the contrary, Claude 3.5 appears to be much more engaging, giving us a taste of more surprising and funnier adventures that keeps you awake.
Indeed, just take a look at the first few outputs to have an idea of GPT-4o’s style. Most of them – 10 out of 12 to be more specific – start with the thrilling and captivating… “In”. That sets a rough competition and gives Claude an edge even before the story begins.
Better storytelling, more captivating twists, Claude 3.5 Sonnet stays ahead of his concurrent. But something else caught my attention: their similarities.
Echoes in the algorithms and shared vocabulary
I found a curious resemblance between GPT-4o’s and Claude 3.5 Sonnet’s answers. They might both have their specific ways of displaying their ideas, but they often use the exact same expressions and it really caught my attention.3
For instance, for one prompt, one AI included a “worn, leather-bound journal with no title” and the other a “thin, leather-bound volume with no title“. For another one, both protagonists were racing against time and needed “to keep [us] on the edge of [our] seat until the very last page“. Of course, both stories were titled “Whisper [something]”. Another time again, both GPT and Claude talked about “spines of countless books”. Do they have overlapping training data sets? Sounds like it.
The verdict
In summary, our experiment revealed distinct storytelling approaches between the two advanced language models, GPT-4o and Claude 3.5 Sonnet. GPT-4o demonstrated a tendency towards more conventional, structured narratives, which some readers might find predictable and monotonous. Despite this, I found its stories occasionally more engaging than its rival’s, though this preference was met with skepticism by anyone who would discover it. Indeed, our entire team – me excluded – unanimously agrees that Claude is by far the most talented writer.
Anyway, more than just revealing my bad taste – how haters call it —, this small study also highlights similarities between the two LLMs. This suggests that the fundamental architectures and training methodologies of large language models often lead to some common patterns in their outputs and behaviors, even though they were developed by different organizations.
This quick investigation confirms that integrating Claude 3.5 Sonnet in our AI-driven products such as Ask Henry is definitely a decision worth exploring. It also suggests that this language model might surprise us in a positive manner in its abilities to analyze and manipulate data. It remains to be seen.
Do you agree or disagree with the observations inferred? Did Claude 3.5 Sonnet win your vote? I can’t wait to have your reactions and your analysis on the matter.
For the completeness of this article and for you to be able to make an informed decision, here are the generated stories along with their corresponding prompts. I also added the original French version of the experiment and its English translation at the very end. These were actually the first prompts I ran. It will allow you to understand the introduction and listen to our funniest AI-generated stories.
Promote a riding club in Martinique. This will be displayed on the first page of their website.
Write me a coming-of-age story set in the heart of Paris.
Give me the best synopsis, full of suspense about the story of your choice.
Give me the best synopsis, full of suspense on a mystery story full of twists and turns in 4 lines.
Enjoy your reading! 🍹
- To read the full story generated by Claude 3.5: click here. ↩︎
- Most ideas of prompts were found on socreate.it. ↩︎
- A completely subjective and inaccurate time quantifier for people who do not bother to count things properly. ↩︎