Comparing Claude, Gemini, Llama and ChatGPT - who was the winner?

A question I get asked quite a bit these days is “Which model is best?”

By model, I’m referring to a large language model, and it’s not the easiest question to answer because I think different models are good for different things.

But you can compare models performing individual tasks using a tool called Airtrain. It's super helpful for seeing how various large language models perform side-by-side.

This super quick video covers:

  • Setting up a free AirTrain account 💻

  • Choosing models to compare (I used Claude, Gemini, ChatGPT, and Llama)

  • Adjusting settings like model versions and temperature

  • Running a prompt across all models simultaneously

  • Analysing the results for quality and creativity

In this use case, you’ll see that Gemini was the winner of this challenge, but I have to say that the latest release of Claude (3.5 Sonnet) is really impressive, and I’ve been using it a lot lately. OpenAI (ChatGPT) still shines when using custom GPTs, and Perplexity AI is my go-to for research questions needing sources (e.g., property data like days on market).

Remember, there's no "best" model overall – it depends on your specific needs. That's why it's crucial to keep our options open and experiment with different models for various tasks.

Happy hunting! 🤖

