Affordable LLMs: The Smart Choice for Production

Aakash Gupta
2 min readJul 25, 2024

--

In production, most people use lower-cost LLMs, not their expensive counterparts. Here’s how the latest models compare:

(This space is changing every 2–3 days)

1. Gemini Flash

It’s ratings in text and multi-modal are middling. But that might surprise you! And it has some advantages: in particular, processing video and it’s really long, 1M token context window. It’s cost reflects that performance, coming in the middle.

2. GPT-4o Mini

Even cheaper and higher on the benchmarks is OpenAI’s latest model, released only last week. Until Llama’s release this weekend, mini was the leader in all benchmarks. Being the cheapest as well is quite a feat. It replaces GPT-3.5 Turbo quite well.

3. Claude 3 Haiku

Haiku is a middling player across all the metrics. When it comes to multi-modal in particular it lags. Haiku’s strength is coding. And text-wise, some prefer it. It’s more of a taste-based choice, with its high-end pricing.

4. Llama 3.1–70B

The latest release to the model game has impressive benchmarks. The fully open-source model has the leading text MMLU benchmark score. And it has open-source pricing to match, though your mileage may vary. Overall, it’s a competitive player.

Why do I even bring up these cheaper LLMs, instead of talking about the big guys like Claude-3.5 Sonnet?

In this week’s deep dive, I sat down with Google Gemini PM Liam Bolling.

I learned that one of the best tips to saving money is to prototype and build on the bigger models but then to fine-tune the cheaper class of models for scale.

And that is the industry practice among enterprise players.

Hence this comparison.

What do you use?

--

--

Aakash Gupta

Helping PMs, product leaders, and product aspirants succeed