Focus on efficiency: The use of compact language models from OpenAI and Gemini

In practice, however, there is now a clear trend towards specialised, more compact systems. Leading providers such as OpenAI and Google (with its Gemini range) are increasingly turning to so-called Small Language Models (SLMs). These more compact variants, which include, for example, GPT-4o mini and the Gemini Flash and Nano models, are becoming increasingly important in practical applications.

An overview of the models

The providers are pursuing different technical approaches with their more compact models:

OpenAI GPT-4o mini: This model serves as the successor to older entry-level versions. It processes both text and image data (multimodality) and achieves results in logical tests that are comparable to older, significantly larger models, whilst incurring lower operating costs.

Google Gemini Flash & Nano: Whilst Gemini Nano is specifically designed for local execution directly on end devices such as smartphones and does not require an internet connection, the Gemini Flash series is optimised for cloud applications. It is characterised by high processing speed and a large context window.

Economic and technical reasons for compact models

The benefits of these leaner models can be attributed to four key factors:

Cost-efficiency
Large language models require significant computing power in data centres for every query. For businesses with a high daily volume of user queries, this represents a significant cost factor. Compact models such as GPT-4o mini significantly reduce the financial cost per query, making the widespread use of AI applications more cost-effective.
Lower latency
More compact models offer higher processing speeds. As fewer calculations need to be performed per token, responses are delivered more quickly. This is a crucial factor for real-time applications such as voice-based assistant systems or direct translation during a conversation.
Local execution and data protection
Models such as Gemini Nano enable data to be processed directly on the user’s device. As the information does not need to leave the device, this approach facilitates compliance with data protection regulations. Furthermore, functionality is maintained even when no network connection is available.
Reduced energy consumption
Training and running large AI models involves significant energy consumption. Smaller models require fewer resources for the same tasks, which reduces energy consumption in data centres and contributes to a better sustainability record.

Typical practical applications

In many scenarios, the use of a highly developed, large model is neither economically nor technically necessary. Companies are therefore increasingly opting for a model whose capacity is precisely tailored to the task at hand.

Development approach: For routine tasks, a model that uses as few resources as possible is chosen to optimise efficiency and speed.
Automated process chains (agentic workflows): For complex tasks, multiple AI calls are often chained together – for example, to analyse documents, filter data and generate reports. Compact models keep the costs of such multi-stage processes low.
Assistance systems in software development: Code-assistance tools use lightweight models to perform syntax corrections or simple code additions directly as the programmer types.
Customer service: Standardised support enquiries, such as checking a delivery status or assisting with password resets, can be answered accurately and cost-effectively using compact models.

Conclusion

Whilst very large language models remain necessary to push technological boundaries and solve highly complex, creative tasks, compact models handle everyday information processing. They enable the broad, cost-effective and rapid integration of AI functions into existing software infrastructures.