Dream team LLM & RAG: how the AI bot works

Modern RAG technology, scalable infrastructure and complete data control - a technically sound solution for companies.

The rms. AI Bot uses an innovative architecture that combines semantic insight with generative AI. Using embeddings, vector databases and a powerful language model, it generates relevant, up-to-date and precise answers - directly from your company content.

Find out how our system is structured, which technology components it uses and how you can benefit from it.

What is RAG?

A technology that really understands content.

Traditional searches compare words. Our solution goes further: content is converted into vectors through so-called embeddings and stored in a vector database - for example in ChromaDB, Pinecone, Milvus or Faiss. When a question is asked, it is also embedded and compared with the most relevant documents - the best content then serves as the context for a language model that generates the final answer.

Advantages at a glance:

Security & data protection: your data remains in your database environment - no external training.
Cost & resource efficiency: No time-consuming fine-tuning required - faster, more cost-effective.
Timeliness & reliability: Answers are based on up-to-date data.
Flexibility: Changes to data sources are possible without having to retrain the model.

Advantages of RAG

Security and data protection

At RAG, proprietary data remains in the secure database environment, enabling tighter access controls. During fine-tuning, the data is integrated into the model training, which can potentially lead to broader data access.

Cost and resource efficiency

Fine-tuning is computationally intensive and time-consuming, as it requires extensive training phases and data preparation. RAG avoids this training effort by retrieving the data dynamically, which is more cost-effective and faster to scale.

Timeliness and reliability

RAG can always access up-to-date data and therefore provide more accurate and trustworthy answers. Fine-tuning is based on a static training data set and may contain outdated knowledge.

Flexibility

RAG is particularly well suited to applications where the underlying data changes or expands frequently without the need to retrain the model. Fine-tuning is better for very specific, narrowly defined tasks, but requires retraining every time the data changes.

RAG search procedure

Enter the question

The visitor enters a question or search query into the system.

Vectorization of the question

The question is converted into a vector by a so-called embedding model, which represents the semantic meaning of the question.

Semantic similarity search

The vector database is queried with the question vector to find the semantically most similar documents or text passages.

Output of the most relevant results

The X most relevant documents (e.g. top 5 or top 10) are returned from the vector database.
-> Optionally, a reranker can be integrated, which improves the precision of the results by up to 35%.

Combination of question, context and prompt

The original question, the documents found and a prompt (task) are transferred to the language model (LLM).

Processing by the language model

The LLM generates an answer based on the question and the context information.

Output of the response

The generated answer is presented to the user. If required, links to relevant sources/pages are added.