The rms. AI Bot.

ChatGPT doesn't know your products, our AI bot does.

Would you like customer inquiries on your website to be answered around the clock? Do your employees struggle to quickly find the right information in internal documents? Our AI bot answers questions about your products immediately and precisely. It answers questions specifically about your company or topic-specific content. Compared to a traditional search, the input can be real questions and a dialog can develop. The underlying data can be complete websites, PDFs, Word files, text documents or exports from databases.

The AI bot can communicate in "simple language" (configurable), integrates team solutions such as Slack and Confluence directly and also has JWT authentication so that access can be restricted to certain user groups.

Register now Test the chatbotDownload whitepaper

True multilingualism with DeepL integration, Works with all current LLMs, Transparent costs, Widget integration, 100% GDPR compliant, Automatic capture of all content via scraping & crawling, Setting for "simple language", Slack and Confluence integration, JWT authentication

Application scenarios for the chatbot

Chat on a website

The AI Bot answers questions about all content that can be found on a website. If required, these can be enriched with PDF files or other content. The answers can contain links to content pages (products, offers, help, ...).

Search via internal company data

A bot makes internal company data easily accessible to employees. Manual searches in documents, knowledge databases, support databases, etc. are no longer necessary and knowledge transfer becomes more efficient.

Support

Customer inquiries can be compared with existing support databases and answers can be pre-formulated based on previous results.

Multilingualism in real time

If required, inquiries can be answered in other languages, even if the content of a website is only available in German.

Internal data transfer

The underlying technology of the bot can be operated on a company-internal server if required, which means that no data is transferred to external systems.

Slack and Confluence integration / desktop application

The AI Bot has dedicated interfaces to two central platforms for teamwork: Slack and Confluence. It can also be operated as an installable desktop application.

From November 2025: The rms. Phonebot

Improve your customer service with the Phonebot, a chatbot with a telephone system interface. It is available 24/7 and answers customer questions immediately, relieving your employees. The Phonebot reduces operational costs by automating standard processes. It uses artificial intelligence and speech recognition to answer calls and understand requests. If necessary, it seamlessly forwards calls to the right employee.

Request test access

Chatbot live demo: RAG, vectors and GDPR explained

The rms. Chatbot: Intelligent answers with full data control

Publicly accessible chatbots such as ChatGPT, Gemini or Perplexity have serious disadvantages when it comes to processing individual queries or queries on current topics: they either cannot answer them at all because internal company data was not part of their training data or was not available at the time of training (e.g. current news). Or they answer the questions based on general knowledge, which can lead to deviations or statements that do not correlate directly with their own content. Other brand names may be mentioned, links may lead to competitors' sites, etc. Questions about internal company data (support databases, internal documents, research databases, etc.) can generally not be answered at all.

Added value for you and your customers

Personalized recommendations

The chatbot can analyze user behavior and previous requests to recommend products, services or content that are truly relevant. This increases relevance and sales.

Appointments & bookings

Customers can book appointments, reserve a table in a restaurant or request a callback directly via the chatbot. This makes the service accessible and convenient.

Real-time status updates

Whether it's about an order, the dispatch of a delivery or the processing status of an inquiry - the chatbot can inform the customer in real time without them having to search or call for a long time.

Multilingualism

In order to serve international markets or address a diverse customer base, the chatbot can communicate in several languages. This increases trust and reach.

Feedback interface

The chatbot can ask for feedback after an interaction. This provides valuable insights into customer satisfaction and helps to continuously improve the service.

Seamless transition to human employees

If the chatbot is unable to resolve a query, it intelligently forwards the customer to the right human contact - including the chat history. This ensures a smooth transition and prevents frustration.

Phone Agents

Expand the chatbot with a voice interface for the automatic processing of caller inquiries. This means your customer service is also available by phone around the clock.

Interfaces to Confluence, Office & Google Drive

Connect the chatbot directly to your internal knowledge platforms for precise access to articles, documents and specifications.

Setting for "Simple language"

The aim of plain language is to formulate texts and content in such a way that they can be understood quickly, easily and completely by the widest possible target group. It is a tool for accessibility and inclusion whose benefits extend far beyond people with reading difficulties. With the rms. Chatbot gives you the option of outputting all responses in plain language (configurable via the chatbot backend).


Retrieval Augmented Generation (RAG)

Here, all available data is first converted into so-called numerical vectors that capture the semantic meaning of the content and stored in suitable databases, so-called vector databases. Examples include Chroma, Pinecone, Faiss, Elasticsearch and Milvus. The vector databases enable a semantic similarity search, i.e. they find content that is similar to the search query in terms of content or context, even if the exact words do not match. This is comparable to applications such as searching for similar images or recognizing a song in Shazam by singing it aloud. In the RAG process, the vector database is searched for the most relevant documents in response to a user query. These documents are then used as context to make the LLM's response more informative and accurate. This significantly reduces the limitations of LLMs in terms of knowledge currency and source citation. If required, a RAG system can be operated completely on its own server without the use of external LLM providers (e.g. OpenAI, Gemini or Perplexity).

Advantages of RAG

Security and data protection

At RAG, proprietary data remains in the secure database environment, enabling tighter access controls. During fine-tuning, the data is integrated into the model training, which can potentially lead to broader data access

Cost and resource efficiency

Fine-tuning is computationally intensive and time-consuming, as it requires extensive training phases and data preparation. RAG avoids this training effort by retrieving the data dynamically, which is more cost-effective and faster to scale

Timeliness and reliability

RAG can always access up-to-date data and therefore provide more accurate and trustworthy answers. Fine-tuning is based on a static training data set and may contain outdated knowledge.

Flexibility

RAG is particularly well suited to applications where the underlying data changes or expands frequently without the need to retrain the model. Fine-tuning is better for very specific, narrowly defined tasks, but requires retraining every time the data changes

RAG search procedure

Enter the question

The visitor enters a question or search query into the system.

Vectorization of the question

The question is converted into a vector by a so-called embedding model, which represents the semantic meaning of the question.

Semantic similarity search

The vector database is queried with the question vector to find the semantically most similar documents or text passages.

Output of the most relevant results

The X most relevant documents (e.g. top 5 or top 10) are returned from the vector database.

Combination of question, context and prompt

The original question, the documents found and a prompt (task) are transferred to the language model (LLM).

Processing by the language model

The LLM generates an answer based on the question and the context information.

Output of the response

The generated answer is presented to the user. If required, links to relevant sources/pages are added.

Data protection

Data management can be 100% controlled using RAG. For example, the following scenarios can be mapped. It is also possible to use your own API keys.

API - OpenAI

API - OpenAI

Storage of all vectorized data locally in a vector database Processing of the data (relevant results, question, prompt) within an OpenAI language model (GPT4, GPT4-mini, GPT4-nano)

API - Gemini

API - Gemini

Storage of all vectorized data locally in a vector database Processing of the data (relevant results, question, prompt) within a Gemini language model.

API - Ionos

API - Ionos

Opensource LLMs, hosting in Germany Storage of all vectorized data locally in a vector database Processing of the data (relevant results, question, prompt) within an opensource language model hosted by Ionos in Germany (Llama 3.1 8B Instruct, Mistral 7B Instruct, Code Llama 13B Instruct, ...)

Dedicated AI server - e.g. Hetzner (open source LLMs, hosting in Germany)

Dedicated AI server - e.g. Hetzner (open source LLMs, hosting in Germany)

Storage of all vectorized data locally in a vector database Processing of the data (relevant results, question, prompt) within an open source language model, which is operated on a dedicated server (phi4, Llama 3.1 8B Instruct, Mistral 7B Instruct, Code Llama 13B Instruct, or all models compatible with llama)

Local AI server - (open source LLMs, hosting in own data center / network)

Local AI server - (open source LLMs, hosting in own data center / network)

Storage of all vectorized data locally in a vector database Processing of the data (relevant results, question, prompt) within an open source language model, which is operated on a dedicated server (phi4, Llama 3.1 8B Instruct, Mistral 7B Instruct, Code Llama 13B Instruct, or all models compatible with llama)

Never be put on hold again: request access now.