1. Data pre-processing and extraction
The knowledge base is built from various document types, which are typically in PDF format, including policy documents, operating procedures, and general terms and conditions.
SIGNAL IDUNA utilizes a hybrid approach that combines Layout Parser in Google Cloud Document AI and PDFPlumber to parse these PDFs and extract the text content. While the Layout Parser is responsible for extracting the text segments, SIGNAL IDUNA enhances the extraction of tables with PDFPlumber if the quality of the PDFs allows. The extracted texts are then cleaned, chunked by Google’s Gecko multilingual embedding model, and enhanced with additional metadata, enabling the ability to process and analyze the information later effectively.
For storing the vectorized texts, Google Cloud SQL for PostgreSQL is used with the pgvector PostgreSQL extension, which provides a highly effective vector database solution for our needs. By storing vectorized text chunks in Cloud SQL, SIGNAL IDUNA benefits from its scalability, reliability, and seamless integration with other Google Cloud services, while pgvector empowers efficient similarity search functionality.
2. Query augmentation
Query augmentation generates multiple queries to improve the formulation of user questions for both document retrieval from the vector store and answer generation. The original question is reformulated into several variants, creating three versions in total: the original query, a rewritten query, and an imitation query. These are used then to retrieve relevant documents and generate the final answer.
For the rewritten query, the system uses Gemini Pro 1.5 to correct spelling errors in the original question. Additionally, the query is expanded by adding synonyms for predefined terms and tagging specific terms (e.g., “remedies,” “assistive devices,” “wahlleistung/selective benefits”) with categories. The system also uses information about selected tariffs to enrich the query. For example, tariff attributes, such as brand or contract type, are extracted from a database and appended to the query in a structured format. These specific adjustments make it possible to handle special tariff codes and add further context based on tariff prefixes.
The imitation query uses Gemini Pro 1.5 to rephrase the question to mimic the language of technical insurance documents, improving the semantic similarity with the source material. It considers conversation history and handles age formatting.
3. Retrieval
First, the system checks the query cache, which stores previously answered questions and their corresponding correct answers. If the question, or one very similar to it, has already been successfully resolved, the cached answer is retrieved, helping to provide a rapid answer. This efficient approach ensures quick access to information and avoids redundant processing.
The accuracy of the cache is maintained through a user feedback loop, which identifies correctly answered questions to be stored in the cache through upvotes. A downvote on a cached answer triggers an immediate cache invalidation, ensuring only relevant and helpful responses are served. This dynamic approach improves the efficiency and accuracy of the system over time. If no matching questions are found in the query cache, the retrieval process falls back on the vector store, ensuring that the system can answer novel questions.
After retrieving any relevant information chunks from the query cache or vector store, the system uses the Vertex AI ranking API to rerank them. This crucial process analyzes various signals to refine the results, prioritizing relevance and ensuring the most accurate and helpful information is presented.
Ensuring complete and accurate answers is paramount during retrieval, and SIGNAL IDUNA found that some queries required information beyond what was available in the source documents. To address this issue, the system uses keyword-based augmentations to supplement the final prompt, providing a more comprehensive context for generating responses.
4. Generation
The answer generation process involves three key components: the user’s question with multiple queries, retrieved chunks of relevant information, and augmentations that add further context. These elements are combined to create the final response using a complex prompt template.
Delivering a near real-time experience is crucial for service agents, so SIGNAL IDUNA also streams the generated response. During development, minimizing latency based on the input posed a significant technical hurdle. To address this issue, SIGNAL IDUNA reduced processing times using asynchronous APIs to help stream data and handle multiple requests. Currently, the system has achieved an average response time of approximately 6 seconds, and SIGNAL IDUNA is experimenting with newer faster models to reduce this time even further.
5. Evaluation
Rigorous evaluation is essential for optimizing Retrieval Augmented Generation (RAG) systems. SIGNAL IDUNA uses the Gen AI evaluation service in Vertex AI to automate the assessment of both response quality and the performance of all process components, such as retrieval. A comprehensive question set, created with input from SIGNAL IDUNA’s service agents, forms the basis of these automated tests.
The evaluation results flow seamlessly into Vertex AI Experiments and Google Cloud BigQuery. This enables SIGNAL IDUNA to visualize performance trends and gain actionable insights using dashboards with Looker on Google Cloud.