This application will consult documentation before generating a response. This is also known as RAG – retrieval augmented generation. This means that the AI will search for the required information to answer the question before the actual question is posed to the LLM.
Requirements: Deployed Azure LLM, deployed embedding AI, Azure Search Service.
Creating AI Objects
Firstly, create your LLM object and template as seen in this previous blog post, ensuring that the context can be added to your prompt in the same way as the question.
Following this, create a cognitive search function. This involves creating an AI embeddings object, like the previously mentioned LLM object. This embeddings AI will scan through the documents and return the relevant documentation. It looks through all the information stored in your index, such as where the information is kept, and compares it to the user’s question.
Vector Store
Next, establish a vector store. This facilitates a connection with the index where your chunked data is stored. The process of chunking data and uploading it to your blob storage and index will be discussed in a following blog post. Chunking data means that a document is separated into blocks of text. These are then added to an index, which is then used to perform the cognitive search. This step requires an Azure search service. Input the endpoint and key of your search service when creating your vector store. Also, include the index name and your embedding model. These elements link your vector store to your embeddings AI, search service, and data index. Now, you can activate your vector store by conducting a similarity search. Input the question (query parameter), the number of results you want to return (the K parameter), and the type of search you wish to perform (query_type parameter).
Here’s an example of how you can create a function to perform a cognitive search:
Cognitive Search and Invoking the Chain
Now you can use this function to perform a cognitive search. Call this function, create your chain, and format the retrieved chunks into a single string using the page_content attribute.
Finally, invoke the chain and input the question and context.
The process of posing a question, searching through the documents, and generating a response is relatively quick, even with many documents returned from the cognitive search.
Complete Code:
Using Cognitive Search, RAG, helps you improve your apps. You can now ask questions about your own documentation. It will retrieve the desired information using a cognitive search and send the retrieved documentation with the query to the LLM. The resulting answer properly answers the question if the answer can be found in the documents.
Subscribe to our RSS feed
Want to know more?
Contact Steven
IoT Data & AI Domain Lead - Data & AI Solution Architect