Answers Generation¶
This section is used to configure the type of answers to be presented to the users. There are two types of Answer generation techniques supported:
- Extractive Answers: The topmost chunk retrieved in response to the user query is directly presented to the user as answers. The extractive answers are exact content retrieved from the chunks without any change in text. Provide the following configurations for extractive answers.
- Response Length- This is the expected length of the answer, in tokens.
-
Generative Answers: The top chunks retrieved in response to the user query are sent to the configured LLM, which generates a paraphrased answer from the content in the chunks. Integrate LLM and Enable Answer Generation in the Generative AI Tools configuration.
- Chunk Order: This configuration sets the order of qualified chunks sent to the LLM in the Prompt. The order of data chunks can affect the context and thereby, the results of a user query. The decision to use a specific chunk order should align with the goals of the task and the nature of the data being processed.
- Most to Least Relevant: In this case, the chunks are added in descending order of relevance, i.e., highest relevance to the lowest, followed by the query. For instance, if the top five chunks are to be sent to the LLM, the most relevant chunk is added first, and the least relevant chunk is added at the end.
- Least to Most Relevant: In this case, the chunks are added in ascending order of relevance. The least relevant chunk is added first, and the most relevant chunk is at the end, followed by the query.
- Max tokens for chunks: This parameter specifies the total number of tokens that can be included in the chunks sent to the LLM for processing. This allows users to fully utilize the LLM’s context-handling capabilities. This has a default value of 20,000 and can take a maximum value of 10,00,000.
The context size of an LLM refers to the maximum number of tokens the model can process in a single interaction. This includes:
* Tokens for the prompt, instructions and context information sent to the LLM. * Tokens for the Output, i.e. the response generated by the LLM.
To determine the maximum value of this parameter, subtract the tokens used for both the prompt and the output from the maximum context size of the language model (LLM). For instance, for a 4,096-token context, if the prompt uses 500 tokens and the response uses 500 tokens, the remaining 3096 tokens can be used for sending the chunks. This is the maximum value that this parameter can take. If each chunk is of 500 tokens, top 6 chunks are sent to LLM as context. If, however, a limited number of chunks are to be sent, lets say, top 3, set this field to 1500.
- Select Generative Model: Select the model for generating answers. If multiple models are configured, all configured models will be listed.
- Answer Prompt: Select the prompt to be sent to the model to generate answers.
- Temperature: This parameter controls the randomness of the output. It affects how deterministic or creative the responses are, and tweaking the temperature can significantly change the generated answers. The lower the temperature, the lesser the randomness.
- Response Length: This is the expected length of the answer, in tokens.
- Chunk Order: This configuration sets the order of qualified chunks sent to the LLM in the Prompt. The order of data chunks can affect the context and thereby, the results of a user query. The decision to use a specific chunk order should align with the goals of the task and the nature of the data being processed.