Batch Testing - DialogGPT¶

Overview¶

Batch Testing is a comprehensive testing framework designed to evaluate and validate the accuracy of intent detection of a virtual assistant. It enables users to systematically test their virtual assistant’s ability to understand user requests across multiple conversation types including dialogs, FAQs, Knowledge (Search AI), and conversation intents. It also supports different model configurations and provides comprehensive performance metrics for both development and production environments.

Unlike traditional testing approaches, Batch Testing replicates the complete DialogGPT runtime pipeline, providing authentic performance insights that mirror real user interactions.

Key features

End-to-End Pipeline Testing: Processes each utterance through the full retrieval and LLM workflow, mirroring real-world behavior to uncover issues static testing might miss.
Model Configuration Flexibility: Supports testing across different combinations of embedding models and LLMs to identify the most effective configuration for your app.
Granular Performance Insights: Measure accuracy, precision, recall, and F1 score across all conversation types, including Dialogs, FAQs, Knowledge, and Conversation Intents.
Lifecycle Support: Enables batch testing for both in-development and published apps, supporting validation at any stage of the deployment lifecycle.

Supported Conversation Types

Single Intent
Multi Intent
Small Talk
Conversation Intent
No Intent
Ambiguous Intent
Answer Generation

How Batch Testing Works¶

Batch Testing replicates the actual runtime behavior by chaining retrieval and LLM calls, ensuring each test case goes through the complete conversation pipeline:

Query Rephrasing (if enabled)
Chunk Qualification from Dialogs, FAQs, and Search Index
Semantic Similarity Matching based on configured thresholds
LLM Processing for intent identification and fulfillment type determination

This approach provides dynamic testing that mirrors real user interactions, enabling accurate performance evaluation across different model configurations.

To access Batch Testing, navigate to Automation AI > Virtual Assistant > Testing > Regression Testing > Batch Testing. This can be accomplished in the following three steps.

Step 1. Test Suite Creation¶

To conduct a batch test, you must create your own test suites. You can create test suites in two ways: either by uploading a CSV or JSON file or manually creating them. Each test suite comprises multiple test cases, which include key fields such as user utterance, expected intent, and fulfillment type.

Uploading a File¶

This method enables you to add multiple test cases simultaneously. You can download the sample CSV or JSON file formats while creating the test suite.

Follow these steps:

Navigate to Automation AI > Virtual Assistant > Testing > Regression Testing > Batch Testing.
Click +New test suite.
Enter the test Name and Description.
Click Upload File. Select the file to upload and click Add to Suite.
Click Create Suite. The created test is displayed.

Quick Entry¶

Add one test case at a time using a form. The form includes mandatory fields like User Utterance, Fulfillment Type, and Expected Intent. You can review and edit the test case before adding it to the test suite. If the fulfillment type is answer generation, the expected intent is automatically selected as answer generation.

Follow these steps:

Navigate to Automation AI > Virtual Assistant > Testing > Regression Testing > Batch Testing.
Click +New test suite.
Enter the test Name and Description.
Click Quick Entry.
Enter the User Utterance, select the Fulfillment Category and Expected Intent.
Click Save and add another to the next test cases or click Add to suit.
Click Create Suite. The created test is displayed.

Step 2. Run Test Suite¶

After creating a new test suite, the execution stage runs the test suite through the complete retrieval and LLM pipeline, simulating live interactions using a set of model configurations to obtain the test results. You can run execution for both In-development and published versions. Additionally, you can add notes or reasons to record the purpose of the test run.

Follow these steps:

Navigate to Automation AI > Virtual Assistant > Testing > Regression Testing > Batch Testing.
Click Run Test Suite for the required suite.
Select the App Version, Orchestration Model, Prompt, and add Notes if required.

Note

The embedding model cannot be changed. For testing purposes, the DialogGPT embedding model is used.
1. Click Run Test to start batch test execution.
2. Once a batch test is completed, the results are displayed.

Step 3. Results and Analysis¶

The Results and Analysis stage evaluates performance using standardized intent detection metrics. It presents all the batch test results conducted so far. It enables users to compare different combinations of embedding and language models and make data-driven decisions using key metrics such as Accuracy, Precision, Recall, F1 Score, and many others more.

Follow these steps:

Navigate to Automation AI > Virtual Assistant > Testing > Regression Testing > Batch Testing and click the Test Suit to view the tests.
Click the summary icon to view the result. You can also download the report as a CSV file or delete the results.
The test result is displayed.
Click Configure View to add or remove the displayed metrics. Select the metric and click Apply.
Click any Intent to view Intent details.
Click any Test Case to view test case details.
Click Conversation Orchestration to request and response payload.