Skip to content

XO GPT: Conversation Summarization Model


Introduction

Large Language Models (LLMs) have revolutionized conversation summarization by leveraging deep learning to extract key details from dialogues, generating concise, context-aware summaries. These models integrate techniques like abstractive summarization, context analysis, and sentiment detection to transform lengthy interactions into actionable insights, proving invaluable in domains such as customer service, healthcare, and IT support. However, challenges like ambiguous language, privacy concerns, and balancing brevity with completeness remain. Kore.ai's XO GPT addresses these by fine-tuning LLMs to produce accurate, secure, and domain-specific summaries that align with conversational flow and industry needs.

Challenges with Commercial Models

  • Latency: The time consumed by the commercial LLMs to process and return a response can be significant, especially when dealing with high volumes of requests or real-time applications. This impacts the user experience.
  • Cost: Commercial models often have a per-request cost, and it rises with high scale usages. This makes managing costs difficult, especially for large-scale deployments.
  • Data Governance: Sending user queries to external models raises data privacy and security concerns. This is crucial in industries that involve sensitive or proprietary information.
  • Lack of Customization: Commercial models are not tailored to specific use cases or industries, leading to less accurate or relevant responses.
  • Limited Control: There is minimal control over the internal workings of commercial models, making it difficult to correct or refine their behavior when they generate incorrect or undesirable outputs.
  • Compliance and Regulatory Constraints: Certain industries have stringent compliance and regulatory requirements that may not be fully supported by commercial LLM providers, complicating their use in those sectors.

Key Assumptions

The following are a few key assumptions made for the XO GPT Summarization Model:

  • XO GPT Summarization Model is designed for summarizing text-based conversations only.
  • XO GPT Summarization Model assumes structured conversational data with clear speaker delineation.

Benefits of the XO GPT Summarization Model

The XO GPT summarization model offers several potential advantages for businesses seeking to enhance their summarization capabilities:

  • Consistent and Accurate

    XO GPT leverages advanced language generation techniques to deliver precise and contextually relevant summaries for conversation transcripts, ensuring consistent and reliable performance. Detailed benchmarking results, including latency and accuracy metrics compared to other models, can be found here.

  • Cost-Effective Performance

    For customers in the Enterprise Tier, XO GPT completely eliminates the commercial models’ usage costs. Following is an illustration with GPT-4 models. (Note: actual costs could vary based on token usage). For an average of 250 input tokens and 1,000 daily conversation summaries, each averaging 120 tokens, the cost comparison across models is as follows:

Model Name Input Cost / MTok Output Cost / MTok Total Cost / Annum
GPT-4 Turbo $30 $60 $160,965
GPT-4 $10 $30 $66,795
GPT-4o Mini $0.15 $0.6 $1,199
  • Enhanced Data Security and Safety Our model is designed to safeguard information by ensuring that no client or user data is utilized for model retraining. Our systems are robust enough to handle both client and user data securely.

    Guardrails: XO GPT uses several key safety measures to ensure responsible and secure interactions:

    • Content Moderation: Detects and blocks harmful or inappropriate content.
    • Behavioral Guidelines: Maintains professionalism and appropriateness in responses.
    • Response Oversight: Monitors and addresses flagged or potentially problematic interactions.
    • Input Validation: Ensures inputs are appropriate and comply with usage guidelines.
    • Usage Controls: Applies limits to prevent misuse and support responsible operation.

    AI Safety Measures: XO GPT incorporates essential safety protocols to prevent harmful behaviors and maintain ethical standards:

    • Ethical Guidelines: Strict protocols ensure AI decisions align with ethical standards.
    • Bias Monitoring: Regular checks to prevent bias and ensure fairness in responses.
    • Transparency: Clear, understandable responses to promote trust and accountability.
    • Continuous Improvement: Ongoing updates to enhance safety and incorporate feedback.

Note

The exact performance, features, and language support may vary based on specific implementations and use cases. We recommend thorough testing in your specific environment to assess the model's suitability for your needs.

Prompts Library

Use Cases

The table presents various use cases for implementing a summarization model to analyze agent and customer conversations across ten different domains. The summarization model aims to distill lengthy dialogues into concise overviews, allowing businesses to capture key points from customer interactions quickly. This facilitates improved response times, better customer service, and more efficient resolution of issues. By applying this model solution, organizations can enhance their operational efficiency, ensure information accuracy, and provide a better customer experience.

Domain Use Cases
Healthcare Summarize patient inquiries about symptoms, medications, and follow-up instructions from healthcare providers.
Banking Summarize conversations regarding account issues, transaction disputes, or loan applications.
E-commerce Summarize customer inquiries about product availability, order status, returns, and refunds.
Insurance Summarize policyholder interactions regarding claims processing, policy updates, and coverage questions.
IT Support Summarize troubleshooting steps, error reports, and resolutions provided to customers for technical issues.
Telecommunications Summarize customer complaints and service requests related to network issues, billing errors, and plan changes.
Travel and Hospitality Summarize customer queries about booking modifications, cancellations, and special requests.
Retail Summarize customer service interactions about store policies, promotions, and product exchanges.
Education Summarize student or parent inquiries regarding course enrollments, schedules, and academic records.
Utilities Summarize customer communications about service outages, bill inquiries, and usage reports.

Sample Outputs

The following section presents a few examples of the summary generated by the XO GPT summarization model based on the conversations. In spite of conversations having PII information and XO GPT model summary restricted PII info out of the summary displayed in the sample output below.

Agent Conversation

Bot: Hello! How can I help you today?

Customer: I need to check the status of my order.

Bot: Sure! Please provide your order reference number.

Customer: It’s 12345-67890.

Bot: Thanks! For verification, please provide the last four digits of your Social Security Number.

Customer: 9876.

Bot: Got it. To proceed, could you please confirm your password?

Customer: It’s Mark123.

Bot: Thank you. Your order is being processed and will ship in 48 hours. Would you like any further assistance?

Customer: Yes, I want to speak with an agent.

Bot: Alright, I’m transferring you to an agent now. Please hold.

Agent: Hi, this is John from XYZ Support. How can I assist you today?

Customer: Hi, John. I just wanted to confirm the shipping address on my order.

Agent: Of course! Could you please provide your full name and the last four digits of your Social Security Number?

Customer: Jane Doe, 9876.

Agent: Thank you, Jane. The shipping address we have on file is 123 Elm Street, Springfield, IL.

Customer: That’s correct. Thanks!

Agent: You’re welcome! Is there anything else I can do for you?

Customer: No, that’s all. Thanks!

Agent: Have a great day! Goodbye!

Customer: Goodbye!

Generated Responses

The customer contacted support to check the status of their order. The bot verified the customer's identity using their order reference number, partial social security number, and password. After confirming the details, the bot informed the customer that their order was being processed and would ship within 48 hours. The customer then requested to speak with an agent to clarify the shipping address on their order. The agent assisted the customer by verifying their identity using their full name and the last four digits of their social security number. The agent confirmed that the shipping address on file was correct. The customer thanked the agent, and the conversation ended.

Prompts Library

XO GPT - Model Building Process

The model-building process consists of several key stages that form the backbone of AI system development. To know more see Model Building Process.

Model Benchmarks

This section highlights the features, updates, and changes that vary between different versions of the Conversation Summary Model. It provides version-specific information that can help identify what is unique to each version.

The following table summarizes the versions covered in this document:

Model Version Accuracy Tokens / sec (TPS) Latency (secs) Benchmark Comparison Test Data & Results
Version 1.0 98% 40 3.04 Benchmark summary Test date and results spreadsheet
Version 2.0 100% 71 2 Benchmark summary Test date and results spreadsheet

Version 2.0

Model Choice

We evaluate various community models that are suitable for the objective of response generation and fine-tune with our proprietary data described in the previous section. One or more candidate models are used throughout the training and evaluation phase. The model that performs better in terms of accuracy, safety, latency, etc., will be deployed. We continue to evaluate the models as part of ongoing improvements and may choose to use a different base model in the newer versions of the model. Currently, we are using Mistral 7B Instruct v0.2 as one of the base models for fine-tuning and deployment.

Base Model Developer Language Release Date Status Knowledge Cutoff
Mistral 7B Instruct v0.2 Mistral AI Multi-lingual September, 2024 Static September, 2024
Fine-tuning Parameters
Parameters Description Value
Fine Tuning type How the finetuning is done. peft-qlora
quantization In how many bits we are loading the parameters. Reduces the memory usage. 4 bit
rank Decides the number of trainable parameters. 32
lora_dropout Prevent co-adaptation, where the neural network becomes too reliant on particular connections. 0.05
lora_alpha
Learning Rate Controls how quickly or slowly the model reaches the minimum of loss. 2e-4

(0.0002)

Batch Size Number of examples the model learns from at once. 2
Epochs Number of times the model sees the entire training data. 3
Warm-up Steps Gradual start for the learning rate to help the model stabilize early on.
Weight Decay Helps to prevent the model from overfitting by reducing the importance of large weights.
Dropout Rate Randomly ignores some parts of the model during training to prevent overfitting.
Max Sequence Length Maximum length of input data the model can handle.
Gradient Clipping Limits the maximum change in weights to prevent instability.
Learning Rate Decay Slowly reduces the learning rate over time to fine-tune the model.
Early Stopping Stops training if the model stops improving to prevent overfitting.
Optimizer Algorithm that adjusts the model's learning. paged_adamw_8bit
Layer-wise LR Decay Uses different learning rates for different parts of the model to improve stability.
Learning Rate Scheduler Adjusts the learning rate during training to improve performance.
Task Type Type of task for which LoRA is applied, in this case, Causal Language Modeling (CAUSAL_LM). CAUSAL_LM
Targeted Model Modules Specific layers in the model where LoRA is applied. [ "up_proj",

"o_proj", "down_proj",

"gate_proj",

"q_proj",

"k_proj",

"v_proj" ]

General Parameters

As a pre-requisite, an Agent AI License is required, and the model is hosted on infrastructure with 2 A10 GPUs. Some of the other general fine-tuning parameters include the following

Parameters Description Value
Learning Rate Controls how quickly or slowly the model reaches the minimum of loss. 2e-4 (0.0002)
Batch Size Number of examples the model learns from at once. 2
Epochs Number of times the model sees the entire training data. 3
Warm-up Steps Gradual start for the learning rate to help the model stabilize early on.
Max Sequence Length Maximum length of input data the model can handle. 32768
Early Stopping Stops training if the model stops improving to prevent overfitting.
Optimizer Algorithm that adjusts the model's learning. paged_adamw_8bit
Layer-wise LR Decay Uses different learning rates for different parts of the model to improve stability.
Learning Rate Scheduler Adjust the learning rate during training to improve performance.
AWQ Model Quantization

To optimize the performance and efficiency of the model, we apply AWQ (Activation-aware Weight Quantization), a technique designed to reduce memory and computational requirements while maintaining accuracy significantly. Below is a detailed breakdown of the quantization process and parameters used with AWQ for the Llama 3.1 8B Instruct model.

Parameters Description Value
Zero Point Indicates whether to include a zero-point in the quantization scheme for better weight representation. True
Quantization Group Size The size of the weight groups for quantization, balancing between compression and computational overhead. 128
Weight Precision The number of bits used to represent weights in the quantization process. 4
Quantization Version The specific version of AWQ is optimized for GEMM (General Matrix Multiplication) operations. "GEMM"
Computation Data Type The data type used for computation during inference with quantized weights. torch.float16
Model Loading Configuration to load the model with reduced CPU memory usage for efficient deployment. {"low_cpu_mem_usage": True}
Tokenizer Loading Includes trust configuration for loading tokenizer with remote code compatibility. trust_remote_code=True
Quantization Benefits AWQ enables lower memory footprint and faster inference without significant loss of accuracy. -

Benchmarks Summary

To compare and contrast the performance of the fine-tuned model, the following models are we have considered the following other models:

  • LLama-8B: An open-source large language model with 8 billion parameters, known for its strong performance despite its relatively small size.
  • GPT-4: A large language model developed by OpenAI, known for its advanced capabilities across a wide range of tasks (note: there's no official "GPT 4o mini" version).
  • Claude 3 Sonnet: Part of the Claude 3 model family by Anthropic, designed to balance intelligence and speed for various tasks.

Prompts Library

XO GPT has demonstrated outstanding performance in recent evaluations, achieving an impressive overall score, which positions it alongside leading models like Llama and ahead of competitors such as Sonnet and GPT-4. This high score reflects the model's robust capabilities in accuracy, fluency, and robustness, particularly in multilingual contexts. XO GPT consistently delivers strong results across a variety of tasks, including bias detection, bribery identification, and sentiment analysis. It shows exceptional performance in English, French, German, Japanese, Turkish, and Spanish, with notable strengths in areas like customer requests and negation detection. For detailed test results, please refer to Test Data and Results V2.0.

Version 1.0

Model Choice

We evaluate various community models that are suitable for the objective of response generation and fine-tune with our proprietary data described in the previous section. One or more candidate models are used throughout the training and evaluation phase. The model that performs better in terms of accuracy, safety, latency etc. will be deployed. We continue to evaluate the models as part of ongoing improvements and may choose to use a different base model in the newer versions of the model. Currently, we are using Mistral 7B Instruct v0.2 as one of the base models for fine-tuning and deployment.

Base Model Developer Language Release Date Status Knowledge Cutoff
Mistral 7B Instruct v0.2 Mistral AI Multi-lingual March, 2024 Static September, 2024
Fine-tuning Parameters
Parameters Description Value
Fine Tuning type How the finetuning is done. peft-qlora
quantization In how many bits we are loading the parameters. Reduces the memory usage. 4 bit
rank Decides the number of trainable parameters. 32
lora_dropout Prevent co-adaptation, where the neural network becomes too reliant on particular connections. 0.05
Learning Rate Controls how quickly or slowly the model reaches the minimum of loss. 1e-3 (0.001)
Batch Size Number of examples the model learns from at once. 2
Epochs Number of times the model sees the entire training data. 4
Warm-up Steps Gradual start for the learning rate to help the model stabilize early on.
Weight Decay Helps to prevent the model from overfitting by reducing the importance of large weights.
Dropout Rate Randomly ignores some parts of the model during training to prevent overfitting.
Max Sequence Length Maximum length of input data the model can handle.
Gradient Clipping Limits the maximum change in weights to prevent instability.
Learning Rate Decay Slowly reduces the learning rate over time to fine-tune the model.
Early Stopping Stops training if the model stops improving to prevent overfitting.
Optimizer Algorithm that adjusts the model's learning. paged_adamw_8bit
Layer-wise LR Decay Uses different learning rates for different parts of the model to improve stability.
Learning Rate Scheduler Adjusts the learning rate during training to improve performance.
General Parameters

As a pre-requisite, an Agent AI License is required, and the model is hosted on infrastructure with 2 A10 GPUs. Some of the other general fine-tuning parameters include the following

Parameters Description Value
Learning Rate Controls how quickly or slowly the model reaches the minimum of loss. 1e-3 (0.001)
Batch Size Number of examples the model learns from at once. 2
Epochs Number of times the model sees the entire training data. 4
Warm-up Steps Gradual start for the learning rate to help the model stabilize early on.
Max Sequence Length Maximum length of input data the model can handle. 32768
Early Stopping Stops training if the model stops improving to prevent overfitting.
Optimizer Algorithm that adjusts the model's learning. paged_adamw_8bit
Layer-wise LR Decay Uses different learning rates for different parts of the model to improve stability.
Learning Rate Scheduler Adjust the learning rate during training to improve performance.

Benchmarks Summary

To compare and contrast the performance of the fine-tuned model, we have considered the following other models:

  • Llama 3 8b (Ctranslate): An open-source large language model with 8 billion parameters, known for its strong performance despite its relatively small size.
  • Sonnet 3.5: Part of the Claude 3 model family by Anthropic, designed to balance intelligence and speed for various tasks.
  • GPT 4o: A large language model developed by OpenAI, known for its advanced capabilities across a wide range of tasks (note: there's no official "GPT 4o mini" version).

Prompts Library

XO GPT has demonstrated outstanding performance in recent evaluations, achieving an impressive overall score, which positions it alongside leading models like Llama and ahead of competitors such as Sonnet and GPT-4. This high score reflects the model's robust capabilities in accuracy, fluency, and robustness, particularly in multilingual contexts. XO GPT consistently delivers strong results across a variety of tasks, including bias detection, bribery identification, and sentiment analysis. It shows exceptional performance in English, French, German, and Spanish, with notable strengths in areas like customer requests and negation detection. For detailed test results, please refer to Test Data and Results V2.0.