AutoQA - GenAI By Question Metric Prompting Guide¶
Overview¶
This guide helps you craft effective prompts for LLM-based adherence detection that achieve high precision and recall. Well-structured prompts are specific, measurable, and account for real-world conversation variations.
Prompt Architecture Framework¶
Core Components¶
Every effective AutoQA prompt should include:
- Context Setting - Define the evaluation scenario
- Pass Criteria - Specific behaviours that constitute success
- Fail Criteria - Clear indicators of non-compliance
Template Structure¶
CONTEXT: [Conversation type and evaluation scope]
PASS CRITERIA: [Specific behaviours indicating adherence]
- Look for: [required elements]
- Acceptable variations: [alternative approaches]
FAIL CRITERIA: [Behaviours indicating non-adherence]
- Missing: [critical elements]
- Inadequate: [insufficient attempts]
Prompt Development Process¶
Step 1: Define Your Success Criteria¶
Ask yourself:
- What specific actions/words indicate adherence?
- What variations are acceptable?
- What constitutes a clear failure?
- How granular should the evaluation be?
Step 2: Create Measurable Standards¶
Ensure criteria are:
- Observable in conversation transcripts
- Objective rather than subjective
- Specific enough to avoid interpretation gaps
- Comprehensive enough to cover typical scenarios
Step 3: Test Specificity¶
Validate your prompt by asking:
- Could two evaluators reach different conclusions?
- Are there ambiguous terms that need clarification?
- Does it clearly differentiate between pass and fail scenarios?
Example Prompts¶
Greeting Evaluation¶
Poor Prompt Examples (Low Precision/Recall)¶
Example 1: "Check if the agent was polite when greeting the customer."
Example 2: "Did the agent say hello properly?"
Example 3: "Evaluate the quality of the agent's opening statement."
Why these fail:
- "Polite" is subjective and unmeasurable
- "Properly" lacks specific criteria
- "Quality" provides no actionable evaluation framework
Robust Prompt (High Precision/Recall)¶
CONTEXT: Evaluate whether the agent provided a complete and professional greeting to establish the service interaction.
PASS CRITERIA:
Look for the presence of these four key elements:
- Acknowledgement/welcome phrase (e.g., "Hello", "Good morning", "Thank you for calling")
- Company/department identification (e.g., "ABC Company", "Customer Service", "Technical Support")
- Agent identification (name, employee ID, or role)
- Offer of assistance (e.g., "How can I help you?", "What can I do for you today?")
Acceptable variations:
- Elements may appear in any order within the greeting sequence
- Casual but professional tone is acceptable
- Abbreviated company name if commonly recognized by customers
- Combined elements (e.g., "This is Sarah from Tech Support, how can I help?")
FAIL CRITERIA:
- Any of the four key elements is missing
- Unprofessional language, slang, or inappropriate tone
- Generic greeting without company/agent identification
- No clear offer of assistance
Call Closing Evaluation¶
Poor Prompt Examples (Low Precision/Recall)¶
Example 1: "Check if the agent ended the call nicely."
Example 2: "Did the agent close the call professionally and make sure the customer was satisfied?"
Example 3: "Evaluate whether the call conclusion was appropriate."
Why these fail:
- "Nicely" and "appropriately" are subjective measures
- Combines multiple criteria without specific definitions
- No clear framework for what constitutes professional closure
Robust Prompt (High Precision/Recall)¶
CONTEXT: Evaluate whether the agent provided a complete and professional closing that ensures customer satisfaction and clear resolution.
PASS CRITERIA:
Look for at least 3 of these 4 elements:
- Issue resolution summary or confirmation of next steps taken
- Satisfaction verification (e.g., "Does that resolve your concern?", "Is there anything else I can help with?")
- Appreciation statement (e.g., "Thank you for calling", "I appreciate your patience", "Thank you for choosing us")
- Professional sign-off (e.g., "Have a great day", "Take care", company-specific closing phrase)
Acceptable variations:
- Elements may be naturally integrated into the conversation flow
- Customer satisfaction can be implied if the customer explicitly expresses satisfaction first
- Concise closings are acceptable when issue resolution is straightforward
- Personal touches that maintain professionalism (e.g., referencing the customer's name)
FAIL CRITERIA:
- Fewer than 3 required elements present
- Leaving the customer with unresolved questions or unclear next steps
- Abrupt disconnection without closure attempt
- Unprofessional final statements or dismissive tone
- Failure to confirm customer understanding when complex solutions are provided
Quality Assurance Checklist
Before deploying your prompt:
- Measurable Criteria: Can each element be objectively identified?
- Complete Coverage: Are all success/failure scenarios addressed?
- Unambiguous Language: Would different evaluators reach consistent conclusions?
- Realistic Expectations: Are standards achievable for your agent population?
- Clear Boundaries: Is the distinction between pass and fail evident?
- Consistent Scoring: Does it align with your overall evaluation framework?
Common Pitfalls to Avoid
-
Vague Descriptors: "Professional manner" vs. "Uses courteous language and acknowledges customer concerns".
-
Subjective Judgments: "Friendly tone" vs. "Uses positive language markers and avoids negative phrasing".
-
Compound Criteria: Mixing multiple evaluation points in a single statement without clear weighting.
-
Cultural Assumptions: Assuming universal communication styles or expressions.
-
Perfectionist Standards: Setting criteria that exclude natural conversation variations.
-
Missing Specificity: Not defining what counts as successful completion of each element.
-
Implicit Requirements: Assuming evaluators understand unstated expectations.
-
Binary Oversimplification: Not accounting for partial completion or contextual appropriateness.
Key Success Factors
Specificity Over Generality¶
Replace broad concepts with concrete, observable behaviours:
- Instead of: "Agent was helpful".
- Use: "Agent acknowledged the customer's concern and provided specific action steps".
Observable Actions Over Intentions¶
Focus on what can be measured in conversation:
- Instead of: "Agent showed empathy".
- Use: "Agent used acknowledgement phrases, such as 'I understand' or 'That must be frustrating’.
Inclusive Criteria Design¶
Account for natural conversation variations:
- Allow multiple ways to meet the same requirement
- Define acceptable alternatives upfront
- Consider different communication styles while maintaining standards
Clear Failure Definition¶
Be explicit about what constitutes non-compliance:
- Define both missing elements and inadequate attempts
- Specify unacceptable alternatives
- Address common failure modes directly
Note: Effective prompts balance specificity with flexibility, ensuring consistent evaluation while accommodating the natural variations inherent in human communication.