Travel VA: Using Patterns for Intents & Entities¶
Using patterns can help to improve NLP interpreter accuracy.
In this document, we will elaborate on the various pattern syntax and how they can be used in intent detection and entity extraction.
Important
- Patterns are to be used as a last resort, only for cases where the ML engine cannot be used. Examples of such cases would be to train the VA in recognizing idiomatic utterances, command like utterances.
- Patterns are evaluated in the order of their listing. Once a match is found the rest of the patterns are not evaluated. So ensure when adding patterns to add in the order of most restrictive to least restrictive.
- Only one wildcard (*) is allowed in a pattern.
- While most of the features are supported in all languages, there are some exceptions, see here for more details.
Pattern Creation Guidelines¶
The following are some general guideline for creating intent patterns:
- Use a minimum of 3 words.
- Use words in their canonical forms (i.e. infinitive verbs, singular nouns).
- Use lowercase both for words and their synonyms.
- Use the US spelling of words (i.e. normalize instead of normalize).
- Avoid using determiners and pronouns (the, a, my, that).
- Avoid using digits.
- Avoid using entity values in defining a task pattern.
- Don’t use elision (i.e. what’s ).
- Don’t use special characters such as () & / $ [ ] + *.
- Don’t use punctuation such as – , . ! ? ‘ “.
Patterns for Intent Detection¶
Following is a list of pattern syntax, along with examples, that can be configured for intent detection.
PATTERN | DESCRIPTION | PATTERN EXAMPLES | ||||||
word1 word2 … wordn | This mandates all the words defined to be available in the user utterance in the same consecutive order with upto 3 (language specific) additional words allowed between any two consecutive words mentioned in the pattern and infinite number of words before and after those specified set of words. |
|
||||||
word1_word2 | Enforce phrase, no additional words allowed in between word1 and word2. This is to ensure a sequence of tokens are read as a phrase. Usage restricted to words, concepts not allowed. Note: There should be no space between the word1, word2 and _. Also be aware that “_word1” is to ensure that the word1 in the user utterance is not marked as Used Up by the Platform and is to be considered for entity extraction. This is useful when entity words are used in the intent pattern. |
|
||||||
word1 * word2 | 0 to infinite number of additional words between the specified words/phrases |
|
||||||
word1 *n word2 | Exactly n number of additional words between the specified words/phrases |
|
||||||
word1 *0 word2 | To disable wildcards between two tokens. Similar to the underscore between two words but can be used between two concepts or within [ ], { } groups. (available 7.1 onwards) |
|
||||||
word1 < word2 | Indicates that the match for word2 should start from the beginning of a sentence. It is useful especially when the word2 appears in the middle of the utterance. Add a space after the angular bracket |
|
||||||
word1 > word2 | Indicates the end of the sentence and no words are allowed after it. Add a space before closing the angular bracket |
|
||||||
!abc | Indicates the word/concept “abc” should not exist anywhere in the user utterance after this token No space between ! and word/concept |
|
||||||
!!abc | The very next word/concept should not be “abc” No space between !! and word/concept |
|
||||||
[ … ] | Used to define a group of words/concepts and the match should be against exactly one of the groups declared in [ ]. Be aware that when a match is found the rest of the group is ignored, so order the words accordingly. Note: the parentheses should not be clubbed with the word, i.e maintain a space between the parenthesis and the adjacent word. Due to the difficulty in maintaining and tracking, it is recommended you use concept instead of this pattern. This pattern also has a detrimental effect on the VA’s performance. |
|
||||||
{ … } | Used to define an optional group or words/concepts and the match would be against zero or one of the words/patterns declared in { }. Be aware that when a match is found, the rest of the group is ignored, so order the words accordingly. Note: the parentheses should not be clubbed with the word, i.e maintain a space between the parenthesis and the adjacent word. |
|
||||||
( … ) | Contain a pattern – i.e when a pattern or part of a pattern is enclosed in these parentheses, we treat it as a pattern unlike [ ] and { }. This is the default setting i.e. when a pattern _word1 word2_ it is treated as _( word1 word2 )_ Commonly used explicitly to define sub pattern inside [ ] or { } |
|
||||||
<< … >> | Used to find words in any order Due to the risk of running into false positives, you are advised not to use this pattern. |
|
||||||
word1 | If you quote words or use words that are not in canonical form, the system will restrict itself to what you used in the pattern |
|
||||||
word1~concept2 **~concept1~concept2** (from ver8.0) | A word (word1) or concept (concept1) can be matched only if it is also a member of another concept (concept2). The most common usage of this is through the system concepts that are dynamically added for each POS tag. |
|
Pattern Operators¶
- AND: ( X Y ): An ordered relationship of words in sequence. This is the default setting. i.e. when you specify a pattern as cancel order it is the same as (cancel order).
For example, (Cancel Booking) matches Cancel my flight booking but doesn’t match I have a pending booking for flight can I cancel?. The XO Platform uses patterns with increasing numbers of wildcards between words (up to 3 for an intent). So a pattern of Cancel Order can match:
- cancel order
- cancel my order
- cancel that last order
- cancel last weeks big order
- OR: [X Y Z]: Any of these can be interchangeably used in the user utterance. For example, ([get make] me [food drink dessert]) will match any of the below utterances:
- Get me food
- Make me a drink
- Get me a drink
- Get me a dessert
- Make me some quick food
- NOT: !X: Words that should not appear in the user utterance for an intent match. For example, (!forecast) is marked as a pattern for an intent named Get current weather and the assistant supports another intent called Get 3-day weather forecast.
- will not match Get current weather
- will match Get 3-day weather forecast
Note
!word
means not after this point. So (!forecast the weather) and (get the weather !forecast) are different. The utterance Get the forecast for the weather matches the second but not the first.
- User utterance: Planning a trip to California get me the forecast.
- Optional: {X}: For example, {phone} If the user utterance is Get me a phone number or get me a number the Platform will treat it equally.
- Enforce Phrase: X_Y: To enforce occurrence of the phrase as is in the user utterance, without any words in between. For example, check_in. The utterance check in or I want to check in will match but not Can you check me in for my flight?
- Concepts: ~: The Platform has a large set of inbuilt concepts that developers can use to define a pattern. For example, (I [like love] ~world_country) will match.
- I like India
- I love traveling to Australia
- I would like to visit an African country
- Unordered: <<, >>: Used to find words in any order. For example, <
> matches Cancel my flight booking and also I have a pending booking for a flight, can I cancel - Start/End of Statement: <, >: For example, ( check in > ) will match I want to check in, but will not match I want to check in now..
- Quote: ‘ –: If you quote words or use words that are not in canonical form, the system will restrict itself to what you used in the pattern. For example, (like to book a flight) This matches I would like to book a flight but not I really liked how easy it was to book a flight on your app.
Negative Patterns¶
Negative Patterns can be used to eliminate intents detected in the presence of a phrase. This will help filter the matched intents for false positives.
User Utterance: “I was booking a flight when I got network failure error”
Intent Detected: Book a flight
Intended Intent: Issue Resolution
Add a Negative Pattern (network failure) (error) (technical issue) for the intent Book a Flight.
User Utterance: “I was booking a flight when I got network failure error”, or “I was booking a flight when I faced a technical issue”, or “I got an error while booking a flight.”
Intent Rejected: Book a Flight
Intent Triggered: Issue Resolution
Patterns for Entity Extraction¶
Patterns can be used to identify the values for entities in user utterance based upon their position and occurrence in user utterance.
Intent patterns operators like {…}, […], !, ~concepts can be used for entity extraction. The following are some use cases where patterns can be applied.
Every entity pattern has to include a * (of some form) to represent where the Platform should look for an entity value.
Continuing with the Travel Planning Assistant example with the Book Flight intent. This intent needs two entities – DepartureCity and ArrivalCity. We will see how to achieve this.
Pattern 1: word1 * word2¶
This can be used as a positional wildcard that indicates the expected position of the entity.
Pattern for ArrivalCity entity: "to * from"
User Utterance: Book a flight to London..
Entity Extracted: ArrivalCity = London
User Utterance not resulting in entity extraction: “Book a flight for London”
Pattern 2: word1 *n¶
This can be used as a positional wildcard * that indicates the expected position of the entity based upon the number of words after the specified word1. That is, n words after the word1 are to be considered for the entity, if n words are not present then look for the next occurence of word1.
Pattern for DepartureCity entity: "from *2"
User Utterance: "Book flight from Paris."
Entity Extracted: DepartureCity = Paris
User Utterance not resulting in entity extraction: “Book flight flying from Paris towards London”
Extension to Pattern 2: word1 *~n¶
Similar to above (pattern 2) but extracts up to n number, if that number of words are available.
Note
Entities need to extract something so *~1 is really the same as *1.
Pattern 3: a combination of word1 * word2 and word3 *n¶
This can be used as a combination of patterns for the likely location in the user utterance that the entity value could be found and the number of words contributing to the entity.
Pattern for ArrivalCity entity: “to * from” and “from to *1”
Pattern for DepartureCity entity: “from * to” and “to from *2”
User Utterance: "Book flight to London from Paris." or Book flight from Paris to London.
Entity Extracted: ArrivalCity = London and DepartureCity= Paris
User Utterance not resulting in entity extraction: “book flight for London from Paris."
Pattern 4: [ word1 word2 ] *¶
This can be for patterns using a group of words or concepts of which at least one should be present in the utterance. The order within the group is important (see above in intent detection .
Pattern for ArrivalCity entity: “to * [ from ]” and “[ from ] to *1”
Pattern for DepartureCity entity: “[ from ] * to” and “to [ from ] *”
User Utterance: Book flight to London from Paris or Book flight from London to Paris.
Entity Extracted: ArrivalCity = London, DepartureCity = Paris
User Utterance not resulting in entity extraction: “Book flight for London from Paris.
Pattern 5: ~CustomConcept *¶
This can be for using concepts. You can create your own custom concepts and use them to define patterns.
Pattern for ArrivalCity entity: “to * from” and “from to *”
Pattern for DepartureCity entity: “~in * to” and “to ~in *”
Custom Concept: ~in – (from)
User Utterance: Book flight landing in London and taking off from Paris. or Book flight with takeoff from Paris and landing in London
Entity Extracted: ArrivalCity = London, DepartureCity = Paris
User Utterance not resulting in entity extraction: “book flight landing in London taking off in Paris.“
Pattern 6: ~intent¶
Useful in entity patterns and custom entities
Words that are used in the intent identification are dynamically marked with the ~intent concept. This can then be used as an anchor or reference point for some entity patterns.
Sample Pattern: “intenttrip~plural“
User Utterance not resulting in entity extraction: show my trips.
User Utterance might mark the entity: “Show me the Trip of a Lifetime in-flight magazine.“
Pattern 7: $currentEntity¶
Useful in delaying the evaluation of a pattern until the entity is actually processed. Normally entity patterns are evaluated when a dialog starts and on new input to see if any words need to be protected until that entity is processed. This might not always be desirable, especially for strings.
Pattern: “$currentEntity=TaskTitle ‘called *“
The above rule will result in evaluating the pattern when the dialog flow has reached the TaskTitle node.