Validator Guidelines
How to evaluate utterances
Last updated
How to evaluate utterances
Last updated
The Process
There are four fundamental questions which Validators need to ask when validating a given utterance:
Is the relation correct?
Is the utterance relevant to the domain and the topic subject?
Is the utterance unique in terms of sentence structure?
Is the utterance natural-sounding?
The questions are hierarchical, in order of importance. For example, making sure the utterance is in the correct category (general, specific, entailment) takes precedence over whether it sounds natural. Thus, an awkward sounding sentence with correct relation is acceptable, while a perfectly worded sentence with incorrect relation is not. The evaluation rubric can be expressed as a flowchart like this:
Relation issues (1)
Utterances with incorrect relation (i.e. under the wrong category) should be rejected. For example, putting utterances which are more specific than the topic subject under the General category. This criterion is the most crucial out of the four. Note: Some utterances may fall under both specific and entailment categories.
Relevancy issues (2)
Utterances which are irrelevant to the topic subject should be rejected. By “irrelevant”, there are two main possibilities: Incorrect intent and insufficient context. Here's an example of incorrect intent:
And here's an example of insufficient context:
Finally, here is an example of how a builder might misinterpret the topic subject or domain and submit an utterance that doesn't fit at all.
Pattern diversity issues (3)
Pattern diversity has to do with the structure of a sentence (syntax). What we're looking for is a wide variety of sentence patterns that convey the same meaning. Consider:
Do you accept credit cards?
Can I pay with a credit card?
These two sentences exhibit pattern diversity. If the builder just replaces an article, pronoun or object, but otherwise leaves the sentence structure the same, then it doesn't teach the AI anything new and should be rejected. Consider:
Do you have clothes for boys?
Do you have clothes for girls?
The two sentences are identical except for a different object. Our AI will see these constructions as functionally equivalent and thus won't learn anything new. Utterances without pattern diversity should thus be rejected. Only the first submitted utterance of a given sentence pattern will be accepted based on the principle of first come, first served.
Naturalness issues (4)
Utterances with grammatical problems, spelling errors, and other “unnatural” qualities should be rejected. Exceptions include chatspeak, or the informal language you would expect in conversations on social media platforms. Note that this criterion has the lowest priority.
Summary Chart
To summarize, validators need to review each utterance based on the following rubric: