Page cover image

Validator Guidelines

How to evaluate utterances

The Process

There are four fundamental questions which Validators need to ask when validating a given utterance:

  1. Is the relation correct?

  2. Is the utterance relevant to the domain and the topic subject?

  3. Is the utterance unique in terms of sentence structure?

  4. Is the utterance natural-sounding?

The questions are hierarchical, in order of importance. For example, making sure the utterance is in the correct category (general, specific, entailment) takes precedence over whether it sounds natural. Thus, an awkward sounding sentence with correct relation is acceptable, while a perfectly worded sentence with incorrect relation is not. The evaluation rubric can be expressed as a flowchart like this:

Validation Flowchart

Relation issues (1)

Utterances with incorrect relation (i.e. under the wrong category) should be rejected. For example, putting utterances which are more specific than the topic subject under the General category. This criterion is the most crucial out of the four. Note: Some utterances may fall under both specific and entailment categories.

General Utterances

Relevancy issues (2)

Utterances which are irrelevant to the topic subject should be rejected. By “irrelevant”, there are two main possibilities: Incorrect intent and insufficient context. Here's an example of incorrect intent:

incorrect intent

And here's an example of insufficient context:

Insufficient Context

Finally, here is an example of how a builder might misinterpret the topic subject or domain and submit an utterance that doesn't fit at all.

Misinterpreted subject

Pattern diversity issues (3)

Pattern diversity has to do with the structure of a sentence (syntax). What we're looking for is a wide variety of sentence patterns that convey the same meaning. Consider:

  • Do you accept credit cards?

  • Can I pay with a credit card?

These two sentences exhibit pattern diversity. If the builder just replaces an article, pronoun or object, but otherwise leaves the sentence structure the same, then it doesn't teach the AI anything new and should be rejected. Consider:

  • Do you have clothes for boys?

  • Do you have clothes for girls?

The two sentences are identical except for a different object. Our AI will see these constructions as functionally equivalent and thus won't learn anything new. Utterances without pattern diversity should thus be rejected. Only the first submitted utterance of a given sentence pattern will be accepted based on the principle of first come, first served.

Pattern Diversity

Naturalness issues (4)

Utterances with grammatical problems, spelling errors, and other “unnatural” qualities should be rejected. Exceptions include chatspeak, or the informal language you would expect in conversations on social media platforms. Note that this criterion has the lowest priority.

Naturalness Issues

Summary Chart

To summarize, validators need to review each utterance based on the following rubric:

Summary Chart

Last updated