🎓
Synesis Academy
  • 🎓Synesis Academy
    • What is AI?
    • What is Synesis One?
    • What is train2earn?
  • 🚀Getting Started
    • Create a wallet
    • Connect to Synesis
    • Buy Kanon
    • Buy SNS
  • 🤖Train2Earn
    • Builder Role
      • Builder Guidelines
        • Create Quality Utterances
          • Campaign Status
    • Validator Role
      • Validator Guidelines
  • 💰Building Wealth
    • Train2Earn Income
    • Passive Income
  • 📖Glossary
Powered by GitBook
On this page
  1. Train2Earn
  2. Builder Role
  3. Builder Guidelines

Create Quality Utterances

PreviousBuilder GuidelinesNextCampaign Status

Last updated 1 year ago

Good quality utterances may have two characteristics: Typicality and Pattern Diversity.

Typicality

Typicality is fulfilled when certain utterances cannot be interpreted other than within the context of the topic subject example. In other words, when builders create utterances they should consider whether their submissions are relevant and sound natural. An utterance is considered relevant when it clearly relates to the topic subject. Also, it is considered natural when a native speaker does not feel any awkwardness in accepting it.

Note that what is ‘natural’ to each language user is different. While social and regional varieties are equally valid, we focus on collecting utterances that are in line with the most common linguistic standards.

Remember that all topic subjects are assigned to specific domains. Thus, whether an utterance is relevant and natural will be judged within the context of each related domain. Builders need to take into account the particular characteristics of the domains and natural language expressions when creating utterances.

Pattern Diversity

We say utterances have Pattern Diversity when they have varied syntactic patterns. Pattern Diversity is an important standard of good quality data since it means the utterances cover many different ways of expressing the same intent. Good data have pattern diversity without duplicates. That is, utterances with pattern diversity DO NOT include changes in one word or various types of specific entities. The example of bad pattern diversity shows no diversity in the sentence structure, but only swapping of words.

Types of Bad data Boolean and multiple intents

Conversational AI (CAI) understands natural language in standardized forms. Despite the diverse form of sentences CAI values the core meaning of the sentence. In computer science, Boolean is a data type that has two possible values. In natural language, combinations of sentences are mostly considered Boolean. For CAI, sentences including booleans tend to cause confusion.

Examples

Is it safe and effective to use a sheet mask twice?

Is it safe to use a sheet mask twice?

Is it effective to use a sheet mask twice?

I need to have an eye check up, it's been hurting since I used the sheet mask

I need to have an eye check up

Eyes have been hurting since I used the sheet mask

There is a 50% chance of CAI to understand one of two sentences separated (shown above). Therefore, it’s suggested to include one intent per sentence.

Sentence Length

Builders and validators need to reflect carefully on length of utterances. Relatively long utterances warrant closer review unless they are necessary to express the intent due to adjunct (optional) elements. On the other hand, average-length utterances (3-8 tokens) promote efficient classification. Typically, longer utterances tend to have semantically redundant components. Builders and validators are advised to pay close attention to the naturalness of the given utterances.

Submitting average length sentences (consisting of 3 to 8 words) is recommended but not a strict rule. The ultimate goal of Synesis train2earn is to crowdsource a wide range of natural utterances that conversational AI can encounter across different domains and subjects.

Punctuation & Capitalization

All caps and punctuation are going to be ignored when it comes to AI understanding. The Hobbits were taken to Isengard. The hobbits were taken to isengard. The Hobbits were taken to Isengard!

These sentences are the same. Varying spelling and punctuation in order to generate new utterances will be rejected by the validators.

🤖
Page cover image