Technology

Technology

Fluent.ai is a leader in speech understanding and voice user interface solutions.

How we do it?

Based on over nine years of research in machine learning and artificial intelligence, and multiple families of issued patents, Fluent.ai’s technology is unique and unmatched.

Conventional speech understanding solutions operate in two distinct steps, first interpreting speech into text in a target language and then applying natural language processing to the text to determine the user’s intent. This approach involves large data collection and labeling efforts and requires a large amount of computing power to develop models in a single language. This approach also involves a number of disjointed modules, such as the acoustic model and a language model to map input speech to a string of words. These modules are not optimized together and hence do not provide optimal speech recognition performance. This becomes particularly evident in environments with noise or with variability in speaker accents.

[twenty20 offset=”0.5″ img1=”5220″ img2=”5221″]

Fluent.ai’s speech-to-intent technology employs unique neural network algorithms to directly map the incoming speech of a user to their intended action without the need to perform speech to text transcription. During training, Fluent.ai technology learns by directly associating semantic representations of a speaker’s intended actions with the spoken utterances. In a way, our models are based on the concept of vocabulary and language acquisition in humans. Unlike conventional automatic speech recognition (ASR), Fluent.ai technology does not require phonetic transcription. Our text-independent approach enables the development of speech understanding models that can learn to recognize a new language from a small amount of data, and allows the end-users to interact with the devices in a language of their choice. The user does not need to conform to any preset phrases and is free to choose words of their preference.

Competitive Advantages

Lightweight and Faster

Highly Accurate

Supports Any Language

Allows for Multiple Concurrent Languages

Requires a Small Fraction of the Typical Training Data

Better Performance in Noisy Environments

Leading Speech to Text Providers

Speech to Intent

A

B

C

D

Comparision

Accuracy

  • A
    50%
  • B
    75%
  • C
    50%
  • D
    50%
  • Fluent.ai
    100%

Noise Robustness

  • A
    50%
  • B
    50%
  • C
    50%
  • D
    50%
  • Fluent.ai
    100%

Improvements with user feedback

  • A
    N/A
  • B
    N/A
  • C
    N/A
  • D
    N/A
  • Fluent.ai
    100%

Offline Performance

  • A
    50%
  • B
    N/A
  • C
    50%
  • D
    N/A
  • Fluent.ai
    100%

Recognition Speed

  • A
    25%
  • B
    50%
  • C
    50%
  • D
    25%
  • Fluent.ai
    100%

Customizability

  • A
    N/A
  • B
    N/A
  • C
    N/A
  • D
    N/A
  • Fluent.ai
    100%

Searching the Internet

  • A
    75%
  • B
    50%
  • C
    25%
  • D
    50%
  • Fluent.ai
    25%

Size of Typical Training Data

  • A
    +10,000 hrs
  • B
    +10,000 hrs
  • C
    +10,000 hrs
  • D
    +10,000 hrs
  • Fluent.ai
    <10 hrs

Speed to Launch New Languages/ Accents

  • A
    25%
  • B
    25%
  • C
    25%
  • D
    25%
  • Fluent.ai
    100%

Ability to Handle Mix of Language

  • A
    25%
  • B
    25%
  • C
    25%
  • D
    75%
  • Fluent.ai
    100%
Wave Wave

Research

DONUT: CTC-based Query-by-Example Keyword Spotting

Authors:
Loren Lugosch, Samuel Myer, Vikrant Tomar
Conference:
NeurIPS 2018 Workshop

Tone Recognition Using Lifters and CTC

Authors:
Loren Lugosch, Vikrant Tomar
Conference:
Interspeech 2018

Efficient keyword spotting using time delay neural networks

Authors:
Samuel Myer, Vikrant Tomar
Conference:
Interspeech 2018

Enhance your devices with Fluent.ai's
offline, robust and multilingual voice AI engine