Automatic Speech Recognition:

A Shifted Role in Early Speech

Ari Septayuda
Arnold Nicholas
Fadhlan Putra
Harry Andiko Pratama
Erkki Svante Nyfors
Discussion The Challenges of Feedback

The provision of meaningful and consistent feedback to children is

a key priority.
Discussion The Challenges of Feedback

Analyses even from a decade ago identified that ASR technology is

not able to provide corrective analytic feedback and can only
provide evaluative feedback.
Corrective= analyze and semantically interpret degrees of deviation
between elicited speech and a given speech target.
Evaluatif= indicating whether a given speech target was reached or not.
For an ASR module operating in a speech intervention context, if it
is unable to produce a lexicographic candidate for a given speech
input, then the system must decide whether this error is an
instance of speech error due to poor speech or a recognition error
due to poor system performance.
Discussion The Challenges of Feedback

Researchers have determined that systems rely on abstract

visualizations as feedback do not seem to work well for children.
Neri et al. have identified a major problem with providing
comparable waveforms, a popular form of feedback (e.g., Speech
Viewer II), to the user. Although showing target and input
waveforms in alignment can be motivating for the user (i.e., to try
to emulate the target waveform by modifying their
pronunciation), it does not necessarily lead to behavior
modification (i.e., correction of articulation).
Discussion Shifting from Analysis to
Elicitation Motivation

A key observation here is that incorporating ASR can make the

computer system responsive to speech even if it does not provide
detailed feedback.
In the context of speech intervention, even rudimentary feedback
can be of value, since it can motivate children to try multiple
repetitions of words and phrases.
This approach has been used in a remarkable study by Mitra et al.,
in the context of accent reduction, which found that even
rudimentary feedback was helpful.
Discussion Shifting from Analysis to
Elicitation Motivation (Cont.)

Scientific researchers in this domain may quickly conclude that

the need for high-quality corrective analytic feedback clearly
motivates the need for further work into automated speech
analysis. And such work is ongoing. For instance, efforts in the
area of acoustic training, which entails to the process of recording
representative speech samples from a user to create or to
augment an acoustic database.
Discussion Shifting from Analysis to
Elicitation Motivation (Cont.)

A point that needs mention is that the findings discussed here is

based on an assumption about the lexical unit being short and the
language having a relative low ratio of morphemes to words (e.g.,
as in English); we expect the results to generalize to other
moderately analytic languages, but may not generalize more
broadly, for example to synthetic languages (e.g., Greek).