Вы находитесь на странице: 1из 8

Text Annotation Guidelines for Hindi

ASR
Precautions Revised Waiting for revise
*

项目文档版本 变更时间 作者 变更记录 备注

Text Annotation Guidelines for


Hindi ASR

I. Task & UI

Cut a section of clear human speech from the audio and transcribe the audio into text.

explaination:

• Default cut: a piece of intercepted audio by default (gray area)

• Current cut: the audio that you cut (blue area)

• Audio classes:

○ Speech - clear human voice


○ Discard - audio does not meet ASR speech requirements

• Text box: where text is entered

• Video: helping to understand what the speaker means

II. Operation introduction


• Step 1. Listen to the intercepted audio

• Step 2. Select audio category

• Step 3-1. If choose discard classification,submitting this task directly. Do not care about transcription.

• Step 3-2. If choose speech classification, you need to determine whether to intercept the audio or not.
And then transcribe the audio.

• Keyboard shortcut:

○ Space bar - submit


○ 1 - continue to play where you left off
○ 2 - pause
○ 3 - play from the beginning of the audio
○ 5 - play from the beginning of the default cut (gray area)
○ a - play from the beginning of the current cut (blue area)
○ s - start cut
○ e - end cut
○ ctrl+q(win) /fn+q(mac) - discard
○ shift+alt(win) /shift+option(mac) - switch to text box

III. Annotation Guidelines


1. Audio classes
There are 2 options for audio classes: 【speech】 and 【discard】, here are the definitions:

a. speech:
i. You can select a speech part which is in Hindi, and the speech part is clear
ii. Only when you chose speech, you need to transcribe text from the audio

b. discard:
i. the entire audio is in non-Hindi language;

ii. the entire audio is unclear or non-audible speech;

iii. the entire audio is Non-human speech, which includes melodies, singing, animals' sounds
and nature sounds.

iv. When you chose discard, you don't need to do the transcription, just submit and do the next
sample

2. Cut speech
The system will give a piece of intercepted audio (gray area) by default. You need to cut the speech part
(blue area) inside the default part (gray area), and the following is how to cut the speech part:
Cut operation: If you want to select a clip of the audio, you need to move the cursor where you want to
start and click, then a vertical line appears where you just clicked. After that click the [start cut] button or
press the shortcut key [S] on the keyboard. And then move the cursor where you want to finish and click,
the other vertical line appears. After that click the [end cut] button or press the shortcut key [E] on the
keyboard, there will be a blue area on the track that's the clip you took. Click [play cut] button or press the
shortcut key [A] to determine if the selected range is the clip you want.
But these operations have to be in the [default cut] range.

a. The unclear part is that you can not know what the words should be, please cut them
out.
b. Do without thinking about the completeness of the sentence, while cutting the audio.
c. Do not select Overlapping speech (2 or more speakers talking simultaneously), there
are three methods to deal with different situations.
i. discard: If the entire audio is that 2 or more speakers are talking about different things
simultaneously, discard the whole audio.

ii. cut: If there is a part of the audio that two or more speakers are talking about different things
simultaneously, please cut this part out and keep the rest and clear part to transcribe.

iii. Keep and transcribe, do not cut and discard:

• If 2 or more speakers speak the same words simultaneously and the words sound clear,
you need intercept this part in and transcribe it;

• If 2 or more speakers are not talking at the same time, the audio should be regarded as a
normal speech case to transcribe it.

• If there is one main voice in a group conversation, the others are low or fuzzy, and the
sound articulation of the main speaker's speech does not be affected by others. So,
transcribe the main one, and regard others as background sound or noise.
d. Do not select music, melodies, singing, animal or natural sounds. There are 3
methods to match different situations.
○ If the entire audio is non-human voice, like music, melodies, singing, the sound of animal and
nature and so on, discard this audio.
○ If the background sound is a song with lyrics, cut this part out and reserve clear human speech
part or discard the entire audio if it's hard to cut the audio. ***BUT, if the background sound do not
affect the clarity of speaker's speech, transcribe the speaker's speech and ignore the background
sound.
○ If the background sound is melodies without lyrics, keep it and transcribe the entire audio.

e. Do not keep the part which is so noisy or unclear so that you can not hear clearly
what the speaker said. There are 3 methods to match different situations.
i. If the noise is full of the entire audio, discard.

ii. If the noise affects the content and it's a part of the entire audio. Cut this part out of audio.

iii. If the noise doesnot have an effect on the main speaker's speech, you can hear the speech
clearly. Keep it and transcribe. It is same as the rule be talked above.

f. The selected speech should start with (and end with) up to 2 modal words, if there
are many modal words at the beginning or the end, you need to cut the long modal
words part to be short.
▪ eg. There is a paragraph laughing (around 10 "ha") at the beginning of speech, it's enough to
keep a fraction of this part in audio (around 2 "ha ha" in audio ).

▪ **more transcription rules for modal words are shown in 3.f.

g. When the audio content is a conversation with pauses and noises in the middle, you
do not need to cut pause/noise part out, you can keep this part in audio. When you do
transcription, you can skip this pause/noise part.
▪ Example: "speech1 + pause/noise+speech2". Transcirption: speech1 and speech2.
(overlapping speech is not suitable for this)
Note: If the noise affects the content, intercept it, keep any section of Hindi audio, and transcribe the
Hindi part. If the noise does not affect the content, ignore it and transcribe the entire audio.

3. Text transcribe
a. Spaces are needed between words.
b. No punctuation is all right.
c. Hindi numbers should be transcribed into the word in Hindi
▪ e.g. 2 ->दो . English numbers are also frequently used in daily life, we should transcribe them
into Arabic numerals or forms that are wild accepted in Hindi, e.g. two->2 or टू . Making sure
that the word which the speaker said and pronounced can match up with the word that you
transcribed.

d. the word is half pronounced, most happens at the start or end of the audio,
sometimes in the middle.
○ If the half pronounced word is not a separate word at the beginning or the end of the audio, do not
transcribe it and cut it out. Eg1. The complete sentence is that "I wanted you to be the American".
But what you hear is "I wanted you to be the Ame(əˈme)", only "əˈme" pronounced and it's not a
word, so cut it out and transcribe the rest. The correct writing is "I wanted you to be the"
○ If the half pronounced word is a separate word at the beginning or the end of the audio, transcribe
the entire audio. Eg2. The complete sentence is "I want to go to the supermarket", but what you hear
is "I want to go to the super(ˈsuːpə)", only "ˈsuːpə" pronounced and it's a word, so transcribe the
entire audio without thinking about the meanings. The correct writing is "I want to go to the super".
○ If the half pronounced word is in the middle, whether or not it is a separate word, do not intercept
and handle in the following two situations:

▪ If the half pronounced word is not a separate word, transcribe the rest. Eg3. "do you stst still
love me", this two "st" is half pronounced for the word "still", do not write it down, the correct
writing is "do you still love me".

▪ If the half pronounced word is caused by stuttering and this word is a separate word, write it
down and transcribe the entire audio. Eg4. What you hear is "The whole super super
supermarket was ruined in a great fire." (Maybe it was someone who stuttered.) This two
"super"is half pronounced for the word "supermarket" and "super" is a word, so write "super"
twice. The correct writing is "The whole super super supermarket was ruined in a great fire."

e. Repeated words and sentences must be transcribed strictly according to the number
of times they get repeated.
f. Modal words need to be transcribed. eg. "ha ha", "hi", "yeap".
○ Transcribe modal words only when the modal word is clear, which means you can clearly know
the number of the modal words.

▪ eg. A period of laughter that we can not count the number of modal words, do not transcribe;
when speaking "ha ha" which is clear to count the number of it, you need to transcribe. For
repeat modal words, write down the same number of modal words in the audio. eg. 3 "ha" in
the audio, you need to write "ha haha" in the text.

g. The final intercepted audio must contain at least two words(≥2).


h. English on the audio
○ If there is a complete sentence at the beginning(the end), intercept the English sentence.

▪ For example: If you hear the sound: “Main kahnachahtahoonki I love you, Jenny." You
should intercept "I love you, Jenny." and only transcribe the Hindi part:मैं कहनाचाहताहंकि
ू .
○ If there are some English words in a Hindi sentence, and it's in the middle of this audio. Please
transcribe the English words into Hindi by Hindi spelling rules for loaning words and you do not
have to intercepted audio.

▪ For example: If you hear the sound: main fonkarrhahoon("I am calling" in Hindi), here "fon"
is "phone" from English, you should transcribe it into Hindi:फोन. So the sentence should be
"मैं फोनकररहाहँ "ू .

i. When the word is a homophone so that you can not decide to write down the correct
one, there are two methods
○ Listen to the following default cut to confirm what the whole sentence is, write down the correct
word by context.(attention: do not use the part which exceed the default cut as the reference)
○ Eg1. The current cut is "The hole (or other word that sounds like /həʊl/ but you can not confirm)
", but you can know the sentence is "The whole town disagreed with the mayor." from the following
default cut. Then you can make sure that the word is "whole" instead of ”hole“. So the right
transcription of this case is "the whole".
○ If there are multiple homophones whose meaning conforms to the meaning of the default cut
sentence( do not think about the entire audio meaning), you can write any word.
○ Eg2. The default cut is "where is my deer/dear." Both words match the meaning of the sentence,
you can write anyone.

j. While the speaker says the character in simplified form or spoken language in audio,
transcribing the corresponding form that speaker says.
k. Dialect(Other local languages words)
If you hear some words from other local languages, such as Panjabi Gujarati Marathi etc. that are
pronounced in slight different way from Hindi, you should transcribe the words the way of Hindi
spelling when you can understand the specific words.(This occasion happens quiet often as locals
often speak Hindi with some words of their own mother longue.)

▪ eg. When you hear the sound /dərvaja/ ("Door" in Gujarati)and you know this means the door,
you should transcribe it in standard Hindi : दरवाज़ा (/dərvaza/).
However, when you can not understand these dialect words or figure it out to transcribe these
words in Hindi spelling, choose discard and ignore the transcription part.

Note
1. Please double check and make sure that the text aligns with the audio before moving on to the next
section.

2. Transcribe what you hear, including ungrammaticalities.

3. Transcriptions must be 100% accurate to the cut speech part.


4. All symbols and numbers in the audio must be transcribed to Turkish words accordingly.

Modal words:
 Modals are to be transcribed as they sound like ah , hmm , oh, umm

Summary :

Transcribe only Hindi audios

 Singing is discard

 Other language clips like Punjabi , Marathi, gujrati, English ,Russian etc are also Discard

 No punctuations like fulstops (.) , Commas (,) Question marks(?) , Exclation marks (!) etc

 No Speaker Identification . Just right in one go

 Interception is one of the crucial parts and needs to be perfect

 Don’t put intercept between a word, Either include the word completely or exclude it
completely from the intercepts.
 Modal sounds like ah Umm Hmm etc are to be transcribed

 transcription must have at least 2 or more words. 1 word transcription doesnt count.

 Hindi + non Hindi clips. Skip those if they are confusing. If the Hindi part has 2 or more words
then intercept and transcribe.

 Difference between Deferment and discard

 Deferment : Skip and it doesnt count in your results. This means you are skipping a clip you
dont want to work on it
 Discard : You are telling the system that the particular audio is not to be transcribed because
either its singing, Non-Hindi . But this affects the results if done wrong
 Qc will check if the workers discarded right or wrong

 Follow rules of Devanagri script when transcribing. Anything not matching to those rules will
be considered a mistake.

Вам также может понравиться