Академический Документы
Профессиональный Документы
Культура Документы
(Test-to-Speech)
(Synthesis units)
12224 2690
1.
1.1.
waveform- concatenation
synthesis units
(phoneme)(di-phone)(syllable)
TTS
(Text analysis)
Part-of-Speech, POS
(Prosody
prediction)Duration
Energy
(Pitch)
()TTS
()
fading-infading-out
rule-based [10] bigram
CART
(Text)
(prosody modification)
(Speech )
&&
1.2.
[1]
TTS(Text-to-Speech)
(treebank) probabilistic context-free grammar(
PCFG)
PCFG [5]
[15]
PCFG PCFG
Bottom-up Cocke-Younger-KasamiCYK[1][7]
CYK
1.3.
TTS
(Sentence S)(Prosodic phrase PP)(Prosodic
word PW)(word)
4
(word)
Prosodic Word PW
Prosodic Phrase PP
Sentence
-
minor break
major break)
1.4.
[6] (word-based)
word-based
corpus-based
word-based corpus-based
12224 2690
409 409*5
1300
2.
CART
2.1.
2.1
1,923 6,250
51,525
EX:
Yes
No
?
Yes
Yes
No
?
Yes
No
No
CART
A B
B
(no break)(minor break)
(major break) CART CART
CART
2.3.
Input sentence
POS Tagger
POS sequence
Parser
Syntax tree
Text analysis
Prosodic hierarchy
[13]
(prosodic hierarchy)
(Text analysis)
S
NP
Nac
NP
NP
DE
Ndabe
VP
VH
NP
NP
VP
DE
Caa
VC31
NP
Nab
VP
Dc
VC1
VP
Baa
Vj3
- 1
VC1
NP
NP
Nad
Nab
Neqa
2.3.1.
(prosodic
word)(prosodic phrase)(
)()
()
()
()
1-3
4-8
9-12
13-18
19-22
23-30
31-39
40-41
>42
PW
PP
12
13
14
15
16
16
2.3.2.
(bottom-up)
S
NP
VH
NP
NP
VP
VP
VP
Caa
VC1
VP
Nab
DE
VP
Daa
Vj3
NP
NP() VH()
VC1() VP()
Daa()Vj3()NP()
Daa() Vj3()
NP
VP
VP
VP
-
()()
9
15
16
[16]
- (Bottom-up)
3.
rule-based
(Phone Sequence)
Yes
Rule1
No
()
Yes
Rule 2
(Was the syllable in a word which was recorded?)
No
()
Rule 3
(waveform concatenation)
10
( )
[11]
4.
TTS
[13]
overlapping
(ms)
Minor break
50
Major break
250
400
625
625
625
500
250
300
11
Prosodic Word
Overlapping
[8]
0.05
0.1
0.15
0.2
fading-out
fading-in
fading-out n
x1 , x 2 , x3 ,..., x n ( 1)( 2)
fading-in fading-out1
fading-out2fading-in3
fading-out
xi ( fading _ out ) xi *
x i ( fading _ in ) xi *
n i 1
n 1
i
n 1
i 1 ~ n
i 1 ~ n
( 1)
( 2)
5.
5.1.
Version 2.1
54,902 %
51939
inside test
[15]
12
(tree structure)(Label)(Bracket)
PARSEVAL[4]
[15]
SP(Structure Precision)
SP=
LPLabeled Precision
LRLabeled Recall
LFLabeled F-measure
LP * LR * 2
LF
LP LR
BPBracketed Precision
BRBracketed Recall
BFBracketed F-measure
BP * BR * 2
BF
BP BR
(%)
SP
LP
LR
LF
BP
BR
BF
38.78
61.96
64.31
63.11
70.04
72.80
71.39
70.04%
72.80%(parse tree)
13
5.2.
confusion matrix
Predicted labels
True
labels
B0
B1
B2
B0
C 00
C01
C02
B1
C10
C11
C12
B2
C20
C21
C22
B i ( i 0 ,1 , 2 ) () B 0
(no break) B 1 (minor break) B 2
(major break) C ii ( i =1 ,2 ,3 )
C ij (i ,j = 1 ,2 ,3 ; ij) B i B j
(Recall)
Rec i Cii
C
j 0
ij
(i = 0, 1, 2)
B0 (no break)
C
Rec0 00
(Precision)
Pre i Cii
C
j 0
ji
(i = 0, 1, 2)
B0 (no break)
C
Pre0 00
(Accuracy)
2
Acc=
C ii
i 0
i 0
j 0
ij
Acc
(C00 C11 C22 ) (C00 C01 C02 C10 C11 C12 C20 C21 C22 )
14
CART
CART
Acc1 Acc2 B1 B2
- CART
True labels
Predicted labels
B0
B1
B2
B0
30,434
3,198
126
B1
5,758
6,810
372
B2
635
1,381
514
Acc10.767
Pre00.826
Pre10.598
Pre20.508
Acc20.791
Rec00.902
Rec10.526
Rec20.203
- Bottom-Up
True labels
Predicted labels
B0
B1
B2
B0
156090
7384
4502
B1
54390
5471
5062
B2
7854
1828
2911
Acc10.669
Pre00.715
Pre10.372
Pre20.233
Acc20.698
Rec00.929
Rec10.084
Rec20.231
5.3.
(naturalness)preference testing
(intelligibility) MOSMean Opinion Score
5 excellent
4 good
3 fair
2 poor
1
unsatisfactory
8 6 2 MOS
paragraphs 20 15 25
Prosodic Word
15
( MOS)
( MOS)
M01
4.05
0.497
0.474
M02
3.15
0.963
3.15
0.792
M03
3.3
0.714
3.3
0.640
M04
4.385
0.504
4.67
0.181
M05
3.55
0.668
3.65
0.852
M06
3.9
0.538
0.632
M07
4.2
0.748
0.707
M08
2.95
0.804
2.95
0.804
3.68
3.715
M01
40
60
M02
50
50
M03
55
45
M04
50
50
M05
70
30
M06
70
30
M07
50
50
M08
55
45
55
45
MOS
CART
CART MOS [3]
5
4.5
4
3
2
1
16
MOS
MOS CART
CART
CART
MOS
CART
3.45
3.65
3.55
3.35
3.5
3.9
4.2
4.1
3.71
3.48
3.9
3.45
3.95
3.45
4.18
3.88
4.25
3.82
CART
4.66
4.66
4.66
4.66
3.66
4.42
4.33
4.83
4.52
6.
97.2%
http://140.120.15.239/onlineTTS/cgitest.html
() or ()
[1]Aho, A. V. and Ullman, J. D., "The Theory of Parsing, Translation, and Compiling ",1972, Vol. 1,
Prentice-Hall, Englewood Cliffs, NJ.
[2]Breiman L, Friedman J. H., Olshen R. A., et al, "Classification and Regression Trees", Wadsworth,
Inc, 1984.
[3]Bao H., Wang A., Lu S., "A Study of Evaluation Method for Synthetic Mandarin Speech",
Proceedings of ISCSLP 2002, PP:383-386, Taipei, Taiwan.
[4]Charniak, E., "Treebank Grammars",
17
[6] Chu M., Peng H., Yang H. Y. and Chang E., " Selecting Non-Uniform Units from A Very Large
Corpus for Concatenative Speech Synthesizer ", Proceedings of ICASSP 2001, IEEE, Volume 2,
pp.785 - 788, Salt Lake City.
[7]Ney, H. "Dynamic Programming Parsing for Context-Free Recognition", IEEE Transactions on
Signal Processing 1991, 39(2), 336-340.
[8] Hwang S. H. and Yei C. Y., "The Synthesis Unit Generation Algorithm for Mandarin TTS",
Proceedings of ICASSP 2002, IEEE, Volume 1, pp. 457 - 460, Orlando, Florida.
[9], "",
, 1998
[10], "", , 2001
[11], "", ,
2005
[12], "", , 2005
[13], "", , 2004
Caa
Cab
Cba
Cbb
Da
18
DE
DE
, , ,
Dfa
Dfb
Di
Dk
FW
Na
Nb
Nc
Ncd
Ncd
Nd
Nep
Ne
Neqa
Ne
Neqb
Ne
Nes
Ne
Neu
Ne
Nf
Ng
Ng
Nh
SHI
VA
VAC
VB
VC
VCL
VD
VE
VF
VG
VH
VHC
VI
VJ
VK
19
VL
V_2
20