FFTNET: A REAL-TIME SPEAKER-DEPENDENT NEURAL VOCODER

Zeyu Jin, Adam Finkelstein, Gautham J. Mysore and Jingwan Lu

This paper will appear at ICASSP 2018.
This page contains the sentences used in the MOS test described in Section 3.
Click in the grid of buttons below to play the audio.
Colors (ranging from red=bad to green=good) encode the scores which are also shown as text in the button labels.

female2(SLT)

MLSAWNFFTWN+FFT+REAL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

male1(BDL)

MLSAWNFFTWN+FFT+REAL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

male2(RMS)

MLSAWNFFTWN+FFT+REAL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

female1(CLB)

MLSAWNFFTWN+FFT+REAL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50