This section contains the sentences used in the MOS and Identification Tests.
Click in the grid of buttons below to play the audio.
Colors (ranging from red=bad to green=good) encode the scores
which are also shown as text in the button labels.
To switch between the results of the two tests (MOS/ID) click this big button:
Synth | We use the source TTS voice to synthesize the word and insert it in context. |
CUTE | Based on our framework, we use CUTE (not VoCo) for the voice conversion. |
Auto | The VoCo method using pre-defined α and β values in range selection. |
Choose | We manually choose one synthesis from several alternatives (up to 16), if it improves on Auto above. |
Edit | We use the editing interface to further refine the synthesis, if it improves on Auto/Choose. |
Real | The actual human recording, without modification. |
Male 1 (DBL)
| Male 2 (RMS)
| Female 1 (CLB)
| Female 2 (SLT)
|
This section addresses the question of whether audio experts could do as well
as VoCo using conventional audio editing software.
It contains just four sentences, and results are compared by an MOS test.
In each sentence a word is replaced with a different word by a generic TTS,
two experts using an audio editing software, and by VoCo.
word | Synth | Expert1 | Expert2 | VoCo |
mentioned | ||||
director | ||||
benefit | ||||
television |