Google Duplex convinced not only with its voice, but also with its emphasis and way of speaking. It was imperfection that gave rise to perfection. What if you take a human-like voice and design the emphasis so that it doesn’t match what the machine says? This creates an effect that can be associated with the Uncanny Valley effect. In this example, produced using SSML, Allison tells you something very sad with a euphoric voice. It sounds a little creepy. Then she tells you something positive with an expression of regret. That sounds pretty weird. Of course, one can imagine a certain context. One can assume that the robot would also like to be a human being and that it is therefore envious of humans and full of sadness about its existence. Then you’d have a jealous, sad robot that doesn’t say what it means. Research is already underway in this area. In 2014 Jan Romportl published the paper „Speech Synthesis and Uncanny Valley“. More research is needed to better understand the effect of the synthetic voice. An interesting method is to make the emphasis inappropriate.
Fig.: The emphasis is inappropriate