CandyVoice develops, innovative, real-time software solutions for digital voice processing applications based on patented technology MUVOC®.

MUVOC® is an unique suite, of ready-to-use, independent, but working in synergie built-in modules:

  • MUVOC® Voice Conversion
  • MUVOC® Voice Transformation
  • MUVOC® Noise Filter
  • MUVOC® Lip-Sync
  • MUVOC® Voice Print
  • MUVOC® Speaker Recognition

MUVOC® allows to handle (monitor, analyse and compute) all essential sound parameters that can be exploited by subsequents: voice conversion, voice transformation, noise filtering, lip-sync, and learning/recognition modules, possibly in distributed systems.

This flexible configuration simplifies development and maintenance at the customer, allows fast operations even on simple CPUs, (ARM, DSP, Intel, etc.), and makes MUVOC® compatible with most embedded chipsets and operating systems.

MUVOC® operates with sample rates from 8 000 Hz to 96 000 Hz, and with 16 or 24-bit samples.

MUVOC® Voice Conversion

The voice conversion consists of reproducing one person's voice (target voice) from the voice of another person (source voice), or from a Text-to-Speech voice (TTS)

How it works?

Professionals / Partners:

The process of voice conversion requires the phase of learning of the mathematical voice model of both: the source, and the target voice, from their respective recordings (made in the studio, or from archives). Afterwards, these voice models are processed, in order to generate a voice conversion model from the source voice to the target voice. Thus, one person (source voice) can express himself / herself with the voice of another person (the target voice) in real time.

The recorded voice of the interpreter chosen by the customer, to be reproduced by the voice imitation, will be realized in a studio.

These recordings will be pre-processed to create the vocal model in a semi-automatic manner. At this point, the customer will hear a fairly representative result of the reproduction of the interpreter's voice.

Then, an important part of supervised processing of the vocal model will provide the optimal quality of voice reproduction.

In the case where the voice to be reproduced comes from a video or audio archive database, CandyVoice can create the quality vocal model from this base, provided that the recordings are of optimal quality, homogeneous (recorded in the same period and in the same sound environment), and without being disturbed by noise.

Individual users:

We give the possibility to the user to customize a Text To Speech, and to create his own vocal model quickly, easily, and for free!

The creation of the vocal model requires previous recording of the user’s voice with 160 phonetically balanced sentences, via a smartphone app or a pc at home, via dedicated CandyVoice’s applications (available soon).

Learn more on the recording procedure

1. TTS voice customization:

The TTS that are available on the market are very expensive and time consuming to create, and they propose very few voices for each language. Thanks to MUVOC® Voice Conversion technique, any TTS can be customized very quickly and inexpensively with new voices for professional or for the general public use.

Try the demo

Scope of application:
  • Health

    With TTS voice customization, we can reproduce and give back the voice to the people who have a probability of losing its use, due to a throat disease (Upper Aerodigestive Tract Cancers, SLA, etc), or due to the consequences of upcoming surgery.

    This ‘reconstruction’ requires the previous recording of the users’ voice (in a studio, or at home, via our smartphone app, or a pc), in order to create his personal vocal model, and his conversion model.

    Then, the users’ voice can be integrated in any Augmentative and Alternative Communication (AAC) device, or another communicating object.

    Regarding the persons who could not anticipate the loss of the use of their voice, and could not record it beforehand, we can create for them a custom voice of their choice that best fits his personality, or ‘collect’ and process one from his family. Then, as in the first case, the chosen voice can be integrated to any communicating device.

  • Robotics and communicating objects

    The customized TTS allows to choose from a multitude of different voices, according to the use, or to the preferences of the user. For example, domestic robots equipped with a familiar and recognizable by its user voice, would be easier accepted in the user’s environment.

  • Safeguarding the voices of loved ones

    A child's voice changes rapidly, and he loses it permanently at puberty. The voice model is a picture of the voice taken at the moment T, and can be replayed, or reused at any time.

    It is also a valuable backup in case the person loses the use of his voice.

  • Messaging and social networks

    The customized TTS allows to listen, rather than to read, all written messages: mail, sms, publications on the social networks, etc. with the voice of the sender. Thus, we can devote more time and attention to our activities, without having our eyes riveted on our screens.

2. Real Time Voice Conversion

The visual design in video games, especially with the use of virtual reality helmets, reached a very high level. MUVOC’s® Real-Time Voice Conversion technology can enhance this experience by adding the sound dimension. The player’s voice can imitate the voices of the characters in real time, what increases the immersion in game, and enriches the player's sensory experience.

3. Visemes

Viseme detection (mouth movements according to pronounced words) is an option of the voice conversion that allows to animate the mouth of the character or the humanoid robot.

Voice and imitation models creation for individuals

Click/touch to zoom, double click/touch or Escape to come back

Voice and imitation models creation for companies

Click/touch to zoom, double click/touch or Escape to come back

MUVOC® Voice Transformation

MUVOC® Voice Transformer is part of CandyVoice’s digital voice processing toolbox, and combines MUVOC® Core (Analysis and Synthesis), and a voice parameter modifier.

MUVOC® Voice Transformer transforms one voice to another in a very realistic way by playing with several vocal parameters simultaneously: timbre, gender (male, female), age (child, teen, adult, elderly), expressivity, and many other amusing and original changes (ogre, helium, etc.).

Thus, from a single voice, we can create a multitude of different voices, for example, a man's voice can be transformed into a woman's voice, a boy's voice, a girl's voice, an old man or woman's voice, etc.

Besides these realistic effects, MUVOC® Voice Transformer can apply ‘artificial’ (non-realistic) effects on the voice, allowing the user to create a wide range of fantastic and original voices!

Other settings allow to adjust the listening speed, and enable fast forward or rewind functions.

Scope of application:
  • Video games: creation of vocal avatars (virtual reality)
  • Toys: voice changers
  • Karaoke: vocal effects, voice transformation, tuning, etc.6
  • Professional mixing
  • Applications or telephone services
  • Marketing Telecommunication services, etc.

MUVOC® Noise Filter

MUVOC® Noise Filter drastically removes, or attenuates the ambient noise without altering the natural aspect of the original voice, even in very noisy environment, but also enhances the voice clarity and intelligibility! MUVOC® Noise Filter combines MUVOC® Core Analysis, and an adaptive filter.

MUVOC® Noise Filter is used for a decade in digital radios for sports referees worldwide.

MUVOC® Noise Filter needs only one channel (a single microphone), is perfectly adapted, and very performant to improve the quality of communications via earphones used with mobile phones. Currently, mobile phones are equipped with very efficient noise filters, but they are disabled while the earphones are used. The efficiency of these filters is related to the fact that they need two microphones to operate, while the earphones have only one.

For example, users who listen to the music with earphones, and at the same time answer the phone, do not suspect that the noise filter is disabled in this case, what can be very unpleasant to their interlocutors. MUVOC® Noise Filter can effectively solve this problem, and improve the listening conditions of communications from noisy environments.

MUVOC® Noise Filter can also be used to improve the recognition software, which are very sensitive to the surrounding noises.

Any system that uses voice recognition (for example robots, or communicating objects) may be disturbed by external or internal noises. According to the noise, MUVOC® Noise Filter can gain about 10 dB on the curve of speech recognition performance.

Scope of application:
  • Mobile Telephony
  • Communication between referees (football, rugby...)
  • Security
  • Event organization
  • Concerts
  • Voice recognition (STT)
  • Construction industry
  • Hearing aids

MUVOC® Voice Print

MUVOC® Voice Print combines MUVOC® Core and the word recognition module.

This module allows to recognize previously recorded words and sentences in order to activate commands, and also to identify the speaker.

A single recording of a command is sufficient to detect and identify the speaker.

The efficiency of MUVOC®, even in very noisy environments, increases the viability and word recognition rates.

Scope of application:
  • Speaker Recognition
  • Voice activation and control of robots and communicating objects
  • Vocal passwords

MUVOC® Lip-Sync

MUVOC® Lip-Sync combines MUVOC® Core and the viseme recognition module. It uses the speech model learned for MUVOC® Voice Conversion.

MUVOC® Speaker Recognition

MUVOC® Speaker Recognition combines MUVOC® Core and speaker recognition module. Speaker recognition allows you to recognize one person's voice from their vocal model, and to secure actions to be performed by Artificial Intelligence. The vocal model is used to create the vocal conversion model. Voice recognition is done in real time, in 1 or 2 seconds from any voice content. Thanks to the voice detection speed, a robot can follow a conversation, while identifying speakers.

The conversion model can be secured (reserved for the exclusive use of the owner of the source voice) using the same speech model.

Scope of application:
  • Speaker Recognition
  • Secured vocal transactions