Singing Vocal Synthesizers

Saturday, March 10, 2018

Sinsy (English) Tutorial!

Hey! No, I didn't abandon my tutorials. Just been busy is all, but I'm restless after a long day.

Anyway, today I'm gonna discuss:

I feel many people struggle to comprehend the tutorials and find working resources. I do it a bit differently than the official tutorials because most of the recommended resources don't work for me very well. Now, if you're an active part of the Singing Synthesizer Community, you probably have the tools needed already! Here's what you'll need, pre-tutorial:

- UTAU (We are going to export as a MIDI)
- (optional) the UTAU Plugin "ImportVSQX" installed; knowledge of how to use the "Import" function to import .vsq or MIDI files.
- CeVIO Creative Studio.
(Note: with CeVIO, I don't know if you need a vocal to use the Song Editor because I have vocals. If you do need a vocal, HAL-O-ROID is free to download.)
- The website sinsy.jp, the official phoneme reference PDF.
- Patience, and an open mind.

Heading to Sinsy, you're immediately granted with a simplistic and text-heavy site. You may be intimidated, but try to relax. It's not difficult to figure out.

You can change the language of the website to English in the top right hand corner, but doing so will immediately default you to English vocals, just so you know. There used to be a bug on the site where changing the language of the vocals being used changed the website's language. It seems to be gone, but be wary.

So now that the website is in English, you can get a good look at what the Parameters are. Most are self-explanatory, but Pitch Shift might be something some people don't know. Pitch Shift is just changing "semitones" (the website uses another name for them; halftones). If you're unfamiliar with them, (before you scoff, some people use Singing Synthesizers out of technological curiosity and don't know jack about music) just know that they are portions of an octave. By entering "1", it would be like...If in UTAU you selected all notes and moved them up exactly one space. By entering "12" you would have moved it up one octave (12 spaces). By entering "24" (the maximum Sinsy allows) you would have moved it up two octaves (24 spaces).

We've acknowledged the website and Parameters. Hooray! For the sake of an example, I'm going to demonstrate via .vsqx import into UTAU, since I feel most people in the Singing Synthesis Community would like to make covers with Sinsy, and gradually become comfortable enough to write an original, but lack the tools.

I'm going to cover an English Song, which means I need a .vsqx that has English Lyrics so I can edit more easily. I've been on a GHOST roll lately, so I'm going to choose a .vsqx of their song called "Happy Days" ft. MAIKA. The .vsqx was made by Grace Herring.
(Note: The song may potentially have upsetting material or ideologies, which is why I didn't link it.)

Though I adore Matsuo, I'm going to use Xiang-Ling. She is terribly scorned for her English mispronunciation, though Matsuo is as well. Now, you may be saying, "Talc, why would you use her if she mispronounces words?" Well, I'm going to show you how to override her mispronunciation. The source of mispronunciation comes from sounds assigned to isolated vowels, homonyms, and homographs. English is complex, and it can be hard to tell which way the word is supposed to be said. Before I get to the cover, I want to introduce the problem we'll run into and how to solve it.

For example: I know that by default, Xiang-Ling will pronounce "wind" as "why-nd" as in, to wrap an object around something or itself, instead of "wih-nd", as in, the movement of air. So what can I do to change this? Phoneme Input.

Now's when I'm going to ask you to open that reference sheet PDF I linked earlier. You may notice that it covers Japanese, English, and Mandarin Phonemes, but for today I am going to focus on English. The first page you see has some very helpful information regarding pronunciation and gives you a visual gist of what we're going to be doing.

The example is off pronunciation-wise, but they do make an important observation: "an answer" is said uniquely. They note using the asterisk only works after a vowel and will cause stress (read: annunciation) of the asterisked vowel.

Without the asterisk "an answer" would likely be pronounced "ananswer", but that's not how "an answer" is said. English speakers usually stress the "an" of "answer" so that the two words don't blend to avoid muttering. From a distance or low voice, "ananswer" could sound like "announcer", which might cause confusion. To my knowledge the asterisk is to help with lexical stress, since the CMU dictionary offers to display it.

Let's have a look at the English phoneme chart on Page 4. I think the consonants are easy to figure out, but some vowels aren't as apparent. I'll give sample words.

aa = box
ae = apple
ax = about (it's the schwa.)
ah = ability
ao = flower
aw = caught
ay = fly
eh = ever
er = urge
ey = ape
ih = igloo
iy = tree
ow = ocean
oy = coy
uh = look
uw = food

EDIT: i looked up CMU and you can combine sounds like "ah" and "r" [ah, r] to get "ar". you can also get "ts" by going [t, s] and etc. if you feel like you'll have difficulty, use the CMU dictionary to look up a word you can't think of how to write phonetically.

so for example, some words would be written out as lyrics on spaces like this:
chance[ch, ae, n, s]
swirl[s, w, er, l]
coil[k, oy, l]

you could also play around with the | function and try:
thing[th, iy|ng]
pack[p, ae|k]
flirt[f, l, er|t]

We have our phonemes and lyrics to the song, let's import the .vsqx. What I like to do is immediately export it as a MIDI in UTAU. I don't know what Xiang-Ling is going to mispronounce off the top of my head other than isolated vowels, so making edits is like taking shots in the dark. Then, I open CeVIO and import the MIDI. Once it is inside CeVIO, don't mess with it in the Song Editor. If CeVIO recognizes a lyric as invalid (basically any English word) it renders it as a rest inside the MusicalXML.

Export the MIDI in CeVIO as a MusicalXML and once you have that, you can head right back to Sinsy and Choose the File, then hit "Send". It will take a few moments depending on song length, (5-7 minutes is the maximum) If there's anything wrong with the file you uploaded it will tell you what happened.

Once it's ready, you will see a little audio player appear like this:

With this, you'll be able to listen to the voice (or download and listen) to identify which words were said wrong.

Your edits in UTAU may look a bit bulky, like this:

But that's nothing irregular. I've used Phoneme Input to attempt to generate a more solid/clear sample on correctly said words and sometimes, it works.

Also, one last thing: Sinsy does support dashes! If you have a sequence like:

[to] [mo] [row] [-]

[row] will be extended just fine, but sometimes Sinsy experiences a little difficulty extending suffixes that end in consonants. Only sometimes, it's usually pretty good.

The patience comes with identifying what is said wrong throughout the whole song and editing it into correction, but often times other Singing Synthesizers mispronounce words too. As it is right now, many Singing Synthesizers are very manual. That's just how it is.

Now that you have an idea of what to do, go for it! :)

Here's a tidbit of what I worked on. I think Sinsy is worth your time! Please give it a go.

Take a note that both vocals have a slight accent, some ih/iy's may be similar and consonants may be slight or harsh. Using phoneme input can sometimes fix this.

Tuesday, January 30, 2018

AquesTone (1+2) and AquesTalk10 Tutorial

Take 1 is a-go! Information here may be subject to change.

AquesTone

NOTE: AquesTone only sings in Japanese.

The first thing you need to do is download AquesTone. Visiting the link, scroll down to the yellow-orange header--"Download", and click "aquestone_0752.zip". This is the most recent version of AquesTone (not AquesTone2) and it is in VSTi format, meaning you need a DAW, like FL Studio or REAPER (haven't tested if it works in other DAWs, but feel free to comment if it works in Audacity or Adobe) so that the DAW can manipulate the virtual instrument.

I use REAPER out of personal preference. Extract the file using your file extractor of choice. Opening up the folder you'll see AquesTone.dll and AqToneLicense.txt, and you can read the License if you want. You install it like you would any other VSTi, taking the .dll file (in this case, AquesTone.dll) and putting it in your DAW's plugin folder. You can also usually have a DAW file browse to open a file in a different location, but I imagine that gets irritating with time, it's easier to have it in the plugin folder.

(For REAPER, when you find the plugins folder, you actually need to place it in the "FX" folder that's inside the Plugins folder. Otherwise it won't be detected.)

So now that AquesTone is installed, you should be ready to use it! Open up your DAW, and you may see in the loading screen that it is briefly analyzing AquesTone.

Go ahead and place a VSTi on a track. In FL Studio the easiest way to do this is to right click "Sampler" the Channel rack and click "replace" which will let you browse your plugins, to which you select AquesTone. In REAPER, click the "insert" menu at the top and a drop down menu will appear. The second option, "Virtual instrument on new track" is the option you're aiming for, and you can select to have AquesTone be your virtual instrument. (In REAPER, you can also add VSTis onto a track like you can Effects.)

Now that AquesTone is on a track, you're ready to rumble, sort of. THIS is the part that's up to you. If you're using a MIDI or making one, you'll need to do that first. AquesTone is ONLY compatible with MIDIs. If you're making or already have a MIDI, you'll need the lyrics before all things.

Open Notepad and type out the romaji Japanese lyrics without spaces. AquesTone will read a space as "the end" and restart with the beginning lyrics instead of the lyrics to the next verse.

a. ku ro i ya gi ga tsu bu ya i ta 🆇
b. kuroiyagigatsubuyaita ✅

As per the incorrect example, AquesTone will sing "ku" the whole MIDI if there is a space.

Save it in ANSI as a .txt file, not Unicode, or else AquesTone will not read it. Now, the UI may have opened up automatically when you loaded it onto the track. If you closed out of it, click on it again and the UI will open back up. This is where the magic happens.

Go ahead and import your MIDI (under File), if you're in FL Studio. If you're in REAPER, you can just drag it onto the first track from the File Explorer and read ahead. In FL Studio, doing this may make a new track and you may have to put AquesTone on this new track (right click on channel rack, replace, select AquesTone) and delete the first track. If it didn't make a new track, select the paintbrush tool in the Mixer after accepting importing the MIDI options and left click on the first track in the Mixer. That step is also the next step if you had to create a new track--go to the Mixer and use the paintbrush tool on the first track. Your MIDI will now appear, and double clicking on it will take you to your DAW's MIDI editor. If you're familiar with other vocal synthesizers who have a host program, you will probably be overcome with relief as MIDI editor UIs are the origin of most vocal synthesizer UIs.

But if you click play, no sound will come out. Why? Because there's no lyrics. So in AquesTone's UI, go ahead and select "Open" and browse for your lyrics text file. It may ask you for a .koe file, but you can just open the .txt file regardless. In the UI, next to the lyrics's Path Location displayed in Blue at the top you'll see "Female_F1", who is your current singer. Underneath is "Auto_F1", the same voice but with Automatic Vibrato, "Male_HK", a husky male voice, and "Auto_HK", the same male voice but with Automatic Vibrato.

Now that everything is loaded, we're ready to hit play. So do it. Sound? Hooray! If not, check that you followed the steps correctly.

If you have sound, we're moving on.

Now that you have sound, you'll notice that as soon as you hit stop and play again, AquesTone starts from the last note it was stopped on, and will immediately sing the words on the wrong notes. You can only do one thing about it: Reload the .txt and play from the beginning. If your MIDI is single channel, you can just hit "UP" or "DOWN" and AquesTone will automatically reload your .txt and you can play from the beginning. Most VSTi vocals have this problem, so don't fret too much.

If you want to play around with the vocal's parameters, I encourage you to try it.
Image result for aquestone

"Husky" gives a softer tone,
"Resonance" gives a muffled sound, making transitions a bit less harsh,
"Volume" is self-explanatory,
"Release" adds an exaggerated end breath,
"Portamento Timing" adds a fall in pitch to the end and start of every note, giving it a bit more emotional impact.
"Vibrato Freq" is how fast the vibrato will be, but it only works on the "Auto" vocals.
"Pitchbend Level" oddly reverts right away for me, but I imagine it would cause the vocal to exaggerate Portamento.

When you're satisfied, go ahead and render your vocals if you'd like. Just be sure to reload the .txt and start from the beginning. If you prefer not to sacrifice quality and save rendering for the final product when mixed, that's up to you!

Here's the example I worked on in the AquesTone Tutorial.
raw vocals - Lynne ft. Auto_F1
with FX - Lynne ft. Auto_F1

When mixed, the harshness of the transitions is usually softened by the music.
Like in this song.

AquesTone2

NOTE: AquesTone2 only sings in Japanese. Its expiration date is May 2018, after that it will not work AT ALL unless they extend the expiration deadline (again) and distribute an upgrade. I don't know why they're doing that, to be honest.

AquesTone2 (you need to scroll to the bottom of the readme to be able to click download) works in a very similar fashion to AquesTone. Download and installation is identical to AquesTone because it is also a VSTi. AquesTone2 has MANY more parameters but product usage is nearly the same. However, there is a certain thing--the syllables are a bit weird in this.

Shi is si, chi is ci, cha is cya, tsu is tu, etc. The UI has a list of syllables (Syll List) if you're familiar with how romaji looks. Bearing that in mind, carefully read the syllable list when crafting a .txt for AquesTone2 to read.

AquesTone2 features a vocal named "Lina", presumably female. If you're familiar with UTAU, this vocal features samples from the same voice provider of Namine Ritsu, but unfortunately isn't as powerful in singing. In fact, the voice is quite weak using default settings and relies heavily on the "Detune" parameter to get its unique sound and smoother transitions.

Image result for aquestone

"Speed" increases how fast Lina will say things, useful when matching tempo.
"Vibrato rate" is the same as before, so is "Portamento Time"
"Velocity" "Sustain" and "Master" are all branched together because they control voice sound. Velocity controls wave rate, Sustain in this case is like Husky, and Master is Hardness.
"Gender" adds masculinity and femininity in the form of raw formant shifts.
"Resonance" is the same as before.
"Detune" is new, and it adds an interesting effect--the vocal splits into two sounds at different pitches, one more off key and dissonant. This is a default effect that can be removed.
"Consonant" "Sustain" and "Master" are branched under "Volume", but these controls really refer to consonant sound. "Consonant" is how prominent the consonant will be, Sustain is how soft or hard, and "Master" is how loud it will be.

Otherwise, functionality is the same. Get your MIDI and .txt file lyrics ready!

Here's some quick demonstrations of Lina without Detune and Maximum Master.
raw vocals - Yoiko no Kusuri ft. Lina
with FX - Yoiko no Kusuri ft. Lina

If you want to see some more of Lina, I have an old cover posted on my YouTube that I kind of want to redo, haha. I'm much more proud of the one I did with Female_F1.

AquesTalk10

NOTE: Only speaks in Japanese, only accepts Hiragana input.

Image result for aquestalk10

Download it here. Once you've got the files extracted, open up the "samples" folder and then the "AqTk10App" folder. Run the application inside. It will open and you'll be able to use AquesTalk10! It's a TTS Engine that's actually very smooth in pronunciation (uses slang!), albeit a bit un-human sounding. You can also adjust the Parameters at the bottom to create a new voice, rather than the 3 default voices and 8 presets. You can save speech you create in .wav format and use it.

I'm no master of Japanese sentences, and this is more fun to experiment with on your own, so I won't give you any demos. :P

Monday, January 29, 2018

Tutorial Announcement!

I'm going to be posting tutorials to certain Singing Synthesizers. Granted, I still have to work out how some function. I haven't had the opportunity to use all of them in-depth.

These Tutorials are the only ones I'm doing At The Moment. I'll be sure to do more after I finish them. I also am not doing these in any specific order, and some may be more image/GIF heavy so creating the tutorial may take awhile! I know for sure the VOCALOID tutorial isn't coming soon.

Tutorials to look forward to:

VOCALOID4 English Usage Tutorial and Tuning Tips

CeVIO Creative Studio: Speech Usage Tutorial
CeVIO Creative Studio: Singing Usage Tutorial

Alter/Ego VST ver. Usage Methodology Tutorial

AquesTone/Talk Tutorial is Done!

Sinsy Use Methodology Tutorial (This is more of "How I do it", not the official way).

I'm not really doing UTAU Tutorials bc there's an endless amount of tutorials out there and most people will explain something to you if you ask. Information constantly changes and depends upon resampler, input method (cv, vcv, cvvc, vccv), appends/alternate pitches, and language of the voice being used. I couldn't possibly cover all those topics without being extremely specific, so I'll leave you to browse the many tutorials out there, since UTAU is a more popular synthesizer and easier to get resources on. I'll be happy to answer questions if you ABSOLUTELY cannot find an answer though!

Wednesday, January 17, 2018

(2018 Update!) Singing Synthesizers other than VOCALOID and UTAU.

hey, been a long time! that last list was outdated and i was mostly grasping for straws with some obscure singing synthesizers without stable releases. so today i'm here to give updated information on synths that currently are publicly accessible. here is a quick list of singing synths that i'm going to tell you how to get. (some do cost money.)

an asterisk indicates that the synthesis language is one unsupported by VOCALOID, but may have been attempted in UTAU. otherwise, it's exclusive to the singing synthesizer(s) listed.

Japanese Singing: CeVIO Creative Studio, Alter/Ego, Sinsy, RenoidPlayer, Wonder Horn, AquesTone, Virtual Singer.
English Singing: Alter/Ego, Sinsy, CANTOR, ChipSpeech, Virtual Singer.
Chinese Singing: NIAOniao, MUTA, Sharpkey, Sinsy, Virtual Singer.

Korean Singing: VOCALINA Studio.
*French Singing: Alter/Ego, Virtual Singer.
*German Singing: CANTOR, Virtual Singer.
Spanish Singing: Virtual Singer.
*Italian Singing: Virtual Singer.
*Latin Singing: Virtual Singer.
*Finnish Singing: Virtual Singer.
*Occitan Singing: Virtual Singer.

CeVIO Creative Studio

CeVIO Creative Studio is a Japanese Singing and Speech Synthesis Workshop. "CeVIO" is actually pronounced "Cheh-vee-oh", not "Seh-vee-oh". You can visit the home page here, the Studio is free to download. Most vocals cost money, approximately $40~$70 per voice. There are links to the shop on the advertisements to the right. I recommend you buy vocals on the Vector PC Shop due to hassle rakuten tends to give about foreign credit cards. Don't be intimidated by the Japanese, it's quite easy to navigate. What's unique about this Singing Synthesizer is that it is also capable of Speech Synthesis, and the initially introduced characters featured at least 3 different talking voices each. The face of this Singing Synthesizer is known as Satou Sasara, a brown haired girl you'll be seeing a lot of on the home page. Some vocals are exclusive to singing, and some exclusive to talking. Satou Sasara can do both, but she does not come free, and both capabilities must be purchased. Other vocals introduced with the program's release were Suzuki Tsudumi, a close girl friend of Sasara's, and a young man named Takahashi that was introduced as a senior to Sasara, who she felt was more of an older brother figure than friend. Tsudumi and Takahashi are only capable of talking.

Additional Vocals include a group of 6 digital vocalists called the Color Voice Series you can read about here, and purchase here. They are only capable of singing, but are noted for unique sounds. Following, a powerful vocal called ONE was released for CeVIO, she is considered to be the most high quality CeVIO vocal, rivaling popular VOCALOID and UTAU alike. ONE is produced by 1st Place, the company that also produced IA, a popular VOCALOID. ONE can sing and speak, like Sasara. IA additionally has a talking voice exclusively for CeVIO. You can buy ONE at this link.

A project featuring the restoration of the deceased singer, Haruo Minami, was released to the public under the character name HAL-O-ROID. He is the only CeVIO voice being given out for no charge, and is only capable of singing.

This program has a much more manual touch to it than most other singing synthesizers. The fine detail customization of pitch, phoneme length, phoneme input, multiple character entry, and many other features make it different from VOCALOID and UTAU, where the editing of these features can be much more timely without experience.

EDIT 1/25/2018: You need a Japanese locale to use the Speech Engine so when speaking, the engine can accurately track real time movement.

Alter/Ego

Alter/Ego is a Free VST Singing Synthesis Plug-in/Standalone Singing Synthesis Program (and other formats) by Plogue. You do need a DAW, such as FL Studio, to use the VST version. Plogue is much more famous for making ChipSpeech (I'll get to that later). Alter/Ego itself is a refreshing program that takes a bit of configuration with parameters to make vocals sound fluent, but successful end results can be very rewarding. Alter/Ego boasts a wide variety of vocal types and language capabilities. More vocals tend to have bilingualism on the side of Japanese, but the majority do sing in English. A vocal that stands out is ALYS, by Voxwave, one of the first notable Singing Synthesizer voices able to sing in French and Japanese. A few other vocals at the moment include Bones, a male bilingual (ENG/JP) singer, Marie Ork, a vocally flexible and bilingual (ENG/"JP") singer that can perform heavy metal screams in addition to regular singing, and LEORA, a bilingual (FR/ENG) female vocal under development (by Voxwave) who features the first ever "Power" voice libraries for the languages she sings, and "Crossfade" feature which allows smooth transitions between the voice libraries when singing. Currently, only her French voice libraries are released and must be paid for, like ALYS.

Tuning and other features are able to be edited within the host program and in the VST, but this singing synthesizer is also manual in terms of editing and requires some getting used to, as it is very different from most Singing Synthesizers.

Daisy, the previous bilingual (ENG/JP) default voice, was originally intended to be part of ChipSpeech, as she was introduced as Dandy 704's love interest. His voice was sampled from the first computerized voice to sing "Daisy Bell". Daisy's name was a tribute to the song title and woman mentioned in the song. She was introduced as a time-space anomaly, her Alter/Ego art showing her exiting a time machine in pursuit of Dandy 704. Eventually, Daisy stopped being distributed officially, the reason given is that she was a time-space anomaly who vanished back into her time.

Additionally, I had stated NATA, an alto bilingual (ENG/"JP") female vocal, was no longer being distributed. Which is incorrect. I was misinformed as to how she would be obtained. She is being sold here for roughly $58 and tax (46.99 euros + VAT). NATA unfortunately doesn't have very many sample usages. She is an unofficial vocal made by Vocallective. In this case, Plogue will not help customers who buy NATA since she is not supported software.

Another unofficial vocal named Vera is under development by Vocallective also. She only has one JP demo. She is also unsupported software and has not yet been released to the public.

EDIT 1/25/2018: If you use the standalone program or formats other than VST, you will need a MIDI keyboard.

Sinsy

Sinsy('s Web Demo) is a free browser-based singing synthesizer focused primarily on sounding human without much editing. It supports 3 languages, but only select vocals have the capability to sing in another language. Xiang-Ling, one of the first vocals produced, is the sole multilingual vocal (JP/CHI/ENG). To use Sinsy, you must upload a (vocal/lyrical) Musical XML to the website, choose a voice, then hit the button to the right. That will generate a WAV file you can download. An easy way to make a Musical XML is to take a MIDI and export it as one out of Cadencii, MuseScore, CeVIO, or any way you prefer. Sinsy will not respond to Control Parameters in VOCALOID or CeVIO. Sinsy's gimmick is to sound as human as possible with as little editing as possible, this is the goal of HMM synthesis. The best way to tune is to manually add dynamics and etc in MuseScore, finale_NotePad, or a similar program. Sinsy's latest update supports this feature, and also Phonetic Input. If you run Linux and are keen in C- you can actually make your own Sinsy voice using the source code and the official tutorials.

RenoidPlayer

An unrelated free browser-based singing synthesizer is R e noidPlayer, which you can read about here. It is exclusively Japanese, and not difficult to utilize. Drop a .ust/.vsq/.vsqx/.ccs file into the Piano roll and it will appear. It also does not react to Control Parameters from other synthesizers, and dynamics don't appear to work either. There is a tutorial on how to make your own voice library for it, but you will need Renoise, a DAW, (specifically the 3.0 version in the archive). The reason is that the DAW makes a special Soundfont file that is associated with the synthesizer. The tutorial eventually becomes difficult to comprehend once the video tutorials end, so don't try it for fun. The downside to this synth is external editing and occasionally, the overlap to help transition between notes is too great and the voice becomes unintelligible.

Wonder Horn Studio

A Japanese singing synthesis that was created by NTT-AT in 2004. A similar method that Sinsy used in synthesis was used in the program, and the vocals could be quite realistic for the time period but were known to occasionally clip. The program seems to have been discontinued as of 3/31/2017. I managed to save one demo featuring two vocals out of many. If you're curious, some usages are still deep within the reserves of NicoNicoDouga, search "ワンダーホルン". The website it was hosted on was utabara.com, but the website seems to have been deleted with the discontinuation of the product.

The discontinuation seems to have been from the expense of the program and customer reluctance to buy it. On NicoNicoDouga, it's said that its popularity has remained low, with the tag containing less than 100,000 plays. It was said to have featured adult voices and one child. There were options to make a choir, and various pitch and vibrato editing additions, seemingly like an early CeVIO. The downside was, allegedly, editing a MIDI had to be done outside of the synthesizer. Editing note lengths was not possible within the Synthesizer if a MIDI was being imported.

EDIT 5/23/2018: After some digging I found that it may still be available within the Japanese software MUSIC PRO V5 as a built-in plugin under the name "Sound Jauman2". NTT seems to have cut all ties with the name "Wonder Horn".

AquesTone

AquesTone is a free VSTi singing synthesis plug-in that sings exclusively in Japanese, created by A-QUEST. It features a male and female voice. Possibly one of the more famous singing synthesizers among VOCALOID fans, due to its partner product, AquesTalk, a speech synthesizer, being typically used in memes, Let's Player voice-overs, and etc. AquesTone itself is very simple and easy to use, focusing primarily on uploading a text file of lyrics onto a blank MIDI through the UI. Tuning is only limited to the few parameters in the UI, though depending on the host program, it may be able to be tuned in the host. Because it is a VSTi, you will need a DAW, such as FL Studio. AquesTone also has an upgrade to AquesTone2 that features a different vocal and more parameters. AquesTone2 expires May 2018, so please hurry if you're interested!

CANTOR

You can learn about its history here. VOCALOID's initial rival, this program utilizes additive morphing synthesis sounds that emulate human speech in order to generate singing, rather than using human-recorded samples. It was developed by VirSyn, but has remained inactive since it's 2.10 update. Its languages are English and German, and there are 50 vocals in the Full Edition (some variations of a single vocal), but the user can create a new voice simply by playing with settings. It has a simple layout and is user-friendly, but just as VOCALOID needs a lot of editing, so does CANTOR. Consonant timing was known to be one of the less-friendly elements.

If you are determined, you can buy CANTOR 2.10 for about $370~$400 (299 Euro) depending on shipping, here.

EDIT 4/25/2018: In order to use CANTOR 2.10's demo AND full version you need to buy an "eLicenser" which contains copy protection software. An eLicenser is a physical USB Drive that must be plugged in to the USB slot. CANTOR will refuse to run if the eLicenser is not plugged in.

Virtual Singer

A Product of Myriad, maker of MelodyAssistant and HarmonyAssistant, Virtual Singer comes with multilingual capabilities. It does not use the Piano roll format like others, instead it uses a Sheet Music format. Vocals must be fine-tuned with musical parameters. It is known for relatively impressive results, and even in 2018, despite the program being aged, it sees use from its Community. It isn't very expensive, (about $40 will be spent acquiring a host program and Virtual Singer) and you can make your own voice for it as well. Bundled with the purchase is the official tutorial to make a Latin voice.

ChipSpeech

ChipSpeech is Plogue's other vocal synthesizer, though it wasn't directly made for singing, so calling it a singing synthesis isn't exactly right. It comes in a VST format, other formats, and is also a standalone program. You will need a DAW to use the VST edition. The vocals in this vocal synthesizer are often restored vocals from early recordings of human voices and discontinued text-to-speech engines, way before VOCALOID existed. ChipSpeech is a more recent creation, sampling those older vocals. ChipSpeech can be used to create Speech, Singing, and generation of ambiguous sound effects. Singing requires a MIDI and the functionality is similar to Alter/Ego--very manual editing. ChipSpeech does cost money, and currently (1/17/2017), its 11 different vocals are $95 and $5.46 tax. The vocal many people are impressed with and is often seen as the icon of ChipSpeech is Lady Parsec.

EDIT 1/25/2018: If you use the standalone program or formats other than VST, you will need a MIDI keyboard.

NIAONIAO

NIAONiao is a Chinese Mandarin Free Singing Synthesizer much similar to UTAU in the way that a user can create their own vocal for it. Initially, it wasn't considered to be impressive, but after several updates it became quite popular in the Chinese Community. It can be downloaded here, and comes with a default voice, her name is Yu Niaoniao. Many people have created vocals for this program, and it is known to be language flexible in terms of limited English, Japanese, Cantonese, and Korean possibilities. You can look at downloadable voices here. The Wiki also has some tutorials on using the program to create your own character vocal, and materials needed are here.

MUTA

MUTA is a Chinese Mandarin Free Singing Synthesizer much similar to CeVIO Creative Studio in terms of layout, though the fine amplitude timing is known to make the program receive an error in playback. There also appears to be a future plan to include talking, as there is an unusable speech option. MUTA accepts Pinyin input, (if you're unaware, Pinyin to Chinese is what Romaji is to Japanese) so for those who don't know Mandarin, a simple reference point of the Mandarin alphabet is all they need. MUTA features the vocals of a character named Yan Xi, a charismatic and young sounding female vocal you can't read about because the home page is deleted. You can look her up, though, her official demo is still on YouTube. You can download MUTA here.

EDIT 1/25/2018: The speech feature is usable to some, the system requirements are unknown but suspected to be a Chinese or Japanese locale. The author planned to have others be able to import their voices into the program as well, but MUTA appears to no longer be under development.

Sharpkey Studio

Sharpkey is a lesser known but higher quality Chinese Mandarin Free Singing Synthesizer. It has many desirable traits from several other Singing Synthesizers and is claimed to be very user friendly to those experienced with VOCALOID, UTAU, and CeVIO. Reactions were very positive in the West, but since not many know Mandarin, it is often not used. The program features a mature and powerful sounding female vocal named Huan Xiao Yi. You can download Sharpkey Studio and Huan Xiao Yi here.

Using the program is much similar to other synthesizers to assure user comfort. It has a variety of Parameters like VOCALOID and UTAU, but some relating more to Mandarin, such as Tone and Cross Tone.

EDIT 1/25/2018: The homepage is here, which you can check for updates! Sharpkey also has another vocal that I missed, her name is Kiana. The author eventually plans to have others be able to import their voices into the program as well.

EDIT 5/23/2018: The producers have begun to organize popular Mandarin UTAU/NIAOniao voices to officially import into the software. Such characters include Yong Qi and a few others.

VOCALINA Studio

VOCALINA Studio is a Free Korean Singing Synthesizer and DAW. It was originally released about the same time as the first Korean VOCALOID, SeeU, and there was a brief rivalry between products. The VOCALINA character that was introduced was a mature female vocal named Vora (Choi Bora). Vora had her own webcomic about being a pop-star in disguise, and has remained a popular character in the Korean Community. As the program developed more, a second vocal was added, Khylin. Khylin was much higher quality than Vora, and due to new engine configurations, Vora was unable to be included in the newest version of VOCALINA Studio (2.3.0 and onwards), so Khylin was able to take a rise in popularity. Khylin normally costs money and is bought through a paid subscription through anywhere between 30 days to 1 year, and once that time period runs out you have to renew your subscription. She is unable to be purchased outside of Korea due to needing a Korean Banking Account/Card.

In order to use VOCALINA Studio, you have to register for an account on Vocalina, then download the Studio, which is estimated to be about 2 GB, so downloading may take a while. Once installed, the Studio will ask you to sign in, so you do need an internet connection. Additionally, a Korean Locale is needed, so a program like Locale Emulator is a good choice to install. The Studio itself is very manual, vocal editing-wise, and relies on direct text input opposed to text to phonetic input. The engine is smart enough to make pronunciation smooth when possible, such as "seong-eol" being written as such, but being sang as "seon-geol". Some features do glitch and vowel transitions are seldom smooth, so learning the tricks and trivials of the program can be slippery, but producing good results is very rewarding.

EDIT: VOCALINA will be terminated in October 2018.

thanks for tuning in! if you'd like, please answer the poll to the right.

Honorable Mentions -

Macne Nana and Macne Petit were upgraded to the VOCALOID4 and VOCALOID Neo software, effectively crushing the MACLOID series as a Mac-exclusive VOCALOID spin-off genre. The fanbase hopes the other members of the MACLOID series also get upgraded, since they can no longer be purchased. Macne Nana and Petit were given English singing capabilities, prior they were Japanese exclusive.
SaltCase is a Simple Japanese Indie Singing Synthesizer that was developed by sota for Mac OS X, though they have ceased development in 2012. In 2013, sota released an April Fools joke that suggested an upgrade, showcasing a vocal that only sang the notes "Nwi" and "Gyo" in a deep, distorted voice. The homepage can be visited here. The mascot character doesn't appear to have a name, but additional voices can be imported. It is an inferior singing synthesizer to the Mac-exclusive UTAU-Synth.
Unity-chan was a Japanese Singing Synthesizer ("Vocaloid") developed for Unity3D. She was later upgraded to VOCALOID4.
Voctro Labs has several projects involving Singing Synthesizers. One project was their own new singing synthesizer testing a new synthesis method in Spanish and English, another called Revivos is being used to sample deceased singers, and a current project called Voiceful is a Speech and Singing Synthesizer that can only be used under a private license, but a demo is available.
Image-line recently created a "Vocal Resynthesis" tool called Harmor that allows the instrumentation of a human voice in FL Studio. It doesn't appear to have limits on language. Here's a sample, if you don't want to watch the tutorial.
Realivox has an English Singing Synthesizer named BLUE.
Kanru Hua, developer of Moresampler, SHIRO, etc, is creating a multilingual Singing Synthesizer currently called "Synthesizer V".
Emvoice is developing a vocal synthesizer called SOHO that can speak and sing. Development demonstrations are available here.
NTT-AT and HOYA have teamed up to make a Speech and Singing Synthesizer called VoiceText. So far only one female vocal named Hikari is able to sing, but there is an additional cast of characters (scroll down) who can speak and show multiple different emotions when speaking. The focus of this project is to develop showing different emotions in speech, and to find as many uses for speech synthesis as possible. They seem to have a story with their cast of characters. Additionally, there is a business version of VoiceText for those who need speech synthesis, and voices can be created under private licenses.

more to be added at a later date, if possible! thanks again, and i hope you decide to use some of them!

Singing Vocal Synthesizers

Search This Blog

Saturday, March 10, 2018

Sinsy (English) Tutorial!

Tuesday, January 30, 2018

AquesTone (1+2) and AquesTalk10 Tutorial

AquesTone

AquesTone2

AquesTalk10

Monday, January 29, 2018

Tutorial Announcement!

Tutorials to look forward to:

Wednesday, January 17, 2018

(2018 Update!) Singing Synthesizers other than VOCALOID and UTAU.

Blog Archive

did you find this info helpful?