The most suitable pitch shifting algorithm for harmonizer

Discussion about the DSP Dimension's articles, tutorials and code.

Moderator: neuronaut

The most suitable pitch shifting algorithm for harmonizer

Postby tereci » 21.11.2010 20:21

Hello, I am working on my school thesis which objectives are to implement an real-time harmonizer-like effect for singers. So I need to find the most suitable algorithm for real-time pitch shifting first. I went through some well known algorithms described in DAFX - digital audio effects book. I have no background in engeneering (only some mathematics and informatics) so it took me some time to understand things.
So far I am considering using PSOLA or Phase Vocoder. I wonder if you could think of some other possible approach, that could be even better(in quality of resulting sound or simillar quality and faster)? ...I went through the tutorials here so it may seem like a redundant question but they have been written some time ago and things may have changed a bit...? I heard that some applications are using neural networks or wavelets - but I have no idea how is this applicable to real time processing?

The most important factor for me is speed - since it will be used to produce multiple pitch-shifts at once in real-time. On the second place is quality of resulting sound and usable range of pitch shift. I understand that I will have to use some formants correction as well since it should be primarily used for human voice.

1) I would be grateful for any recommendations regarding choice of pitch shifting algorithm... idealy with a brief explanation on why do you recommend it... Even if you have some preference among PSOLA and Phase Vocoder...
2) I am also looking for a best way to find pitch marks for PSOLA algorithm... if you know about any algorithm, let me know please...
Thank you in advance,
Tereza.
tereci
 
Posts: 5
Joined: 24.10.2010 10:50

Re: The most suitable pitch shifting algorithm for harmonizer

Postby neuronaut » 22.11.2010 17:33

The easiest implementation would be to use fixed buffers and just change the pitch of these buffers by resampling and crossfade looping them. If you're looking for a "special effect" quality is not really a concern unless you feel more adventurous... But if you do, and if the algorithm is intended to be used on voice, PSOLA is the way to go as it offers the best quality vs speed tradeoff.

Best wishes
Stephan
Free DSP tutorials by Stephan M. Bernsee at http://www.dspdimension.com
"There are 10 types of people in this world: those who understand binary, those who don't"
--Unknown
neuronaut
 
Posts: 1331
Joined: 17.11.2005 09:15
Location: Mainz, Germany

Re: The most suitable pitch shifting algorithm for harmonizer

Postby tereci » 22.11.2010 21:07

Thank you for your reply.
Though I am not sure what resampling and crossfade looping means.. If I resample it and play it with the original frequency then I will change the druation as well, wouldn't I? But thats maybe what crossfade looping solves..? From what I found crossfade looping should mean that "some portion of the data at the beginning of a loop is mixed with some portion of the data at the end of the same loop, so as to produce a smoother transition between the end and the beginning when the loop plays" ..and I don't see how this would help me so I think I don't understand exactly what you meant...?

Anyway yes, I will have to be adventurous:) I cannot implement just a basic algorithm as a master thesis, they would kick me out of the school:) I played with PSOLA implementation from DFAX in matlab and the results were really really bad - I can upload them somewhere if you want to know what I mean by bad. But again, maybe I just need better pitch mark detection algorithm.
Maybe I should be more exact. The result should be a VST pluging written in C++. It should enable user to sing into microphone and generate a pitch shifted copies of his voice in real time and add them to the original (something like real time chorus). The steps(amount of) of pitch shifting should be driven by MIDI (exactly how isn't specified yet and it could be pretty simple). So the key of the song can be set and the "pitch shifter" will then produce such a shifts that they will match the key of the song.
Now idealy I need the pitch shifting to be really quick, to be able to do multiple pitch shifted copies and some small additional logic in real-time. But I also need it to be really good quality - there shouldn't be noticible any artifacts that would be distracting for the listener. Plus I need quite a big range of pitch shift - from what I read without artefacts it ispossible to shift the sound only up to 25% with these basic algorithms. Anyway I would rather end up with a really high-quality pitch shifter that will not be able to run real-time then to end up with something that will run real-time with no problems but will not be usable due to miserable sound quality.
Originaly this effect should worked for instruments and polyphonic music as well but I persuaded my teacher that it would be impossible to do it without big loss in quality due to the universality.

I will have three months to perfect the algorithm but now I have to choose which way to go. I tried phase vocoder and PSOLA - both in their basic implementations - and both gave bad results. Ofcourse I don't expect you to give me exact recipe, but I am just stuck and I don't know which way to go, because I am afraid that I will realize at the end of the three moths that I chose badly. Because if I should dive into things like wavelets or neural networks it will take me a lot of time before I will understand them well enough to be able to say if they are usable or not and there will be no time to try something else. So I just wanted to know if you could tell me what could be the most promising way or at least which way seems like absolute waste of time...

It should be something like http://www.youtube.com/watch?v=xGSs4W_5pPE&feature=related at 0:25 .

Btw I found very nice page with comparison of pitch shifting/time stretching algorithms... http://diuf.unifr.ch/pai/people/juiller ... eview.html. Those modified Phase Vocoders sounds nice - even though it is different kind of signal(multiple instruments) and no formant preservation(which I will need). And the WSOLA seems to be lot better then PSOLA(at least my results)... though I dont know how would these algorithms perform on voice only with some formant preservation applied.

With these additional informations... is PSOLA still the best way to go?

P.S. Sorry for the long essay:)
tereci
 
Posts: 5
Joined: 24.10.2010 10:50

Re: The most suitable pitch shifting algorithm for harmonizer

Postby neuronaut » 23.11.2010 09:20

Yes. If you don't plan on releasing your app as a commercial product you should look for an algorithm by Keith Lent, published in CMJ: "Lent, K. "An Efficient Method for Pitch Shifting Digitally Sampled Sounds," Computer Music Journal vol. 13, No. I, Winter 1989, pp. 65-72."

This is the one that most popular harmonizers use (from the sound of it the one in the Youtube video as well) but it's patented by IVL so you might need to obtain a license from them in order to use it commercially.

Pitch shifting based on Wavelets or ANNs is certainly cool as a research project but not within a 3 months' time frame.

HTH
Stephan
Free DSP tutorials by Stephan M. Bernsee at http://www.dspdimension.com
"There are 10 types of people in this world: those who understand binary, those who don't"
--Unknown
neuronaut
 
Posts: 1331
Joined: 17.11.2005 09:15
Location: Mainz, Germany

Re: The most suitable pitch shifting algorithm for harmonizer

Postby tereci » 24.01.2011 01:29

Hello,
thank you for your advice. I tried to combine last two steps of Lents algorithm with YINs pitch tracker and it has really nice sounding results - especially with some basic noise filtering. But now I got stuck... it is too slow to run real-time. I need to be able to detect frequencies from cca 80Hz to 1100Hz which should be approximate vocal range. Now YIN is based on autocorelation and so I need enough samples to enable the lowest frequency I want to detect, to complete at least two cycles. That means that I need fs/80 * 2 samples if I understand things correctly. Now thats cca 25ms fixed latency just for buffering the data(I am implementing it as VST plugin so I need to use some kind of delay line). From what I've read to make the plugin usable in realtime it has to have latency under 20ms. Similar it is for the Lent algorithm itself because it uses Hanning windows two periods long. I may be totally getting it wrong or there may be some easy way to work around this since it is used by popular harmonizers. I would be grateful for any feedback on this.
Thank you,
Tereza
tereci
 
Posts: 5
Joined: 24.10.2010 10:50


Return to The DSP Dimension

Who is online

Users browsing this forum: No registered users and 1 guest

cron