Thank you for your reply.
Though I am not sure what resampling and crossfade looping means.. If I resample it and play it with the original frequency then I will change the druation as well, wouldn't I? But thats maybe what crossfade looping solves..? From what I found crossfade looping should mean that "some portion of the data at the beginning of a loop is mixed with some portion of the data at the end of the same loop, so as to produce a smoother transition between the end and the beginning when the loop plays" ..and I don't see how this would help me so I think I don't understand exactly what you meant...?
Anyway yes, I will have to be adventurous:) I cannot implement just a basic algorithm as a master thesis, they would kick me out of the school:) I played with PSOLA implementation from DFAX in matlab and the results were really really bad - I can upload them somewhere if you want to know what I mean by bad. But again, maybe I just need better pitch mark detection algorithm.
Maybe I should be more exact. The result should be a VST pluging written in C++. It should enable user to sing into microphone and generate a pitch shifted copies of his voice in real time and add them to the original (something like real time chorus). The steps(amount of) of pitch shifting should be driven by MIDI (exactly how isn't specified yet and it could be pretty simple). So the key of the song can be set and the "pitch shifter" will then produce such a shifts that they will match the key of the song.
Now idealy I need the pitch shifting to be really quick, to be able to do multiple pitch shifted copies and some small additional logic in real-time. But I also need it to be really good quality - there shouldn't be noticible any artifacts that would be distracting for the listener. Plus I need quite a big range of pitch shift - from what I read without artefacts it ispossible to shift the sound only up to 25% with these basic algorithms. Anyway I would rather end up with a really high-quality pitch shifter that will not be able to run real-time then to end up with something that will run real-time with no problems but will not be usable due to miserable sound quality.
Originaly this effect should worked for instruments and polyphonic music as well but I persuaded my teacher that it would be impossible to do it without big loss in quality due to the universality.
I will have three months to perfect the algorithm but now I have to choose which way to go. I tried phase vocoder and PSOLA - both in their basic implementations - and both gave bad results. Ofcourse I don't expect you to give me exact recipe, but I am just stuck and I don't know which way to go, because I am afraid that I will realize at the end of the three moths that I chose badly. Because if I should dive into things like wavelets or neural networks it will take me a lot of time before I will understand them well enough to be able to say if they are usable or not and there will be no time to try something else. So I just wanted to know if you could tell me what could be the most promising way or at least which way seems like absolute waste of time...
It should be
something like
http://www.youtube.com/watch?v=xGSs4W_5pPE&feature=related at 0:25 .
Btw I found very nice page with comparison of pitch shifting/time stretching algorithms...
http://diuf.unifr.ch/pai/people/juiller ... eview.html. Those modified Phase Vocoders sounds nice - even though it is different kind of signal(multiple instruments) and no formant preservation(which I will need). And the WSOLA seems to be lot better then PSOLA(at least my results)... though I dont know how would these algorithms perform on voice only with some formant preservation applied.
With these additional informations... is PSOLA still the best way to go?
P.S. Sorry for the long essay:)