- Posts: 77
- Thank you received: 31
Please Log in to join the conversation.
2.0. Pitch Shifting Techniques
2.1. Zero-Padding
Zero-padding is very simple pitch shifting method because it adds or removes values at zero crossing to change the pitch. However, it produces artifacts that swamp the signal without intensive interpolation to keep the speech signal’s slope from abruptly changing. Even, without this problem there are still considerable artifacts because the zero-crossings are not necessarily evenly spaced. Overall, the quality is too poor to make up for its simplicity.
2.2. PSOLA (Pitch Synchronous Overlap/Add)
PSOLA (Pitch Synchronous OverLap/Add) uses a pitch detection algorithm to extract single periods and either overlaps or adds a single pitch period to decrease or increase pitch respectively. It changes the pitch without altering format frequencies, consistent with human voice characteristics and is computationally efficient [1]. Several of its assumptions, such as setting the phase equal to zero, are not completely correct though and lead to artifacts. Spectral discontinuities at boundaries also cause artifacts. PSOLA can only be as accurate as the pitch detection algorithm used and the unvoiced regions can be tricky to handle. Yet another problem is that voiced fricatives contain a buzz in their pitch shifted sound [2]. While, PSOLA produces a better sounding pitch shifted speech than the zero-padding method (its intelligibility is good), the artifacts make the sound unnatural and unpleasing for the listen.
2.3. Physical Modeling
Physical modeling tries to completely separate the vocal tract information from the glottal impulses using inverse filtering techniques. The drawbacks of this method are computationally complexity and artifacts produce from the inability to exactly model the inverse filter speech to completely separate vocal tract info from impulses [1]. The artifacts are less noticeable than in methods such as PSOLA but its computational complexity is much greater.
2.4. Frequency Domain Methods
Frequency domain methods (including phase vocoders) must first transform the speech signal into the frequency domain and then transform it back after the pitch is adjusted. These methods contain spectral leakage, which cause the formant frequencies to shift and most require a transformation back to the time domain [1, 3]. Frequency domain models allow for finer pitch shifts at the cost of increased complexity and shifted formants.
2.5. Delay Based
Delay based methods crossfade between two channels with different varying delays and gains to produce a smoothly transitioned pitched shifted signal. They produce only small artifacts so long as large pitch shifts are not used. They are also fast enough to operate in real-time; however, they do shift and smear the formant frequencies [4]. Delay based methods are less computationally complex than frequency domain methods or physical modeling methods and its results have less artifacts than zero-padding or PSOLA so this method was chosen to be implemented for this project. Details of how the delay based method works are described in Section 3.
Please Log in to join the conversation.
Please Log in to join the conversation.
Please Log in to join the conversation.
Please Log in to join the conversation.
Would it be possible to run both octaves at the same time and maybe have a volume adjustement for each (like the EHX Micro POG)?
Please Log in to join the conversation.
Please Log in to join the conversation.
Please Log in to join the conversation.
Please Log in to join the conversation.
Please Log in to join the conversation.