LOSS CONCEALMENT FOR LAPPED TRANSFORM CODECS- 123docz.net

Linear transforms are widely used in signal compression. They have the primary objective of concentrating the signal energy on a few coefficients, thus preparing

the data for the subsequent quantization and entropy coding. Block transforms (e.g., the Discrete Cosine Transform, DCT) are convenient in that they make each block of data independent, constraining the effect of any error (either by quantization or by loss) to that single block of data. Nevertheless, by not exploiting correlation between adjacent samples in different blocks, they may often pro- duce a structured noise (blocking artifacts), which is readily identifiable in the decoded signal as a buzzing sound. Overlapped transform coders occupy an important niche between block codes and fully predictive coders. They still limit the data to a certain block of samples, but their basis functions do not have disconti- nuities at block boundaries. Instead, basis functions spread over to (i.e., overlap) neighboring data blocks. This significantly reduces blocking artifacts, while pre- serving or even improving the compression qualities of the transform. For these reasons, overlapped transforms are used in numerous audio and speech codecs (e.g., MP3, Windows Media Audio [WMA], and ITU-G722.1).

A loss concealment technique based on exploiting the partial information available about certain samples has been recently introduced [16]. The technique can be used with essentially any linear transform where some of the coefficients are missing. Important cases include missing “frames” of overlapped transform (e.g., Modulated Lapped Transform, MLT) coefficients, or wavelet coefficients, or even single or multiple missing transform coefficients within a block of a block transform (e.g., DCT). However, since we are mostly interested in concealment of missing blocks in real-time speech and audio communication over packet net- works, we will focus our discussion on the case of overlapped transforms.

When using an overlapped transform based codec, if a frame or block of coefficients is lost, partial information is available about the missing segment. While this information is not of enough quality to be used directly, it provides important clues about the missing segment. In this section we discuss ways in which to exploit this partial information to maximize the quality of the recovered signal. In particular, we apply some of the techniques to single-frame loss concealment on the ITU-G722.1 codec [5].

In order to better understand the scenario, let us take a look at how an overlapped transform is used for coding purposes. Figure 3.3a shows a one- dimensional signal. In this example, the signal is split into overlapping blocks of 2N samples, as shown in Figure 3.3b. Then, at each block,N transform coefficients are obtained by multiply/accumulate operations with theN basis functions constituting the transform. Figure 3.4 shows the first few basis functions of a typical transform. On the decoder side, the basis functions are scaled by the transform coefficients and added. Subsequent frames of the signal are then overlapped and added. Figure 3.3c shows the contribution of each overlapping block, before addition. Note that the recovered segments have the same length but are not iden- tical to the original segments: the original signal is recovered only after adding the overlapping parts. Now, suppose the information about one of the blocks was

0 100 200 300 400 500 600 (e)

(d) (c) (b) (a)

FIGURE 3.3: A sample speech signal. (a) Original signal. (b) Sig- nal split into overlapping segments and windowed. (c) Corresponding segments after decoding. (d) Overlapped/added signal with one missing block. (e) Error concealment using simple block repetition.

0 N 2N

(e) (d) (c) (b) (a)

FIGURE 3.4: A few basis functions of the MLT transform. From top to bottom: 1st, 2nd, 3rd, 10th, and 50th basis functions.

lost. A total of 2N samples—spawning the lost block—cannot be reconstructed correctly. If we replace the lost coefficients with zeros, we would have the reconstructed signal indicated in Figure 3.3d. Note that in this example, although only Ncoefficients are missing, a total of 2Nsamples do not reconstruct correctly, due to the overlapping nature of the transform. Nevertheless, overlapped transforms like the MLT are critically sampled. This means that some partial information is available about the 2N incomplete samples. More specifically, a total ofN linear equations are available regarding these 2N samples. We will now examine how this can be used to improve the loss concealment.

3.4.1 Speech Codecs

The ITU standard recommends that lost blocks be replaced with the previous block. While this technique is reasonable for low loss rates, artifacts are still present and become significant at loss rates that are common in the Internet.

In particular, replication of coefficients does not take into account the alignment of pitch periods between past and lost frames. (See examples of speech codecs, G.722.1.)

In Section 3.2.1 we presented one of the main principles behind loss concealment for speech: pitch replication. As we will see, the algorithm presented in [16]

can be seen as an elaborate pitch replication system. It uses the partial information available to synthesize a signal that has similar spectral characteristics and aligns well with the surrounding blocks.

The MLT transform can be decomposed into a windowing operation, followed by a folding and a DCT. Each block of coefficients can thus be written in matrix form as

m=dct (F J x), (3.1)

wherem is the N ×1 vector of the resulting transform coefficients, F is the N×2Nfold-over matrix

F =

⎡

⎢⎢

⎢⎣

0 0 1 1 0 0 0 0 ã ã ã 0 0 ã ã ã 0 0

... . .. ... ... ... ... ... ... ...

0 1 0 0 1 0 0 0 ã ã ã 0 0 ã ã ã 0 0

1 0 ã ã ã 0 0 ã ã ã 0 1 0 0 ã ã ã 0 0 ã ã ã 0 0

0 0 ã ã ã 0 0 ã ã ã 0 0 1 0 ã ã ã 0 0 ã ã ã 0 −1

0 0 ã ã ã 0 0 ã ã ã 0 0 0 1 ã ã ã 0 0 ã ã ã −1 0

... ... ... ... ... ... ... . .. ... ...

0 0 ã ã ã 0 0 ã ã ã 0 0 0 0 1 −1 0 0

⎤

⎥⎥

⎥⎦ (3.2)

andJ is a scaling matrix, that is, anN×Ndiagonal matrix with the windowing coefficients

Jij= sin

π/2N (i+0.5)

, ifi=j,

0, otherwise. (3.3)

Furthermore, we will often need to refer to the signal before the DCT is applied.

Let’s call thatz. So, we write

z=F J x. (3.4)

Note that in (3.2) the nonzero elements of the folding matrix form two nonover- lapping subblocks. In other words, we can decompose F in four submatrices, where two of them are zero matrices:

F =

F1 0 0 F2

. (3.5)

Similarly, we write J=

J1 0 0 J2

, x=

x1 x2

, z= z1

, and y= y1

. (3.6)

Looking at the block diagonal structure ofF andJ, we can easily see that only the first half of the samples ofx is used in computing the first half of the folded vectorF J x(and similarly for the second half). That is, we can write

z1=F1J1x1. (3.7)

Therefore, if the next block of coefficients (which would also be using the second half of samples ofx) is lost, we can use this partial knowledge about the samples to try to estimatex2.

More specifically, suppose an isolated block is lost (i.e., both the preceding and the subsequent blocks to the missing block of coefficients are correctly received).

The missing (incomplete) set of samples is 2N long. By computing the inverse DCT of the received data (but before applying the unfolding matrix), we have access toy. We can therefore write the following equation, applying to the first incomplete N samples:

z2=F2J2x2. (3.8)

Note that thex1 andx2 in (3.7) and (3.8) refer to different blocks. To avoid confusion, we will now add a time index to our notation. Namely to represent blocks at different time instants we will add a superscript index, indicating the block ordering. For example,xnwill mean the vectorxand time instantn.

Assume the block at timenis missing, but both the previous and the subsequent blocks are correctly received. So, since blocknis missing, but we have received blocksn−1 andn+1, we can write

z2n−1 z1n+1

F2 0 0 F1

J2 0 0 J1

x1n x2n

. (3.9)

Note that the matrices containingF1,F2,J1, andJ2are now rotated in relation to the originalFandJmatrices. For simplicity, let’s refer to these modified (block rotated) matrices in the aforementioned equation asGandH. We therefore write

zn2−1 zn2+1

=GH xn. (3.10)

Note that this is an underdetermined system of equations. We knowzn2−1and z2n+1, and we are trying to estimate the 2Nsamples ofxn. This underdetermined system could be solved for the minimum energy vector xn using the Moore–

Penrose generalized inverse of GH. This would provide the minimum energy signal segmentx that satisfies the received (partial) information. Nevertheless, simulations show that this is not a good choice forx, as the nature of the matrix Jtends to concentrate the energy in the higher gain samples. A better choice is to find the solution minimizing the energy of the windowed signalH x. This solution does distribute more evenly the energy across the samples ofx. Nevertheless, it still does not use the information about the neighboring frames. Before proceeding to describe the best mode, let us introduce a small change in interpretation. Let us introduce an identity matrixI in (3.10), which becomes

zn2−1 zn2+1

=GH I xn. (3.11)

We now interpretInot as a simple identity matrix, but as a matrix whose columns form a basis for the space of x. In this context, the basis I consists simply of impulses at each sample location. Using the generalized inverse ofGHwould be minimizing the energy of the basis representation over these impulses. That takes into account the partial information about the missing samples, but it does not take into account all the prior information we have about the missing segment:

the properly received signal segments just before (and possibly after) the missing segment. To fully exploit that information, we will reshape the aforementioned equation by introducing two small modifications. The first modification improves the signal continuity across frames by removing the no-excitation response. The

second biases the reconstructed signal toward having the same spectrum and pitch as the neighboring segments.

To account for the signal continuity, we estimate the LPC filter corresponding to the previous block and compute the no-excitation response of the LPC filter into the missing segment,x. We then modify (3.11) to account forˇ xˇand write

zn2−1 zn2+1

−GHxˇ=GH Ixˆn, (3.12) wherexˆ=x− ˇx.

To account for the spectral continuity, we invoke our interpretation ofI as a basis for the vectorx (nowxˆ) to claim we should not be minimizing the energy of x. Instead, we should be minimizing the energy of the representation of x under a basis whose functions have a spectrum corresponding to the desired LPC spectrum. To that end, we apply the LPC filter to the identity matrix, to obtain a new basisL, where each column ofLcorresponds to a time-shifted version of the impulse response of the LPC filter.

Finally, we compute an estimate of the periodicity and pitch period for the segment and apply that to the basis functions as well. Each column ofLis now a series of “colored” pulses, each apart by the pitch period, each with the impulse response of the LPC filter, and each with decreasing amplitude, based on the esti- mated periodicity index. For simplicity, we still call this final basis matrixL. The representation on this new basis is notxany more, so let’s call itr. We now have

zn2−1 zn2+1

−GHxˇ=GH Lrn, (3.13) which is then solved by the pseudo inverse ofGH L, that is,

rn=(GH L)†

zn2−1 zn2+1

−GHxˇ

, (3.14)

where † denotes the pseudo inverse. Note that this is the solution that minimizes the LPC residual ofx, as we wanted. The final solution forxis obtained by simply computing

xn=Lrn+ ˇx. (3.15)

Figure 3.5 shows a sample of the results obtained by the concealment algorithm. The first signal is the original, the second is the signal reconstructed using the proposed technique, and the third is the results of concealment by a pitch replication method. In both cases every third packet is lost.

FIGURE 3.5: Sample results. (a) Original signal. (b) Concealed using the partial information method, after losing every third frame. (c) Con- cealed using the pitch replication method.

In this section, we presented an error concealment technique that exploits the partial information available for the missing segment of a signal encoded by an overlapped transform. The discussion was centered around a speech codec, simply because speech is of foremost importance for real-time communication. Nev- ertheless, the same principle can be applied to other overlapped transform codecs.

In particular, the same ideas apply to error concealment in music, as long as we remove the conditions relating to pitch and introduce a higher order model to account for the harmonic nature of music.

LOSS CONCEALMENT FOR LAPPED TRANSFORM CODECS

LOSS CONCEALMENT FOR WAVEFORM SPEECH CODECS

FORWARD ERROR CORRECTION TECHNIQUES FOR SPEECH