MPEG-4 FINE GRAIN SCALABLE (FGS) CODING AND ITS- 123docz.net

5.3.1 SNR FGS Structure in MPEG-4

The previously discussed conventional scalable coding schemes are not able to efficiently address the problem of easy, adaptive, and efficient adaptation to time-

varying network conditions or device characteristics. The reason for this is that they provide only coarse granularity rate adaptation and their coding efficiency often decreases due to the overhead associated with an increased number of layers.

To address this problem, FGS coding has been standardized in the MPEG-4 standard, as it is able to provide fine-grain scalability to easily adapt to various time-varying network and device resource (e.g., power) constraints [6,44]. More- over, FGS can enable a streaming server to perform minimal real-time processing and rate control when outputting a very large number of simultaneous unicast (on-demand) streams, as the resulting bit stream can be easily truncated to fulfill various (network) rate requirements. Also, FGS is easily adaptable to unpre- dictable bandwidth variations due to heterogeneous access technologies (Internet, wireless cellular or wireless LANs) or to dynamic changes in network conditions (e.g., congestion events). Moreover, FGS enables low-complexity decoding and low-memory requirements that provide common receivers (e.g., set top boxes and digital televisions), in addition to powerful computers, the opportunity to stream and decode any desired streamed video content. Hence, receiver-driven streaming solutions can only select the portions of the FGS bit stream that fulfill these constraints [40,45].

In MPEG-4 FGS, a video sequence is represented by two layers of bit streams with identical spatial resolution, which are referred to as the base layer bit stream and the fine granular enhancement layer bit stream, as illustrated in Figure 5.6.

FIGURE 5.6: MPEG-4 FGS encoder.

FIGURE 5.7: The structure of bit planes of Y, U, and V components.

The base layer bit stream is coded with nonscalable coding techniques, whereas the enhancement layer bit stream is generated by coding the difference between the original DCT coefficients and the reconstructed base layer coefficients using a bit-plane coding technique [1,6,7]. The residual signal is represented with bit planes in the DCT domain, where the number of bit planes is not fixed, but is based on the number of bit planes needed to represent the residual magnitude in binary format. Before a DCT residual picture is coded at the enhancement layer, the maximum number of bit planes of each color component (Y, U, and V) is first found. In general, three color components may have different numbers of bit planes. Figure 5.7 gives an example of 5 bit planes in Y component and 4 bit planes in U and V components. These three values are coded in the picture header of the enhancement layer stream and transmitted to the decoder.

All components have aligned themselves with the least significant bit (LSB) plane. The FGS encoder and decoder process bit planes from the most significant bit (MSB) plane to the LSB plane. Because of the possible different maximum numbers of bit planes on Y, U, and V components, the first MSB planes may contain only one or two components. In the example given by Figure 5.7, there is only Y component existing in the MSB plane. In this case, bits for the coded block pattern (CBP) of each macroblock can be reduced significantly. Every macroblock in a bit plane is coded with row scan order.

Since the enhancement layer bit stream can be truncated arbitrarily in any frame (see Figure 5.8), MPEG-4 FGS provides the capability of easily adapting to channel bandwidth variations.

5.3.2 MPEG-4 Hybrid Temporal–SNR Scalability with an All-FGS Structure As mentioned earlier, temporal scalability is an important tool for enhancing the motion smoothness of compressed video. Typically, a base layer stream coded with a frame rate fBL is enhanced by another layer consisting of video frames that do not coincide (temporally) with the base layer frames. Therefore, if the enhancement layer has a frame rate offEL, then the total frame of both base and enhancement layer streams isfBL+fEL.

FIGURE 5.8: An MPEG-4 FGS two-layer bit stream.

In the SNR FGS scalability structure described in the previous section, the frame rate of the transmitted video is locked to the frame rate of the base layer re- gardless of the available bandwidth and corresponding transmission bit rate. Since one of the design objectives of FGS is to cover a relatively wide range of bandwidth variation over IP networks (e.g., 100 kbps to 1 Mbps), it is quite desirable that the SNR enhancement tool of FGS be complemented with a temporal scalability tool. It is also desirable to develop a framework that provides the flexibility of choosing between temporal scalability (better motion smoothness) and SNR scalability (higher quality) at transmission time. This, for example, can be used in response to user preferences and/or real-time bandwidth variations at transmission time [44]. For typical streaming applications, both of these elements are not known at the time of encoding the content.

Consequently, the MPEG-4 framework for supporting hybrid temporal–SNR scalabilities building on the SNR FGS structure is described in detail in [44]. This framework provides a new level of abstraction between encoding and transmission processes by supporting both SNR and temporal scalabilities through a single enhancement layer. Figure 5.9 shows the hybrid scalability structure. In addition to the standard SNR FGS frames, this hybrid structure includes motion-compensated residual frames at the enhancement layer. These motion-compensated frames are referred to as FGS Temporal (FGST) pictures [44].

As shown in Figure 5.9, each FGST picture is predicted from base layer frames that do not coincide temporally with that FGST picture, and therefore, this leads

5.3:MPEG-4FINEGRAINSCALABLE(FGS)CODING131

FIGURE 5.9: FGS hybrid temporal–SNR scalability structure with (a) bidirectional and (b) forward prediction FGST pictures and (c) examples of SNR-only (top), temporal-only (middle), or both temporal and SNR (bottom) scalability.

FIGURE 5.10: Multilayer FGS–temporal scalability structure.

to the desired temporal scalability feature. Moreover, the FGST residual signal is coded using the same fine granular video coding method employed for compress- ing the standard SNR FGS frames.

Each FGST picture includes two types of information: (a) motion vectors (MVs) that are computed in reference to temporally adjacent base layer frames and (b) coded data representing the bit planes DCT signal of the motion-compensated FGST residual. The MVs can be computed using standard macroblock-based matching motion-estimation methods. Therefore, the motion- estimation and compensation functional blocks of the base layer can be used by the enhancement layer codec.

The FGST picture data is coded and transmitted using a data-partitioning strategy to provide added error resilience. Under this strategy, after the FGST frame header, all motion vectors are clustered and transmitted before the residual signal bit planes. The MV data can be transmitted in designated packets with greater protection. More details on hybrid SNR–temporal FGS can be obtained from [44].

Finally, these scalabilities can be further combined in a multilayer manner, and an example of this is shown in Figure 5.10.

5.3.3 Nonstandard FGS Variants

To improve the coding efficiency of FGS, various temporal prediction structures have been proposed. For example, in [8], an additional motion compensation loop is introduced into the enhancement layer using the reconstructed high-quality reference. Furthermore, an improved method is proposed in [9] where an estimation- theoretic framework is presented to obtain the prediction optimally considering both the reconstructed high-quality reference and the base layer information.

This optimization translates into consistent performance gains in compression efficiency at the enhancement layer. Nonetheless, the main disadvantage of such schemes is the high complexity due to the multiple motion estimation loops for the enhancement layer coding.

However, the FGS scheme can also benefit from temporal dependency at the FGS enhancement layer based on one prediction loop. Motion-Compensated FGS (MC-FGS) was first proposed to address this problem in [10]. A high-quality reference, generated from the enhancement layer, can be utilized in the motion compensation loop to get better prediction. However, in case there is a close-loop structure at the enhancement layer, it could induce drift errors when the enhancement layer cannot be guaranteed at the decoder side due to network bandwidth fluctuations. Several methods used to reduce the drift in the MC-FGS structure are discussed in [10].

To introduce temporal prediction into the FGS enhancement layer coding with- out severe drift errors, several alternative techniques have been proposed. Progres- sive FGS (PFGS) proposed in [12,13] explores a separate motion compensation loop for the FGS enhancement layer to improve the compression performance and provides means to eliminate the drift error as well. There are two key points in the PFGS coding. One is to use as many predictions from the enhancement reference layers as possible (for coding efficiency) instead of always using the base layer as in MPEG-4 FGS. The other point is to keep a prediction path from the base layer to the highest quality layer across several frames, for error recovery and channel adaptation. Such a prediction path enables lost or erroneous higher quality enhancement layers to be automatically reconstructed from lower layers gradually over a few frames. Thus, PFGS trades off coding efficiency for drift error reduction.

In [14], a robust FGS (RFGS) technique was presented by incorporating the ideas of leaky [10,15] and partial predictions to deal with the drift error. In RFGS, the high-quality reference used in the enhancement layer compensation loop is constructed by combining the reconstructed base layer image and part of the enhancement layer. A frame-based fading mechanism is introduced to cope with the mismatch error. Specifically, at each frame, a uniformly leaky factor between 0 and 1 is applied to the enhancement layer before adding to the base layer image to alleviate the error propagation. Moreover, an adaptive leaky prediction based on the RFGS is proposed in [16] where the leaky factor is determined for each bit plane of enhancement layer according to its significance and location to further improve the coding performance.

Furthermore, several techniques are proposed to achieve more flexible trade- off between drift errors and coding efficiency at the macroblock level rather than at the frame level. The macroblock-based PFGS (MPFGS) is one such scheme [17,18]. In MPFGS, three INTER modes, HPHR, LPLR, and HPLR, are proposed for the enhancement layer macroblock encoding (see Figure 5.11). In fact,

FIGURE 5.11: INTER modes for the enhancement macroblocks in MPFGS.

the HPHR mode is used to get high coding efficiency by using higher quality reference, while the HPLR mode is imposed to attenuate the drifting error by in- troducing the mismatch error into the encoding process. Assuming that the base layer is always available at the decoder, LPLR and HPLR modes help reset the drift errors potentially caused by the HPHR mode. A decision-making mechanism is presented in MPFGS to choose the optimal prediction mode for each enhancement layer macroblock by considering the error propagation effects and taking advantage of the HPLR mode to achieve a flexible trade-off between high coding efficiency and low drifting error. Another macroblock-based approach is presented in [19,20], called enhanced mode-adaptive FGS (EMFGS). Three pre- dictors, the reconstructed base layer macroblock, the reconstructed enhancement layer macroblock, and the average of the previous two, are proposed in EMFGS.

A uniformly fading factor, 0.5, is used to form the third predictor. Also, a mode- selection algorithm is provided to decide the encoding mode of the enhancement layer macroblock.

Another network-aware solution was presented in [45] to alleviate the FGS coding inefficiencies based on the available network conditions—video transcal- ing (TS), which can be viewed as a generalization of (nonscalable) transcoding.

With TS, a scalable video stream that covers a given bandwidth range is mapped into one or more scalable video streams covering different bandwidth ranges. The TS framework exploits the fact that the level of heterogeneity changes at different points of the video distribution tree over wireless and mobile Internet networks.

This provides the opportunity to improve the video quality by performing the appropriate TS process. An Internet/wireless network gateway represents a good

candidate for performing TS, thus improving the performance of FGS-based compression schemes.

MPEG-4 FINE GRAIN SCALABLE (FGS) CODING AND ITS

LOSS CONCEALMENT FOR WAVEFORM SPEECH CODECS

LOSS CONCEALMENT FOR LAPPED TRANSFORM CODECS