We are now ready to define more precisely what we consider as bandwidth adap- tation mechanisms: These are techniques that enable the rate of a media stream to
be modified during a playback session (i.e., while a user is connected and receiv- ing content for playback) in order to accommodate changes in the network (e.g., changes in available bandwidth, congestion, and packet losses).
In order to provide a rough classification of bandwidth adaptation architectures, note that defining a specific mechanism requires choosing:
• Adaptation points, that is, the locations in the network where the bit stream is adapted to match specific bandwidth requirements. For example, adapta- tion could take place at the sender, at a proxy, or even at the client applica- tion.
• Decision agents, that is, the component within the system where decisions about transmission rate changes are made. This decision could be made at the sender, a proxy, or the client, based on whatever information is available at that point in the network.
• Coding techniques, that is, the source coding techniques designed to facili- tate bandwidth adaptation. Note that not every technique is appropriate for a certain combination of adaptation point and decision agent. These tech- niques are discussed in Section 4.4.
It is important to note that, in general, bandwidth adaptation decisions need not be made at the same point in the network where the adaptation itself takes place.
A concrete example of this situation is client-driven techniques where each client evaluates the status and parameters of its own transmission link and requests to the sender changes to the streaming parameters; in this case bandwidth adaptation decisions are made by clients and put in place by the sender.
In general, the choice of adaptation point and decision agent for a particular system depends on what information is available to each component of the system (client, sender, or proxy if there is any), on available computational resources, and on the characteristics of the bit stream.
4.3.1 Trade-Offs
Before discussing specific architectures in detail it is useful to understand how operating at client, server, or proxy leads to different trade-offs.
First, note that adaptation decisions should be based on available information about (i) the state of the network (e.g., bandwidth availability) and (ii) the relative importance of information encoded in the media stream (e.g., how much degrada- tion will result from dropping one of the layers in a scalable representation, or in general the rate–distortion characteristics of different parts of the stream).
Figure 4.2 illustrates that source-related information is likely to be known more accurately at the sender (which can analyze media as it encodes or extracts rel- evant information from an existing stream) than at the client (which must rely on information provided to it by the server). Similarly, more efficient adaptation
Source information
Channel information
Client Proxy Server
high
low Information
accuracy
Decision agent
FIGURE 4.2: Trade-off between the accuracy of source information and channel information available at various network locations.
decisions are possible when information about the state of the network is timely and accurate. Ideally, information should be available about the channel behavior observed by the client. Thus the client has access to more accurate information in a more timely way, as it observes packet arrival events. Figure 4.2 also illustrates that since the most accurate information is not available in a single place, some algorithms will entail information exchange between client and sender. Exam- ples include where the client sends packet status feedback to the sender or where the sender provides the client with information about the source, such as an “RD preamble” [40].
Second, two major factors affect the performance of a bandwidth adaptation algorithm for a single client, namely (i) the granularity with which bandwidth can be adapted and (ii) the speed with which changes can be made to react to variations in network behavior.
Figure 4.3 illustrates that when actual adaptation (i.e., change in the rate at which data is sent to the client) is performed at the server, finer granularity can be achieved. Conversely, when adaptation takes place at the server the reaction time may be longer because packets resulting from adaptation will take longer to arrive at the client.
Third, it is often important to consider system-level trade-offs. Not only how a particular client’s quality is affected by bandwidth adaptation, but rather how adaptation affects overall network performance. Figure 4.4 illustrates how system scalability and overall network utilization are affected by choices made in the bandwidth adaptation mechanism. If decisions on how to change the bandwidth, and even adaptation itself, are performed close to the client, the system will be easier to scale, since more of the computation cost will be borne by the clients.
However, if bandwidth adaptation is performed close to the clients this will be to
Media quality (Adaptation granularity)
Reaction speed
Client Proxy Server
high
low Adaptation
criteria
Adaptation point
FIGURE 4.3: Comparison on bandwidth adaptation flexibility and re- action time to serve a single client.
Network utilization
Service scalability
Client Proxy Server
high
low System-level performance
Decision agent / Adaptation point
FIGURE 4.4: Comparison on service scalability and overall network utilization when serving multiple clients.
the detriment of overall network utilization, since data rate reductions will only reduce utilization close to the client.
4.3.2 Where Should the Adaptation Points Be?
As introduced earlier, an adaptation point is the system component where the bandwidth of the stream is physically changed. Each possible choice of location for the adaptation points has different advantages in terms of various performance metrics of interest.
4.3.2.1 Sender
The sender has the most flexibility in terms of compression format, since it can adjust the coding parameters in real time (in case live encoding to single users is performed), and can switch between several simultaneously produced streams (simulcast), etc. Moreover, the sender is typically least constrained in terms of storage and processing. Generally, then, adaptation at the sender provides the most flexibility from a source coding perspective. In practice, this means that when the sender performs bandwidth adaptation, this makes it possible to achieve finer grain bandwidth adaptation will be possible, as shown in Figure 4.3, with least penalty in terms of quality at the receiver.
There are several drawbacks in server-driven bandwidth adaptation. The sender is furthest away from the client; thus when congestion occurs in the network there may be a delay before the bandwidth adaptation can take effect (see Figure 4.3).
Moreover, depending on where network information is being captured, this infor- mation may be unreliable. If bandwidth changes are requested by the client (see Section 4.3.3), and are thus based on more reliable information about the state of the network, letting the adaptation happen at the server means that the delay in reacting can be significant, which can reduce the effectiveness of bandwidth adaptation. If the sender itself is estimating the network state, it will be able to adapt faster, but may not have sufficiently accurate information about the network to be effective.
Adaptation at the server also presents problems in terms of scalability in cases where data is being broadcast to multiple clients. First, each server may be limited in the number of clients it can provide content to simultaneously, in particular if compression or bandwidth adaptation is computationally expensive. Second, the server may have to create separate versions of the same content for clients with different Internet access bandwidths, for example, one for 56K modem connec- tions, and another for DSL, etc. This will in turn create a heavy traffic load in the local network around the server, which may also have a negative impact on other content being served.
Physical adaptation is closely related to the coding techniques applied in a par- ticular application. Since the sender can access the source more flexibly, a number of adaptation techniques have been proposed. Such techniques include source rate control (i.e., adjusting coding parameters during the encoding process [29,76]), rate–distortion optimized packet scheduling [16,48], and switching between dif- ferent bit streams or layers [19,67].
4.3.2.2 Client
Bandwidth adaptation at the client essentially means that the client does not de- code all the content it receives. This would be beneficial only in terms of low-
ering the complexity of decoding or avoiding decoding of lower priority data that is likely to be corrupted. This type of adaptation in general requires a cod- ing format that supports complexity scalability. The reconstructed quality is re- lated to the complexity of the decoder used. For example, van der Schaar and de With [70] proposed to reduce the memory costs of an MPEG-2 decoder by re-compressing the I- and P-reference pictures prior to motion-compensated re- construction. Transform coding and motion estimation algorithms with complex- ity scalability have also been studied [35,36,55]. In addition to the complexity- scalable modifications of existing decoders, recent research has also attempted to model the complexity based on the compressed source characteristics and the de- coding platform capabilities [69]. Clearly, such a system would have no impact on the traffic being carried by the network, and thus would not contribute to reduced congestion.
4.3.2.3 Proxy
Proxies are a good compromise between server and client adaptation. A proxy is responsible for a smaller number of clients than a server, which improves scala- bility and traffic balancing, and is also closer to the clients so it can respond faster to changes that affect the client. Most often, the source information at the proxy is stored as a pre-encoded stream received from the original media server, and thus transcoding is widely employed for adaptation at this point. For example, Shen et al. [59] have proposed a transcoding-enabled proxy caching system to provide different appropriate video quality to different network environments.
4.3.3 Sender-, Client-, and Proxy-Driven Adaptation
Note that there are many situations where the changes in source coding rate are implemented at one point in the network, based on decisions made somewhere else. A particular case of interest is that where the client makes decisions about data to be transmitted and submits these to the server.
4.3.3.1 Client-Driven Decisions
Information about the status of decoded data is best when bandwidth adaptation decisions are made by the client, in particular when these decisions are based on accurate and fine grain information, for example, arrival or not arrival of indi- vidual packets. The client-driven approach can also help reduce the processing complexity at the server side, thus allowing the server to support more clients simultaneously.
Examples of this method include the Adaptive Stream Management (ASM) process of the SureStream technology used in RealSystem 8 [19]. Two major
components involved in this process are a compressed media file, which con- tains multiple independently encoded streams of a given source, and an ASM rule book, which describes various forms of channel adaptation that involve selecting combinations of encoded streams as a function of the channel status (including bandwidth, packet loss, and loss effect on the reconstructed signal). The ASM rule book is sent to the client at the beginning of a session. During transmission, the client monitors the rate and loss statistics of arriving packets, and then in- structs the server to subscribe to a rule, or combination of rules, to match the current channel behavior. Another example is that of receiver-driven adaptation in the context of multicast delivery [15,46].
A drawback is that, while the client makes the decisions, these need to be im- plemented in either a server or a proxy, as in the example given earlier. This is because bandwidth adaptation at the client can only help in reducing the com- plexity of decoding. Thus there will be some latency before the changes in band- width can be implemented. Another potential drawback is that some clients, such as low-power hand-held devices, may not have sufficient computation power to implement a complex decision process.
4.3.3.2 Proxy-Driven Decisions
In this type of system, proxies can estimate the state of the network (or get this information from the client) and then decide on appropriate changes to the band- width to be used by the media stream. For example, a proxy can select certain packets to be forwarded to the client, change transcoding parameters, or send in- structions to the server so that the server can modify the information it transmits.
Chakareski et al. [10] have proposed a rate–distortion optimized framework in the scenario of proxy-driven streaming. At any given time, the proxy determines which packets should be requested from the media server and which should be retransmitted directly from the proxy to the client in order to meet rate constraints on the last hop while minimizing the average end-to-end distortion. Approaches that have investigated the role of proxies in terms of both streaming and caching also include [50]. The proxy, usually located at the edge of a backbone network, coordinates the communication between the source server and the client, and can potentially achieve better bandwidth usage than a client- or server-driven system.
4.3.3.3 Server-Driven Decisions
Finally, in this scenario estimates of network state are provided to the server, which decides on data to be sent to each client. Feedback is often required for this approach. The server-based approach has the most information about the source (e.g., about possible rate–distortion operating points) and thus can work with a more flexible and efficient adaptation algorithm in terms of source coding. In
addition, the server can regulate connections with different clients as a whole to improve overall bandwidth utilization. The main disadvantage of this approach is that the server may not have reliable or timely information about the state of the network near the client.
As an example, the work of Hsu et al. [29] performs source rate control by assigning quantizers to each of the video blocks under the rate constraints at the encoder, where the available channel rate is estimated by incorporating the chan- nel information provided by the feedback channel and a priori channel model.
Related work [30] shows that source rate control algorithms can also be applied for various types of network-related rate constraints. Intelligent transport mecha- nisms, such as optimal packet scheduling for a scalable multimedia representation [16,48], can also be performed at the server.
4.3.4 Criteria and Constraints
This section provides an overview of different criteria that can be applied to se- lect a bandwidth adaptation mechanism for a given application. We emphasize that this is, by necessity, a qualitative discussion. Many of the techniques that are mentioned in this chapter have only been proposed in a research context and have not been fully tested in a more realistic network environment. Moreover, a quan- titative comparison of the various methods is likely to be very complex, as should be clear given the number of criteria to be considered in general.
4.3.4.1 Media Quality
Clearly, the ultimate criterion to evaluate the performance of a bandwidth adap- tation mechanism should be the resulting subjective media quality at the receiver in the presence of typical bandwidth variations. Some progress has been made in devising objective metrics that can capture the perceptual quality of media under various compression strategies [34,37,68]. These objective metrics are most ad- vanced for the analysis of audio sources, somewhat less so for video applications.
Approaches that can compare meaningfully different methods in the presence of variations in the network behavior (e.g., bandwidth fluctuations, packet losses) are not that readily available.
Service interruptions, such as those that might occur if no bandwidth adaptation is used, are obviously undesirable, and so one could, for example, compare differ- ent techniques in terms of their outage probability (the probability that perceptual quality over a given period of time drops below acceptable levels). A comparison would still be challenging: for example, an end user may deem two configurations with different, but nonnegligible, outage probability to be equally unacceptable.
Quality evaluation is also more complicated once a bandwidth adaptation mechanism is put into place because these mechanisms are dynamic in nature.
Thus, they operate only when the bandwidth falls below certain levels and lead to changes in the media quality (e.g., in the context of video, variations in frame rate, frame resolution, frame quality). In this situation, it is unclear whether users will base their quality assessment on the perceived “average” quality, the worst case quality level, the duration of the worst quality, etc.
Many currently deployed practical media streaming systems generally select one of multiple streams, that is, the one whose bandwidth best matches the band- width available to the end user; in many cases no adaptation is possible within a stream. Thus system designers only have a limited amount of real-life experi- ence with bandwidth adaptation mechanisms. It also follows from this that the impact of various such mechanisms on perceptual media quality is not as well understood.
In summary, while progress has been made toward understanding subjective quality metrics for various types of media, challenges remain in addressing situ- ations where quality adaptations (not to mention information losses) take place.
For this reason, and also to facilitate bandwidth adaptation mechanisms, objective quality metrics, such as peak signal-to-noise ratio (PSNR), are often used. For example, authors have proposed optimizing average PSNR (e.g., [29]) or min- imizing the loss in PSNR introduced by bandwidth adaptation, with respect to the PSNR achieved when the media stream transmitted at a given target bit rate (e.g., [16]).
4.3.4.2 End-to-End Delay, Reaction Time, and Latency
As discussed earlier, a longer end-to-end delay facilitates preserving a consistent quality level in the face of bandwidth fluctuations. Roughly speaking, a longer end-to-end delay leads to more multimedia units (e.g., video frames) being stored in the decoder buffer so that the application can absorb short-term variations in bandwidth.
When the end-to-end delay is not long, the reaction time of the adaptation sys- tem to changes in bandwidth becomes important. The system has to detect rel- evant variations in network behavior and then trigger the necessary changes in the media stream so as to best match bandwidth availability. Ideally, this should happen sufficiently fast so that the end user does not suffer from negative conse- quences of mismatch between network availability and stream requirements.
Note that this leads to interesting design trade-offs in the context of the adapta- tion architectures discussed earlier. For example, a faster reaction may be possible if the sender makes adaptation decisions, but these may suffer from a somewhat worse knowledge of network status at the client.
Long end-to-end delay is a practical solution only for one-way transmission applications. For two-way communications, a long delay will limit the interac- tivity. Even in the case of one-way communications, excessive end-to-end delays