Direct Mode Coding For Bi-Predictive Pictures in the H.264 Standard
The new H.264 (MPEG-4 AVC) video coding standard can achieve considerably higher coding efficiency
compared to previous standards. This is accomplished mainly due to the consideration of variable block sizes for
motion compensation, multiple reference frames, intra prediction, but also due to better exploitation of the
spatiotemporal correlation that may exist between adjacent Macroblocks, with the SKIP mode in Predictive (P)
slices and the two DIRECT modes in Bi-predictive (B) slices. These modes, when signaled, could in effect represent
the motion of a macroblock or block without having to transmit any additional motion information required by other
Inter macroblock types. This property also allows these modes to be highly compressible especially due to the
consideration of Run Length Coding strategies. Although for SKIP mode spatial correlation of motion vectors from
adjacent macroblocks is used to predict its motion parameters, until recently DIRECT Mode considered only
temporal correlation of adjacent pictures. In this paper we introduce alternative methods for the generation of the
motion information for the DIRECT Mode using spatial or combined spatiotemporal correlation. Considering that
temporal correlation requires that the motion and timestamp information from previous pictures are available in both
the encoder and decoder, it is shown that our spatial-only method can reduce or eliminate such requirements while,
at the same time, achieving similar performance. The combined methods on the other hand, by jointly exploiting
spatial and temporal correlation either at the macroblock or slice/picture level, can achieve even higher coding
efficiency. Finally, improvements on the existing Rate Distortion Optimization related to B slices within the H.264
codec are also presented, which can lead to improvements of up to 16% in bitrate reduction or, equivalently, more
than 0.7dB in PSNR.
0