Video processing involves many codecs and acronyms. Read on to find out what they all mean!
4:2:0, 4:2:2, 4:4:4
The notations 4:4:4, 4:2:2 and 4:2:0 refer to the most common chroma subsampling formats used in the distribution of digital images and video. Most processing and distribution of video is done in the YCrCb domain. The original RGB values of a pixel are transformed to separate the brightness of the pixel (its luminance or Y value) and its chrominance (Cr and Cb) into independent parameters. The human eye is more sensitive to changes in brightness than to those in color. This fact is exploited by subsampling the chrominance components to reduce the number of bits needed to represent a pixel.
A horizontal row of four pixels has 4 sets of Y, Cr and Cb values (one for each pixel), and is represented by the notation 4:4:4. If the chrominance values are then subsampled by two (the Cr and Cb values of each pair of horizontal pixels are replaced by the average of the two Cr and Cb values of the two pixels), this results in a 4:2:2 format. All pixels will still have a Y component, but now every two horizontal pixels share a single Cr and Cb component. If the process is further continued for a block of 2×2 pixels, such that the four pixels share one Cr and one Cb component, the format is denoted by 4:2:0.
Each pixel in a digital image is composed of three color components: Red, Green and Blue. In 8-bit images, each of those color components is represented by an 8-bit binary number, giving it a range of values between 0 and 255. This is the most common bit-depth for content that is distributed to consumers. It achieves a good balance between video quality, bandwidth consumption and use of CPU and memory resources. For HDR and WCG content, introduced in the mid-2010s, 8 bits per color component is no longer adequate to represent the much wider range of brightness levels and colors enabled by these technologies. Hence, 10 bits per component are used, enabling a range of values from 0 to 1024. Note that professional uses (such as archiving and editing in studios) are typically done at greater bit-depths such as 12 or even 16 bits per color component.
AV1 is the name of the first video coding standard developed by the Alliance for Open Media (AOM). It was published in 2015. The AOM was formed with the goal of creating a video coding standard that was royalty-free, while also providing video compression that was as good as the best available standards from the ITU and MPEG. AV1 was the result of that endeavor.
AV1 achieves high compression efficiency (i.e. very low bitrates while maintaining high video quality), but does so at the expense of high encoding complexity. For this reason, AV1 commercial deployments have so far been limited to VOD applications, with no major live applications or services.
AV2 is the name of the second video coding standard by the AOM. It is currently under development and will probably be finalized in 2024 or 2025. It is expected to provide substantial (40-50%) improvement over AV1, while remaining a royalty-free encoder.
Content Aware Encoding
NGA stands for Next Generation Audio. It refers to new audio formats and technologies that enable audio eThe term Content Aware Encoding (CAE) refers to situations where the encoder will dynamically adjust its encoding parameters based on the actual video content – as opposed to statically fixing them to a one-size-fits-all setting. A high-action sports game, or an action-packed movie with car chases and explosions, will require more bits to encode than a talk-show or a presidential debate because there is much more change from one video frame to the next. Even within the same program, some scenes are more complicated to encode than others. CAE allows encoders to optimize bandwidth usage – both in traditional TV broadcasts and OTT streaming. By adapting to the content, the minimum number of bits to achieve a certain video quality threshold is used in each scene. This also results in a more uniform video quality throughout the program and across channels and different VOD assets.
The Essential Video Coding (EVC) standard, also known as MPEG-5 Part 1, was developed by MPEG and published in 2020. The development of EVC was prompted by that of the AV1 codec by the AOM, which is royalty-free. EVC comes in two profiles (collection of encoding tools). The Baseline profile is royalty-free and its performance in terms of compression efficiency lies somewhere in between those of AVC and HEVC. The Main profile has minimal royalties on certain tools (each of which can be turned or off independently) and its performance in terms of compression efficiency lies somewhere in between those of HEVC and VVC. The EVC standard could be attractive in certain applications due to the balanced trade-offs that it achieves among coding efficiency, complexity, and royalty costs. To date there are no commercial applications using EVC.
Film Grain Synthesis (FGS)
For more than 150 years, images (and later video) were captured on film. Due to the nature of that medium, photographs and movies exhibit visual artifacts that appear like ‘grains’ to humans. The nature of these film grains varies depending on the chemical composition of the type of film used.
Because of its random nature, film grain is very hard to compress using modern video coding standards. Film grain is therefore often filtered out prior to encoding. Filtering out film grain, however, does not sit well with content creators who – after more than a century of experience with various types of film – consider film grain to be part of their artistic intent. To satisfy both requirements, film grain is first analyzed, then filtered out prior to encoding. Metadata that describes the analyzed film grain is sent along with the compressed bitstream to the decoder. The decoder will then synthesize the film grain using the metadata and superimpose it on top of the decoded (and filtered) image. This process is known as Film Grain Synthesis (FGS).
The Advanced Video Codec (AVC), also known as H.264, is a video coding standard that was jointly developed by the ITU and MPEG. The first version, targeted for consumer use, was published in 2003. Enhancements, targeted at professional applications, were published in 2005. AVC has been one of the most commercially successful video coding standards. It is the dominant format used in streaming applications over the Internet (so-called OTT). It is also the dominant format used in HDTV broadcasts – over-the-air (OTA), satellite or cable TV. Several newer video coding standards have been developed, which offer significantly better compression efficiency than AVC, albeit at the expense of higher encoder complexity. With the advances in chip technology, these newer standards are expected to gradually replace AVC in most use cases.
The High Efficiency Video Codec (HEVC), also known as H.265, is a video coding standard that was jointly developed by the ITU and MPEG as a successor to AVC/H.264. HEVC was published in 2013 and achieves almost two times better compression than AVC at the same video quality for high-definition content (HD or higher). It is currently the dominant format used for 4K content, which would otherwise be impractical to distribute to consumers. It is also the video coding format selected for the recent ATSC 3.0 standard.
The adoption of HEVC was hampered by the confusion surrounding Intellectual Property rights pertaining to the standard. Three separate patent pools were formed by the various IP rights owners, each with its own distinct licensing terms. This situation was the main reason for the formation of the Alliance for Open Media (AOM), which developed a competing and royalty-free standard, AV1, in 2015.
The Versatile Video Codec (VVC), also known as H.266, is the latest and most advanced video coding standard, and has been jointly developed by the ITU and MPEG. It was published in 2020 and achieves almost two times better compression than HEVC at the same video quality for high-resolution content. As with prior codecs, the improvement in compression efficiency comes at the cost of more complex encoding (~10x that of HEVC) and decoding (~2-3x that of HEVC). There are currently no commercial services based on VVC, due in part to the lack of silicon-based implementations and the effort required to develop accelerated encoding algorithms. As hardware implementations become available, VVC is expected to become the format of choice for 8K and 360 video and AR/VR content.
JPEG 2000, known colloquially as J2K, is a still image coding standard developed by the Joint Photographic Experts Group (JPEG) at the turn of the century – hence its name. It was meant as a successor to the original JPEG format developed in the late 1980s. It is based on wavelet-transform techniques (as opposed to the block-based DCT transform techniques of JPEG). JPEG 2000 never managed to completely displace JPEG, which still dominates in websites and consumer photography. JPEG 2000 has gained traction in professional applications, particularly in video production head-ends. It provides significant reduction in bandwidth compared to uncompressed video, yet each frame can be independently edited (or added or inserted into a stream), and the encoding latency is significantly lower than video compression standards such as AVC or HEVC.
The Low Complexity Enhancement Video Coding (LC-EVC) standard, also known as MPEG-5 Part 2, was developed by MPEG and published in 2020. It is based on the Perseus proprietary codec developed by V-Nova. It is a scalable codec that consists of a base layer – which is encoded using an existing standard such as AVC or HEVC – and an enhancement layer. The enhancement layer is added to the base layer to produce a higher resolution image (e.g. HD → 4K). This is useful in applications which need to serve both legacy devices that use older codecs or handle lower resolutions, and newer devices that support the enhancement layer and higher resolutions. LC-EVC was selected as a supported format in the recently announced TV 3.0 standard for broadcast TV in Brazil.
MPEG-2 is an ‘umbrella standard’ developed by the Motion Pictures Expert Group (MPEG) in the mid-1990s. It encompasses several sub-parts such as the MPEG-2 Transport Stream for carriage of audio, video, and data (MPEG-2 Systems), a standard for encoding video (MPEG-2 Video), and a new Advanced Audio Coding (AAC) standard for encoding audio. The primary objective of the standard was to provide new digital encoding tools for the first digital TV standards such as ATSC 1.0 for over-the-air broadcast (1996). It was rapidly adopted by both cable TV and satellite TV operators around the world. The core MPEG-2 technologies were also adopted as a basis for storing content on DVDs. MPEG-2 has been a huge commercial success. While it has since been overtaken by newer standards such as AVC and HEVC, there are still hundreds of MPEG-2 SD channels around the world servicing millions of legacy devices. Many operators have had to postpone plans to phase out MPEG-2 broadcasts due to the reluctance of many consumers to upgrade their legacy devices. The trend away from traditional Pay TV services and toward OTT streaming may finally bring the curtain down on the use of MPEG-2.