natively, without platform-specific plugins — the Ogg container format, Ogg video (called
“Theora”), and Ogg audio (called “Vorbis”). On the desktop, Ogg is supported out-of-the-
box by all major Linux distributions, and you can use it on Mac and Windows by
QuickTime components or
DirectShow filters, respectively. It is also
playable with the excellent
VLC on all platforms.
WebM is a new container format. It is technically similar to another format, called
Matroska. WebM was announced in May, 2010. It is designed to be used exclusively with
the VP8 video codec and Vorbis audio codec. (More on these in a minute.) It is supported
natively, without platform-specific plugins, in the latest versions of Chromium, Google
Chrome, Mozilla Firefox, and Opera. Adobe has also announced that a future version of
Flash will support WebM video.
Audio Video Interleave, usually with an
extension. The AVI container format was
invented by Microsoft in a simpler time, when the fact that computers could play video
at all was considered pretty amazing. It does not officially support features of more
recent container formats like embedded metadata. It does not even officially support
most of the modern video and audio codecs in use today. Over time, companies have
tried to extend it in generally incompatible ways to support this or that, and it is still
the default container format for popular encoders such as
When you talk about “watching a video,” you’re probably talking about a combination of one
video stream and one audio stream. But you don’t have two different files; you just have “the
video.” Maybe it’s an AVI file, or an MP4 file. These are
just container formats, like a ZIP file
that contains multiple kinds of files within it. The container format defines how to store the
video and audio streams in a single file.
When you “watch a video,” your video player is doing at least three things at once:
1. Interpreting the container format to find out which video and audio tracks are available,
and how they are stored within the file so that it can find the data it needs to decode
2. Decoding the video stream and displaying a series of images on the screen
3. Decoding the audio stream and sending the sound to your speakers
A video codec is an algorithm by which a video stream is encoded, i.e. it specifies how to do #2
above. (The word “codec” is a
portmanteau, a combination of the words “coder” and
“decoder.”) Your video player decodes the video stream according to the video codec, then
displays a series of images, or “frames,” on the screen. Most modern video codecs use all sorts
of tricks to minimize the amount of information required to display one frame after the next.
For example, instead of storing each individual frame (like a screenshot), they will only store
the differences between frames. Most videos don’t actually change all that much from one
frame to the next, so this allows for high compression rates, which results in smaller file
There are lossy and lossless video codecs. Lossless video is much too big to be useful on the
web, so I’ll concentrate on lossy codecs. A lossy video codec means that information is being
irretrievably lost during encoding. Like copying an audio cassette tape, you’re losing
information about the source video, and degrading the quality, every time you encode. Instead
of the “hiss” of an audio cassette, a re-re-re-encoded video may look blocky, especially during
scenes with a lot of motion. (Actually, this can happen even if you encode straight from the
original source, if you choose a poor video codec or pass it the wrong set of parameters.) On
the bright side, lossy video codecs can offer amazing compression rates by smoothing over
blockiness during playback, to make the loss less noticeable to the human eye.
tons of video codecs. The three most relevant codecs are
H.264 is also known as “MPEG-4 part 10,” a.k.a. “MPEG-4 AVC,” a.k.a. “MPEG-4 Advanced
Video Coding.” H.264 was also developed by
the MPEG group and standardized in 2003. It aims
to provide a single codec for low-bandwidth, low-CPU devices (cell phones); high-bandwidth,
high-CPU devices (modern desktop computers); and everything in between. To accomplish
this, the H.264 standard is split into “
profiles,” which each define a set of optional features
that trade complexity for file size. Higher profiles use more optional features, offer better
visual quality at smaller file sizes, take longer to encode, and require more CPU power to
decode in real-time.
To give you a rough idea of the range of profiles,
Apple’s iPhone supports Baseline profile,
AppleTV set-top box supports Baseline and Main profiles, and
Adobe Flash on a desktop
PC supports Baseline, Main, and High profiles. YouTube now uses H.264 to encode
definition videos, playable through Adobe Flash; YouTube also provides H.264-encoded video
to mobile devices, including Apple’s iPhone and phones running Google’s
operating system. Also, H.264 is one of the video codecs mandated by the Blu-Ray
specification; Blu-Ray discs that use it generally use the High profile.
Most non-PC devices that play H.264 video (including iPhones and standalone Blu-Ray players)
actually do the decoding on a dedicated chip, since their main CPUs are nowhere near
powerful enough to decode the video in real-time. These days, even low-end desktop graphics
cards support decoding H.264 in hardware. There are
competing H.264 encoders, including the
x264 library. The H.264 standard is patent-encumbered; licensing is brokered
MPEG LA group. H.264 video can be embedded in most popular
formats, including MP4 (used primarily by
Apple’s iTunes Store) and MKV (used primarily by
non-commercial video enthusiasts).
Theora evolved from the
VP3 codec and has subsequently been developed by the
Foundation. Theora is a royalty-free codec and is not encumbered by any known patents
other than the original VP3 patents, which have been licensed royalty-free. Although the
standard has been “frozen” since 2004, the Theora project (which includes an open source
reference encoder and decoder)
only released version 1.0 in November 2008 and
version 1.1 in
Theora video can be embedded in any container format, although it is most often seen in an
Ogg container. All major Linux distributions support Theora out-of-the-box, and Mozilla
includes native support for Theora video in an Ogg container. And by “native”, I
mean “available on all platforms without platform-specific plugins.” You can also play Theora
on Windows or
on Mac OS X after installing Xiph.org’s open source decoder software.
VP8 is another video codec from On2, the same company that originally developed VP3 (later
Theora). Technically, it produces output on par with H.264 High Profile, while maintaining a
low decoding complexity on par with H.264 Baseline.
In 2010, Google acquired On2 and published the video codec specification and a sample
encoder and decoder as open source. As part of this, Google also “opened” all the patents that
On2 had filed on VP8, by licensing them royalty-free. (This is the best you can hope for with
patents. You can’t actually “release” them or nullify them once they’ve been issued. To make
them open source–friendly, you license them royalty-free, and then anyone can use the
technologies the patents cover without paying anything or negotiating patent licenses.) As of
May 19, 2010, VP8 is a royalty-free, modern codec and is not encumbered by any known
patents, other than the patents that On2 (now Google) has already licensed royalty-free.
Unless you’re going to stick to films made before
1927 or so, you’re going to want an audio
track in your video. Like
video codecs, audio codecs are algorithms by which an audio stream
is encoded. Like video codecs, there are lossy and lossless audio codecs. And like lossless video,
lossless audio is really too big to put on the web. So I’ll concentrate on lossy audio codecs.
Actually, it’s even narrower than that, because there are different categories of lossy audio
codecs. Audio is used in places where video is not (telephony, for example), and there is an
entire category of
audio codecs optimized for encoding speech. You wouldn’t rip a music CD
with these codecs, because the result would sound like a 4-year-old singing into a
speakerphone. But you would use them in an
Asterisk PBX, because bandwidth is precious,
and these codecs can compress human speech into a fraction of the size of general-purpose
codecs. However, due to lack of support in both native browsers and third-party plugins,
speech-optimized audio codecs never really took off on the web. So I’ll concentrate on general
purpose lossy audio codecs.
As I mentioned earlier, when you “watch a video,” your computer is doing at least three
things at once:
1. Interpreting the container format
2. Decoding the video stream
3. Decoding the audio stream and sending the sound to your speakers
The audio codec specifies how to do #3 — decoding the audio stream and turning it into digital
waveforms that your speakers then turn into sound. As with video codecs, there are all sorts
of tricks to minimize the amount of information stored in the audio stream. And since we’re
talking about lossy audio codecs, information is being lost during the recording → encoding →
decoding → listening lifecycle. Different audio codecs throw away different things, but they
all have the same purpose: to trick your ears into not noticing the parts that are missing.
One concept that audio has that video does not is channels. We’re sending sound to your
speakers, right? Well, how many speakers do you have? If you’re sitting at your computer, you
may only have two: one on the left and one on the right. My desktop has three: left, right,
and one more on the floor. So-called “
surround sound” systems can have six or more speakers,
strategically placed around the room. Each speaker is fed a particular channel of the original
recording. The theory is that you can sit in the middle of the six speakers, literally
surrounded by six separate channels of sound, and your brain synthesizes them and feels like
you’re in the middle of the action. Does it work? A multi-billion-dollar industry seems to
Most general-purpose audio codecs can handle two channels of sound. During recording, the
sound is split into left and right channels; during encoding, both channels are stored in the
same audio stream; during decoding, both channels are decoded and each is sent to the
appropriate speaker. Some audio codecs can handle more than two channels, and they keep
track of which channel is which and so your player can send the right sound to the right
There are lots of audio codecs. Did I say there were lots of video codecs? Forget that. There
gobs and gobs of audio codecs, but on the web, there are really only three you need to
know about: MP3, AAC, and Vorbis.
MPEG-1 AUDIO LAYER 3
MPEG-1 Audio Layer 3 is colloquially known as “MP3.” If you haven’t heard of MP3s, I don’t
know what to do with you.
Walmart sells portable music players and calls them “MP3
players.” Walmart. Anyway…
MP3s can contain up to 2 channels of sound. They can be encoded at different bitrates: 64
kbps, 128 kbps, 192 kbps, and a variety of others from 32 to 320. Higher bitrates mean larger
file sizes and better quality audio, although the ratio of audio quality to bitrate is not linear.
(128 kbps sounds more than twice as good as 64 kbps, but 256 kbps doesn’t sound twice as
good as 128 kbps.) Furthermore, the MP3 format allows for variable bitrate encoding, which
means that some parts of the encoded stream are compressed more than others. For example,
silence between notes can be encoded at a low bitrate, then the bitrate can spike up a moment
later when multiple instruments start playing a complex chord. MP3s can also be encoded
with a constant bitrate, which, unsurprisingly, is called constant bitrate encoding.
The MP3 standard doesn’t define exactly how to encode MP3s (although it does define exactly
how to decode them); different encoders use different psychoacoustic models that produce
wildly different results, but are all decodable by the same players. The open source
project is the best free encoder, and arguably the best encoder period for all but the lowest
The MP3 format (standardized in 1991) is patent-encumbered, which explains why Linux can’t
play MP3 files out of the box. Pretty much every portable music player supports standalone
MP3 files, and MP3 audio streams can be embedded in any
video container. Adobe Flash can
play both standalone MP3 files and MP3 audio streams within an MP4 video container.
ADVANCED AUDIO CODING
Advanced Audio Coding is affectionately known as “AAC.” Standardized in 1997, it lurched
into prominence when Apple chose it as their default format for the iTunes Store. Originally,
all AAC files “bought” from the iTunes Store were encrypted with Apple’s proprietary DRM
FairPlay. Selected songs in the iTunes Store are now available as unprotected
AAC files, which Apple calls “iTunes Plus” because it sounds so much better than calling
everything else “iTunes Minus.” The AAC format is patent-encumbered;
licensing rates are
AAC was designed to provide better sound quality than MP3 at the same bitrate, and it can
encode audio at any bitrate. (MP3 is limited to a fixed number of bitrates, with an upper
bound of 320 kbps.) AAC can encode up to 48 channels of sound, although in practice no one
does that. The AAC format also differs from MP3 in defining multiple profiles, in much the
same way as
H.264, and for the same reasons. The “low-complexity” profile is designed to be
playable in real-time on devices with limited CPU power, while higher profiles offer better
sound quality at the same bitrate at the expense of slower encoding and decoding.
All current Apple products, including iPods, AppleTV, and QuickTime support certain profiles
of AAC in standalone audio files and in audio streams in an MP4 video container. Adobe
Flash supports all profiles of AAC in MP4, as do the open source MPlayer and VLC video
players. For encoding, the FAAC library is the open source option; support for it is a compile-
time option in mencoder and ffmpeg.
Vorbis is often called “Ogg Vorbis,” although this is technically incorrect. (“Ogg” is just
container format, and Vorbis audio streams can be embedded in other containers.) Vorbis is
not encumbered by any known patents and is therefore supported out-of-the-box by all major
Linux distributions and by portable devices running the open source
Mozilla Firefox 3.5 supports Vorbis audio files in an Ogg container, or Ogg videos with a
Vorbis audio track.
Android mobile phones can also play standalone Vorbis audio files. Vorbis
audio streams are usually embedded in an Ogg or WebM container, but they can also be
embedded in an MP4 or
MKV container (or, with some hacking,
in AVI). Vorbis supports an
arbitrary number of sound channels.
There are open source Vorbis encoders and decoders, including
aoTuV (encoder), and
libvorbis (decoder). There are also
QuickTime components for
Mac OS X and
DirectShow filters for Windows.
WHAT WORKS ON THE WEB
If your eyes haven’t glazed over yet, you’re doing better than most. As you can tell, video
(and audio) is a complicated subject — and this was the abridged version! I’m sure you’re
wondering how all of this relates to HTML5. Well, HTML5 includes a
embedding video into a web page. There are no restrictions on the video codec, audio codec,
or container format you can use for your video. One
element can link to multiple
video files, and the browser will choose the first video file it can actually play. It is up to you
to know which browsers support which containers and codecs.
As of this writing, this is the landscape of HTML5 video:
Mozilla Firefox (3.5 and later) supports Theora video and Vorbis audio in an Ogg
container. Firefox 4 also supports WebM.
Opera (10.5 and later) supports Theora video and Vorbis audio in an Ogg container.
Opera 10.60 also supports WebM.
Google Chrome (3.0 and later) supports Theora video and Vorbis audio in an Ogg
container. Google Chrome 6.0 also supports WebM.
Safari on Macs and Windows PCs (3.0 and later) will support anything that QuickTime
supports. In theory, you could require your users to install third-party QuickTime
plugins. In practice, few users are going to do that. So you’re left with the formats that
QuickTime supports “out of the box.” This is a long list, but it does not include WebM,
Theora, Vorbis, or the Ogg container. However, QuickTime does ship with support for
H.264 video (main profile) and AAC audio in an MP4 container.
Mobile phones like Apple’s iPhone and Google Android phones support H.264 video
(baseline profile) and AAC audio (“low complexity” profile) in an MP4 container.
Adobe Flash (220.127.116.11 and later) supports H.264 video (all profiles) and AAC audio (all
profiles) in an MP4 container.
Internet Explorer 9 supports all profiles of H.264 video and either AAC or MP3 audio in
an MP4 container. It will also play WebM video if you install a third-party codec, which
is not installed by default on any version of Windows. IE9 does not support other third-
party codecs (unlike Safari, which will play anything QuickTime can play).
Internet Explorer 8 has no HTML5 video support at all, but virtually all Internet Explorer
users will have the Adobe Flash plugin. Later in this chapter, I’ll show you how you can
use HTML5 video but gracefully fall back to Flash.
That might be easier to digest in table form.
VIDEO CODEC SUPPORT IN SHIPPING BROWSERS
CODECS/CONTAINER IE FIREFOX SAFARI CHROME OPERA IPHONE ANDROID
† Safari will play anything that QuickTime can play. QuickTime comes pre-installed with H.264/AAC/MP4 support. There are
installable third-party plugins that add support for Theora and WebM, but each user needs to install these plugins before Safari
will recognize those video formats.
‡ Google Chrome will
drop support for H.264 soon.
Read about why.
A year from now, the landscape will look significantly different as WebM is implemented in
multiple browsers, those browsers ship non-experimental WebM-enabled versions, and users
upgrade to those new versions.
VIDEO CODEC SUPPORT IN UPCOMING BROWSERS
IE FIREFOXSAFARI CHROME OPERA IPHONE ANDROID
* Internet Explorer 9 will only support WebM “
when the user has installed a VP8 codec,” which implies that Microsoft will not be
shipping the codec themselves.
† Safari will play anything that QuickTime can play, but QuickTime only comes with H.264/AAC/MP4 support pre-installed.
‡ Although Android 2.3 supports WebM, there are no hardware decoders yet, so battery life is a concern.
And now for the knockout punch:
PROFESSOR MARKUP SAYS
There is no single combination of containers and codecs that
works in all HTML5 browsers.
This is not likely to change in the near future.
To make your video watchable across all of these devices and
platforms, you’re going to need to encode your video more
For maximum compatibility, here’s what your video workflow will look like:
1. Make one version that uses WebM (VP8 + Vorbis).
2. Make another version that uses H.264 baseline video and AAC “low complexity” audio in
an MP4 container.
3. Make another version that uses Theora video and Vorbis audio in an Ogg container.
4. Link to all three video files from a single
element, and fall back to a Flash-based
LICENSING ISSUES WITH H.264
Before we continue, I need to point out that there is a cost to encoding your videos twice.
Well, there’s the obvious cost, that you have to encode your videos twice, and that takes more
computers and more time than just doing it once. But there’s another real cost associated with
H.264 video: licensing costs.
Remember when I first explained
H.264 video, and I mentioned offhand that the video codec
was patent-encumbered and licensing was brokered by the MPEG LA consortium. That turns
out to be kind of important. To understand why it’s important, I direct you to
MPEG LA splits the H.264 license portfolio into two sublicenses: one for
manufacturers of encoders or decoders and the other for distributors of content. …
The sublicense on the distribution side gets further split out to four key
subcategories, two of which (subscription and title-by-title purchase or paid use) are
tied to whether the end user pays directly for video services, and two of which
(“free” television and internet broadcast) are tied to remuneration from sources
other than the end viewer. …
The licensing fee for “free” television is based on one of two royalty options. The
first is a one-time payment of $2,500 per AVC transmission encoder, which covers
one AVC encoder “used by or on behalf of a Licensee in transmitting AVC video to
the End User,” who will decode and view it. If you’re wondering whether this is a
double charge, the answer is yes: A license fee has already been charged to the
encoder manufacturer, and the broadcaster will in turn pay one of the two royalty
The second licensing fee is an annual broadcast fee. … [T]he annual broadcast fee is
broken down by viewership sizes:
$2,500 per calendar year per broadcast markets of 100,000–499,999 television
$5,000 per calendar year per broadcast market of 500,000–999,999 television
$10,000 per calendar year per broadcast market of 1,000,000 or more television
… With all the issues around “free” television, why should someone involved in
nonbroadcast delivery care? As I mentioned before, the participation fees apply to
any delivery of content. After defining that “free” television meant more than just
[over-the-air], MPEG LA went on to define participation fees for internet
broadcasting as “AVC video that is delivered via the Worldwide Internet to an end
user for which the end user does not pay remuneration for the right to receive or
view.” In other words, any public broadcast, whether it is [over-the-air], cable,
satellite, or the internet, is subject to participation fees. …
The fees are potentially somewhat steeper for internet broadcasts, perhaps assuming
that internet delivery will grow much faster than OTA or “free” television via cable
or satellite. Adding the “free television” broadcast-market fee together with an
additional fee, MPEG LA grants a reprieve of sorts during the first license term,
which ends on Dec. 31, 2010, and notes that “after the first term the royalty shall
be no more than the economic equivalent of royalties payable during the same time
for free television.”
That last part — about the fee structure for internet broadcasts — has already been amended.
The MPEG-LA recently
announced that internet streaming would not be charged. That does
not mean that H.264 is royalty-free for all users. In particular, encoders (like the one that
processes video uploaded to YouTube) and decoders (like the one included in Microsoft
Internet Explorer 9) are still subject to licensing fees. See
Free as in smokescreen for more
ENCODING VIDEO WITH
MIRO VIDEO CONVERTER
There are many tools for encoding video, and there are many video encoding options that
affect video quality. If you do not wish to take the time to understand anything about video
encoding, this section is for you.
Miro Video Converter is an open source, GPL-licensed program for encoding video in multiple
Download it for Mac OS X or Windows. It supports all the output formats mentioned
in this chapter. It offers no options beyond choosing a video file and choosing an output
format. It can take virtually any video file as input, including DV video produced by
consumer-level camcorders. It produces reasonable quality output from most videos. Due to its
lack of options, if you are unhappy with the output, you have no recourse but to try another
To start, just launch the Miro Video Converter application.
Miro Video Converter main screen
Click “Choose file” and select the source video you want to encode.
The “Pick a Device or Video Format” dropdown menu lists a variety of devices and formats.
For the purposes of this chapter, we are only interested in three of them.
1. WebM (vp8) is WebM video (
VP8 video and
Vorbis audio in a WebM container).
2. Theora is
Theora video and
Vorbis audio in an Ogg container.
3. iPhone is
H.264 Baseline Profile video and
AAC low-complexity audio in an MP4
Select “WebM” first.
Choosing WebM output
Click the “Convert” button and Miro Video Converter will immediately start encoding your
video. The output file will be named
and will be saved in the same
directory as the source video.
You’ll be staring at this screen
for a long time
Once the encoding is complete, you’ll be dumped back to the main screen. This time, select
“Theora” from the Devices and Formats list.
Time for Theora
That’s it; press the “Convert” button again to encode your Theora video. The video will be
and will be saved in the same directory as the source video.
Time for a cup of coffee
Finally, encode your iPhone-compatible H.264 video by selecting “iPhone” from the Devices
and Formats list.
iPhone, not iPhone 4
Documents you may be interested
Documents you may be interested