RETURN TO CONTENTS PAGE
algorithms will speed the work of post-production,
reduce the mundane components and allow
skilled editors to concentrate on polishing and
improving the edit. In our company we call this
“Assisted Editing.” The computer is assisting in the
editing process in ways it hasn’t done previously.
market Non-Linear Editor around 20 years ago
with Avid’s introduction of Media Composer
&RPSRVHU©ZLWK©©ELW© ©JUD\©³FRORU´©RQ©D© ©[©
120 screen would have to have wondered what
the fuss was all about. Non-Linear certainly didn’t
seem to be set to replace the existing paradigm of
the day. Now, 20 years later, we routinely edit HD
on laptops with portable storage with amazing
software – as being that early indicator of the
potential: as Media Composer 1 was to Non-Linear
Editing First Cuts is to Assisted Editing.
Imagine dailies coming from a set that where
that were being marked “good²©LQ©WKH©½HOG©
Alternate takes could be carried in Multiclips,
much the way they are now with ScriptSync in
Media Composer 4. If we could combine the
transcription capability of Adobe’s CS4 apps (with
improved accuracy in the future) with Avid’s
take minutes and be done without human action.
With appropriately tagged material you can
already explore long-form documentary material
as fully edited stories (with B-roll and lower thirds)
available in the source. Right now much of that
metadata has to be entered manually, but imagine
a few years down the track when there’s
automatic transcription; keyword generation;
image recognition (at least of people); location
tracking via GPS, and it’s entirely reasonable
rough edited sequences. (First Cuts produces
technically competent stories but even I would
admit that they lack the soul and heart that a
true editor brings.)
In talking about how they manage metadata
Adobe refers to “adding intelligence to media,”
but they really mean “maintaining metadata”
because data is not intelligence. Metadata is an
input to intelligent algorithms that manipulate the
data in useful ways. It is the generation of those
algorithms that will see the greatest development
in the near future as most routine editing – and
pre-editing preparation – becomes automated
with new software tools.
In the rest of this article I want to explore how
metadata is tracked in the tools we use and to
explore the six distinct types of metadata I’ve
STORING AND MAINTAINING METADATA
AAF and MXF are about metadata more
One of the things that distinguished Avid’s Open
Media Framework – OMF – from AVI and
they started a process within SMPTE for a more
advanced format. The result was AAF – the
Advanced Authoring Format.
AAF is designed to comprehensively track all
videotape) is called “essence” to distinguish it
completely independent of the essence as long
All changes to, and use of, the essence is tracked
Despite wide support for the new standard within
the organizing body and its participating
members, AAF has not become the ubiquitous
exchange format it was designed to be.
However a spin-off – subset actually – of the AAF
metadata, technically known as the “Zero
Divergence Directive” – but that’s not important
right now. (While it sounds like an episode of The
Big Bang Theory, all it really means is that
manufacturers can’t deviate from the standard,
which is a good thing.) What is important is that it
gives the industry a standard format for media and
metadata together. Panasonic’s P2 Media (DVCPRO
MXF IS THE MEDIA-CENTRIC SUBSET OF AAF
RETURN TO CONTENTS PAGE
format to replace the less capable OMF format.
(Ironically, the OMF format has been much more
widely supported and popular since it’s been
RI½FLDOO\©±dead” than it was as an Avid media
entirely surprising that the latest versions of Media
source via the Avid Media Architecture – AMA.
Avid and Premiere Pro support project interchange
via AAF and Premiere Pro also works natively with
path to managing metadata throughout its
known as a “VLGHFDU«¿OH” because it resides in the
but a different extension (Remember my sealed
wallet with tape and log notes stored together?
This is the same idea in digital form.).
Sadly, the nice thing about standards is that there
we have QuickTime Metadata.
Apple added support for QuickTime Metadata in
my all-time favorite Final Cut Pro release: 5.1.2!
Yes, this dot dot release added an extraordinary
number of major new features for such an
undersold release: QT Metadata, Apple Event
support (used, for example, in PluralEyes) and
more. Core to this article is the introduction of
QuickTime Metadata, which allows Metadata to
Apple now retain all the source metadata from
when imported via Log and Transfer. This source
The only thing you can’t do with QuickTime
metadata is to view it inside Final Cut Pro! (That is,
without a tool like our miniME.) One has to
presume that Apple have future plans to expose
this metadata inside Final Cut Pro, but there have
been no announced plans.
EACH FORMAT HAS IT’S OWN QUICKTIME METADATA,
DISPLAYED HERE IN MINIME
There is an amazing amount of metadata stored
from P2 Media (DVCPRO HD, AVC-I), and much the
same from their AVCCAM media on SD card;
From the RED camera we are richly blessed with
yet. Panasonic have “slots” for Latitude and
Longitude in DVCPRO HD, AVC-I and AVCCAM
formats but only one high end camera records that
data – at this point.
You can explore this metadata in Log and Transfer
by right-clicking on the column headers and
selecting “Show All Columns.” After Log and
from Final Cut Pro and use miniME to export the
QuickTime metadata to an Excel spreadsheet
(free in the demo version).
FINAL CUT PRO IMPORTS AND STORES METADATA FROM CAMERAS
VIA LOG AND TRANSFER
So, all major media wrappers in professional video
use carry metadata, but what types of metadata
are there and how might they be used?
THE SIX TYPES OF METADATA
In May 2009 I experienced an iPhoto disaster. One
day I opened my iPhoto library and it was empty.
Gone. None of the repair routines worked at all.
The good news is that all my images were intact.
The bad news was that all the metadata not stored
in digital images was gone. I had been scanning in
thousands of slides, negatives and prints from my
– and my family’s – archive. All date, name, event,
place and comment notation was gone.
I had all my data but without the metadata it was
almost useless. Some data came from the source
when the images were digital: for those I had date
information, from which event information could
What I learnt from the incident was yet another
lesson on the value of metadata (and backing up!)
It also set me thinking about the types of
metadata I was entering. I guess 90 hours of
sorting thousands of images into some cohesive
form gave me some focus.
I’d already been exposed to the concept of
“Implicit and Explicit” metadata: these are common
terms when discussing metadata on the Internet.
Explicit metadata in that context is derived from
an action by the user that creates an immediately
How to C#: Basic SDK Concept of XDoc.PDF for .NET
s), and add, create, insert, delete, re-order, copy, paste, cut, rotate, and And PDF file text processing like text writing, extracting, searching Image Process. cut and paste pdf images; how to copy text from pdf image
RETURN TO CONTENTS PAGE
Rate a video on YouTube (you generate
Rate a song in your music player library (you
generate a metadata rating in your library);
Add a vote for a site on Digg (Vote count
Enter log notes for a clip (Logging metadata),
Then you’re generating Explicit metadata.
Implicit metadata is derived when you do
something that doesn’t seem like it’s generating
metadata, such as:
Watching a video on YouTube (view count
Buying a product on Amazon (sales and
Skipping past a song on a music player because
you don’t want to hear it (“like” metadata); or
Using a Clip in a Sequence (Clip usage metadata).
As I thought about how I needed to categorize my
iPhoto library and thinking about metadata at the
same time, I realized that, while implicit and
explicit were useful when examining how
metadata might be acquired, it really didn’t help
describe what type of metadata we have available.
Despite that, implicit metadata “indeed kicks
explicit’s *ss.” Explicit metadata takes work.
Explicit metadata requires observation and
analysis: stuff computers are good at that bores
humans interested in emotion and story.
One of the greatest thing about digital (a.k.a.
metadata from the cameras – right from the
source. Just how much – and exactly what
metadata – depends on the camera manufacturer.
There are more types of metadata than just what
the camera provides and what an editor or
assistant enters. Instead I think we’re better served
dividing metadata into six distinct types: Source,
Added, Derived, Inferred, Analytical and Transform.
outset by the camera or capture software. It is
usually immutable – you can’t change it (and you
VKRXOGQ´W ©:KHQ©\RX©XVH©'9©6WDUW 6WRS©GHWHFWLRQ©
or its equivalent on other platforms, you’re using
is sensing breaks in the time-of-day Timecode
track – that’s generally hidden – and adding
Markers when there is a discontinuity in that
Other examples of type of Source metadata:
Timecode and timebase;
GPS data (latitude and longitude);
Focal length, aperture, exposure; and
White balance setting.
INFORMATION ABOUT THE FORMAT IS SOURCE METADATA
By itself this metadata isn’t terribly useful. It’s what
is done with it that’s important. Consider it a raw
data input that can be then used to generate
Inferred or Derived metadata. The raw data can
also be an input for a smart algorithm to automate
synchronizing of dual system audio and video,
such as Avid’s AutoSync and our Sync-N-Link for
Because it’s important source metadata needs
to be preserved throughout the process. Apple
achieve this by writing the source metadata into
by Log and Transfer.
Source metadata is not exclusively electronic.
is a form of source metadata that can’t be altered.
Added Metadata is information that we just can’t
get from the camera or capture software and has
to come from a human. It can be added by a
person on-set (e.g. Adobe OnLocation) or during
the logging process. Examples:
Keywords or tags;
Comments or Log Notes;
RETURN TO CONTENTS PAGE
Apply a label;
Enter an auxiliary timecode or copy it from
Manually transcribe speech (not done
LOG NOTES NETERED IN THE NLE IS ADDED METADATA
One of the most exciting things I see in the future
use of metadata in post production is that the
amount of Added Metadata will be reduced and
replaced with Derived and Inferred Metadata.
Instead of a human needing to transcribe the
will create a transcription as Derived metadata
(Speech transcription technology will improve to
the point where it’s possible to rely on it: it’s just
not there yet.).
Facial recognition technology is also in its infancy.
When fully matured we will be able to identify
people or characters once and have them
recognized and tagged across the entirety of the
project. Like speech recognition, it’s not quite
Over time the amount of Added Metadata that’s
required will be reduced, reducing the need for
manual logging of clips.
Even the application of labels could be automated
based on some Source parameter. A clip tagged as
label applied to quickly identify it as a “good” take.
Neither Source nor Added Metadata are
particularly new: they are the current “state of the
art.” From here I start to focus more on the
potential future use of metadata beyond where
we are now.
In attempting to reconstruct my iPhoto library
from a total loss of metadata last year, I found
another valuable source of Added metadata: the
notes people made on slides or on the back of
prints. Kodak Australia for long periods of time
printed the month and year of processing on the
back, which was also a form of added (but source)
metadata of the era.
In production terms, these are the script notes
Derived Metadata is calculated using a non-human
external information source. It takes the Source
metadata (preferably) or Added metadata and
uses software algorithms or web APIs (Application
Programming Interface – a way of accessing a web
application from another application) to take the
basic facts and generate useful information.
Speech recognition software can produce
A language algorithm can derive keywords from
Locations can be derived from GPS data using
mapping data (e.g. Eiffel Tower, Paris, France)
or even identifying whether somewhere is in
a city or the country;
Facial recognition (with the limitation that the
same face may be recognized, but at least one
time the name will have to be provided as Added
metadata entered by a person);
Recalculation of duration when video and audio
have different timebases;
OCR (Optical Character Recognition) of text
within a shot.
TODAY GOOGLE’S MAPPING API CAN TAKE LATITUDE AND LONGI-
TUDE INFORMATION AND GIVE US STREET ADDRESS AND IDENTIFY
THE LOCATION AS A CHURCH
One excellent use of Derived metadata is Singular
Software’s PluralEyes, which examines the audio
waveforms from multiple sources and derives
synchronization metadata from the waveform
data. PluralEyes uses this metadata to synchronize
multiple cameras or make multiclips from the
synchronized cameras, thanks to skillful
interpretation of source metadata: the waveforms
in this example.
In reconstructing my iPhoto library I upgraded to
iPhoto 09 predominantly on the promise of facial
friends once – in comments – I was hoping to
reduce the workload the second time round. That’s
how I learnt that these type of technologically
RETURN TO CONTENTS PAGE
derived pieces of metadata are still in the earliest
stages, but the potential is obvious.
While neither speech recognition nor facial
recognition technologies are quite “there yet”
for production purposes, GPS to location and OCR
technologies are well proven and ready to be
applied to production software. As the tools for
deriving metadata get better, there is huge
As we get more tools for deriving the metadata we
get more options for building on the basic Source
metadata and make it more valuable.
Inferred Metadata is metadata that can be
assumed from other metadata without an external
information source. It may be used to help derive
what would otherwise have to be Added
By examining both time of day of the shots and
at the same location during a similar time period
to make an event (If this event is given a name,
the name is Added metadata, but with software
assisted editing it only needs to be added once
for the event, not to each clip.).
WITH THE CURRENT STATE-OF-THE-ART IPHOTO NEEDS HELP IDEN-
TIFYING WHAT IS AN EVENT
If the time of day timecode for a series of shots is
within a relatively continuous period, but over
closely spaced locations, and then there is a big
gap until the next time of day timecode, it can be
assumed that those shots were made together at a
series of related events (and if they are named, this
would Added metadata).
it’s a weekend afternoon, we can infer that the
event is likely a wedding (See a more detailed
example of how inferred metadata would more
accurately identify a wedding a little further on in
One way that Intelligent Assistance uses Inferred
metadata is in our Sync-N-Link software, which
batch merges dual system audio and video using
matching Timecode. However free run Timecode
(a.k.a. “Time of Day” Timecode) comes around
every day, making it harder to determine which
audio clip goes with the matching video for a
multi-day shoot. Sync-N-Link uses the Bin structure
and the relative proximity of video and audio clips
to infer which audio and video clips match.
Inferred metadata requires some computation
“smarts” but essentially applies high quality “rules
of thumb” to the construction of meaning from
source or derived metadata.
BRINGING IT TOGETHER
With just these four types of metadata, imagine
All interviews are transcribed by speech
The transcriptions are analyzed and divided into
logical “takes” – an answer to a question or one
of the paragraphs of the answer.
From the transcription we can infer (see next
section) that the person (voice) with the least
amount of transcription is asking the questions
and the voice with the greater amount of
transcription is answering.
Using facial recognition software the person
is matched to all other interviews or shots with
that person. All that is required from a human is
the one-off input of the person’s name, which
will provide us with the information we need
to automatically generate lower third
source could be broken into subclips, dropping
the speaker with the least contribution to
For each subclip now created Keyword Extraction
technology derives “story keywords” to summarize
the content into something more useful for an
editing algorithm (Keyword extraction technology
is already well proven in the knowledge
PDQDJHPHQW OLEUDU\©LQGXVWULHV ©
Locations are all generated from the GPS data
stored by the camera. There are several GPS to
RETURN TO CONTENTS PAGE
Google’s location API also reports businesses or points of interest at a
particular location, so instead of street address, the GPS data can be used
to derive the business at a location. This then can be used to infer the
purpose of the video (If the inferred event was in a Church and it was a
weekend afternoon, it would be reasonable to infer that this would be a
wedding. Additional GPS and time data that places the most recent
event at a residential address, and a subsequent event at a reception
venue or hotel and the inference of a wedding is complete.). This type of
derived and inferred knowledge would then be used to choose the most
appropriate editing algorithm for this type of edit.
Location information would also be useful in identifying appropriate
B-roll for interview shots, that is based on location proximity (Not all
B-roll is location based, but some is.).
All this data could be fed into an improved version of our First Cuts for
FCP algorithm to generate edits that have story arc, B-roll and lower
The shots are transcribed and matched with the script
(similar to Avid’s ScriptSync).
All takes of a section of script are built into multiclips, with the marked
“good” takes the active clip in the multiclip (or if there is more than one
“good²©WDNH©DV©WKH©½UVW© ©©RU©©DQJOHV©LQ©WKH©PXOWLFOLS ©7DNHV©QRW©
marked “good” could be optionally included in the multiclip.
Based on the script and the marked good takes the multiclips are
Taking the raw stone and polishing it into a sparkling gem.
These two scenarios are just a hint of what could be achieved with good
metadata – source, added, derived and inferred – and some smart
algorithms. Pre-edit postproduction will be revolutionized.
But these four are not the only types of metadata available to us.
RETURN TO CONTENTS PAGE
When working out what to name this next type of
metadata I considered “Visually Obvious Invisible
Metadata,” but really! So this next type of
metadata I’ll call Analytical Metadata because it
requires intelligent analysis of the images or clips
Analytical metadata is encoded information in the
picture about the picture: probably mostly related
to people, places and context. The most obvious
example is a series of photos without any event
information, time or location metadata. By
analyzing who was wearing what clothes and
correlating between shots, the images related to
an event can be grouped together even without
an overall group shot. There may be only one shot
can be cross-correlated to the other pictures in the
group by clothing.
Similarly a painting, picture, decoration or
architectural element that appears in more than
one shot can be used to identify the location for all
the shots at that event. I’ve even used hair styles as
a general time-period indicator, but that’s not a
absence of someone in a picture can identify a
time period: if “that partner” is in the picture so it
challenging to automate with a computer.
Full, detailed visual recognition and the
interpretive skills that go with it are still in its
infancy. Scientists are working to teach computers
How to recognize objects but I doubt these
functions will be automated in my life.
There was one other “metadata breakthrough” that
came from my big “iPhoto disaster of 09.” It led to
Let me back up a minute and explain the
inspiration. In earlier versions (before iPhoto 09 I
think) corrections to images made in iPhoto
created an updated copy, which became the
source for the next round of corrections. There has
always been the option to revert to the original
image, but that would be starting over.
In iPhoto 09, if all corrections are done within
iPhoto it’s tracked as live metadata. Crop an image
and the original is always available to be
uncropped. Ditto rotation and all image correction.
Instead of starting over, you’re always working
from the source, plus Transform metadata.
version of each photo, as well as all the
This set me thinking. The image was being stored
as the original and then the display metadata is
applied. Then it struck me that this was exactly
digital still cameras and the digital cinema RED
camera). Raw images really need a Color Lookup
Table (CLUT) before they’re viewable at all. An
call this Presentation Metadata – information on
how to present the raw image. Greg (my partner)
argued strongly that it should be Aesthetic
IT TAKES PROCESSING OF GAMMA AND MATRIX TO GET THE SPECTACULAR RESULTS WE’RE USED TO
FROM RED RAW. (THANKS TO GRAEME NATTRESS OF RED DIGITAL CINEMA FOR THE IMAGE.)
NATIVELY RAW FILES ARE AESTHETICALLY UNPLEASING.
Documents you may be interested
Documents you may be interested