aboutsummaryrefslogtreecommitdiff
path: root/docs/design/draft-metadata.txt
blob: aa8407581d4699b5c03bc124ca16e544eee5c9aa (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
Metadata
--------

This draft recaps the current metadata handling in GStreamer and proposes some
additions.


Supported Metadata standards
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The paragraphs below list supported native metadata standards sorted by type and
then in alphabetical order. Some standards have been extended to support
additional metadata. GStreamer already supports all of those to some extend. 
This is showns in the table below as either [--], [r-], [-w] or [rw] depending on
read/write support (08.Feb.2010).

Audio
- mp3
  ID3v2: [rw]
    http://www.id3.org/Developer_Information
  ID3v1: [rw] 
    http://www.id3.org/ID3v1
  XMP: [--] (inside ID3v2 PRIV tag of owner XMP)
    http://www.adobe.com/devnet/xmp/
- ogg/vorbis
  vorbiscomment: [rw] 
    http://www.xiph.org/vorbis/doc/v-comment.html
    http://wiki.xiph.org/VorbisComment
- wav
  LIST/INFO chunk: [rw]
    http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/RIFF.html#Info
    http://www.kk.iij4u.or.jp/~kondo/wave/mpidata.txt
  XMP: [--]
    http://www.adobe.com/devnet/xmp/

Video
- 3gp
  {moov,trak}.udta:  [rw]
     http://www.3gpp.org/ftp/Specs/html-info/26244.htm 
  ID3V2: [--]
     http://www.3gpp.org/ftp/Specs/html-info/26244.htm 
     http://www.mp4ra.org/specs.html#id3v2
- avi
  LIST/INFO chunk: [rw]
    http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/RIFF.html#Info
    http://www.kk.iij4u.or.jp/~kondo/wave/mpidata.txt
  XMP: [--] (inside "_PMX" chunk)
    http://www.adobe.com/devnet/xmp/
- asf
  ??: 
  XMP: [--]
    http://www.adobe.com/devnet/xmp/
- flv [--]
  XMP: (inside onXMPData script data tag)
    http://www.adobe.com/devnet/xmp/
- mkv
  tags: [rw]
    http://www.matroska.org/technical/specs/tagging/index.html
- mov
  XMP: [--] (inside moov.udta.XMP_ box)
    http://www.adobe.com/devnet/xmp/
- mp4
  {moov,trak}.udta: [rw]
    http://standards.iso.org/ittf/PubliclyAvailableStandards/c051533_ISO_IEC_14496-12_2008.zip
  moov.udta.meta.ilst: [rw]
    http://atomicparsley.sourceforge.net/
    http://atomicparsley.sourceforge.net/mpeg-4files.html
  ID3v2: [--]
    http://www.mp4ra.org/specs.html#id3v2
  XMP: [--] (inside UUID box)
    http://www.adobe.com/devnet/xmp/
- mxf
  ??

Images
- gif
  XMP: [--]
    http://www.adobe.com/devnet/xmp/
- jpg
  jif: [rw] (only comments)
  EXIF: [rw] (via metadata plugin)
    http://www.exif.org/specifications.html
  IPTC: [rw] (via metadata plugin)
    http://www.iptc.org/IPTC4XMP/
  XMP: [rw] (via metadata plugin)
    http://www.adobe.com/devnet/xmp/
- png
  XMP: [--]
    http://www.adobe.com/devnet/xmp/

further Links:
http://age.hobba.nl/audio/tag_frame_reference.html
http://wiki.creativecommons.org/Tracker_CC_Indexing


Current Metadata handling
~~~~~~~~~~~~~~~~~~~~~~~~~

When reading files, demuxers or parsers extract the metadata. It will be sent 
a GST_EVENT_TAG to downstream elements. When a sink element receives a tag
event, it will post a GST_MESSAGE_TAG message on the bus with the contents of
the tag event.

Elements receiving GST_EVENT_TAG events can mangle them, mux them into the 
buffers they send or just pass them through. Usually is muxers that will format
the tag data into the form required by the format they mux. Such elements would
also implement the GstTagSetter interface to receive tags from the application.

  +----------+
  | demux    |
 sink       src --> GstEvent(tag) over GstPad to downstream element
  +----------+

           method call over GstTagSetter interface from application
                                                          |
                                                          v
                                                    +----------+
                                                    | mux      |
GstEvent(tag) over GstPad from upstream element --> sink       src
                                                    +----------+

The data used in all those interfaces is GstTagList. It is based on a 
GstStructure which is like a hash table with differently typed entries. The key
is always a string/GQuark. Many keys are predefined in GStreamer core. More keys
are defined in gst-plugins-base/gst-libs/gst/tag/tag.h.
If elements and applications use predefined types, it is possible to transcode a
file from one format into another while preserving all known and mapped
metadata.


Issues
~~~~~~

Unknown/Unmapped metadata
^^^^^^^^^^^^^^^^^^^^^^^^^

Right now GStreamer can lose metadata when transcoding, remuxing content. This
can happend as we don't map all metadata fields to generic ones.

We should probably also add the whole metadata blob to the GstTagList. We would
need a GST_TAG_SYSTEM_xxx define (e.g. GST_TAG_SYSTEM_ID3V2) for each standard.
The content is not printable and should be treated as binary if not known. The
tag is not mergeable - call gst_tag_register() with GstTagMergeFunc=NULL. Also
the tag data is only useful for upstream elements, not for the application. 

A muxer would first scan a taglist for known system tags. Unknown tags are
ignored as already. It would first populate its own metadata store with the
entries from the system tag and the update the entries with the data in normal
tags.

Below is an initial list of tag systems:
ID3V1           - GST_TAG_SYSTEM_ID3V1
ID3V2           - GST_TAG_SYSTEM_ID3V2
RIFF_INFO       - GST_TAG_SYSTEM_RIFF_INFO
XMP             - GST_TAG_SYSTEM_XMP

We would basically need this for each container format.

See also https://bugzilla.gnome.org/show_bug.cgi?id=345352

Lost metadata
^^^^^^^^^^^^^

A case slighly different from the previous is that when an application sets a
GstTagList on a pipeline. Right elements consuming tags do not report which tags
have been consumed.  Especially when using elements that make metadata 
persistent, we have no means of knowing which of the tags made it into the 
target stream and which were not serialized. Ideally the application would like
to know which kind of metadata is accepted by a pipleine to reflect that in the
UI.

Although it is in practise so that elements implementing GstTagSetter are the
ones that serialize, this does not have to be so. Otherwise we could add a
means to that interface, where elements add the tags they have serialized. The
application could build one list from all the tag messages and then query all
the serialized tags from tag-setters. The delta tells what has not been
serialized.

A different approach would be to query the list of supported tags in advance.
This could be a query (GST_QUERY_TAG_SUPPORT). The query result could be a list
of elements and their tags. As a convenience we could flatten the list of tags
for the top-level element (if the query was sent to a bin) and add that.

Tags are per Element
^^^^^^^^^^^^^^^^^^^^

In many cases we want tags per stream. Even metadata standards like mp4/3gp
metadata supports that. Right now GST_MESSAGE_SRC(tags) is the element. We tried
changing that to the pad, but that broke applications.
Also we miss the symmetric functionality in GstTagSetter. This interface is
usually implemented by elements.

Open bugs
^^^^^^^^^

https://bugzilla.gnome.org/buglist.cgi?query_format=advanced;short_desc=tag;bug_status=UNCONFIRMED;bug_status=NEW;bug_status=ASSIGNED;bug_status=REOPENED;bug_status=NEEDINFO;short_desc_type=allwordssubstr;product=GStreamer

Add GST_TAG_MERGE_REMOVE
https://bugzilla.gnome.org/show_bug.cgi?id=560302