1 Encapsulation of FLAC in ISO Base Media File Format
6 2 Supporting Normative References
7 3 Design Rules of Encapsulation
8 3.1 File Type Identification
9 3.2 Overview of Track Structure
10 3.3 Definition of FLAC sample
11 3.3.1 Sample entry format
12 3.3.2 FLAC Specific Box
14 3.3.4 Duration of FLAC sample
17 3.3.6.1 Random Access Point
18 3.4 Basic Structure (informative)
20 3.5 Example of Encapsulation (informative)
26 This document specifies the normative mapping for encapsulation of
27 FLAC coded audio bitstreams in ISO Base Media file format and its
28 derivatives. The encapsulation of FLAC coded bitstreams in
29 QuickTime file format is outside the scope of this specification.
31 2 Supporting Normative References
33 [1] ISO/IEC 14496-12:2012 Corrected version
35 Information technology — Coding of audio-visual objects — Part
36 12: ISO base media file format
38 [2] ISO/IEC 14496-12:2012/Amd.1:2013
40 Information technology — Coding of audio-visual objects — Part
41 12: ISO base media file format AMENDMENT 1: Various
42 enhancements including support for large metadata
44 [3] FLAC format specification
46 https://xiph.org/flac/format.html
48 Definition of the FLAC Audio Codec stream format
50 [4] FLAC-in-Ogg mapping specification
52 https://xiph.org/flac/ogg_mapping.html
54 Ogg Encapsulation for the FLAC Audio Codec
56 [5] Matroska specification
58 3 Design Rules of Encapsulation
60 3.1 File Type Identification
62 This specification does not define any brand to declare files
63 which conform to this specification. Files which conform to
64 this specification shall contain at least one brand which
65 supports the requirements and the requirements described in
66 this clause without contradiction in the compatible brands
67 list of the File Type Box. The minimal support of the
68 encapsulation of FLAC bitstreams in ISO Base Media file format
69 requires the 'isom' brand.
71 3.2 Overview of Track Structure
73 FLAC coded audio shall be encapsulated into the ISO Base
74 Media File Format as media data within an audio track.
76 + The handler_type field in the Handler Reference Box
77 shall be set to 'soun'.
79 + The Media Information Box shall contain the Sound Media
82 + The codingname of the sample entry is 'fLaC'.
84 This specification does not define any encapsulation
85 using MP4AudioSampleEntry with objectTypeIndication
86 specified by the MPEG-4 Registration Authority
87 (http://www.mp4ra.org/). See section 'Sample entry
88 format' for the definition of the sample entry.
90 + The 'dfLa' box is added to the sample entry to convey
91 initializing information for the decoder.
93 See section 'FLAC Specific Box' for the definition of
96 + A FLAC sample is exactly one FLAC frame as described
97 in the format specification[3]. See section
98 'Sample format' for details of the frame contents.
100 + Every FLAC sample is a sync sample. No pre-roll or
101 lapping is required. See section 'Random Access' for
104 3.3 Definition of a FLAC sample
106 3.3.1 Sample entry format
108 For any track containing one or more FLAC bitstreams, a
109 sample entry describing the corresponding FLAC bitstream
110 shall be present inside the Sample Table Box. This version
111 of the specification defines only one sample entry format
112 named FLACSampleEntry whose codingname is 'fLaC'. This
113 sample entry includes exactly one FLAC Specific Box
114 defined in section 'FLAC specific box' as a mandatory box
115 and indicates that FLAC samples described by this sample
116 entry are stored by the sample format described in section
119 The syntax and semantics of the FLACSampleEntry is shown
120 as follows. The data fields of this box and native
121 FLAC[3] structures encoded within FLAC blocks are both
122 stored in big-endian format, though for purposes of the
123 ISO BMFF container, FLAC native metadata and data blocks
124 are treated as unstructured octet streams.
126 class FLACSampleEntry() extends AudioSampleEntry ('fLaC'){
130 The fields of the AudioSampleEntry portion shall be set as
135 The channelcount field shall be set equal to the
136 channel count specified by the FLAC bitstream's native
137 METADATA_BLOCK_STREAMINFO header as described in [3].
138 Note that the FLAC FRAME_HEADER structure that begins
139 each FLAC sample redundantly encodes channel number;
140 the number of channels declared in each FRAME_HEADER
141 MUST match the number of channels declared here and in
142 the METADATA_BLOCK_STREAMINFO header.
146 The samplesize field shall be set equal to the bits
147 per sample specified by the FLAC bitstream's native
148 METADATA_BLOCK_STREAMINFO header as described in [3].
149 Note that the FLAC FRAME_HEADER structure that begins
150 each FLAC sample redundantly encodes the number of
151 bits per sample; the bits per sample declared in each
152 FRAME_HEADER MUST match the samplesize declared here
153 and the bits per sample field declared in the
154 METADATA_BLOCK_STREAMINFO header.
158 When possible, the samplerate field shall be set
159 equal to the sample rate specified by the FLAC
160 bitstream's native METADATA_BLOCK_STREAMINFO header
161 as described in [3], left-shifted by 16 bits to
162 create the appropriate 16.16 fixed-point
165 When the bitstream's native sample rate is greater
166 than the maximum expressible value of 65535 Hz,
167 the samplerate field shall hold the greatest
168 expressible regular division of that rate. I.e.
169 the samplerate field shall hold 48000.0 for
170 native sample rates of 96 and 192 kHz. In the
171 case of unusual sample rates which do not have
172 an expressible regular division, the maximum value
173 of 65535.0 Hz should be used.
175 High-rate FLAC bitstreams are common, and the native
176 value from the METADATA_BLOCK_STREAMINFO header in
177 the FLACSpecificBox MUST be read to determine the
178 correct sample rate of the bitstream.
180 Note that the FLAC FRAME_HEADER structure that begins
181 each FLAC sample redundantly encodes the sample rate;
182 the sample rate declared in each FRAME_HEADER MUST
183 match the sample rate declared in the
184 METADATA_BLOCK_STREAMINFO header, and here in the
185 AudioSampleEntry portion of the FLACSampleEntry
186 as much as is allowed by the encoding restrictions
189 Finally, the FLACSpecificBox carries codec headers:
193 This box contains initializing information for the
194 decoder as defined in section 'FLAC specific box'.
196 3.3.2 FLAC Specific Box
198 Exactly one FLAC Specific Box shall be present in each
199 FLACSampleEntry. This specification defines version 0
200 of this box. If incompatible changes occur in future
201 versions of this specification, another version number
202 will be defined. The data fields of this box and native
203 FLAC[3] structures encoded within FLAC blocks are both
204 stored in big-endian format, though for purposes of the
205 ISO BMFF container, FLAC native metadata and data blocks
206 are treated as unstructured octet streams.
208 The syntax and semantics of the FLAC Specific Box is shown
211 class FLACMetadataBlock {
212 unsigned int(1) LastMetadataBlockFlag;
213 unsigned int(7) BlockType;
214 unsigned int(24) Length;
215 unsigned int(8) BlockData[Length];
218 aligned(8) class FLACSpecificBox
219 extends FullBox('dfLa', version=0, 0){
220 for (i=0; ; i++) { // to end of box
227 The Version field shall be set to 0.
229 In the future versions of this specification, this
230 field may be set to other values. And without support
231 of those values, the reader shall not read the fields
232 after this within the FLACSpecificBox.
236 The Flags field shall be set to 0.
238 After the FullBox header, the box contains a sequence of
239 FLAC[3] native-metadata block structures that fill the
240 remainder of the box.
242 Each FLACMetadataBlock structure consists of three fields
243 filling a total of four bytes that form a FLAC[3] native
244 METADATA_BLOCK_HEADER, followed by raw octet bytes that
245 comprise the FLAC[3] native METADATA_BLOCK_DATA.
247 + LastMetadataBlockFlag:
249 The LastMetadataBlockFlag field maps semantically to
250 the FLAC[3] native METADATA_BLOCK_HEADER
251 Last-metadata-block flag as defined in the FLAC[3]
254 The LastMetadataBlockFlag is set to 1 if this
255 MetadataBlock is the last metadata block in the
256 FLACSpecificBox. It is set to 0 otherwise.
260 The BlockType field maps semantically to the FLAC[3]
261 native METADATA_BLOCK_HEADER BLOCK_TYPE field as
262 defined in the FLAC[3] file specification.
264 The BlockType is set to a valid FLAC[3] BLOCK_TYPE
265 value that identifies the type of this native metadata
266 block. The BlockType of the first FLACMetadataBlock
267 must be set to 0, signifying this is a FLAC[3] native
268 METADATA_BLOCK_STREAMINFO block.
272 The Length field maps semantically to the FLAC[3]
273 native METADATA_BLOCK_HEADER Length field as
274 defined in the FLAC[3] file specification.
276 The length field specifies the number of bytes of
277 MetadataBlockData to follow.
281 The BlockData field maps semantically to the FLAC[3]
282 native METADATA_BLOCK_HEADER METADATA_BLOCK_DATA as
283 defined in the FLAC[3] file specification.
285 Taken together, the bytes of the FLACMetadataBlock form a
286 complete FLAC[3] native METADATA_BLOCK structure.
288 Note that a minimum of a single FLACMetadataBlock,
289 consisting of a FLAC[3] native METADATA_BLOCK_STREAMINFO
290 structure, is required. Should the FLACSpecificBox
291 contain more than a single FLACMetadataBlock structure,
292 the FLACMetadataBlock containing the FLAC[3] native
293 METADATA_BLOCK_STREAMINFO must occur first in the list.
295 Other containers that package FLAC audio streams, such as
296 Ogg[4] and Matroska[5], wrap FLAC[3] native metadata without
297 modification similar to this specification. When
298 repackaging or remuxing FLAC[3] streams from another
299 format that contains FLAC[3] native metadata into an ISO
300 BMFF file, the complete FLAC[3] native metadata should be
301 preserved in the ISO BMFF stream as described above. It
302 is also allowed to parse this native metadata and include
303 contextually redundant ISO BMFF-native repackagings and/or
304 reparsings of FLAC[3] native metadata, so long as the
305 native metadata is also preserved.
309 A FLAC sample is exactly one FLAC audio FRAME (as defined
310 in the FLAC[3] file specification) belonging to a FLAC
311 bitstreams. The FLAC sample data begins with a complete
312 FLAC FRAME_HEADER, followed by one FLAC SUBFRAME per
313 channel, any necessary bit padding, and ends with the
314 usual FLAC FRAME_FOOTER.
316 Note that the FLAC native FRAME_HEADER structure that
317 begins each FLAC sample redundantly encodes channel count,
318 sample rate, and sample size. The values of these fields
319 must agree both with the values declared in the FLAC
320 METADATA_BLOCK_STREAMINFO structure as well as the
323 3.3.4 Duration of a FLAC sample
325 The duration of any given FLAC sample is determined by
326 dividing the decoded block size of a FLAC frame, as
327 encoded in the FLAC FRAME's FRAME_HEADER structure, by the
328 value of the timescale field in the Media Header Box.
329 FLAC samples are permitted to have variable durations
330 within a given audio stream. FLAC does not use padding
335 Sub-samples are not defined for FLAC samples in this
340 This subclause describes the nature of the random access
343 3.3.6.1 Random Access Point
345 All FLAC samples can be independently decoded
346 i.e. every FLAC sample is a sync sample. The Sync
347 Sample Box shall not be present as long as there are
348 no samples other than FLAC samples in the same
349 track. The sample_is_non_sync_sample field for FLAC
350 samples shall be set to 0.
352 3.4 Basic Structure (informative)
356 This subclause shows a basic structure of the Movie Box as follows:
358 +----+----+----+----+----+----+----+----+------------------------------+
359 |moov| | | | | | | | Movie Box |
360 +----+----+----+----+----+----+----+----+------------------------------+
361 | |mvhd| | | | | | | Movie Header Box |
362 +----+----+----+----+----+----+----+----+------------------------------+
363 | |trak| | | | | | | Track Box |
364 +----+----+----+----+----+----+----+----+------------------------------+
365 | | |tkhd| | | | | | Track Header Box |
366 +----+----+----+----+----+----+----+----+------------------------------+
367 | | |edts|* | | | | | Edit Box |
368 +----+----+----+----+----+----+----+----+------------------------------+
369 | | | |elst|* | | | | Edit List Box |
370 +----+----+----+----+----+----+----+----+------------------------------+
371 | | |mdia| | | | | | Media Box |
372 +----+----+----+----+----+----+----+----+------------------------------+
373 | | | |mdhd| | | | | Media Header Box |
374 +----+----+----+----+----+----+----+----+------------------------------+
375 | | | |hdlr| | | | | Handler Reference Box |
376 +----+----+----+----+----+----+----+----+------------------------------+
377 | | | |minf| | | | | Media Information Box |
378 +----+----+----+----+----+----+----+----+------------------------------+
379 | | | | |smhd| | | | Sound Media Header Box |
380 +----+----+----+----+----+----+----+----+------------------------------+
381 | | | | |dinf| | | | Data Information Box |
382 +----+----+----+----+----+----+----+----+------------------------------+
383 | | | | | |dref| | | Data Reference Box |
384 +----+----+----+----+----+----+----+----+------------------------------+
385 | | | | | | |url | | DataEntryUrlBox |
386 +----+----+----+----+----+----+ or +----+------------------------------+
387 | | | | | | |urn | | DataEntryUrnBox |
388 +----+----+----+----+----+----+----+----+------------------------------+
389 | | | | |stbl| | | | Sample Table |
390 +----+----+----+----+----+----+----+----+------------------------------+
391 | | | | | |stsd| | | Sample Description Box |
392 +----+----+----+----+----+----+----+----+------------------------------+
393 | | | | | | |fLaC| | FLACSampleEntry |
394 +----+----+----+----+----+----+----+----+------------------------------+
395 | | | | | | | |dfLa| FLAC Specific Box |
396 +----+----+----+----+----+----+----+----+------------------------------+
397 | | | | | |stts| | | Decoding Time to Sample Box |
398 +----+----+----+----+----+----+----+----+------------------------------+
399 | | | | | |stsc| | | Sample To Chunk Box |
400 +----+----+----+----+----+----+----+----+------------------------------+
401 | | | | | |stsz| | | Sample Size Box |
402 +----+----+----+----+----+ or +----+----+------------------------------+
403 | | | | | |stz2| | | Compact Sample Size Box |
404 +----+----+----+----+----+----+----+----+------------------------------+
405 | | | | | |stco| | | Chunk Offset Box |
406 +----+----+----+----+----+ or +----+----+------------------------------+
407 | | | | | |co64| | | Chunk Large Offset Box |
408 +----+----+----+----+----+----+----+----+------------------------------+
409 | |mvex|* | | | | | | Movie Extends Box |
410 +----+----+----+----+----+----+----+----+------------------------------+
411 | | |trex|* | | | | | Track Extends Box |
412 +----+----+----+----+----+----+----+----+------------------------------+
414 Figure 1 - Basic structure of Movie Box
416 It is strongly recommended that the order of boxes should
417 follow the above structure. Boxes marked with an asterisk
418 (*) may or may not be present depending on context. For
419 most boxes listed above, the definition is as is defined
420 in ISO/IEC 14496-12 [1]. The additional boxes and the
421 additional requirements, restrictions and recommendations
422 to the other boxes are described in this specification.
424 3.5 Example of Encapsulation (informative)
427 [ftyp: File Type Box]
430 major_brand = mp42 : MP4 version 2
433 brand[0] = mp42 : MP4 version 2
434 brand[1] = isom : ISO Base Media file format
438 [mvhd: Movie Header Box]
443 creation_time = UTC 2014/12/12, 18:41:19
444 modification_time = UTC 2014/12/12, 18:41:19
446 duration = 33600 (00:00:00.700)
450 reserved = 0x00000000
451 reserved = 0x00000000
452 transformation matrix
453 | a, b, u | | 1.000000, 0.000000, 0.000000 |
454 | c, d, v | = | 0.000000, 1.000000, 0.000000 |
455 | x, y, w | | 0.000000, 0.000000, 1.000000 |
456 pre_defined = 0x00000000
457 pre_defined = 0x00000000
458 pre_defined = 0x00000000
459 pre_defined = 0x00000000
460 pre_defined = 0x00000000
461 pre_defined = 0x00000000
463 [iods: Object Descriptor Box]
468 [tag = 0x10: MP4_IOD]
469 expandableClassSize = 16
470 ObjectDescriptorID = 1
472 includeInlineProfileLevelFlag = 0
474 ODProfileLevelIndication = 0xff
475 sceneProfileLevelIndication = 0xff
476 audioProfileLevelIndication = 0xfe
477 visualProfileLevelIndication = 0xff
478 graphicsProfileLevelIndication = 0xff
479 [tag = 0x0e: ES_ID_Inc]
480 expandableClassSize = 4
485 [tkhd: Track Header Box]
493 creation_time = UTC 2014/12/12, 18:41:19
494 modification_time = UTC 2014/12/12, 18:41:19
496 reserved = 0x00000000
497 duration = 33600 (00:00:00.700)
498 reserved = 0x00000000
499 reserved = 0x00000000
504 transformation matrix
505 | a, b, u | | 1.000000, 0.000000, 0.000000 |
506 | c, d, v | = | 0.000000, 1.000000, 0.000000 |
507 | x, y, w | | 0.000000, 0.000000, 1.000000 |
513 [mdhd: Media Header Box]
518 creation_time = UTC 2014/12/12, 18:41:19
519 modification_time = UTC 2014/12/12, 18:41:19
521 duration = 34560 (00:00:00.720)
524 [hdlr: Handler Reference Box]
529 pre_defined = 0x00000000
531 reserved = 0x00000000
532 reserved = 0x00000000
533 reserved = 0x00000000
534 name = Xiph Audio Handler
535 [minf: Media Information Box]
538 [smhd: Sound Media Header Box]
545 [dinf: Data Information Box]
548 [dref: Data Reference Box]
554 [url : Data Entry Url Box]
559 location = in the same file
560 [stbl: Sample Table Box]
563 [stsd: Sample Description Box]
569 [fLaC: Audio Description]
572 reserved = 0x000000000000
573 data_reference_index = 1
576 reserved = 0x00000000
581 samplerate = 48000.000000
582 [dfLa: FLAC Specific Box]
588 LastMetadataBlockFlag = 1
592 [stts: Decoding Time to Sample Box]
601 [stsc: Sample To Chunk Box]
609 samples_per_chunk = 13
610 sample_description_index = 1
613 samples_per_chunk = 5
614 sample_description_index = 1
615 [stsz: Sample Size Box]
620 sample_size = 0 (variable)
640 [stco: Chunk Offset Box]
646 chunk_offset[0] = 686
647 chunk_offset[1] = 12985
648 [free: Free Space Box]
651 [mdat: Media Data Box]
657 This spec draws heavily from the Opus-in-ISOBMFF specification
658 work done by Yusuke Nakamura <muken.the.vfrmaniac |at| gmail.com>
660 Thank you to Tim Terriberry, David Evans, and Yusuke Nakamura
661 for valuable feedback. Thank you to Ralph Giles for editorial
666 Monty Montgomery <cmontgomery@mozilla.com>