DOCS/tech/general.txt

   1 So, I'll describe how this stuff works.
   2
   3 The main modules:
   4
   5 1. stream.c: this is the input layer, this reads the input media (file, stdin,
   6    vcd, dvd, network etc).  what it has to know: appropriate buffering by
   7    sector, seek, skip functions, reading by bytes, or blocks with any size.
   8    The stream_t (stream.h) structure describes the input stream, file/device.
   9
  10    There is a stream cache layer (cache2.c), it's a wrapper for the stream
  11    API. It does fork(), then emulates stream driver in the parent process,
  12    and stream user in the child process, while proxying between them using
  13    preallocated big memory chunk for FIFO buffer.
  14
  15 2. demuxer.c: this does the demultiplexing (separating) of the input to
  16    audio, video or dvdsub channels, and their reading by buffered packages.
  17          The demuxer.c is basically a framework, which is the same for all the
  18          input formats, and there are parsers for each of them (mpeg-es,
  19          mpeg-ps, avi, avi-ni, asf), these are in the demux_*.c files.
  20          The structure is the demuxer_t. There is only one demuxer.
  21
  22 2.a. demux_packet_t, that is DP.
  23    Contains one chunk (avi) or packet (asf,mpg). They are stored in memory as
  24          in linked list, cause of their different size.
  25
  26 2.b. demuxer stream, that is DS.
  27    Struct: demux_stream_t
  28    Every channel (a/v/s) has one. This contains the packets for the stream
  29          (see 2.a). For now, there can be 3 for each demuxer :
  30          - audio (d_audio)
  31          - video (d_video)
  32          - DVD subtitle (d_dvdsub)
  33
  34 2.c. stream header. There are 2 types (for now): sh_audio_t and sh_video_t
  35    This contains every parameter essential for decoding, such as input/output
  36          buffers, chosen codec, fps, etc. There are each for every stream in
  37          the file. At least one for video, if sound is present then another,
  38          but if there are more, then there'll be one structure for each.
  39          These are filled according to the header (avi/asf), or demux_mpg.c
  40          does it (mpg) if it founds a new stream. If a new stream is found,
  41          the ====> Found audio/video stream: <id>  messages is displayed.
  42
  43          The chosen stream header and its demuxer are connected together
  44          (ds->sh and sh->ds) to simplify the usage. So it's enough to pass the
  45          ds or the sh, depending on the function.
  46
  47          For example: we have an asf file, 6 streams inside it, 1 audio, 5
  48          video. During the reading of the header, 6 sh structs are created, 1
  49          audio and 5 video. When it starts reading the packet, it chooses the
  50          stream for the first found audio & video packet, and sets the sh
  51          pointers of d_audio and d_video according to them. So later it reads
  52          only these streams. Of course the user can force choosing a specific
  53          stream with
  54          -vid and -aid switches.
  55          A good example for this is the DVD, where the english stream is not
  56          always the first, so every VOB has different language :)
  57          That's when we have to use for example the -aid 128 switch.
  58
  59   Now, how this reading works?
  60          - demuxer.c/demux_read_data() is called, it gets how many bytes,
  61            and where (memory address), would we like to read, and from which
  62            DS. The codecs call this.
  63          - this checks if the given DS's buffer contains something, if so, it
  64            reads from there as much as needed. If there isn't enough, it calls
  65            ds_fill_buffer(), which:
  66          - checks if the given DS has buffered packages (DP's), if so, it moves
  67            the oldest to the buffer, and reads on. If the list is empty, it
  68            calls demux_fill_buffer() :
  69          - this calls the parser for the input format, which reads the file
  70            onward, and moves the found packages to their buffers.
  71                  Well it we'd like an audio package, but only a bunch of video
  72                  packages are available, then sooner or later the:
  73                  DEMUXER: Too many (%d in %d bytes) audio packets in the buffer
  74                  error shows up.
  75
  76 2.d. video.c: this file/function handle the reading and assembling of the
  77      video frames. each call to video_read_frame() should read and return a
  78      single video frame, and it's duration in seconds (float).
  79      The implementation is splitted to 2 big parts - reading from mpeg-like
  80      streams and reading from one-frame-per-chunk files (avi, asf, mov).
  81      Then it calculates duration, either from fixed FPS value, or from the
  82      PTS difference between and after reading the frame.
  83
  84 2.e. other utility functions: there are some useful code there, like
  85     AVI muxer, or mp3 header parser, but leave them for now.
  86
  87 So everything is ok 'till now. It can be found in libmpdemux/ library.
  88 It should compile outside of mplayer tree, you just have to implement few
  89 simple functions, like mp_msg() to print messages, etc.
  90 See libmpdemux/test.c for example.
  91
  92 See also formats.txt, for description of common media file formats and their
  93 implementation details in libmpdemux.
  94
  95 Now, go on:
  96
  97 3. mplayer.c - ooh, he's the boss :)
  98     Its main purpose is connecting the other modules, and maintaining A/V
  99     sync.
 100
 101     The given stream's actual position is in the 'timer' field of the
 102     corresponding stream header (sh_audio / sh_video).
 103
 104          The structure of the playing loop :
 105          while(not EOF) {
 106              fill audio buffer (read & decode audio) + increase a_frame
 107              read & decode a single video frame + increase v_frame
 108              sleep  (wait until a_frame>=v_frame)
 109              display the frame
 110              apply A-V PTS correction to a_frame
 111              handle events (keys,lirc etc) -> pause,seek,...
 112          }
 113
 114          When playing (a/v), it increases the variables by the duration of the
 115          played a/v.
 116          - with audio this is played bytes / sh_audio->o_bps
 117          Note: i_bps = number of compressed bytes for one second of audio
 118                o_bps = number of uncompressed bytes for one second of audio
 119                    (this is = bps*samplerate*channels)
 120          - with video this is usually == 1.0/fps, but I have to note that
 121          fps doesn't really matters at video, for example asf doesn't have that,
 122          instead there is "duration" and it can change per frame.
 123          MPEG2 has "repeat_count" which delays the frame by 1-2.5 ...
 124          Maybe only AVI and MPEG1 has fixed fps.
 125
 126          So everything works right until the audio and video are in perfect
 127          synchronity, since the audio goes, it gives the timing, and if the
 128          time of a frame passed, the next frame is displayed.
 129          But what if these two aren't synchronized in the input file?
 130          PTS correction kicks in. The input demuxers read the PTS (presentation
 131          timestamp) of the packages, and with it we can see if the streams
 132          are synchronized. Then MPlayer can correct the a_frame, within
 133          a given maximal bounder (see -mc option). The summary of the
 134          corrections can be found in c_total .
 135
 136          Of course this is not everything, several things suck.
 137          For example the soundcards delay, which has to be corrected by
 138          MPlayer! The audio delay is the sum of all these:
 139          - bytes read since the last timestamp:
 140            t1 = d_audio->pts_bytes/sh_audio->i_bps
 141          - if Win32/ACM then the bytes stored in audio input buffer
 142            t2 = a_in_buffer_len/sh_audio->i_bps
 143          - uncompressed bytes in audio out buffer
 144            t3 = a_buffer_len/sh_audio->o_bps
 145          - not yet played bytes stored in the soundcard's (or DMA's) buffer
 146            t4 = get_audio_delay()/sh_audio->o_bps
 147
 148          From this we can calculate what PTS we need for the just played
 149          audio, then after we compare this with the video's PTS, we have
 150          the difference!
 151
 152          Life didn't get simpler with AVI. There's the "official" timing
 153          method, the BPS-based, so the header contains how many compressed
 154          audio bytes or chunks belong to one second of frames.
 155          In the AVI stream header there are 2 important fields, the
 156          dwSampleSize, and dwRate/dwScale pairs:
 157          - If the dwSampleSize is 0, then it's VBR stream, so its bitrate
 158          isn't constant. It means that 1 chunk stores 1 sample, and
 159          dwRate/dwScale gives the chunks/sec value.
 160          - If the dwSampleSize is >0, then it's constant bitrate, and the
 161          time can be measured this way:  time = (bytepos/dwSampleSize) /
 162          (dwRate/dwScale) (so the sample's number is divided with the
 163          samplerate). Now the audio can be handled as a stream, which can
 164          be cut to chunks, but can be one chunk also.
 165
 166          The other method can be used only for interleaved files: from
 167          the order of the chunks, a timestamp (PTS) value can be calculated.
 168          The PTS of the video chunks are simple: chunk number * fps
 169          The audio is the same as the previous video chunk was.
 170          We have to pay attention to the so called "audio preload", that is,
 171          there is a delay between the audio and video streams. This is
 172          usually 0.5-1.0 sec, but can be totally different.
 173          The exact value was measured until now, but now the demux_avi.c
 174          handles it: at the audio chunk after the first video, it calculates
 175          the A/V difference, and take this as a measure for audio preload.
 176
 177 3.a. audio playback:
 178          Some words on audio playback:
 179          Not the playing is hard, but:
 180          1. knowing when to write into the buffer, without blocking
 181          2. knowing how much was played of what we wrote into
 182          The first is needed for audio decoding, and to keep the buffer
 183          full (so the audio will never skip). And the second is needed for
 184          correct timing, because some soundcards delay even 3-7 seconds,
 185          which can't be forgotten about.
 186          To solve this, the OSS gives several possibilities:
 187          - ioctl(SNDCTL_DSP_GETODELAY): tells how many unplayed bytes are in
 188            the soundcard's buffer -> perfect for timing, but not all drivers
 189            support it :(
 190          - ioctl(SNDCTL_DSP_GETOSPACE): tells how much can we write into the
 191            soundcard's buffer, without blocking. If the driver doesn't
 192            support GETODELAY, we can use this to know how much the delay is.
 193          - select(): should tell if we can write into the buffer without
 194            blocking. Unfortunately it doesn't say how much we could :((
 195            Also, doesn't/badly works with some drivers.
 196            Only used if none of the above works.
 197
 198 4. Codecs. Consists of libmpcodecs/* and separate files or libs,
 199    for example liba52, libmpeg2, xa/*, alaw.c, opendivx/*, loader, mp3lib.
 200
 201    mplayer.c doesn't call them directly, but through the dec_audio.c and
 202    dec_video.c files, so the mplayer.c doesn't have to know anything about
 203    the codecs.
 204
 205    libmpcodecs contains wrapper for every codecs, some of them include the
 206    codec function implementation, some calls functions from other files
 207    included with mplayer, some calls optional external libraries.
 208    file naming convention in libmpcodecs:
 209    ad_*.c - audio decoder (called through dec_audio.c)
 210    vd_*.c - video decoder (called through dec_video.c)
 211    ve_*.c - video encoder (used by mencoder)
 212    vf_*.c - video filter  (see option -vf)
 213
 214    On this topic, see also:
 215    libmpcodecs.txt - The structure of the codec-filter path, with explanation
 216    dr-methods.txt - Direct rendering, MPI buffer management for video codecs
 217    codecs.conf.txt - How to write/edit codec configuration file (codecs.conf)
 218    codec-devel.txt - Mike's hints about codec development - a bit OUTDATED
 219    hwac3.txt - about SP/DIF audio passthrough
 220
 221 5. libvo: this displays the frame.
 222
 223    for details on this, read libvo.txt
 224
 225 6. libao2: this control audio playing
 226 6.a audio plugins
 227
 228    for details on this, read libao2.txt
 229