Document --cookie-jar option.
[clive.git] / bin / clive
blob9f30ec0d559534211cc66932411306c35d871f9b
1 #!/usr/bin/perl
2 # -*- coding: ascii -*-
3 ###########################################################################
4 # clive, command line video extraction utility.
5 # Copyright 2007, 2008, 2009 Toni Gundogdu.
7 # This file is part of clive.
9 # clive is free software: you can redistribute it and/or modify it under
10 # the terms of the GNU General Public License as published by the Free
11 # Software Foundation, either version 3 of the License, or (at your option)
12 # any later version.
14 # clive is distributed in the hope that it will be useful, but WITHOUT ANY
15 # WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
16 # FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
17 # details.
19 # You should have received a copy of the GNU General Public License along
20 # with this program. If not, see <http://www.gnu.org/licenses/>.
21 ###########################################################################
22 use warnings;
23 use strict;
25 binmode( STDOUT, ":utf8" );
26 binmode( STDERR, ":utf8" );
28 use clive::App;
30 clive::App->main;
32 __END__
34 =head1 NAME
36 clive - command line video extraction utility
38 =head1 SYNOPSIS
40 clive [options]... [URL]...
42 =head1 DESCRIPTION
44 clive is a command line video extraction utility for Youtube and other similar
45 video-sharing websites. It was written to work around the Adobe Flash plugin
46 requirement as the technology is poorly supported on Unix-like systems.
48 clive is not an universal video extraction utility. In fact, it supports only
49 a number of video websites. Each website typically exposes access to the video
50 content in a very different way, meaning that clive has to be customized for
51 each website in order to download any videos from them.
53 =head1 OPTIONS
55 -h, --help print help and exit
56 -v, --version print version and exit
57 --hosts print supported hosts and exit
58 --upgrade-config upgrade 2.0/2.1 config to 2.2+ format
59 -l, --last recall last input
60 --last-file=FILE read/write FILE instead of default path
61 Output Options:
62 --emit-csv emit video details in CSV to stdout
63 --debug print cURL debug messages
64 -q, --quiet turn off all output
65 --stderr redirect all output to stderr
66 HTTP Options:
67 --agent=STRING identify as STRING to http server
68 --connect-timeout=SECS max time allowed connection to take
69 --connect-timeout-socks=S same as above, tries to workaround SOCKS
70 --proxy=ADDR use ADDR for http proxy
71 --no-proxy disable all use of http proxy
72 --cookie-jar=FILE enable cookies, write them to FILE
73 Cache Options:
74 --cache-file=FILE read/write FILE instead of default path
75 -r, --cache-read enable reading from cache
76 -d, --cache-dump dump cache records to stdout
77 --cache-dump-format=STRING print format for dumping cache records
78 --cache-grep=PATTERN grep cache records for PATTERN
79 -i, --cache-ignore-case ignore case while matching records
80 -D, --cache-remove-record remove matched records from cache
81 --cache-clear truncate cache records
82 --no-cache disable cache all (read and write) use
83 Download Options:
84 -f, --format=FORMAT extract FORMAT of video
85 -O, --output-file=FILE write video to file
86 -c, --continue continue partially downloaded file
87 -n, --no-extract do not extract any videos
88 --save-dir=DIR save video files to DIR
89 --cclass=CLASS use character CLASS to filter titles
90 -C, --no-cclass do not apply character class
91 --filename-format=STRING use STRING to format output filename
92 --exec=CMD command to run when transfer finishes
93 -e, --exec-run invoke command defined with --exec
94 --stream=PERCENT run stream command (below) at PERCENT
95 --stream-exec=CMD stream command to run
96 --limit-rate=AMOUNT limit transfer rate to AMOUNT (KB/s)
97 --stop-after=SIZE|PERCENT stop file transfer after SIZE or PERCENT
99 =head1 OPTION SYNTAX
101 You may freely mix different option styles and specify options after
102 the command line arguments, e.g.:
103 % clive -c URL --format=best
105 You may also put several options together that do not require arguments:
106 % clive -cnrf best URL
108 Note that the "dashed" options have aliases. For example:
109 % clive --no-extract --no_extract --noextract
110 % clive --cache-read --cache_read --cacheread
112 =head1 OPTION DESCRIPTIONS
114 =over 4
116 =item B<-h, --help>
118 Print help and exit.
120 =item B<-v, --version>
122 Print version and exit.
124 =item B<--hosts>
126 Print supported hosts with available formats and exit.
128 =item B<--upgrade-config>
130 Upgrade clive 2.0/2.1 config to current 2.2+ format and exit.
132 =item B<-l, --last>
134 Re-feed the previously fed video page links from the last run time.
136 =item B<--last-file>=I<path>
138 Use I<path> instead of the default path. See also L</FILES>.
140 =back
142 B<Output options>
144 =over 4
146 =item B<--emit-csv>
148 Print (or emit) video details in CSV format to standard output.
149 Implies --no-extract.
151 =item B<--debug>
153 Print cURL debug (or verbose) messages to standard error.
155 =item B<-q, --quiet>
157 Turn off all output to standard output and error.
159 =item B<--stderr>
161 Direct all output to standard error.
163 =back
165 B<HTTP Options>
167 =over 4
169 =item B<--agent>=I<string>
171 Identify clive as I<string> to HTTP servers. Defaults to "Mozilla/5.0".
173 =item B<--connect-timeout>=I<seconds>
175 Maximum time in I<seconds> allowed for connection to take. Defaults to 30.
177 =item B<--connect-timeout-socks>=I<seconds>
179 Same as above but tries to workaround the SOCKS proxy bug in cURL.
180 Defaults to 30.
182 =item B<--proxy>=I<address>
184 Use I<address> for HTTP proxy. Example: "http://foo:1234".
186 =item B<--no-proxy>
188 Disable all use of HTTP proxy, even if http_proxy environment variable is set.
190 =item B<--cookie-jar>=I<file>
192 Enable cookies, which are otherwise rejected by default, and have libcurl
193 to write them to I<file>. Specify "-" to instead to have the cookies written
194 to stdout.
196 =back
198 B<Cache Options>
200 =over 4
202 =item B<--cache-file>=I<path>
204 Use I<path> instead of the default path. See L</FILES>.
206 =item B<-r, --cache-read>
208 Read video details from cache record if it exists. Allows clive to
209 skip video page fetching and parsing again. See L</CACHE> section for more
210 on this.
212 =item B<-d, --cache-dump>
214 Dump cache records to standard output.
216 =item B<--cache-dump-format>=I<format-string>
218 Used to format the output of the above. Defaults to "%n: %t [%f, %mMB]".
220 Example:
221 % clive --cache-dump --cache-dump-format="%d: %t"
223 Supported format specifiers:
224 %t .. video page title
225 %i .. video id
226 %h .. video host
227 %l .. video file length (bytes)
228 %m .. video file length (MB)
229 %d .. date (last update)
230 %T .. time (last update)
231 %s .. time stamp (same as "%d %T")
232 %f .. video file format
233 %n .. index
235 =item B<--cache-grep>=I<pattern>
237 Grep stored cache records for I<pattern>. See also L</EXAMPLES - ADVANCED USE>.
239 =item B<-i, --cache-ignore-case>
241 Ignore case-differences while matching records.
243 =item B<-D, --cache-remove-record>
245 Remove matched records from cache.
247 =item B<--cache-clear>
249 Truncate cache records.
251 =item B<--no-cache>
253 Disable all (read and write) cache use.
255 =back
257 B<Download Options>
259 =over 4
261 =item B<-f, --format>=I<format>
263 Download the I<format> of the video. If I<format> is set to I<best>, clive
264 will attempt to download the best quality of the video.
266 Note that the I<format> is strictly host specific. See the L</FORMATS>
267 section for more on this.
269 =item B<-n, --no-extract>
271 Do not extract the video. In other words: simulate only to the point
272 that clive verifies the video link after fetching and parsing the
273 video page.
275 =item B<-O, --output-file>=I<file>
277 Write video to I<file>. Overwrites an already existing file.
279 Do not use this option when you are downloading more than one video
280 on one go.
282 See also the note below.
284 =item B<-c, --continue>
286 Continue partially downloaded video file.
288 Note that, by default, clive appends a numeric suffix to the filename
289 if the file exists already. That is unless:
291 * file is already completely retrieved, or:
292 * -c or -O is used
294 =item B<--save-dir>=I<dir>
296 Save extracted videos to I<dir>. clive defaults to the current working
297 directory.
299 =item B<--cclass>=I<class>
301 Use character-I<class> to filter video page titles. Defaults to "\w".
302 This is a Perl regular expression character class. For example:
303 "[A-Za-z0-9]".
305 =item B<-C, --no-cclass>
307 Disables the use of B<--cclass>. Causes clive to use the video page
308 title as it is for output filename.
310 =item B<--filename-format>=I<format-string>
312 Use I<format-string> to format output video filenames. Default is "%t.%s".
314 Supported format specifiers:
315 %t .. video page title (after applying character-class filter)
316 %s .. video file suffix (e.g. "flv")
317 %i .. video id
318 %h .. video host
320 =item B<--exec>=I<command>;
322 Defines the I<command> to run when video file transfer completes.
323 Note that B<--exec-run> must be used to actually cause clive
324 to invoke the defined I<command>.
326 Optional arguments may be passed to the command. The expression must be
327 terminated by a semicolon (";"). If the specifier "%i" appears anywhere
328 in the I<command>, it is replaced by the pathname of the extracted
329 video file.
331 =item B<--exec>=I<command>+
333 Same as above but "%i" is replaced with as many path names as
334 possible for the invocation of I<command>.
336 =item B<-e, --exec-run>
338 Causes clive to invoke the command defined with B<--exec> when
339 transfer finishes.
341 =item B<--stream>=I<percent>
343 Execute --stream-exec=I<command> when file transfer reaches
344 I<percent>.
346 =item B<--stream-exec>=I<command>
348 Execute (fork a child process) I<command> while transferring
349 video file. This "simulates" streaming the media but does so
350 without checking for buffer underruns so make sure you set
351 --stream=I<percent> high enough and that you have a fast
352 internet connection.
354 clive will not attempt to re-execute the command if it
355 terminates before the file transfer finishes.
357 clive will wait that the child process terminates before
358 it moves on to extract another file or exits if there
359 are not any left.
361 Note that some video file formats (namely Google mp4) are
362 known to B<not> to work with this feature.
364 =item B<--limit-rate>=I<amount>
366 Limit transfer rate to I<amount> KB/s.
368 =item B<--stop-after>=I<size|percent>
370 Stop file transfer after I<size> or I<percent>. The value must
371 be terminated by either '%' or 'M'.
373 =back
375 =head1 EXAMPLES - BASIC USE
377 =over 4
379 =item clive "http://youtube.com/watch?v=3HD220e0bx4"
381 Extracts video (flv) from the above video page link. You
382 can then play the flv video file in a media player.
384 =item cat E<gt> url.lst
386 http://en.sevenload.com/videos/IUL3gda-Funny-Football-Clips
387 http://youtube.com/watch?v=3HD220e0bx4
388 http://break.com/index/beach-tackle-whip-lash.html
389 http://www.liveleak.com/view?i=704_1228511265
391 =item cat url.lst | clive
393 You can feed clive multiple video page links like this
394 or as command line arguments.
396 =item clive URL1 URL2 URL3 URL4
398 When you are using the pipes, be sure to separate each link with a newline.
400 =item xclip -o | clive
402 There are many X clipboard utilities. The above example uses C<xclip(1)>
403 and a pipe to paste (or feed) the contents to clive.
405 =item clive -l
407 Recall last video page link input. Regardless the way they were fed
408 to clive.
410 =back
412 =head1 EXAMPLES - ADVANCED USE
414 =over 4
416 =item clive -f best "http://youtube.com/watch?v=3HD220e0bx4"
418 Extract the best format of the video.
420 =item clive -r -f best "http://youtube.com/watch?v=3HD220e0bx4"
422 Same as above but read the cache record without fetching and
423 parsing the video page again.
425 =item clive --cache-dump
427 Dump all cache records to stdout. You can use --cache-dump-format
428 to format the output.
430 =item clive -ig 3hd2
432 Grep for "3hd2" pattern in cache records. If pattern matches, clive
433 continues to extract the matched videos. Note the use of "-i"
434 (--cache-ignore-case).
436 =item clive -ig 3hd2 -D
438 Same as above but removes the record from cache instead of extracting
439 the video.
441 =item clive --exec="ffmpeg -i %i %i.mp3;" -e URL
443 Extract video and use C<ffmpeg(1)> to copy audio from it to mp3 file.
445 =item clive --stream-exec="mplayer -really-quiet %i"
446 --stream=25 URL
448 Start playing the video (with mplayer) being extracted when the transfer
449 reaches 25% complete.
451 =back
453 =head1 FORMATS
455 flv format which is typically ~320x240 resolution video on all supported
456 websites, is downloaded by default if --format option is not defined.
458 Some of the supported websites support additional formats which are
459 listed below with any known details. Note that --hosts option lists
460 available formats as well as the supported hosts.
462 =over 4
464 =item B<youtube.com>
466 =item B<last.fm>
468 Format: (flv|fmt17|fmt18|fmt22|fmt35)
470 flv (fmt34) and fmt18 (mp4) are usually available. Others may be
471 available. At the time of writing this, the following formats are
472 recognized by both clive and Youtube:
474 hd .. fmt22 .. mp4 (1280x720)
475 hq .. fmt35 .. flv (640x380)
476 mp4 .. fmt18 .. mp4 (480x360)
477 flv .. fmt34 .. flv (320x180)
478 3gp .. fmt17 .. 3gp (176x144)
480 Note that you can use either format ID, e.g. fmt22 or hd.
482 Some of the last.fm videos are actually hosted by Youtube.
483 clive can be used to download such videos.
485 =item B<video.google.com>
487 Format: (flv|mp4)
489 mp4 format is available for a limited number of videos.
491 =item B<dailymotion.com>
493 Format: (flv|spak-mini|vp6-hq|vp6-hd|vp6|h264)
495 The HD and HQ videos may not always be available.
497 vp6-hd .. on2 (1280x720)
498 vp6-hq .. on2 (848x480)
499 h264 .. h264 (512x384)
500 vp6 .. on2 (320x240)
501 flv .. flv (320x240)
502 spak-mini .. flv (80x60)
504 =item B<spiegel.de>
506 Format: (flv|vp6_(64|576|928)|h264_1400)
508 h264_1400 .. mp4 (996x560)
509 vp6_928 .. flv (996x560)
510 vp6_576 .. flv (560x315)
511 flv .. flv (180x100)
512 vp6_64 .. flv (180x100)
514 Format: (3gp|small|iphone|podcast)
516 The data that clive parses indicates that these formats should be available
517 although we are yet to find a video with these formats available.
518 If you find one, let us know, too.
520 3gp .. 3gp (?)
521 small .. 3gp (?)
522 iphone .. mp4 (?)
523 podcast .. mp4 (?)
525 =item B<golem.de>
527 Format: (flv|high|ipod)
529 =item B<vimeo.com>
531 Format: (flv|hd)
533 HD should be available for the vimeo.com/hd channel videos at least.
534 Note that "flv" only means the "default flv". Some of the hosted
535 "default" videos are actually "mp4", not "flv".
537 For further reading:
538 http://vimeo.com/help/hd
540 =item B<evisor.tv>
542 =item B<liveleak.com>
544 =item B<tv.cctv.com>
546 =item B<sevenload.com>
548 =item B<break.com>
550 =item B<redtube.com>
552 Format: flv
554 =back
556 =head1 FILES
558 Should HOME environment variable be undefined for some reason, clive will
559 use the current working directory instead.
561 =over 4
563 =item $HOME/.cliverc, $HOME/.clive/config, $HOME/.config/clive/config
565 User configuration file. For example:
566 % cat >> ~/.cliverc
567 -f best
568 --proxy=http://foo:1234
570 =item $HOME/.cache/clive/last
572 File containing the last user input (video page links).
574 You can use --last-file to override the path, e.g.:
575 --last-file=/path/to/last/file.
577 You can also define this option in the config file.
579 See also CLIVE_CACHE notes below.
581 =item $HOME/.cache/clive/cache
583 BerkeleyDB based cache file containing the records of fetched
584 and parsed video pages.
586 You can use --cache-file to override the path., e.g.:
587 --cache-file=/path/to/cache/file.
589 You can also define this option in the config file.
591 See also CLIVE_CACHE notes below.
593 =item Notes: CLIVE_CACHE
595 clive defaults to use $HOME/.cache/clive/ for "last" and
596 "cache" files described above.
598 The use of the default path can be overridden by
599 CLIVE_CACHE environment variable. Note that clive
600 will attempt to create the specified path recursively.
602 Examples:
603 setenv CLIVE_CACHE /home/user/cachedata (in csh terms)
604 clive # will read/write /home/user/cachedata/(last|cache)
606 unsetenv CLIVE_CACHE
607 clive # read/write $HOME/clive/(last|cache)
609 clive --last-file=mylast --cache-file=cachedata/mycache
610 # read/write "mylast" file, read/write cachedata/mycache file
612 =back
614 =head1 CACHE
617 The purpose of the cache is to allow clive to skip fetching
618 and parsing the video page again. It does not contain any
619 actual video data so one should not expect to recover a deleted
620 video file from the cache. Only some of the parsed details
621 are stored as records to the cache.
623 By now, it is should be a well known fact that the cache fails
624 with some of the supported hosts. For example Youtube video links
625 expire after some time, this causes the re-extraction to fail if
626 the cached video link is used later again.
628 This was the main reason why in 2.2.0 reading from cache was
629 disabled by default. Many users reported the reuse of expired
630 video links as a bug previously even though it was well documented
631 in the manual page explaining that most of the HTTP 403/404 errors
632 were actually caused by expired video links.
634 It is, of course, still possible to read from cache. You can
635 enable this by invoking the --cache-read option. This causes
636 clive to look up a saved cache record and reuse the stored
637 video details if they are found instead of fetching the video
638 page.
640 The use of the cache can be disabled with the --no-cache option.
641 This disables both read and write. Note that if the BerkeleyDB Perl
642 module is not installed, clive will not use the cache.
644 See also the --cache-grep option.
646 =head1 UNICODE
648 Q: Why am I seeing mangled video filenames?
650 A: Make sure you have set appropriate locale. For example (in csh/urxvt terms):
651 % setenv LANG en_US.UTF-8
652 % urxvt &
654 You can get a list of supported locales on your typical Unix-like system with:
655 % locale -a
657 =head1 DEBUGGING
659 Some tips that we have found useful:
661 % cclive --debug URL
663 Causes I<libcurl> to dump log messages to stderr. This data includes HTTP
664 headers etc.
666 % cclive -n URL
668 Causes clive to do the usual fetch, parse and verify video link
669 approach but exits after that without downloading the video file.
671 =head1 BUGS
673 Sure to be some.
675 Please report them to the issue tracker at:
676 <http://code.google.com/p/clive/issues/>
678 =head1 EXIT STATUS
680 clive exits 0 on success, and E<gt>0 if an error occurs.
682 CLIVE_OK = 0
683 CLIVE_NOTHINGTODO = 1 # file already retrieved
684 CLIVE_NOSUPPORT = 2 # host not supported
685 CLIVE_READ = 3 # file open/read error
686 CLIVE_GREP = 4 # grep: nothing matched in cache
687 CLIVE_OPTARG = 5 # invalid option argument
688 CLIVE_SYSTEM = 6 # system call failed (e.g. fork)
689 CLIVE_REGEXP = 7 # regexp pattern matching failed
690 CLIVE_FORMAT = 8 # requested format unavailable
691 CLIVE_NET = 9 # network error
692 CLIVE_STOP = 10 # --stop-after
694 =head1 OTHER
696 Project page:
697 <http://clive.googlecode.com/>
699 Front-end (GUI):
700 <http://abby.googlecode.com/>
702 Development repository can be cloned with:
703 % git clone git://repo.or.cz/clive.git
705 =head1 HISTORY
707 =over 4
709 =item Originally written in Python
711 =item Later rewritten in Perl for 2.0.0
713 =back
715 =head1 AUTHOR
717 Toni Gundogdu <legatvs@gmail.com>
719 Thanks to all those who have contributed to the project
720 by sending patches, reporting bugs and writing feedback.
721 You know who you are.
723 =cut