Fix: unicode support (\w, cctv, etc.).
[clive.git] / bin / clive
blobca11b8bc41ea916dd2723a751a5e0a2da5424f3d
1 #!/usr/bin/perl
2 # -*- coding: ascii -*-
3 ###########################################################################
4 # clive, command line video extraction utility.
5 # Copyright 2007, 2008, 2009 Toni Gundogdu.
7 # This file is part of clive.
9 # clive is free software: you can redistribute it and/or modify it under
10 # the terms of the GNU General Public License as published by the Free
11 # Software Foundation, either version 3 of the License, or (at your option)
12 # any later version.
14 # clive is distributed in the hope that it will be useful, but WITHOUT ANY
15 # WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
16 # FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
17 # details.
19 # You should have received a copy of the GNU General Public License along
20 # with this program. If not, see <http://www.gnu.org/licenses/>.
21 ###########################################################################
22 use warnings;
23 use strict;
25 binmode(STDOUT, ":utf8");
26 binmode(STDERR, ":utf8");
28 use clive::App;
30 clive::App->main;
33 __END__
35 =head1 NAME
37 clive - command line video extraction utility
39 =head1 SYNOPSIS
41 clive [options]... [URL]...
43 =head1 DESCRIPTION
45 clive is a command line video extraction utility for Youtube and other
46 video-sharing websites. It was written to work around the Adobe Flash
47 plugin requirement.
49 clive is not an universal video extraction utility. It only works with
50 the video hosts that it supports.
52 =head1 OPTIONS
54 -h, --help print help and exit
55 -v, --version print version and exit
56 --hosts print supported hosts and exit
57 -l, --last recall last input
58 --last-file=PATH read/write PATH instead of default path
59 Output Options:
60 --emit-csv emit video details in CSV to stdout
61 --debug print cURL debug messages
62 -q, --quiet turn off all output
63 --stderr redirect all output to stderr
64 HTTP Options:
65 --agent=STRING identify as STRING to http server
66 --connect-timeout=SECS max time allowed connection to take
67 --connect-timeout-socks=S same as above, tries to workaround SOCKS
68 --proxy=ADDR use ADDR for http proxy
69 --no-proxy disable all use of http proxy
70 Cache Options:
71 --cache-file=PATH read/write PATH instead of default path
72 -r, --cache-read enable reading from cache
73 -d, --cache-dump dump cache records to stdout
74 --cache-dump-format=STRING print format for dumping cache records
75 --cache-grep=PATTERN grep cache records for PATTERN
76 -i, --cache-ignore-case ignore case while matching records
77 -D, --cache-remove-record remove matched records from cache
78 --cache-clear truncate cache records
79 --no-cache disable cache all (read and write) use
80 Download Options:
81 -f, --format=FORMAT extract FORMAT of video
82 -O, --output-file=FILE write video to file
83 -c, --continue continue partially downloaded file
84 -n, --no-extract do not extract any videos
85 --save-dir=DIR save video files to DIR
86 --cclass=CLASS use character CLASS to filter titles
87 -C, --no-cclass do not apply character class
88 --filename-format=STRING use STRING to format output filename
89 --exec=CMD command to run when transfer finishes
90 -e, --exec-run invoke command defined with --exec
91 --stream=PERCENT run stream command (below) at PERCENT
92 --stream-exec=CMD stream command to run
93 --limit-rate=AMOUNT limit transfer rate to AMOUNT (KB/s)
94 --stop-after=SIZE|PERCENT stop file transfer after SIZE or PERCENT
95 -R, --raw process video page html on as-is basis
97 =head1 OPTION SYNTAX
99 You may freely mix different option styles and specify options after
100 the command line arguments, e.g.:
101 % clive -c URL --format=best
103 You may also put several options together that do not require arguments:
104 % clive -cnrf best URL
106 Note that the "dashed" options have aliases available. These are not
107 documented in the L</OPTIONS> section. For example:
108 % clive --no-extract --no_extract --noextract
109 % clive --cache-read --cache_read --cacheread
111 =head1 OPTION DESCRIPTIONS
113 =over 4
115 =item B<-h, --help>
117 Print help and exit.
119 =item B<-v, --version>
121 Print version and exit.
123 =item B<--hosts>
125 Print supported hosts and exit.
127 =item B<-l, --last>
129 Recall (or reuse) the last input.
131 =item B<--last-file>=I<path>
133 Use I<path> instead of the default path. See L</FILES>.
135 =back
137 B<Output options>
139 =over 4
141 =item B<--emit-csv>
143 Print (emit) video details in CSV format to standard output.
144 Implies --no-extract.
146 =item B<--debug>
148 Print cURL debug (or verbose) messages to standard error.
150 =item B<-q, --quiet>
152 Turn off all output to standard output and error.
154 =item B<--stderr>
156 Direct all output to standard error.
158 =back
160 B<HTTP Options>
162 =over 4
164 =item B<--agent>=I<string>
166 Identify clive as I<string> to HTTP servers. Defaults to "Mozilla/5.0".
168 =item B<--connect-timeout>=I<seconds>
170 Maximum time in I<seconds> allowed for connection to take. Defaults to 30.
172 =item B<--connect-timeout-socks>=I<seconds>
174 Same as above but tries to workaround the SOCKS proxy bug in cURL.
175 Defaults to 30.
177 =item B<--proxy>=I<address>
179 Use I<address> for HTTP proxy. Example: "http://foo:1234".
181 =item B<--no-proxy>
183 Disable all use of HTTP proxy, even if http_proxy environment variable is set.
185 =back
187 B<Cache Options>
189 =over 4
191 =item B<--cache-file>=I<path>
193 Use I<path> instead of the default path. See L</FILES>.
195 =item B<-r, --cache-read>
197 Read video details from cache record if it exists. Allows clive to
198 skip fetching the video page again. See L</CACHE> section for more
199 on this.
201 =item B<-d, --cache-dump>
203 Dump cache records to standard output.
205 =item B<--cache-dump-format>=I<format-string>
207 Used to format the output of the above. Defaults to "%n: %t [%f, %mMB]".
209 Example:
210 % clive --cache-dump --cache-dump-format="%d: %t"
212 Supported format specifiers:
213 %t .. video page title
214 %i .. video id
215 %h .. video host
216 %l .. video file length (bytes)
217 %m .. video file length (MB)
218 %d .. date (last update)
219 %T .. time (last update)
220 %s .. time stamp (same as "%d %T")
221 %f .. video file format
222 %n .. index
224 =item B<--cache-grep>=I<pattern>
226 Grep stored cache records for I<pattern>. See also L</EXAMPLES - ADVANCED USE>.
228 =item B<-i, --cache-ignore-case>
230 Ignore case-differences while matching records.
232 =item B<-D, --cache-remove-record>
234 Remove matched records from cache.
236 =item B<--cache-clear>
238 Truncate cache records.
240 =item B<--no-cache>
242 Disable all (read and write) cache use.
244 =back
246 B<Download Options>
248 =over 4
250 =item B<-f, --format>=I<format>
252 Extract I<format> of the video. If I<format> is set to I<best>, clive
253 will attempt to extract the best video quality.
255 Note that the I<format> is strictly host specific. See the L</FORMATS>
256 section for more on this.
258 =item B<-n, --no-extract>
260 Do not extract the video.
262 =item B<-O, --output-file>=I<file>
264 Write video to I<file>. Overwrites an already existing file.
266 Do not use this option when you are downloading more than
267 one video on one go.
269 See also the note below.
271 =item B<-c, --continue>
273 Continue partially downloaded video file.
275 Note that, by default, clive appends a numeric suffix to
276 the filename if the file exists already. That is unless
277 one of the following conditions is met:
279 * file is already completely retrieved
280 * -c or -O is used
282 =item B<--save-dir>=I<dir>
284 Save extracted videos to I<dir>.
286 =item B<--cclass>=I<class>
288 Use character-I<class> to filter video page titles. Defaults to "\w".
289 Refer to the Perl regular expressions (character classes) for more on
290 this.
292 =item B<-C, --no-cclass>
294 Negates the use of the character-class.
296 =item B<-R, --raw>
298 Process video page HTML as it is. Technically, this means that
299 clive will not try to decode the HTML using Encode::decode_utf8.
300 This may be useful with some of the supported host such as
301 Cctv which uses GBK as its HTML encoding.
303 =item B<--filename-format>=I<format-string>
305 Use I<format-string> to format output video filenames. Defaults to "%t.%s".
307 Supported format specifiers:
308 %t .. video page title (after applying character-class filter)
309 %s .. video file suffix (e.g. "flv")
310 %i .. video id
311 %h .. video host
313 =item B<--exec>=I<command>;
315 Defines the I<command> to run when video file transfer completes.
316 Note that B<--exec-run> must be used to actually cause clive
317 to invoke the defined I<command>.
319 Optional arguments may be passed to the command. The expression must be
320 terminated by a semicolon (";"). If the specifier "%i" appears anywhere
321 in the I<command>, it is replaced by the pathname of the extracted
322 video file.
324 =item B<--exec>=I<command>+
326 Same as above but "%i" is replaced with as many path names as
327 possible for the invocation of I<command>.
329 =item B<-e, --exec-run>
331 Causes clive to invoke the command defined with B<--exec> when
332 transfer finishes.
334 =item B<--stream>=I<percent>
336 Execute --stream-exec=I<command> when file transfer reaches
337 I<percent>.
339 =item B<--stream-exec>=I<command>
341 Execute (fork a child process) I<command> while transferring
342 video file. This "simulates" streaming the media but does so
343 without checking for buffer underruns so make sure you set
344 --stream=I<percent> high enough and that you have a fast
345 internet connection.
347 clive will not attempt to re-execute the command if it
348 terminates before the file transfer finishes.
350 clive will wait that the child process terminates before
351 it moves on to extract another file or exits if there
352 are not any left.
354 Note that some video file formats (namely Google mp4) are
355 known to B<not> to work with this feature.
357 =item B<--limit-rate>=I<amount>
359 Limit transfer rate to I<amount> KB/s.
361 =item B<--stop-after>=I<size|percent>
363 Stop file transfer after I<size> or I<percent>. The value must
364 be terminated by either '%' or 'M'.
366 =back
368 =head1 EXAMPLES - BASIC USE
370 =over 4
372 =item clive "http://youtube.com/watch?v=3HD220e0bx4"
374 Extracts video (flv) from the above video page link. You
375 can then play the flv video file in a media player.
377 =item cat E<gt> url.lst
379 http://en.sevenload.com/videos/IUL3gda-Funny-Football-Clips
380 http://youtube.com/watch?v=3HD220e0bx4
381 http://break.com/index/beach-tackle-whip-lash.html
382 http://www.liveleak.com/view?i=704_1228511265
384 =item cat url.lst | clive
386 You can feed clive multiple video page links like this
387 or via command line:
389 =item clive URL URL2 URL3 URL4
391 When you are using the pipes, be sure to separate each link
392 with a newline.
394 =item xclip -o | clive
396 Many X clipboard utilities exist. This example uses C<xclip(1)>
397 and a pipe to paste the contents of the clipboard to clive.
399 =item clive -l
401 Recall last video page link input.
403 =back
405 =head1 EXAMPLES - ADVANCED USE
407 =over 4
409 =item clive -f best "http://youtube.com/watch?v=3HD220e0bx4"
411 Extract the best format of the video.
413 =item clive -r -f best "http://youtube.com/watch?v=3HD220e0bx4"
415 Causes clive to skip re-fetching the video page and parsing it.
416 Instead it reuses the record saved in the cache from the previous
417 time.
419 =item clive --cache-dump
421 Dump cache records to stdout. You can use --cache-dump-format
422 to format the output.
424 =item clive -ig 3hd2
426 Grep for "3hd2" pattern in cache records. If pattern
427 matches, clive continues to extract the matched videos.
428 Note the use of "-i" (--cache-ignore-case).
430 =item clive -ig 3hd2 -D
432 Same as above but removes the record from cache instead of
433 extracting the video.
435 =item clive --exec="ffmpeg -i %i %i.mp3;" -e URL
437 Extract video and use C<ffmpeg(1)> to copy audio from it
438 to mp3 file.
440 =item clive --stream-exec="mplayer -really-quiet %i"
441 --stream=25 URL
443 Start playing the video being extracted when the transfer
444 reaches 25% complete.
446 =back
448 =head1 FORMATS
450 clive extracts flv (typically 320x240) by default from all
451 supported websites. Some of them support also other formats.
453 =over 4
455 =item B<youtube.com>
457 =item B<last.fm>
459 Format: (flv|fmt17|fmt18|fmt22|fmt35)
461 flv (fmt34) and fmt18 (mp4) are usually available. Others may be
462 available. At the time of writing this, the following formats
463 are recognized by both clive and Youtube:
465 fmt22 .. mp4 (1280x720) (HD)
466 fmt35 .. flv (640x380) (HQ)
467 fmt18 .. mp4 (480x360)
468 flv .. fmt34 (320x180)
469 fmt17 .. 3gp (176x144)
471 Note that you can use alternative format IDs for Youtube:
473 fmt22 .. hd
474 fmt35 .. hq
475 fmt18 .. mp4
476 fmt17 .. 3gp
478 For example:
479 % clive -f hd URL
481 Which is equivalent to:
482 % clive -f fmt22 URL
484 Some of the videos available at last.fm are actually Youtube
485 videos. clive can handle such video links.
487 =item B<video.google.com>
489 Format: (flv|mp4)
491 mp4 format is available for a limited number of videos.
493 =item B<dailymotion.com>
495 Format: (flv|spak-mini|vp6-hq|vp6-hd|vp6|h264)
497 The HD and HQ videos may not always be available.
499 ON2-1280x720 (vp6-hd)
500 ON2-848x480 (vp6-hq)
501 H264-512x384 (h264)
502 ON2-320x240 (vp6)
503 FLV-320x240 (flv/spark)
504 FLV-80x60 (spak-mini)
506 =item B<vimeo.com>
508 Format: (flv|hd)
510 HD should be available for the vimeo.com/hd channel videos at least.
511 Note that "flv" only means "default" here, as some of the hosted
512 videos are encoded (as default) in other video formats such as
513 "mp4" rather than "flv".
515 For further reading:
516 http://vimeo.com/help/hd
518 =item B<evisor.tv>
520 =item B<liveleak.com>
522 =item B<tv.cctv.com>
524 =item B<sevenload.com>
526 =item B<break.com>
528 =item B<redtube.com>
530 Format: flv
532 =back
534 =head1 FILES
536 Should HOME environment variable be undefined for some
537 reason, clive will use the current working directory
538 instead.
540 =over 4
542 =item $HOME/.cliverc, $HOME/.clive/config, $HOME/.config/clive/config
544 User configuration file. For example:
545 % cat >> ~/.cliverc
546 -f best
547 --proxy=http://foo:1234
549 =item $HOME/.cache/clive/last
551 File containing the last user input (video page links).
553 You can use --last-file to override the path, e.g.:
554 --last-file=/path/to/last/file.
556 You can also define this option in the config file.
558 See also CLIVE_CACHE notes below.
560 =item $HOME/.cache/clive/cache
562 BerkeleyDB based cache file containing the records of fetched
563 and parsed video pages.
565 You can use --cache-file to override the path., e.g.:
566 --cache-file=/path/to/cache/file.
568 You can also define this option in the config file.
570 See also CLIVE_CACHE notes below.
572 =item Notes: CLIVE_CACHE
574 clive defaults to use $HOME/.cache/clive/ for "last" and
575 "cache" files described above.
577 The use of the default path can be overridden by
578 CLIVE_CACHE environment variable. Note that clive
579 will attempt to create the specified path recursively.
581 Examples:
582 setenv CLIVE_CACHE /home/user/cachedata (in csh terms)
583 clive # will read/write /home/user/cachedata/(last|cache)
585 unsetenv CLIVE_CACHE
586 clive # read/write $HOME/clive/(last|cache)
588 clive --last-file=mylast --cache-file=cachedata/mycache
589 # read/write "mylast" file, read/write cachedata/mycache file
591 =back
593 =head1 CACHE
596 The purpose of the cache is to allow clive to skip fetching
597 and parsing the video page again. It does not contain any
598 actual video data so one should not expect to recover a deleted
599 video file from the cache. Only some of the parsed details
600 are stored as records to the cache.
602 By now, it is should be a well known fact that the cache fails
603 with some of the supported hosts. For example Youtube video links
604 expire after some time, this causes the re-extraction to fail if
605 the cached video link is used later again.
607 This was the main reason why in 2.2.0 reading from cache was
608 disabled by default. Many users reported the reuse of expired
609 video links as a bug previously even though it was well documented
610 in the manual page explaining that most of the HTTP 403/404 errors
611 were actually caused by expired video links.
613 It is, of course, still possible to read from cache. You can
614 enable this by invoking the --cache-read option. This causes
615 clive to look up a saved cache record and reuse the stored
616 video details if they are found instead of fetching the video
617 page.
619 The use of the cache can be disabled with the --no-cache option.
620 This disables both read and write. Note that if the BerkeleyDB Perl
621 module is not installed, clive will not use the cache.
623 See also the --cache-grep option.
625 =head1 BUGS
627 Sure to be some.
629 Please report them to the issue tracker at:
630 <http://code.google.com/p/clive/issues/>
632 =head1 EXIT STATUS
634 clive exits 0 on success, and E<gt>0 if an error occurs.
636 CLIVE_OK = 0
637 CLIVE_NOTHINGTODO = 1 # file already retrieved
638 CLIVE_NOSUPPORT = 2 # host not supported
639 CLIVE_READ = 3 # file open/read error
640 CLIVE_GREP = 4 # grep: nothing matched in cache
641 CLIVE_OPTARG = 5 # invalid option argument
642 CLIVE_SYSTEM = 6 # system call failed (e.g. fork)
643 CLIVE_REGEXP = 7 # regexp pattern matching failed
644 CLIVE_FORMAT = 8 # requested format unavailable
645 CLIVE_NET = 9 # network error
646 CLIVE_STOP = 10 # --stop-after
648 =head1 OTHER
650 Project page:
651 <http://code.google.com/p/clive/>
653 Front-end (GUI):
654 <http://code.google.com/p/abby/>
656 Development repository can be cloned with:
657 % git clone git://repo.or.cz/clive.git
659 =head1 HISTORY
661 clive was originally written in Python but later (2.0.0)
662 rewritten in Perl. The project started as a workaround
663 to Youtube Adobe flash-plugin requirement which was
664 poorly supported on Unix-like systems.
666 =head1 AUTHOR
668 Toni Gundogdu <legatvs@gmail.com>
670 Thanks to all those who have contributed to the project over the
671 years by sending patches, reporting bugs and writing feedback.
672 You know who you are.
674 =cut