|Contents||Bulletin||Scripting in shell and Perl||Network troubleshooting||History||Humor|
|News||Recommended Books||Recommended Links||Reference|
|ln command||Horror Stories||Unix History||Humor|
Wget is a utility designed for retrieving binary documents across the Web, through the use of HTTP (Hyper Text Transfer Protocol) and FTP (File Transfer Protocol), and saving them to disk. Wget is non-interactive, which means it can work in the background, while the user is not logged in, unlike most of web browsers (thus you may start the program and log off, letting it do its work). Analyzing server responses, it distinguishes between correctly and incorrectly retrieved documents, and retries retrieving them as many times as necessary, or until a user-specified limit is reached. REST is used in FTP on hosts that support it. Proxy servers are supported to speed up the retrieval and lighten network load. Wget supports the use of initialization file .wgetrc.
Wget supports a full-featured recursion mechanism, through which you can retrieve large parts of the web, creating local copies of remote directory hierarchies. Of course, maximum level of recursion and other parameters can be specified. Infinite recursion loops are always avoided by hashing the retrieved data. All of this works for both HTTP and FTP.
The retrieval is conveniently traced with printing dots, each dot representing one kilobyte of data received. Builtin features offer mechanisms to tune which links you wish to follow (cf. -L, -D and -H).
http_proxy, ftp_proxy, no_proxy, WGETRC, HOME
Wget supports the use of initialization file .wgetrc. First a system-wide init file will be looked for (/usr/local/lib/wgetrc by default) and loaded. Then the user's file will be searched for in two places: In the environmental variable WGETRC (which is presumed to hold the full pathname) and $HOME/.wgetrc. Note that the settings in user's startup file may override the system settings, which includes the quota settings (he he).
The syntax of each line of startup file is simple:variable = value
Valid values are different for different variables. The complete set of commands is listed below, the letter after equation-sign denoting the value the command takes. It is on/off for on or off (which can also be 1 or 0), string for any string or N for positive integer. For example, you may specify "use_proxy = off" to disable use of proxy servers by default. You may use inf for infinite value (the role of 0 on the command line), where appropriate. The commands are case-insensitive and underscore-insensitive, thus DIr_Prefix is the same as dirprefix. Empty lines, lines consisting of spaces, or lines beginning with '#' are skipped.
Most of the commands have their equivalent command-line option, except some more obscure or rarely used ones. A sample init file is provided in the distribution, named sample.wgetrc.
accept/reject = stringSame as -A/-R.
add_hostdir = on/offEnable/disable host-prefixed hostnames. -nH disables it.
always_rest = on/offEnable/disable continuation of the retrieval, the same as -c.
base = stringSet base for relative URL-s, the same as -B.
convert links = on/offConvert non-relative links locally. The same as -k.
debug = on/offDebug mode, same as -d.
dir_mode = NSet permission modes of created subdirectories (default is 755).
dir_prefix = stringTop of directory tree, the same as -P.
dirstruct = on/offTurning dirstruct on or off, the same as -x or -nd, respectively.
domains = stringSame as -D.
follow_ftp = on/offFollow FTP links from HTML documents, the same as -f.
force_html = on/offIf set to on, force the input filename to be regarded as an HTML document, the same as -F.
ftp_proxy = stringUse the string as FTP proxy, instead of the one specified in environment.
glob = on/offTurn globbing on/off, the same as -g.
header = stringDefine an additional header, like --header.
http_passwd = stringSet HTTP password.
http_proxy = stringUse the string as HTTP proxy, instead of the one specified in environment.
http_user = stringSet HTTP user.
input = stringRead the URL-s from filename, like -i.
kill_longer = on/offConsider data longer than specified in content-length header as invalid (and retry getting it). The default behaviour is to save as much data as there is, provided there is more than or equal to the value in content-length.
logfile = stringSet logfile, the same as -o.
login = stringYour user name on the remote machine, for FTP. Defaults to "anonymous".
mirror = on/offTurn mirroring on/off. The same as -m.
noclobber = on/offSame as -nc.
no_parent = on/offSame as --no-parent.
no_proxy = stringUse the string as the comma-separated list of domains to avoid in proxy loading, instead of the one specified in environment.
num_tries = NSet number of retries per URL, the same as -t.
output_document = stringSet the output filename, the same as -O.
passwd = stringYour password on the remote machine, for FTP. Defaults to firstname.lastname@example.org.
quiet = on/offQuiet mode, the same as -q.
quota = quotaSpecify the download quota, which is useful to put in /usr/local/lib/wgetrc. When download quota is specified, wget will stop retrieving after the download sum has become greater than quota. The quota can be specified in bytes (default), kbytes ('k' appended) or mbytes ('m' appended). Thus "quota = 5m" will set the quota to 5 mbytes. Note that the user's startup file overrides system settings.
reclevel = NRecursion level, the same as -l.
recursive = on/offRecursive on/off, the same as -r.
relative_only = on/offFollow only relative links (the same as -L). Refer to section FOLLOWING LINKS for a more detailed description.
robots = on/offUse (or not) robots.txt file.
server_response = on/offChoose whether or not to print the HTTP and FTP server responses, the same as -S.
simple_host_check = on/offSame as -nh.
span_hosts = on/offSame as -H.
timeout = NSet timeout value, the same as -T.
timestamping = on/offTurn timestamping on/off. The same as -N.
use_proxy = on/offTurn proxy support on/off. The same as -Y.
verbose = on/offTurn verbose on/off, the same as -v/-nv.
Wget also support proxy. It can take proxy settings from the environment or they can be specified explicitly. Here is how to specify proxy settings via environment:
export http_proxy="http://www-proxy.mycompany.com:8081/" export https_proxy="https://www-proxy.mycompany.com:8081/"
If proxy has authentication you need also use two parameters (note the use minus sign not underscore):
Most of the URL conventions described in RFC1738 are supported. Two alternative syntaxes are also supported, which means you can use three forms of address to specify a file:
Normal URL (recommended form):
FTP only (ncftp-like): hostname:/dir/file
HTTP only (netscape-like):
You may encode your username and/or password to URL using
If you do not understand these syntaxes, just use the plain ordinary syntax with which you would call lynx or netscape. Note that the alternative forms are deprecated, and may cease being supported in the future.
There are quite a few command-line options for wget. Note that you do not have to know or to use them unless you wish to change the default behaviour of the program. For simple operations you need no options at all. It is also a good idea to put frequently used command-line options in .wgetrc, where they can be stored in a more readable form.
This is the complete list of options with descriptions, sorted in descending order of importance:
-h --helpPrint a help screen. You will also get help if you do not supply command-line arguments.
-V --versionDisplay version of wget.
-v --verboseVerbose output, with all the available data. The default output consists only of saving updates and error messages. If the output is stdout, verbose is default.
-q --quietQuiet mode, with no output at all.
-d --debugDebug output, and will work only if wget was compiled with -DDEBUG. Note that when the program is compiled with debug output, it is not printed unless you specify -d.
-i filename --input-file=filenameRead URL-s from filename, in which case no URL-s need to be on the command line. If there are URL-s both on the command line and in a filename, those on the command line are first to be retrieved. The filename need not be an HTML document (but no harm if it is) - it is enough if the URL-s are just listed sequentially.
However, if you specify --force-html, the document will be regarded as HTML. In that case you may have problems with relative links, which you can solve either by adding to the document or by specifying --base=url on the command-line.
-o logfile --output-file=logfileLog messages to logfile, instead of default stdout. Verbose output is now the default at logfiles. If you do not wish it, use -nv (non-verbose).
-a logfile --append-output=logfileAppend to logfile - same as -o, but appends to a logfile (or creating a new one if the old does not exist) instead of rewriting the old log file.
-t num --tries=numSet number of retries to num. Specify 0 for infinite retrying.
--follow-ftpFollow FTP links from HTML documents.
-c --continue-ftpContinue retrieval of FTP documents, from where it was left off. If you specify "wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z", and there is already a file named ls-lR.Z in the current directory, wget continue retrieval from the offset equal to the length of the existing file. Note that you do not need to specify this option if the only thing you want is wget to continue retrieving where it left off when the connection is lost - wget does this by default. You need this option when you want to continue retrieval of a file already halfway retrieved, saved by other FTP software, or left by wget being killed.
-g on/off --glob=on/offTurn FTP globbing on or off. By default, globbing will be turned on if the URL contains a globbing characters (an asterisk, e.g.). Globbing means you may use the special characters (wildcards) to retrieve more files from the same directory at once, like wget ftp://gnjilux.cc.fer.hr/*.msg. Globbing currently works only on UNIX FTP servers.
-e command --execute=commandExecute command, as if it were a part of .wgetrc file. A command invoked this way will take precedence over the same command in .wgetrc, if there is one.
-N --timestampingUse the so-called time-stamps to determine whether to retrieve a file. If the last-modification date of the remote file is equal to, or older than that of local file, and the sizes of files are equal, the remote file will not be retrieved. This option is useful for weekly mirroring of HTTP or FTP sites, since it will not permit downloading of the same file twice.
-F --force-htmlWhen input is read from a file, force it to be HTML. This enables you to retrieve relative links from existing HTML files on your local disk, by adding to HTML, or using --base.
-B base href --base=base hrefUse base href as base reference, as if it were in the file, in the form . Note that the base in the file will take precedence over the one on the command-line.
-r --recursiveRecursive web-suck. According to the protocol of the URL, this can mean two things. Recursive retrieval of a HTTP URL means that Wget will download the URL you want, parse it as an HTML document (if an HTML document it is), and retrieve the files this document is referring to, down to a certain depth (default 5; change it with -l). Wget will create a hierarchy of directories locally, corresponding to the one found on the HTTP server.
This option is ideal for presentations, where slow connections should be bypassed. The results will be especially good if relative links were used, since the pages will then work on the new location without change.
When using this option with an FTP URL, it will retrieve all the data from the given directory and subdirectories, similar to HTTP recursive retrieval.
You should be warned that invoking this option may cause grave overloading of your connection. The load can be minimized by lowering the maximal recursion level (see -l) and/or by lowering the number of retries (see -t).
-m --mirrorTurn on mirroring options. This will set recursion and time-stamping, combining -r and -N.
-l depth --level=depthSet recursion depth level to the specified level. Default is 5. After the given recursion level is reached, the sucking will proceed from the parent. Thus specifying -r -l1 should equal a recursion-less retrieve from file. Setting the level to zero makes recursion depth (theoretically) unlimited. Note that the number of retrieved documents will increase exponentially with the depth level.
-H --span-hostsEnable spanning across hosts when doing recursive retrieving. See -r and -D. Refer to FOLLOWING LINKS for a more detailed description.
-L --relativeFollow only relative links. Useful for retrieving a specific homepage without any distractions, not even those from the same host. Refer to FOLLOWING LINKS for a more detailed description.
-D domain-list --domains=domain-listSet domains to be accepted and DNS looked-up, where domain-list is a comma-separated list. Note that it does not turn on -H. This speeds things up, even if only one host is spanned. Refer to FOLLOWING LINKS for a more detailed description.
-A acclist / -R rejlist --accept=acclist / --reject=rejlistComma-separated list of extensions to accept/reject. For example, if you wish to download only GIFs and JPEGs, you will use -A gif,jpg,jpeg. If you wish to download everything except cumbersome MPEGs and .AU files, you will use -R mpg,mpeg,au.
-X list --exclude-directories listComma-separated list of directories to exclude from FTP fetching.
-P prefix --directory-prefix=prefixSet directory prefix ("." by default) to prefix. The directory prefix is the directory where all other files and subdirectories will be saved to.
-T value --timeout=valueSet the read timeout to a specified value. Whenever a read is issued, the file descriptor is checked for a possible timeout, which could otherwise leave a pending connection (uninterrupted read). The default timeout is 900 seconds (fifteen minutes).
-Y on/off --proxy=on/offTurn proxy on or off. The proxy is on by default if the appropriate environmental variable is defined.
-Q quota[KM] --quota=quota[KM]Specify download quota, in bytes (default), kilobytes or megabytes. More useful for rc file. See below.
-O filename --output-document=filenameThe documents will not be written to the appropriate files, but all will be appended to a unique file name specified by this option. The number of tries will be automatically set to 1. If this filename is `-', the documents will be written to stdout, and --quiet will be turned on. Use this option with caution, since it turns off all the diagnostics Wget can otherwise give about various errors.
-S --server-responsePrint the headers sent by the HTTP server and/or responses sent by the FTP server.
-s --save-headersSave the headers sent by the HTTP server to the file, before the actual contents.
Define an additional header. You can define more than additional headers. Do not try to terminate the header with CR or LF.
Use these two options to set username and password Wget will send to HTTP servers. Wget supports only the basic WWW authentication scheme.
Non-verbose - turn off verbose without being completely quiet (use -q for that), which means that error messages and basic information still get printed.
Do not create a hierarchy of directories when retrieving recursively. With this option turned on, all files will get saved to the current directory, without clobbering (if a name shows up more than once, the filenames will get extensions .n).
The opposite of -nd -- Force creation of a hierarchy of directories even if it would not have been done otherwise.
Disable time-consuming DNS lookup of almost all hosts. Refer to FOLLOWING LINKS for a more detailed description.
Do not ascend to parent directory.
Convert the non-relative links to relative ones locally.
Recursive retrieving has a mechanism that allows you to specify which links wget will follow.
Only relative linksWhen only relative links are followed (option -L), recursive retrieving will never span hosts. will never get called, and the process will be very fast, with the minimum strain of the network. This will suit your needs most of the time, especially when mirroring the output the output of *2html converters, which generally produce only relative links.
Host checkingThe drawback of following the relative links solely is that humans often tend to mix them with absolute links to the very same host, and the very same page. In this mode (which is the default), all URL-s that refer to the same host will be retrieved.
The problem with this options are the aliases of the hosts and domains. Thus there is no way for wget to know that regoc.srce.hr and www.srce.hr are the same hosts, or that fly.cc.fer.hr is the same as fly.cc.etf.hr. Whenever an absolute link is encountered, gethostbyname is called to check whether we are really on the same host. Although results of gethostbyname are hashed, so that it will never get called twice for the same host, it still presents a nuisance e.g. in the large indexes of difference hosts, when each of them has to be looked up. You can use -nh to prevent such complex checking, and then wget will just compare the hostname. Things will run much faster, but also much less reliable.
Domain acceptanceWith the -D option you may specify domains that will be followed. The nice thing about this option is that hosts that are not from those domains will not get DNS- looked up. Thus you may specify -Dmit.edu, just to make sure that nothing outside .mit.edu gets looked up . This is very important and useful. It also means that -D does not imply -H (it must be explicitly specified). Feel free to use this option, since it will speed things up greatly, with almost all the reliability of host checking of all hosts.
Of course, domain acceptance can be used to limit the retrieval to particular domains, but freely spanning hosts within the domain, but then you must explicitly specify -H.
All hostsWhen -H is specified without -D, all hosts are being spanned. It is useful to set the recursion level to a small value in those cases. Such option is rarely useful.
FTP The rules for FTP are somewhat specific, since they have to be. To have FTP links followed from HTML documents, you must specify -f (follow_ftp). If you do specify it, FTP links will be able to span hosts even if span_hosts is not set. Option relative_only (-L) has no effect on FTP. However, domain acceptance (-D) and suffix rules (-A/-R) still apply.
Wget will catch the SIGHUP (hangup signal) and ignore it. If the output was on stdout, it will be redirected to a file named wget-log_. This is also convenient when you wish to redirect the output of Wget interactively.$ wget http://www.ifi.uio.no/~larsi/gnus.tar.gz &Wget will not try to handle any signals other than SIGHUP. Thus you may interrupt Wget using ^C or SIGTERM.
$ kill -HUP %% # to redirect the output
Get URL http://fly.cc.fer.hr/:wget http://fly.cc.fer.hr/
Force non-verbose output:wget -nv http://fly.cc.fer.hr/
Unlimit number of retries:wget -t0 http://www.yahoo.com/
Create a mirror image of fly's web (with the same directory structure the original has), up to six recursion levels, with only one try per document, saving the verbose output to log file 'log':wget -r -l6 -t1 -o log http://fly.cc.fer.hr/
Retrieve from yahoo host only (depth 50):wget -r -l50 http://www.yahoo.com/
is the author of Wget. Thanks to the beta testers and all the other people who helped with useful suggestions.
"Single-threaded downloading has its benefits, especially when Wget is concerned. Other download managers have internal databases to help them keep track of which parts of files are already downloaded. Wget gets this information simply by scanning a file's size. This means that Wget is able to continue downloading a file which another application started to download; most other download managers lack this feature. Usually I start by downloading a file with my browser, and if it is too large, I stop downloading and finish it later with Wget."
Command Line. Advanced users will be able to continue to access files from external ftp sites using the wget command.
Use the --proxy-user=USER --proxy-passwd=PASSWORD command line options.
Softpanorama hot topic of the month
Wget - Wikipedia, the free encyclopedia
GNU Wget - GNU Project - Free Software Foundation (FSF)
Linux.com Make Wget cater to your needs
Wget's WebsiteWGET for Windows
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit exclusivly for research and educational purposes. If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner.
ABUSE: IPs or network segments from which we detect a stream of probes might be blocked for no less then 90 days. Multiple types of probes increase this period.
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Haterís Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least
Copyright © 1996-2016 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License.
Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info|
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.
Last modified: October 20, 2015