May the source be with you, but remember the KISS principle ;-)

Contents Bulletin Scripting in shell and Perl Network troubleshooting History Humor


News Recommended Books Recommended Links Reference

cp command

Curl eLinks Lynx w3m
ln command Horror Stories Unix History Humor



Wget is a utility designed for retrieving binary documents across the Web, through the use of HTTP (Hyper Text Transfer Protocol) and FTP (File Transfer Protocol), and saving them to disk. Wget is non-interactive, which means it can work in the background, while the user is not logged in, unlike most of web browsers (thus you may start the program and log off, letting it do its work). Analyzing server responses, it distinguishes between correctly and incorrectly retrieved documents, and retries retrieving them as many times as necessary, or until a user-specified limit is reached. REST is used in FTP on hosts that support it. Proxy servers are supported to speed up the retrieval and lighten network load. Wget supports the use of initialization file .wgetrc.

Wget supports a full-featured recursion mechanism, through which you can retrieve large parts of the web, creating local copies of remote directory hierarchies. Of course, maximum level of recursion and other parameters can be specified. Infinite recursion loops are always avoided by hashing the retrieved data. All of this works for both HTTP and FTP.

The retrieval is conveniently traced with printing dots, each dot representing one kilobyte of data received. Builtin features offer mechanisms to tune which links you wish to follow (cf. -L, -D and -H). 

Environment variables used

http_proxy, ftp_proxy, no_proxy, WGETRC, HOME


Startup files used

/usr/local/lib/wgetrc, $HOME/.wgetrc

Startup file .wgetrc

Wget supports the use of initialization file .wgetrc. First a system-wide init file will be looked for (/usr/local/lib/wgetrc by default) and loaded. Then the user's file will be searched for in two places: In the environmental variable WGETRC (which is presumed to hold the full pathname) and $HOME/.wgetrc. Note that the settings in user's startup file may override the system settings, which includes the quota settings (he he).

The syntax of each line of startup file is simple:

variable = value

Valid values are different for different variables. The complete set of commands is listed below, the letter after equation-sign denoting the value the command takes. It is on/off for on or off (which can also be 1 or 0), string for any string or N for positive integer. For example, you may specify "use_proxy = off" to disable use of proxy servers by default. You may use inf for infinite value (the role of 0 on the command line), where appropriate. The commands are case-insensitive and underscore-insensitive, thus DIr_Prefix is the same as dirprefix. Empty lines, lines consisting of spaces, or lines beginning with '#' are skipped.

Most of the commands have their equivalent command-line option, except some more obscure or rarely used ones. A sample init file is provided in the distribution, named sample.wgetrc.

accept/reject = string

Same as -A/-R.

add_hostdir = on/off

Enable/disable host-prefixed hostnames. -nH disables it.

always_rest = on/off

Enable/disable continuation of the retrieval, the same as -c.

base = string

Set base for relative URL-s, the same as -B.

convert links = on/off

Convert non-relative links locally. The same as -k.

debug = on/off

Debug mode, same as -d.

dir_mode = N

Set permission modes of created subdirectories (default is 755).

dir_prefix = string

Top of directory tree, the same as -P.

dirstruct = on/off

Turning dirstruct on or off, the same as -x or -nd, respectively.

domains = string

Same as -D.

follow_ftp = on/off

Follow FTP links from HTML documents, the same as -f.

force_html = on/off

If set to on, force the input filename to be regarded as an HTML document, the same as -F.

ftp_proxy = string

Use the string as FTP proxy, instead of the one specified in environment.

glob = on/off

Turn globbing on/off, the same as -g.

header = string

Define an additional header, like --header.

http_passwd = string

Set HTTP password.

http_proxy = string

Use the string as HTTP proxy, instead of the one specified in environment.

http_user = string

Set HTTP user.

input = string

Read the URL-s from filename, like -i.

kill_longer = on/off

Consider data longer than specified in content-length header as invalid (and retry getting it). The default behaviour is to save as much data as there is, provided there is more than or equal to the value in content-length.

logfile = string

Set logfile, the same as -o.

login = string

Your user name on the remote machine, for FTP. Defaults to "anonymous".

mirror = on/off

Turn mirroring on/off. The same as -m.

noclobber = on/off

Same as -nc.

no_parent = on/off

Same as --no-parent.

no_proxy = string

Use the string as the comma-separated list of domains to avoid in proxy loading, instead of the one specified in environment.

num_tries = N

Set number of retries per URL, the same as -t.

output_document = string

Set the output filename, the same as -O.

passwd = string

Your password on the remote machine, for FTP. Defaults to username@hostname.domainname.

quiet = on/off

Quiet mode, the same as -q.

quota = quota

Specify the download quota, which is useful to put in /usr/local/lib/wgetrc. When download quota is specified, wget will stop retrieving after the download sum has become greater than quota. The quota can be specified in bytes (default), kbytes ('k' appended) or mbytes ('m' appended). Thus "quota = 5m" will set the quota to 5 mbytes. Note that the user's startup file overrides system settings.

reclevel = N

Recursion level, the same as -l.

recursive = on/off

Recursive on/off, the same as -r.

relative_only = on/off

Follow only relative links (the same as -L). Refer to section FOLLOWING LINKS for a more detailed description.

robots = on/off

Use (or not) robots.txt file.

server_response = on/off

Choose whether or not to print the HTTP and FTP server responses, the same as -S.

simple_host_check = on/off

Same as -nh.

span_hosts = on/off

Same as -H.

timeout = N

Set timeout value, the same as -T.

timestamping = on/off

Turn timestamping on/off. The same as -N.

use_proxy = on/off

Turn proxy support on/off. The same as -Y.

verbose = on/off

Turn verbose on/off, the same as -v/-nv.


Using proxy

Wget also support proxy. It can take proxy settings from the environment or they can be specified explicitly. Here is how to specify proxy settings via environment:

export http_proxy=""
export https_proxy=""

If proxy has authentication you need also use  two parameters (note the use minus sign not underscore):




URL Conventions

Most of the URL conventions described in RFC1738 are supported. Two alternative syntaxes are also supported, which means you can use three forms of address to specify a file:

Normal URL (recommended form):


FTP only (ncftp-like): hostname:/dir/file

HTTP only (netscape-like):

You may encode your username and/or password to URL using
the form:


If you do not understand these syntaxes, just use the plain ordinary syntax with which you would call lynx or netscape. Note that the alternative forms are deprecated, and may cease being supported in the future.


There are quite a few command-line options for wget. Note that you do not have to know or to use them unless you wish to change the default behaviour of the program. For simple operations you need no options at all. It is also a good idea to put frequently used command-line options in .wgetrc, where they can be stored in a more readable form.

This is the complete list of options with descriptions, sorted in descending order of importance:

-h --help

Print a help screen. You will also get help if you do not supply command-line arguments.

-V --version

Display version of wget.

-v --verbose

Verbose output, with all the available data. The default output consists only of saving updates and error messages. If the output is stdout, verbose is default.

-q --quiet

Quiet mode, with no output at all.

-d --debug

Debug output, and will work only if wget was compiled with -DDEBUG. Note that when the program is compiled with debug output, it is not printed unless you specify -d.

-i filename --input-file=filename

Read URL-s from filename, in which case no URL-s need to be on the command line. If there are URL-s both on the command line and in a filename, those on the command line are first to be retrieved. The filename need not be an HTML document (but no harm if it is) - it is enough if the URL-s are just listed sequentially.

However, if you specify --force-html, the document will be regarded as HTML. In that case you may have problems with relative links, which you can solve either by adding to the document or by specifying --base=url on the command-line.

-o logfile --output-file=logfile

Log messages to logfile, instead of default stdout. Verbose output is now the default at logfiles. If you do not wish it, use -nv (non-verbose).

-a logfile --append-output=logfile

Append to logfile - same as -o, but appends to a logfile (or creating a new one if the old does not exist) instead of rewriting the old log file.

-t num --tries=num

Set number of retries to num. Specify 0 for infinite retrying.


Follow FTP links from HTML documents.

-c --continue-ftp

Continue retrieval of FTP documents, from where it was left off. If you specify "wget -c", and there is already a file named ls-lR.Z in the current directory, wget continue retrieval from the offset equal to the length of the existing file. Note that you do not need to specify this option if the only thing you want is wget to continue retrieving where it left off when the connection is lost - wget does this by default. You need this option when you want to continue retrieval of a file already halfway retrieved, saved by other FTP software, or left by wget being killed.

-g on/off --glob=on/off

Turn FTP globbing on or off. By default, globbing will be turned on if the URL contains a globbing characters (an asterisk, e.g.). Globbing means you may use the special characters (wildcards) to retrieve more files from the same directory at once, like wget*.msg. Globbing currently works only on UNIX FTP servers.

-e command --execute=command

Execute command, as if it were a part of .wgetrc file. A command invoked this way will take precedence over the same command in .wgetrc, if there is one.

-N --timestamping

Use the so-called time-stamps to determine whether to retrieve a file. If the last-modification date of the remote file is equal to, or older than that of local file, and the sizes of files are equal, the remote file will not be retrieved. This option is useful for weekly mirroring of HTTP or FTP sites, since it will not permit downloading of the same file twice.

-F --force-html

When input is read from a file, force it to be HTML. This enables you to retrieve relative links from existing HTML files on your local disk, by adding to HTML, or using --base.

-B base href --base=base href

Use base href as base reference, as if it were in the file, in the form . Note that the base in the file will take precedence over the one on the command-line.

-r --recursive

Recursive web-suck. According to the protocol of the URL, this can mean two things. Recursive retrieval of a HTTP URL means that Wget will download the URL you want, parse it as an HTML document (if an HTML document it is), and retrieve the files this document is referring to, down to a certain depth (default 5; change it with -l). Wget will create a hierarchy of directories locally, corresponding to the one found on the HTTP server.

This option is ideal for presentations, where slow connections should be bypassed. The results will be especially good if relative links were used, since the pages will then work on the new location without change.

When using this option with an FTP URL, it will retrieve all the data from the given directory and subdirectories, similar to HTTP recursive retrieval.

You should be warned that invoking this option may cause grave overloading of your connection. The load can be minimized by lowering the maximal recursion level (see -l) and/or by lowering the number of retries (see -t).

-m --mirror

Turn on mirroring options. This will set recursion and time-stamping, combining -r and -N.

-l depth --level=depth

Set recursion depth level to the specified level. Default is 5. After the given recursion level is reached, the sucking will proceed from the parent. Thus specifying -r -l1 should equal a recursion-less retrieve from file. Setting the level to zero makes recursion depth (theoretically) unlimited. Note that the number of retrieved documents will increase exponentially with the depth level.

-H --span-hosts

Enable spanning across hosts when doing recursive retrieving. See -r and -D. Refer to FOLLOWING LINKS for a more detailed description.

-L --relative

Follow only relative links. Useful for retrieving a specific homepage without any distractions, not even those from the same host. Refer to FOLLOWING LINKS for a more detailed description.

-D domain-list --domains=domain-list

Set domains to be accepted and DNS looked-up, where domain-list is a comma-separated list. Note that it does not turn on -H. This speeds things up, even if only one host is spanned. Refer to FOLLOWING LINKS for a more detailed description.

-A acclist / -R rejlist --accept=acclist / --reject=rejlist

Comma-separated list of extensions to accept/reject. For example, if you wish to download only GIFs and JPEGs, you will use -A gif,jpg,jpeg. If you wish to download everything except cumbersome MPEGs and .AU files, you will use -R mpg,mpeg,au.

-X list --exclude-directories list

Comma-separated list of directories to exclude from FTP fetching.

-P prefix --directory-prefix=prefix

Set directory prefix ("." by default) to prefix. The directory prefix is the directory where all other files and subdirectories will be saved to.

-T value --timeout=value

Set the read timeout to a specified value. Whenever a read is issued, the file descriptor is checked for a possible timeout, which could otherwise leave a pending connection (uninterrupted read). The default timeout is 900 seconds (fifteen minutes).

-Y on/off --proxy=on/off

Turn proxy on or off. The proxy is on by default if the appropriate environmental variable is defined.

-Q quota[KM] --quota=quota[KM]

Specify download quota, in bytes (default), kilobytes or megabytes. More useful for rc file. See below.

-O filename --output-document=filename

The documents will not be written to the appropriate files, but all will be appended to a unique file name specified by this option. The number of tries will be automatically set to 1. If this filename is `-', the documents will be written to stdout, and --quiet will be turned on. Use this option with caution, since it turns off all the diagnostics Wget can otherwise give about various errors.

-S --server-response

Print the headers sent by the HTTP server and/or responses sent by the FTP server.

-s --save-headers

Save the headers sent by the HTTP server to the file, before the actual contents.


Define an additional header. You can define more than additional headers. Do not try to terminate the header with CR or LF.

--http-user --http-passwd

Use these two options to set username and password Wget will send to HTTP servers. Wget supports only the basic WWW authentication scheme.


Do not clobber existing files when saving to directory hierarchy within recursive retrieval of several files. This option is extremely useful when you wish to continue where you left off with retrieval. If the files are .html or (yuck) .htm, it will be loaded from the disk, and parsed as if they have been retrieved from the Web.


Non-verbose - turn off verbose without being completely quiet (use -q for that), which means that error messages and basic information still get printed.


Do not create a hierarchy of directories when retrieving recursively. With this option turned on, all files will get saved to the current directory, without clobbering (if a name shows up more than once, the filenames will get extensions .n).


The opposite of -nd -- Force creation of a hierarchy of directories even if it would not have been done otherwise.


Disable time-consuming DNS lookup of almost all hosts. Refer to FOLLOWING LINKS for a more detailed description.


Disable host-prefixed directories. By default, will produce a directory named in which everything else will go. This option disables such behaviour.


Do not ascend to parent directory.

-k --convert-links

Convert the non-relative links to relative ones locally.

Following links

Recursive retrieving has a mechanism that allows you to specify which links wget will follow.

Only relative links

When only relative links are followed (option -L), recursive retrieving will never span hosts. will never get called, and the process will be very fast, with the minimum strain of the network. This will suit your needs most of the time, especially when mirroring the output the output of *2html converters, which generally produce only relative links.

Host checking

The drawback of following the relative links solely is that humans often tend to mix them with absolute links to the very same host, and the very same page. In this mode (which is the default), all URL-s that refer to the same host will be retrieved.

The problem with this options are the aliases of the hosts and domains. Thus there is no way for wget to know that and are the same hosts, or that is the same as Whenever an absolute link is encountered, gethostbyname is called to check whether we are really on the same host. Although results of gethostbyname are hashed, so that it will never get called twice for the same host, it still presents a nuisance e.g. in the large indexes of difference hosts, when each of them has to be looked up. You can use -nh to prevent such complex checking, and then wget will just compare the hostname. Things will run much faster, but also much less reliable.

Domain acceptance

With the -D option you may specify domains that will be followed. The nice thing about this option is that hosts that are not from those domains will not get DNS- looked up. Thus you may specify, just to make sure that nothing outside gets looked up . This is very important and useful. It also means that -D does not imply -H (it must be explicitly specified). Feel free to use this option, since it will speed things up greatly, with almost all the reliability of host checking of all hosts.

Of course, domain acceptance can be used to limit the retrieval to particular domains, but freely spanning hosts within the domain, but then you must explicitly specify -H.

All hosts

When -H is specified without -D, all hosts are being spanned. It is useful to set the recursion level to a small value in those cases. Such option is rarely useful.

FTP The rules for FTP are somewhat specific, since they have to be. To have FTP links followed from HTML documents, you must specify -f (follow_ftp). If you do specify it, FTP links will be able to span hosts even if span_hosts is not set. Option relative_only (-L) has no effect on FTP. However, domain acceptance (-D) and suffix rules (-A/-R) still apply.



Wget will catch the SIGHUP (hangup signal) and ignore it. If the output was on stdout, it will be redirected to a file named wget-log_. This is also convenient when you wish to redirect the output of Wget interactively.
$ wget &
$ kill -HUP %% # to redirect the output
Wget will not try to handle any signals other than SIGHUP. Thus you may interrupt Wget using ^C or SIGTERM.



Force non-verbose output:

wget -nv

Unlimit number of retries:

wget -t0

Create a mirror image of fly's web (with the same directory structure the original has), up to six recursion levels, with only one try per document, saving the verbose output to log file 'log':

wget -r -l6 -t1 -o log

Retrieve from yahoo host only (depth 50):

wget -r -l50


Hrvoje Niksic is the author of Wget. Thanks to the beta testers and all the other people who helped with useful suggestions.

Top updates

Softpanorama Switchboard
Softpanorama Search


Old News ;-)

[Jan 14, 2007] Make Wget cater to your needs

"Single-threaded downloading has its benefits, especially when Wget is concerned. Other download managers have internal databases to help them keep track of which parts of files are already downloaded. Wget gets this information simply by scanning a file's size. This means that Wget is able to continue downloading a file which another application started to download; most other download managers lack this feature. Usually I start by downloading a file with my browser, and if it is too large, I stop downloading and finish it later with Wget."

MPA Effects on Software (ITS)

Command Line. Advanced users will be able to continue to access files from external ftp sites using the wget command.
Use the --proxy-user=USER --proxy-passwd=PASSWORD command line options.

Recommended Links

Softpanorama hot topic of the month

Softpanorama Recommended

Wget - Wikipedia, the free encyclopedia

GNU Wget - GNU Project - Free Software Foundation (FSF) Make Wget cater to your needs

Wget's Website

WGET for Windows


FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit exclusivly for research and educational purposes.   If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner. 

ABUSE: IPs or network segments from which we detect a stream of probes might be blocked for no less then 90 days. Multiple types of probes increase this period.  


Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy


War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes


Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law


Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Haterís Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least

Copyright © 1996-2016 by Dr. Nikolai Bezroukov. was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case is down you can use the at


The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: October 20, 2015