The Linux wget command is a command-line utility to download files from the internet over HTTP, HTTPS, and FTP protocols. The wget command is usually available on most Linux distros by default, but if you do not have it installed already, it can be downloaded using the package manager for your distribution.
Ubuntu/Debian-based distros: (difference between apt and apt-get)
sudo apt install wget
sudo dnf install wget
sudo yum install wget
Red Hat-based distros:
sudo rpm install wget
Arch Linux-based distros:
sudo pacman -S wget
Linux wget Features and Usage
The wget command in Linux has the following features that make it very useful for scripting:
- Can download files in the background
- Allows resuming interrupted downloads easily
- Can mirror websites for local browsing
- You can crawl web pages to find broken links
- And many more…
The basic syntax of the wget command is:
Download a File in Linux with wget Command
What better way to learn something than to do-it-yourself? Let’s download the Debian ISO file from the official Debian download page. In the below command, I’m downloading the net-install ISO for Debian 10.
The first thing that the Linux wget command does is look up and resolve the URL. After that, it shows you detailed information about which server it’s connected to for downloading the file. And lastly, the file download progress along with the speed and the ETA.
Without any options, the wget command will simply download a file based on the file name that’s provided by the server. But if you want to download the file and save it with a different name, you can make use of the
wget https://cdimage.debian.org/debian-cd/current/amd64/iso-cd/debian-10.3.0-amd64-netinst.iso -O debian.iso
Linux wget Command Options
Now that you know how to download a file and have basic knowledge of what wget command is capable of, let’s actually see how to make use of the options that are offered by the wget command in Linux.
1. Resume an interrupted download with wget
wget -c option allows us to continue or resume an interrupted download by repeating the same command. Have a look at the example below:
wget -c https://cdimage.debian.org/debian-cd/current/amd64/iso-cd/debian-10.3.0-amd64-netinst.iso
While the above download is running, if the network disconnects or there’s a keyboard interrupt that stops the download, you can repeat the same command in the directory where the partially downloaded file is saved.
The Linux wget command will continue downloading from where it was interrupted. This is especially useful on unstable networks where you can simply re-run the wget command to resume the download without losing progress.
2. Turn off wget output
So the output, for the most part, is unnecessary. But how can we just avoid getting that output? As with most commands in Linux, the wget command also gives us an option to quiet the output.
wget -q <URL>
wget -nv <URL>
-q command stands for quiet which will quiet out all the output whatsoever. The
-nv option stands for non-verbose and will only show messages that are required to be displayed (completion notice or errors).
But there’s one issue with the above command options. The progress bar gets hidden too. That can be fine depending on your usage but I like to have it displayed.
How to quiet all output except the progress bar with Linux wget?
The wget command provides us with another option
If your system has the latest wget command, you can get the progress bar with only the show progress option. If you’re on an older version, you might need to add the progress=bar:force option too. Let’s use the same example as above and add our new options.
wget -q --show-progress https://cdimage.debian.org/debian-cd/current/amd64/iso-cd/debian-10.3.0-amd64-netinst.iso
3. Download Multiple URLs with wget
You aren’t limited to being able to download only one file at a time but can download multiple files by simply specifying all the URLs in a file (one URL per line).
We’ll use the
-i option to pass the file with the list of URLs to the wget command.
wget -i <file name>
All the files will begin downloading now. You can combine this command option with the command options above to quiet the output.
4. Downloading Entire Websites With wget
Okay, we talked about downloading and saving entire websites in our introduction and said it is one of the features of this command. Let’s learn how to do it now.
The wget command provides us with a neat command option to download or “mirror” an entire website. The
-m option (short for mirror) when paired with the
-k option, which changes the links within downloaded HTML files point to local pages, you can get a full-fledged offline web page. The
-p option will also include the page requisites like images and other media making it a completely offline website for your use.
To download a website, run the following command:
wget -mk -p <URL>
The command starts downloading all pages of the website one by one. So make sure you don’t leave this command unattended when working on a website with hundreds of thousands of pages.
5. Crawl a Website for 404 Links With wget
This feature is especially useful when you own a website and simply want to run through your entire website to find any broken links that can be replaced.
There’s a command option
--spider that will crawl through the website page by page, URL by URL and find out any link that resolves to a 404 error. We’ll also make use of the
-r (recursive URL crawl) and
-l(recursive crawl level) options to make our crawler continue crawling through links.
Also, we’ll add an additional command option
-o (lowercase letter o) which will save the wget output in a log file specified here.
wget -r -l 10 --spider www.google.com -o google.log
If you run the command above, you can pretty much expect it to run perpetually owing to the sheer number of interlinks Google has. I just let it run for a couple of seconds and the command generated a 1 MB text file (which btw is huge for a basic text file).
6. Run a Download in the Background
If you’re running a multi-file download that will take a lot of time, it makes no sense to keep the terminal on in the background. Instead, you can pass on the wget command in the background using the
wget -b https://cdimage.debian.org/debian-cd/current/amd64/iso-cd/debian-10.3.0-amd64-netinst.iso
The command will quickly give you the process ID that you can kill if you wish to stop the wget command from downloading files in the background.
The Linux wget command can be used for a whole lot of purposes than just for downloading files and I’m sure you’d have now understood the potential of this seemingly simple command. Explore the command options and you’ll be opened up to a plethora of things that are possible with this command.
Don’t forget the man command in Linux is always there to help you out with such commands apart from the