Fetching Web Pages With C Using Libcurl

Fetching Web Pages With C Using Libcurl

Let’s learn to use the Libcurl module to fetch web pages. C is a very complicated language and if you have ever worked with Sockets in C, you know that dealing with network connection can sometimes be a nightmare. In this context, fetching webpages using C seems like a nightmare. However, in this module, we’ll learn how to easily fetch webpages using libcurl!

What Is Libcurl ?

Libcurl is an easy to use library which helps us to make requests to various services like:

  • HTTP
  • HTTPS
  • FTP
  • FTPS
  • IMAP
  • IMAPS
  • And much more….

This saves us a lot of trouble of dealing with socket programming which would otherwise have been a NIGHTMARE for C programmers. Another benefit of libcurl is that it is highly portable!

It’s syntax and build remains the same, regardless of whether you are using Linux, Windows, Mac, etc. This makes working with libcurl super easy as well us lets us write a uniform code without worrying about OS specific factors.

Installing Libcurl

Libcurl doesn’t usually come installed by default on most systems. However, you can install it with :

$ sudo apt-get install libcurl4-gnutls-dev

Fetching Web Pages with libcurl

#include <stdio.h>
#include <stdlib.h>
#include <curl/curl.h>

int main(void)
{
	CURL *curl = curl_easy_init();

	if (!curl)
	{
		fprintf(stderr,"[-] Failed Initializing Curl\n");
		exit(-1);
	}

	CURLcode res;
	curl_easy_setopt(curl, CURLOPT_URL, "http://example.com");
	res = curl_easy_perform(curl);
	
	if (res != CURLE_OK)
	{
		fprintf(stderr,"[-] Could Not Fetch Webpage\n[+] Error : %s\n",curl_easy_strerror(res));
		exit(-2);
	}

	curl_easy_cleanup(curl);
	return 0;
}

Explaining The Code

Let’s break down the code and understand the working of it in detail.

Lines 1-3: Including Required Headers

#include <stdio.h>
#include <stdlib.h>
#include <curl/curl.h>

We would need the following header files :

  • stdio.h: For printing messages to console.
  • stdlib.h: To call exit() if we encounter an error
  • curl/curl.h: Contains the functions we would use to request our webpage

Notice how we have written the last library as curl/curl.h? This is because unlike our standard libraries like stdio.h or stdlib.h which are stored under /usr/include, it is stored in /usr/include/curl (in some cases it might be in /usr/include/x86_64-linux-gnu /curl as well ) and hence it included as such.

Line 5: Declaring Main

int main(void)

We declare the main function stating that it would return an integer (as specified by the leading int) and would take no arguments as specified by void

Line 7: Creating CURL Object

CURL *curl = curl_easy_init();

Here we are creating a CURL easy handle and initializing it using the function curl_easy_init() and storing it using a pointer “curl”.

The curl_easy_init() must be the first function to call when using with libcurl and it returns a CURL easy handle that you must use as input to other functions in the easy interface, which we store in the previously created.

Lines 9-13: Checking For Errors

if (!curl)
{
	fprintf(stderr,"[-] Failed Initializing Curl\n");
	exit(-1);
}

This part is rather easy to understand. We are checking whether or not the CURL easy handle was successfully created. If there was any error during the process the curl variable would have a NULL value stored in it and would thereby thereby enter the if-block.

It would then print a message to stderr and exit the program thereby preventing further execution.

Lines 15-17: Fetching The Webpage

CURLcode res;
curl_easy_setopt(curl, CURLOPT_URL, "http://example.com");
res = curl_easy_perform(curl);

First we create a variable of the type CURLcode, which is a special libcurl typedefed variable for error codes. This would help us get the return code for our operations later and help in the identification of errors.

Next up, we use have the curl_easy_setopt() function which is one of the most important functions here. It defines the behavior of libcurl. It takes three arguments :

  • curl : The Curl easy handle
  • CURLOPT_URL : Specify that we’ll be pass in the URL to work with
  • http://example.com : The URL we will request

The curl_easy_setopt() is a very extensive function which lets us control the nature of our request to a great extent. To know more information about the same and the possible options, type in :

$ man 3 curl_easy_setopt

Finally we are perform our request with curl_easy_perform() and store the resultant response in the variable “res” and prints the webpage onto our terminal.

Lines 19-23: Checking The Response

if (res != CURLE_OK)
{
	fprintf(stderr,"[-] Could Not Fetch Webpage\n[+] Error : %s\n",curl_easy_strerror(res));
	exit(-2);
}

Next up we are checking if the request was sucessful. CURLE_OK (0) means that the request was successful or else it would return a non zero number. You can check all the possible response codes and their meaning with :

$ man 3 libcurl-errors

If any errors are encountered we print the error to stderr using curl_easy_strerror() which returns a string describing error code. and then exit the program.

Lines 25-26: Cleanup

curl_easy_cleanup(curl);
return 0;

The curl_easy_cleanup() should always be the last function to call when working with Curl easy sessions. It kills the handle passed to it and all memory associated with it. Then we simply return an integer and exit the program.

Compiling And Executing The Program

To compile the program, use:

$ gcc test.c -o fetch -O2 -Wall -lcurl

Where,

  • gcc: is the name of the compiler we will be using
  • test.c: is the name of our program
  • -o fetch: is a flag which tells the compiler to store the executable to store the resultant executable as “fetch”
  • -O2: is a flag which produces optimized code
  • -Wall: is a flag which shows more comprehensive Warnings
  • -lcurl: tells the compiler to link the program against libcurl

You can learn more about these GCC flags here.

Then executing the resultant executable should fetch us the webpage:

$ ./fetch
<!doctype html>
<html>
<head>
    <title>Example Domain</title>

    <meta charset="utf-8" />
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <style type="text/css">
    body {
        background-color: #f0f0f2;
        margin: 0;
        padding: 0;
        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
        
    }
    div {
        width: 600px;
        margin: 5em auto;
        padding: 2em;
        background-color: #fdfdff;
        border-radius: 0.5em;
        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
    }
    a:link, a:visited {
        color: #38488f;
        text-decoration: none;
    }
    @media (max-width: 700px) {
        div {
            margin: 0 auto;
            width: auto;
        }
    }
    </style>    
</head>

<body>
<div>
    <h1>Example Domain</h1>
    <p>This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.</p>
    <p><a href="https://www.iana.org/domains/example">More information...</a></p>
</div>
</body>
</html>

Conclusion

Thus our program to fetch a webpage. Though this is a very basic program, it paves the way for more complex operations which otherwise would have been nightmares. You can also fetch data from FTP servers or scrape the resulting data to fulfill your requirements.