pdfgrep – Search for text inside a PDF document

Search Text In A PDF Document From The Terminal

The grep command in Linux is used to search for a specific text string in any file. This is a really powerful tool which you can use in several ways by searching for new lines, lines having no uppercase letters and many more ways. The grep command, however, does not work with PDF files.

That’s where pdfgrep command comes in. It is essentially ‘grep’ but for PDF files. In this tutorial, I will guide you through the usage and installation process of this command.

Installing pdfgrep

This command, although it does not ship with all the Linux distributions, is available in the official repositories of all the package managers. To install it, you can use the following command, depending upon your Linux distribution :

# On Debian and Ubuntu-based distributions
sudo apt update && sudo apt install pdfgrep
# On Fedora Workstation
sudo dnf install pdfgrep
# On Arch Linux
sudo pacman -S pdfgrep
Installing Pdfgrep On Fedora
Installing pdfgrep On Fedora

Usage of pdfgrep

If you have used the grep command before, then this utility will feel familiar to you. The basic usage of this command is as follows:

pdfgrep Search_String FILENAME.pdf
Searching For A Text In PDF
Searching For A Text In PDF

Perhaps you would want to perform a case-insensitive search because the search string can be capitalized in the document. You can use the --ignore-case flag with the command.

pdfgrep --ignore-case Search_Strng FILENAME.pdf
Ignoring Case While Searching For Text
Ignoring Case While Searching For Text

You can also get the total number of search results directly in the terminal by using the -c option along with the full command:

pdfgrep --ignore-case Search_Strng FILENAME.pdf --count

Since PDF documents have page numbers, you can also get the page number on which your search string is present. You can use the --page-number option along with the whole command:

pdfgrep --page-number --ignore-case Search_String FILENAME.pdf
Displaying Page Number In The Search Result
Displaying Page Number In The Search Result

There is also a way through which you can search in a password-protected PDF file. Keep the rest of the command same and then just add --password option to it along with the password of the locked document.

pdfgrep --password YOUR-PASSWORD Search_String FILENAME.pdf

Summary

What makes pdfgrep great, in my opinion, is its similarity with the grep command, therefore making it easier for the users, by not making them remember new command and options for basically the same functionality.

References

Arch Linux Manual page on pdfgrep