Introduction to pentesting: Reconnaissance

The Harvester

An excellent tool to use in reconnaissance is The Harvester. The Harvester is a simple yet highly effective Python script written by Christian Martorella at Edge Security. This tool allows us to quickly and accurately catalogue both e-mail addresses and sub domains that are directly directed to our target.

It’s important to always use the latest version of the Harvester as many search engines regularly update and update their systems. Even subtle changes to a search engine’s behaviour can render automated tools ineffective. In some cases, search engines will actually filter the results before returning information to you. Many search engines also employ throttling techniques that will attempt to prevent you from running automated searches.

The Harvester can be used to search Google, Bing and PGP servers for e-mails, hosts and sub domains. It can also search LinkedIn for user names. We’ve already discussed the dangers of posting to public forums using your corporate e-mail address; however, there are additional hazards you should be aware of. Let us assume during your reconnaissance you discover the e-mail address of an employee from your target organization. By twisting and manipulating the information before the “@” symbol, we should be able to create a series of potential network user names. It ain’t uncommon for organizations to use the exact same user names and e-mail addresses (before the “@” symbol). With a handful of prospective user names, we can attempt to brute-force our way into any services, like SSH, VPNs or FTP, that we (will) discover during the next step (Scanning).

The harvester is built into BlackArch. To access the Harvester, run “theharvester” in any terminal emulator.

If you’re using an operating system other than BlackArch, you can download the tool directly from Edge Security at: Once you’ve got it downloaded, you can unpack the downloaded tar file by running the following command in a terminal: tar xf theHarvester Please note the capital “H” that is used when untarring the code. Linux is case sensitive, so the operating system sees a difference between “theHarvester” and “theharvester”. You’ll need to pay attention to the executable to determine if you should use a capital or lowercase “h”. If the cases don’t match exactly, you’ll get a message saying “no such file or directory”. This is a good indication that you’ve mistyped the name of the file.

Regardless of whether you’ve downloaded the Harvester or used the version installed in BlackArch, we’ll use it to collect additional information about our target. Execute the following command:

theharvester -d -l 10 -b google

This command will search for e-mails, subdomains and hosts that belong to

Let us examine the command a little closer. “theharvester” is used to invoke the tool. A lowercase “-d” is used to specify the target domain. A lowercase “-l” (that is an L not a 1) is used to limit the number of results returned to us. In this case, the tool was instructed to return only 10 results. The “-b” is used to specify what public repository we want to search. We can choose among Google, Bing PGP or LinkedIn (For this example, we chose to search using Google).

The first step of reconnaissance is very cyclical because in-depth reconnaissance often leads to the discovery of new targets, which, in turn, leads to additional reconnaissance. As a result, the amount of time to complete this phase will vary from several hours to several weeks. Remember, a determined malicious hacker understand not only the power of good reconnaissance, but also that of a nearly limitless amount of time. As an aspiring penetration tester, you should devote as much time as possible to practicing and conducting information gathering.