Using Linux grep and Windows findstr to Manipulate Files
Last Updated: 2023-04-01 14:24:24 UTC
by Guy Bruneau (Version: 1)
Over the years I have found grep to be very versatile. The most common use of grep is to find if the logs have a string that match an IP address, a domain, a service or protocol, some application was logged, etc.
Years ago, when I initially built my first DNS Sinkhole , I used several combination of grep to parse and compare files to build the bind lists of domains to sinkhole. I now use Pi-Hole  which uses the same principals which is now managed via its interface.
My early sinkhole used a series of grep commands to compare two files. The following example use a wildcard list of country codes that were already blocked by the sinkhole against a list of known bad domains published on various websites . To demonstrate how to use a list to compare and remove the blocked domains, I will use my pi-hole domain list.
First step is to create the file the filter list called; toremove, which contains the following blocked top-level domains that are already blocked (it could be as many as the organization need). Another list could be applied for domains already blocked (i.e. google.com, sans.isc):
Before we start, lest get a count of how may lines we have the file list.2.pihole.xxxx.ca.domains with wc -l to establish a baseline:
This picture shows there is 505196 records in this file. The options use with grep are as follow:
- w - Select only those lines containing matches that form whole words.
- h - Suppress the prefixing of file names on output.
- v - Invert the sense of matching, to select non-matching lines.
- f - Obtain patterns from FILE, one per line.
The next step is to compare the top-level domain list against a downloaded domain list:
grep -whvf /root/toremove list.2.pihole.xxxx.ca.domains
This picture shows when grep was first run with the result above the command. Re-run of the same command and this time grepping for any domains ending with .xyz$ have been removed from the list. The $ at the end of xyz is to indicate the 'end of the line'.
Let’s recheck what we have left after removing the 3 top-domains from the list:
We now have 375686 domains left in the list. The command removed 129510 records.
It is possible to repeat the same search using Windows findstr. Lest list the options used to filter the file:
- /v - Prints only lines that don't contain a match.
- /g:filename - Gets search strings from the specified file.
This is how to do it:
findstr /v /g:toremove list.2.pihole.xxxx.ca.domains | findstr .xyz$
This is the options used with find (find /?) to count the number of lines left:
- /V - Displays all lines NOT containing the specified string.
- /C - Displays only the count of lines containing the string.
- "" - Specifies the text string to find.
Let’s recheck to confirm that findstr (findstr /?) remove the 3 top-domains from the list:
findstr /v /g:toremove list.2.pihole.xxxx.ca.domains | find /v /c ""
This output the same result as grep: 375686 domains left in the list. The command removed 129510 records.
This highlight the versality of both of these tools to work through large amout of data quickly and still obtain the same result. This is another example of Living Off the Land Binaries (LOLBins).
Guy Bruneau IPSS Inc.
My Handler Page
gbruneau at isc dot sans dot edu