Posted by Charlie Recksieck
How to Use It, How To Protect Yourself From it
First of all, your intentions don't have to be nefarious to use Google advanced search options. Whether you're looking up info on a person, finding out more about a business, looking for existing contact lists, these advanced search options can be phenomenally helpful. You can search an entire website for all files in their public server - here's an example of Google search to find all pages in the US Coca Cola website: site:us.coca-cola.com. You can look for any mention of a certain person on the web by putting their full name in quotes. Also, in case you are unfamiliar with the useful parameters of Google advanced search, here's the friendly Google guidelines/tool and also a great list of search operators, current as of 2018.
One thing to keep in mind: Google search results only show you what Google has been able to "crawl". Any site can and should place a robots.txt file in their root directory to control, allow and disallow which files Google is allowed to crawl and index. If disallowed by robots.txt file(s), then that page will/should not show up in Google results. For good definition of robots.txt files and best practices, click here. We will get a little more into this later.
I want to break this into two sections: 1) A fairly innocuous Google dorking tip that you can likely use, and 2) The dangers out there and what you can do to protect the files on your site.
Doing Your Own Google Dorking
For years, I have been in indie bands. When you're in a completely Do-It-Yourself, DIY situation (aka no manager or record company), you need to maintain good contacts in a lot of fields: record reviews, potential music directors for small radio airplay, music bookers at performing venues, etc. Where do you get good lists of this contact information? Besides some paid services (which may or may not even be current or well-maintained), you can take matters into your own hands with some smart Google dorking. Say you already know good email addresses for music bookers (these are fictitious, fyi) "firstname.lastname@example.org" and "email@example.com". If you want to find an Excel list of similar bookers, then search this: "firstname.lastname@example.org" "email@example.com" filetype:xls. You might get yourself a treasure trove of bookers. If that doesn't work, look for text files instead: "firstname.lastname@example.org" "email@example.com" filetype:TXT. It's that simple.
I do want to take a moment to defend the innocuousness of this trick. We do not advocate spamming anybody, it's a persistent problem on the internet that costs everybody. But in this case of music booking, it is customary to reach out unsolicited to bookers so we do not really consider this spam.
Let's say another use is trying to find a list of important people in Chicago. In this case perhaps a good idea would be to look for an Excel sheet containing some significant Chicagoans (former mayor Rahm Emanuel and Chicago Blackhawks owner Rocky Wirtz) and also a couple of local zip codes (60611 and 60607). Search filetype:xlsx "rahm emanuel" "Rocky Wirtz" "60607" "60611" and maybe you'll find something. Again, reaching out for unsolicited donations is trickier etiquette and you should make sure you're not making yourself a spammer.
The dark side of people Google dorking your site has some serious consequences: You may have sensitive information viewable to the public, there could actually be security breaches by what people find on your site. Perhaps your web developer left a backup copy of an .asp or .php up in a crawlable section of your site - that backup page may show full credentials for your database access, or other info that can be exploited to take you down or have your data be altered or stolen. Nefarious "dorks" could access your webcams, security cameras or routers. Here's an an excellent article on the repercussions.
If you're up for some further reading on the subject we recommend this article and also this article. But here are some suggestions for what you can do to get started:
1 See What's Exposed On Your Site - For starters, you can search your website site:yourwebsitename.com and scroll through what's publicly indexed. (If you or your IT people are extra-curious, try looking at the Google Hacking Database.
2 Stop Google From Indexing Your Problem Pages - You can use Google Webmaster Tools to take down the links to the pages you no longer want indexed. Here's some good instructions on using Google Webmaster Tools
3 Control Your Robots.txt File - Do you have a robots.txt file? Are you protecting any and all sensitive directories from being crawled by search engines? Here's a great intro to use robots.txt to shore up vulnerabilities. When in doubt, disallow/protect any directory that might be in doubt - or be safe by disallowing all folders, then explicitly allowing only certain folders that are part of your public website.
4 Take Files Down - When you've seen the list of what you really have on your public server space - either from Google dorking yourself, or looking through your FTP or File Manager - if a file really does not need to be public like this, just take it down.
5 Password-Protecting - If that sensitive file really needs to be there for you and your employees to be able to access it, then at the very least put some password protection on the files themselves or that folder on your server. (We haven't written a good instructional article on this, but feel free to contact us if you need quick assistance with this.)
6 Update Your Software - Make sure you are up to date on all of your software's versions and security patches. We are shocked to see that amount of clients and other companies we know, even large ones, that do not run security patches. Again, if you need any help with this, reach out and we'll consult on this. Brad here with us is the best in the business when it comes to this stuff. At the very least, though, ask your IT department or contractor what they are doing to stay current and protect you.
* * * * * * * *
Good luck, everybody. Stay safe out there!