WARNING: This post was originally published in 2009 and hasn't been updated since.
The tips, techniques and technology explained here may be outdated. If you spot any errors, please let me know in the comments so I can adjust the article. Thanks!

Ever heard of MetaData? Wikipedia describes it best:

Metadata (meta data, or sometimes metainformation) is "data about data", of any sort in any media.

So I hear you thinking: who cares? Well, for starters: you should.

MetaData contains a lot more information than "data about data". Documents such as .PDF, .DOC, .XLS, .PPT, ... contain information such as

  • Revision history of files (in case of Word documents)
  • Usernames of the person creating/editing the file
  • Paths to where the file was/is located
  • Software version used (Word 5.0, Word 10.0, ...)
  • Public network shares
  • ...

If you're still saying "so what?", ask yourself the following question: should this data really be public? Should everyone really know my username to my computer? Or everyone who contributed to a certain file? Or where I saved it, and what software I used?

If I were a malicious person, I could use that information for a targetted attack: I can send you a phishing e-mail, with the name of some of your colleagues in it, or one of those names as the FROM-address, so it looks legitimate. I could use that software version number to attach a very specific software exploit, so I can gain control over your system. I can use your username to brute-force your password.

See a trend there? The MetaData is giving out a lot of info that can be abused, and there are plenty of ways to get it. Consider our good friend Google for a second, they have some very nifty filters you can use in order to search efficiently. Ever searched for the string " filetype:doc"? It gives you a list of all .DOC files, found on the site.

Guess what information is in those files?

Revision info, for everyone who worked on a file:

revision history -- Revision #7: Author 'benjaxxx' worked on "
revision history -- Revision #6: Author 'waly xxx' worked on "
revision history -- Revision #5: Author 'Steve xxx' worked on "
revision history -- Revision #4: Author 'waly xxx' worked on "
revision history -- Revision #3: Author 'waly xxx' worked on "
revision history -- Revision #2: Author 'waly xxx' worked on "
revision history -- Revision #1: Author 'waly xxx' worked on "
revision history -- Revision #0: Author 'waly xxx' worked on "

Paths used in that computer:


And the list goes on!

By using publicly available information, I can get enough information to get an idea of the internal layout of a company. And I haven't even set foot inside it yet. Tools such as Metagoofil simplify the act of getting this information, by searching Google for you -- and extracting the metadata.


