MetaData: I’ll Bet You Thought That Was Private?

WARNING: This post was originally published in 2009 and hasn't been updated since.
The tips, techniques and technology explained here may be outdated. If you spot any errors, please let me know in the comments so I can adjust the article. Thanks!

Ever heard of MetaData? Wikipedia describes it best:

Metadata (meta data, or sometimes metainformation) is "data about data", of any sort in any media.

So I hear you thinking: who cares? Well, for starters: you should.

MetaData contains a lot more information than "data about data". Documents such as .PDF, .DOC, .XLS, .PPT, ... contain information such as

  • Revision history of files (in case of Word documents)
  • Usernames of the person creating/editing the file
  • Paths to where the file was/is located
  • Software version used (Word 5.0, Word 10.0, ...)
  • Public network shares
  • ...

If you're still saying "so what?", ask yourself the following question: should this data really be public? Should everyone really know my username to my computer? Or everyone who contributed to a certain file? Or where I saved it, and what software I used?

If I were a malicious person, I could use that information for a targetted attack: I can send you a phishing e-mail, with the name of some of your colleagues in it, or one of those names as the FROM-address, so it looks legitimate. I could use that software version number to attach a very specific software exploit, so I can gain control over your system. I can use your username to brute-force your password.

See a trend there? The MetaData is giving out a lot of info that can be abused, and there are plenty of ways to get it. Consider our good friend Google for a second, they have some very nifty filters you can use in order to search efficiently. Ever searched for the string "site:microsoft.com filetype:doc"? It gives you a list of all .DOC files, found on the microsoft.com site.

Guess what information is in those files?

Revision info, for everyone who worked on a file:

revision history -- Revision #7: Author 'benjaxxx' worked on "
revision history -- Revision #6: Author 'waly xxx' worked on "
revision history -- Revision #5: Author 'Steve xxx' worked on "
revision history -- Revision #4: Author 'waly xxx' worked on "
revision history -- Revision #3: Author 'waly xxx' worked on "
revision history -- Revision #2: Author 'waly xxx' worked on "
revision history -- Revision #1: Author 'waly xxx' worked on "
revision history -- Revision #0: Author 'waly xxx' worked on "

Paths used in that computer:

H:\SQL\SQL70_sp2\Langs\Spanish\updated_Readme_Localised\test\
\\MULTIMED-SERVER\WWWROOT\Peru\ftpfiles\
C:\WINDOWS\TEMP\
\\Dolphin\adcu\IDEAS\

And the list goes on!

By using publicly available information, I can get enough information to get an idea of the internal layout of a company. And I haven't even set foot inside it yet. Tools such as Metagoofil simplify the act of getting this information, by searching Google for you -- and extracting the metadata.

H:\SQL\SQL70_sp2\Langs\Spanish\updated_Readme_Localised\test\
\\MULTIMED-SERVER\WWWROOT\Peru\ftpfiles\
C:\WINDOWS\TEMP\
\\Dolphin\adcu\IDEAS\

Leave a Reply

Your email address will not be published. Required fields are marked *

*

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Advertisement

Why ads?

I'm glad you made it to this blogpost. I hope it helps solve your problem. So why then do I show ads on the site? Writing content, testing it and making sure the layout isn't totally b0rked takes time. A lot of time. The ads are a way to pay back a small portion of that time.

And as you know running a site costs (a bit of) money: the domain name, webhosting, time spent writing and updating content, ... So if you like the content of this blog, consider disabling your AdBlocker for this domain. Thanks!

Looking for help?

Tired of fixing all these tech-problems yourself? We've got an excellent team at Nucleus, a top-class Belgian hosting provider, that can help you.

Discover our Managed Hosting, where skilled engineers manage your servers and keep them up-to-date, so you can focus on your core business. We use a variety of Configuration Management Systems such as Puppet to make sure every config is reviewed, unit-tested and guaranteed to be working.

Want to get in touch? Find me as @mattiasgeniar on Twitter or via the contact-page on this blog.