MetaData: I’ll Bet You Thought That Was Private?

Want to help support this blog? Try out Oh Dear, the best all-in-one monitoring tool for your entire website, co-founded by me (the guy that wrote this blogpost). Start with a 10-day trial, no strings attached.

We offer uptime monitoring, SSL checks, broken links checking, performance & cronjob monitoring, branded status pages & so much more. Try us out today!

Profile image of Mattias Geniar

Mattias Geniar, September 19, 2009

Follow me on Twitter as @mattiasgeniar

Ever heard of MetaData? Wikipedia describes it best:

Metadata (meta data, or sometimes metainformation) is “data about data”, of any sort in any media.

So I hear you thinking: who cares? Well, for starters: you should.

MetaData contains a lot more information than “data about data”. Documents such as .PDF, .DOC, .XLS, .PPT, … contain information such as

  • Revision history of files (in case of Word documents)
  • Usernames of the person creating/editing the file
  • Paths to where the file was/is located
  • Software version used (Word 5.0, Word 10.0, …)
  • Public network shares

If you’re still saying “so what?", ask yourself the following question: should this data really be public? Should everyone really know my username to my computer? Or everyone who contributed to a certain file? Or where I saved it, and what software I used?

If I were a malicious person, I could use that information for a targetted attack: I can send you a phishing e-mail, with the name of some of your colleagues in it, or one of those names as the FROM-address, so it looks legitimate. I could use that software version number to attach a very specific software exploit, so I can gain control over your system. I can use your username to brute-force your password.

See a trend there? The MetaData is giving out a lot of info that can be abused, and there are plenty of ways to get it. Consider our good friend Google for a second, they have some very nifty filters you can use in order to search efficiently. Ever searched for the string “site:microsoft.com filetype:doc"? It gives you a list of all .DOC files, found on the microsoft.com site.

Guess what information is in those files?

Revision info, for everyone who worked on a file:

revision history – Revision #7: Author ‘benjaxxx’ worked on "

revision history – Revision #6: Author ‘waly xxx’ worked on "

revision history – Revision #5: Author ‘Steve xxx’ worked on "

revision history – Revision #4: Author ‘waly xxx’ worked on "

revision history – Revision #3: Author ‘waly xxx’ worked on "

revision history – Revision #2: Author ‘waly xxx’ worked on "

revision history – Revision #1: Author ‘waly xxx’ worked on "

revision history – Revision #0: Author ‘waly xxx’ worked on "

Paths used in that computer:

H:\SQL\SQL70_sp2\Langs\Spanish\updated_Readme_Localised\test\

\MULTIMED-SERVER\WWWROOT\Peru\ftpfiles\

C:\WINDOWS\TEMP\

\Dolphin\adcu\IDEAS\

And the list goes on!

By using publicly available information, I can get enough information to get an idea of the internal layout of a company. And I haven’t even set foot inside it yet. Tools such as Metagoofil simplify the act of getting this information, by searching Google for you – and extracting the metadata.

H:\SQL\SQL70_sp2\Langs\Spanish\updated_Readme_Localised\test\
\\MULTIMED-SERVER\WWWROOT\Peru\ftpfiles\
C:\WINDOWS\TEMP\
\\Dolphin\adcu\IDEAS\


Want to subscribe to the cron.weekly newsletter?

I write a weekly-ish newsletter on Linux, open source & webdevelopment called cron.weekly.

It features the latest news, guides & tutorials and new open source projects. You can sign up via email below.

No spam. Just some good, practical Linux & open source content.