MetaData: I’ll Bet You Thought That Was Private?

Mattias Geniar, Saturday, September 19, 2009

Ever heard of MetaData? Wikipedia describes it best:

Metadata (meta data, or sometimes metainformation) is "data about data", of any sort in any media.

So I hear you thinking: who cares? Well, for starters: you should.

MetaData contains a lot more information than "data about data". Documents such as .PDF, .DOC, .XLS, .PPT, ... contain information such as

  • Revision history of files (in case of Word documents)
  • Usernames of the person creating/editing the file
  • Paths to where the file was/is located
  • Software version used (Word 5.0, Word 10.0, ...)
  • Public network shares
  • ...

If you're still saying "so what?", ask yourself the following question: should this data really be public? Should everyone really know my username to my computer? Or everyone who contributed to a certain file? Or where I saved it, and what software I used?

If I were a malicious person, I could use that information for a targetted attack: I can send you a phishing e-mail, with the name of some of your colleagues in it, or one of those names as the FROM-address, so it looks legitimate. I could use that software version number to attach a very specific software exploit, so I can gain control over your system. I can use your username to brute-force your password.

See a trend there? The MetaData is giving out a lot of info that can be abused, and there are plenty of ways to get it. Consider our good friend Google for a second, they have some very nifty filters you can use in order to search efficiently. Ever searched for the string " filetype:doc"? It gives you a list of all .DOC files, found on the site.

Guess what information is in those files?

Revision info, for everyone who worked on a file:

revision history -- Revision #7: Author 'benjaxxx' worked on "
revision history -- Revision #6: Author 'waly xxx' worked on "
revision history -- Revision #5: Author 'Steve xxx' worked on "
revision history -- Revision #4: Author 'waly xxx' worked on "
revision history -- Revision #3: Author 'waly xxx' worked on "
revision history -- Revision #2: Author 'waly xxx' worked on "
revision history -- Revision #1: Author 'waly xxx' worked on "
revision history -- Revision #0: Author 'waly xxx' worked on "

Paths used in that computer:


And the list goes on!

By using publicly available information, I can get enough information to get an idea of the internal layout of a company. And I haven't even set foot inside it yet. Tools such as Metagoofil simplify the act of getting this information, by searching Google for you -- and extracting the metadata.


Hi! My name is Mattias Geniar. I'm a Support Manager at Nucleus Hosting in Belgium, a general web geek, public speaker and podcaster. If you're interested in keeping up with me, have a look at my podcast and weekly newsletter below. For more updates, follow me on Twitter as @mattiasgeniar.

I respect your privacy and you won't get spam. Ever.
Just a weekly newsletter about Linux and open source.

SysCast podcast

In the SysCast podcast I talk about Linux & open source projects, interview sysadmins or developers and discuss web-related technologies. A show by and for geeks!

cron.weekly newsletter

A weekly newsletter - delivered every Sunday - for Linux sysadmins and open source users. It helps keeps you informed about open source projects, Linux guides & tutorials and the latest news.

Share this post

Did you like this post? Will you help me share it on social media? Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *