Input Validation: Using filter_var() Over Regular Expressions

Want to help support this blog? Try out Oh Dear, the best all-in-one monitoring tool for your entire website, co-founded by me (the guy that wrote this blogpost). Start with a 10-day trial, no strings attached.

We offer uptime monitoring, SSL checks, broken links checking, performance & cronjob monitoring, branded status pages & so much more. Try us out today!

Profile image of Mattias Geniar

Mattias Geniar, February 07, 2009

Follow me on Twitter as @mattiasgeniar

Just about the biggest time-sink on any project, is the amount of input validation that needs to be done. You _have_ to assume your visitor is a maniac serial killer, out to destroy your application. And you have to prevent it.

Thus starts our never-ending battle for user input validation. We can’t allow it all (think XSS or SQL Injection), so we check every value presented to us. Correct e-mail formats, IP’s, integers, HTML-code, ….

For a long time, a generic E-mail validation Regular Expression looked like this.

$filter = "^[_a-z0-9-]+(.[_a-z0-9-]+)*@[a-z0-9-]+(.[a-z0-9-]+)*(.[a-z]{2,4})$";
if (!eregi($filter, $user_email)) {
	echo "Invalid e-mail address.";
}

But using PHP’s filter_var function, this can be made 100x easier!

if (!filter_var($user_email, FILTER_VALIDATE_EMAIL)) {
	echo "Invalid e-mail";
}

By passing the correct argument to the filter_var($input, $type); function, we can very quickly determine if the supplied input-variable is compliant with the input we asked, and require.

Some validation-types also allow you to pass in an extra “flag”. Similar to certain “settings” for the validation. It’s often easier to explain this in code, so here’s an example.

$user_url = "google.be";        // Requires input with 'http://'
if (!filter_var($user_url, FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED)) {
	echo "Invalid URL";
}

And there’s more. While there are quite a few FILTER_VALIDATE_* options, there are also **FILTER_SANITIZE_*** filters. These are ment to not only validate the input, but alter the input so it is compliant with the given filter.

// $user_int: the tainted input string, which needs cleansing
// $sanitized_int: the input string, stripped from anything but numbers and operators
$user_int = "1+7-3=5 and then do - 5 + 4 which equals: 4";
$sanitized_int = filter_var($user_int, FILTER_SANITIZE_NUMBER_INT);

// Results in: 1+7-35-5+44

Because PHP’s Manual pages on filter_var() don’t include a detailed list of possible validation & sanitation constants, here they are listed – with help from W3Schools.com.

Sanitizing input

  • FILTER_SANITIZE_STRING: This filter removes data that is potentially harmful for your application. It is used to strip tags and remove or encode unwanted characters.

    Optional flags available:

    • FILTER_FLAG_NO_ENCODE_QUOTES – This flag does not encode quotes
    • FILTER_FLAG_STRIP_LOW – Strip characters with ASCII value below 32
    • FILTER_FLAG_STRIP_HIGH – Strip characters with ASCII value above 127
    • FILTER_FLAG_ENCODE_LOW – Encode characters with ASCII value below 32
    • FILTER_FLAG_ENCODE_HIGH – Encode characters with ASCII value above 127
    • FILTER_FLAG_ENCODE_AMP – Encode the & character to &
  • FILTER_SANITIZE_STRIPPED: Alias to the FILTER_SANITIZE_STRING, shown above.

  • FILTER_SANITIZE_ENCODED: Filter strips or URL-encodes unwanted characters. Similar to urlencode().

    Optional flags available:

    • FILTER_FLAG_STRIP_LOW – Strip characters with ASCII value below 32
    • FILTER_FLAG_STRIP_HIGH – Strip characters with ASCII value above 32
    • FILTER_FLAG_ENCODE_LOW – Encode characters with ASCII value below 32
    • FILTER_FLAG_ENCODE_HIGH – Encode characters with ASCII value above 32
  • FILTER_SANITIZE_SPECIAL_CHARS: Filter HTML-escapes special characters.

    Optional flags available:

    • FILTER_FLAG_STRIP_LOW – Strip characters with ASCII value below 32
    • FILTER_FLAG_STRIP_HIGH – Strip characters with ASCII value above 32
    • FILTER_FLAG_ENCODE_HIGH – Encode characters with ASCII value above 32
  • FILTER_SANITIZE_EMAIL: Filter removes all illegal e-mail characters from a string.

  • FILTER_SANITIZE_URL: Filter removes all illegal URL characters from a string.

  • FILTER_SANITIZE_NUMBER_INT: Filter removes all illegal characters from a number.

  • FILTER_SANITIZE_NUMBER_FLOAT: Filter removes all illegal characters from a float number.

  • FILTER_SANITIZE_MAGIC_QUOTES: Filter performs the addslashes() function to a string.

Validating input

  • FILTER_VALIDATE_INT: Validates value as integer.

    Optional flags available:

    • min_range – specifies the minimum integer value (code example)
    • max_range – specifies the maximum integer value
    • FILTER_FLAG_ALLOW_OCTAL – allows octal number values
    • FILTER_FLAG_ALLOW_HEX – allows hexadecimal number values
  • FILTER_VALIDATE_BOOLEAN: Validates value as a boolean option.

  • FILTER_VALIDATE_FLOAT: Validates value as a float number.

  • FILTER_VALIDATE_REGEXP: Validates value against a Perl-compatible regular expression.

  • FILTER_VALIDATE_URL: Validates value as an URL.

    Optional flags available:

    • FILTER_FLAG_SCHEME_REQUIRED – Requires URL to be an RFC compliant URL (like http://example)
    • FILTER_FLAG_HOST_REQUIRED – Requires URL to include host name (like http://www.example.com)
    • FILTER_FLAG_PATH_REQUIRED – Requires URL to have a path after the domain name (like www.example.com/example1/test2/)
    • FILTER_FLAG_QUERY_REQUIRED – Requires URL to have a query string (like “example.php?name=Peter&age=37”)
  • FILTER_VALIDATE_EMAIL: Validates value as an e-mail address.

  • FILTER_VALIDATE_IP: Validates value as an IPv4 or IPv6 address.

    Optional flags available:

    • FILTER_FLAG_IPV4 – Requires the value to be a valid IPv4 IP (like 255.255.255.255)
    • FILTER_FLAG_IPV6 – Requires the value to be a valid IPv6 IP (like 2001:0db8:85a3:08d3:1319:8a2e:0370:7334)
    • FILTER_FLAG_NO_PRIV_RANGE – Requires the value to be a RFC specified IP, not within a private range (like 192.168.0.1, 10.0.0.1, …)
    • FILTER_FLAG_NO_RES_RANGE – Requires that the value is not within the reserved IP range. This flag takes both IPV4 and IPV6 values. A reserved IP could be 255.255.255.255 (broadcast address).

An excellent source for more code examples can be found at the Devolio blog post Data Filtering Using PHPs Filter Functions (Part One).

While the filter_var() functions can’t replace every possible type of input validation, it can help a great deal. Especially tricky validations such as E-mail, URL or IPv6 addresses are so much easier this way.



Want to subscribe to the cron.weekly newsletter?

I write a weekly-ish newsletter on Linux, open source & webdevelopment called cron.weekly.

It features the latest news, guides & tutorials and new open source projects. You can sign up via email below.

No spam. Just some good, practical Linux & open source content.