Input Validation: Using filter_var() Over Regular Expressions

Just about the biggest time-sink on any project, is the amount of input validation that needs to be done. You _have_ to assume your visitor is a maniac serial killer , out to destroy your application. And you have to prevent it.

Thus starts our never-ending battle for user input validation. We can’t allow it all (think XSS or SQL Injection ), so we check every value presented to us. Correct e-mail formats, IP’s, integers, HTML-code, ….

For a long time, a generic E-mail validation Regular Expression looked like this.

$filter = "^[_a-z0-9-]+(.[_a-z0-9-]+)*@[a-z0-9-]+(.[a-z0-9-]+)*(.[a-z]{2,4})$";
if (!eregi($filter, $user_email)) {
	echo "Invalid e-mail address.";
}

But using PHP’s filter_var function, this can be made 100x easier!

if (!filter_var($user_email, FILTER_VALIDATE_EMAIL)) {
	echo "Invalid e-mail";
}

By passing the correct argument to the filter_var($input, $type); function, we can very quickly determine if the supplied input-variable is compliant with the input we asked, and require.

Some validation-types also allow you to pass in an extra “flag”. Similar to certain “settings” for the validation. It’s often easier to explain this in code, so here’s an example.

$user_url = "google.be";        // Requires input with 'http://'
if (!filter_var($user_url, FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED)) {
	echo "Invalid URL";
}

And there’s more. While there are quite a few FILTER_VALIDATE_* options, there are also FILTER_SANITIZE_* filters. These are ment to not only validate the input, but alter the input so it is compliant with the given filter.

// $user_int: the tainted input string, which needs cleansing
// $sanitized_int: the input string, stripped from anything but numbers and operators
$user_int = "1+7-3=5 and then do - 5 + 4 which equals: 4";
$sanitized_int = filter_var($user_int, FILTER_SANITIZE_NUMBER_INT);

// Results in: 1+7-35-5+44

Because PHP’s Manual pages on filter_var() don’t include a detailed list of possible validation & sanitation constants , here they are listed – with help from W3Schools.com .

Sanitizing input

  • FILTER_SANITIZE_STRING: This filter removes data that is potentially harmful for your application. It is used to strip tags and remove or encode unwanted characters.

    Optional flags available:

    • FILTER_FLAG_NO_ENCODE_QUOTES – This flag does not encode quotes
    • FILTER_FLAG_STRIP_LOW – Strip characters with ASCII value below 32
    • FILTER_FLAG_STRIP_HIGH – Strip characters with ASCII value above 127
    • FILTER_FLAG_ENCODE_LOW – Encode characters with ASCII value below 32
    • FILTER_FLAG_ENCODE_HIGH – Encode characters with ASCII value above 127
    • FILTER_FLAG_ENCODE_AMP – Encode the & character to &
  • FILTER_SANITIZE_STRIPPED: Alias to the FILTER_SANITIZE_STRING, shown above.

  • FILTER_SANITIZE_ENCODED: Filter strips or URL-encodes unwanted characters. Similar to urlencode() .

    Optional flags available:

    • FILTER_FLAG_STRIP_LOW – Strip characters with ASCII value below 32
    • FILTER_FLAG_STRIP_HIGH – Strip characters with ASCII value above 32
    • FILTER_FLAG_ENCODE_LOW – Encode characters with ASCII value below 32
    • FILTER_FLAG_ENCODE_HIGH – Encode characters with ASCII value above 32
  • FILTER_SANITIZE_SPECIAL_CHARS: Filter HTML-escapes special characters.

    Optional flags available:

    • FILTER_FLAG_STRIP_LOW – Strip characters with ASCII value below 32
    • FILTER_FLAG_STRIP_HIGH – Strip characters with ASCII value above 32
    • FILTER_FLAG_ENCODE_HIGH – Encode characters with ASCII value above 32
  • FILTER_SANITIZE_EMAIL: Filter removes all illegal e-mail characters from a string.

  • FILTER_SANITIZE_URL: Filter removes all illegal URL characters from a string.

  • FILTER_SANITIZE_NUMBER_INT: Filter removes all illegal characters from a number.

  • FILTER_SANITIZE_NUMBER_FLOAT: Filter removes all illegal characters from a float number.

  • FILTER_SANITIZE_MAGIC_QUOTES: Filter performs the addslashes() function to a string.

Validating input

  • FILTER_VALIDATE_INT: Validates value as integer.

    Optional flags available:

    • min_range – specifies the minimum integer value (code example )
    • max_range – specifies the maximum integer value
    • FILTER_FLAG_ALLOW_OCTAL – allows octal number values
    • FILTER_FLAG_ALLOW_HEX – allows hexadecimal number values
  • FILTER_VALIDATE_BOOLEAN: Validates value as a boolean option.

  • FILTER_VALIDATE_FLOAT: Validates value as a float number.

  • FILTER_VALIDATE_REGEXP: Validates value against a Perl-compatible regular expression.

  • FILTER_VALIDATE_URL: Validates value as an URL.

    Optional flags available:

    • FILTER_FLAG_SCHEME_REQUIRED – Requires URL to be an RFC compliant URL (like http://example)
    • FILTER_FLAG_HOST_REQUIRED – Requires URL to include host name (like http://www.example.com )
    • FILTER_FLAG_PATH_REQUIRED – Requires URL to have a path after the domain name (like www.example.com/example1/test2/ )
    • FILTER_FLAG_QUERY_REQUIRED – Requires URL to have a query string (like “example.php?name=Peter&age=37”)
  • FILTER_VALIDATE_EMAIL: Validates value as an e-mail address.

  • FILTER_VALIDATE_IP: Validates value as an IPv4 or IPv6 address.

    Optional flags available:

    • FILTER_FLAG_IPV4 – Requires the value to be a valid IPv4 IP (like 255.255.255.255)
    • FILTER_FLAG_IPV6 – Requires the value to be a valid IPv6 IP (like 2001:0db8:85a3:08d3:1319:8a2e:0370:7334)
    • FILTER_FLAG_NO_PRIV_RANGE – Requires the value to be a RFC specified IP, not within a private range (like 192.168.0.1, 10.0.0.1, …)
    • FILTER_FLAG_NO_RES_RANGE – Requires that the value is not within the reserved IP range. This flag takes both IPV4 and IPV6 values. A reserved IP could be 255.255.255.255 (broadcast address).

An excellent source for more code examples can be found at the Devolio blog post Data Filtering Using PHPs Filter Functions (Part One) .

While the filter_var() functions can’t replace every possible type of input validation, it can help a great deal. Especially tricky validations such as E-mail, URL or IPv6 addresses are so much easier this way.