Wednesday, December 18, 2013

Data Validation and Sanitizing with PHP

To make website secure and protect them from hacks and preventing bad guys from gaining access to our site’s data, it is very important to validate and sanitize data from external sources before performing any action on those data. Validation is the process of verifying whether the data is in the format what we expect. Sanitization is the process to remove unwanted characters or malicious code from the data. We cannot trust any data we collect from external sources like user submitted data. We need to first validate the data and then sanitize it before displaying it or inserting it into database.

When users submit data to our website, we need to make sure that the data is the form we expect. If we expect the input to be an integer we need to validate that the input user has submitted is an integer. In the same we way we need to validate data the user enters for other types like name should only contain alphabets and period, email should contain only alphanumeric characters, at the rate, underscore and period. If the field shouldn't have HTML in it, we need to make sure to remove HTML from it. If the field should have HTML in it, make sure only the parts of HTML that we like are included. The following are some of the simple methods to validate user submitted data:

Numbers Only

The following code will validate numbers. It will take a value and strip out any non-numeric characters. This code will allow negative numbers and decimal points.

$output = preg_replace("/[^0-9\-.]/", "", $data);

Strip Tags or Display Tags

To remove HTML tags from the data we can use the following PHP function.

$output = strip_tags($data);


If we want to display HTML tags in the output we can use the following PHP function. This function displays the HTML tags, the code will not be parsed.

$output = htmlspecialchars($data);

Escaping Strings in MySQL

The following functions can be used to sanitize the data before it can be inserted into database.

<?php
function clean_data($data) {

  $filters = array(
    '@<script[^>]*?>.*?</script>@si',   // Remove javascript code
    '@<[\/\!]*?[^<>]*?>@si',            // Remove HTML tags
    '@<style[^>]*?>.*?</style>@siU',    // Remove style tags
    '@<![\s\S]*?--[ \t\n\r]*>@'         // Remove multi-line comments
  );

    $output_data = preg_replace($filters, '', $data);
    return $output_data;
  }


function sanitize_data($data) {
    if (is_array($data)) {
        foreach($data as $key=>$val) {
            $output_data[$key] = sanitize_data($val);
        }
    }
    else {
        if (get_magic_quotes_gpc()) {
            $data = stripslashes($data);
        }
        $data  = clean_data($data);
        $output_data = mysql_real_escape_string($data);
    }
    return $output_data;
}
?>

The following is the usage example of the above functions.

<?php
  $string = "This is my <script src='http://www.example.com/malicious_script.js'></script> profile.";
  $output_string = sanitize_data($string);

echo "Original String : ".$string;
echo "<br> Sanitized String : ".$output_string;
?>


If you run the above script and see the generate output using view source from the browser, you can see that the input string has the javascript embedded in it, which the output string does not contain it.

No comments :

Post a Comment