Thursday, December 19, 2013

Getting content from a webpage in PHP using CURL

We can get content from a website in PHP using PHP function file_get_contents(). Below is the sample code to display the content from a website.

<?php
$content=file_get_contents('http://www.example.com');
echo $content;
?>

But some website does not allow accessing their content by anything other than web browser. Such websites block the program that is trying to access their content by checking for a User Agent string, which is sent by all browsers to websites they visit. Therefore to access this type of websites, we have to write a program that simulates being a browser. In this tutorial we will learn how to write a program that fetches the content of a web page simulating like a browser.  For this program, we use the Mod CURL (Client URL) library extension to PHP.  It only works when this extension is enabled in our server or PHP installation.

The following is an example User Agent string:
Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201
For more information and list of User Agent string you can visit www.useragentstring.com

The following is the code to get content from a website using CURL:

<?php
$url='http://www.example.org';
$user_agent='Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201';

$curl=curl_init(); //Open a session using CURL

//Setting options for CURL
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_USERAGENT, $user_agent);
curl_setopt($curl, CURLOPT_HEADER, 0);
curl_setopt($curl, CURLOPT_ENCODING, 'gzip');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl, CURLOPT_FAILONERROR, 1);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 8);
curl_setopt($curl, CURLOPT_TIMEOUT, 8);

$content=curl_exec($curl); //Executing the CURL
curl_close($curl); //Close session

echo $content;
?>

Note: Some websites only allow browsers to access their web pages because other programs are not permitted to access it. So please check whether you are allowed to access the content of a website before using this code to access that website.

No comments :

Post a Comment