We can get content from a website in PHP using PHP function file_get_contents().
Below is the sample code to display the content from a website.
<?php
$content=file_get_contents('http://www.example.com');
echo
$content;
?>
But some website does not allow accessing their content by anything
other than web browser. Such websites block the program that is trying to
access their content by checking for a User Agent string, which is sent by all
browsers to websites they visit. Therefore to access this type of websites, we
have to write a program that simulates being a browser. In this tutorial we
will learn how to write a program that fetches the content of a web page
simulating like a browser. For this
program, we use the Mod CURL (Client URL) library extension to PHP. It only works when this extension is enabled
in our server or PHP installation.
The following is an example User Agent string:
Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2)
Gecko/20110201
For more information and list of User Agent string you can
visit www.useragentstring.com
The following is the code to get content from a website
using CURL:
<?php
$url='http://www.example.org';
$user_agent='Mozilla/5.0 (Windows; U; Windows NT 6.1;
rv:2.2) Gecko/20110201';
$curl=curl_init(); //Open a session using CURL
//Setting options for CURL
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_USERAGENT, $user_agent);
curl_setopt($curl, CURLOPT_HEADER, 0);
curl_setopt($curl, CURLOPT_ENCODING, 'gzip');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl, CURLOPT_FAILONERROR, 1);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 8);
curl_setopt($curl, CURLOPT_TIMEOUT, 8);
$content=curl_exec($curl); //Executing the CURL
curl_close($curl); //Close session
echo $content;
?>
No comments :
Post a Comment