webcodebase.com Snippets database & Pastebin

PHP Website Parser Class

Submitted on 10/11/2008
Authors Comment: PHP Website Parser class with detailed description of most of the functions, can be useful learning for new web developers and also a easy way for people to recive information from websites when its needed.
How to use: How to use:
Read the Code, everything is explained in the code.
Download Snippet:
Snippet:

  1. <?php
  2. /*
  3.  
  4. * PHP Website Parser
  5.  
  6. * This class can be used to parse websites and recive
  7. * Information from the website.
  8.  
  9.  
  10. * Author: 2008 Arne Chr. Blystad
  11. * Website: http://www.webcodebase.com
  12.  
  13. */
  14.  
  15. /* Website to parse */
  16. $website        =       "http://www.webcodebase.com/index.php?page=codebase";
  17.  
  18. /* Regex to Parse [Only needed for some functions] */
  19. $regex  =       "/PHP \\(([0-9]*)\\)/";
  20.  
  21. // Declare the class and which website that is going to be parsed
  22. $parse = new parse($website);
  23.  
  24. // Different methods to use the class:
  25.  
  26. // Parse a line, following the regex
  27. # echo $parse->parse_line($regex);
  28. # OR:
  29. # $c = $this->parse_line($regex);
  30. # echo "Currently there is {$c} scripts on WebCodeBase.com";
  31.  
  32. // Parse the whole website
  33. # echo $parse->parse_site();
  34.  
  35. /*
  36.  
  37. * [Class] Parse
  38.  
  39. * [Function] Parse Websites
  40.  
  41. */
  42. class parse {
  43.         /*
  44.         * [Function] __construct($website,$what_to_do,$regex)
  45.        
  46.         * __construct is loaded when the class is loaded
  47.         * using for example $parse = new parse("http://www.webcodebase.com")
  48.         */
  49.        
  50.         public function __construct($website)
  51.         {
  52.                 // Define variables
  53.                 $this->website  =       $website;
  54.                
  55.                 $this->web              =       $this->parse_website($this->website);
  56.         }
  57.  
  58.  
  59.         /*
  60.         * [Function] parse_line($content , $regex , $output)
  61.         * $content = Output of file_get_contents()
  62.         * $regex = Regex of info that is going to be parsed
  63.         * $output = either echo or return, will decide method of outputting the data
  64.         */     
  65.         public function parse_line($regex,$output="return")
  66.         {      
  67.                 $content = $this->web;
  68.                
  69.                 if (preg_match($regex, $content, $regs)) {
  70.                         $result = $regs[0];
  71.                 } else {
  72.                         $result = "[ERROR] Couldn't match REGEX";
  73.                 }
  74.                
  75.                 if($output == "return") {
  76.                         return $result;
  77.                 }
  78.                 else {
  79.                         echo $result;
  80.                 }
  81.         }
  82.         /*
  83.        
  84.         * [Function] parse_website($website)
  85.         * Parses the website, and either return the info,
  86.         - Or results in a error
  87.        
  88.         */
  89.         public function parse_website($website)
  90.         {
  91.                 $c = file_get_contents($website);
  92.                
  93.                 if(!$c)
  94.                 {
  95.                         die("[ERROR] Invalid Website");
  96.                 }
  97.                 else
  98.                 {
  99.                         return $c;
  100.                 }
  101.         }
  102.         /*
  103.        
  104.         * [Function] parse_site
  105.         * Parses the website whole website, but it removes the HTML tags
  106.         - This can be useful for a lot of different things, only the imaginations
  107.         - will limit you.
  108.        
  109.         */
  110.         public function parse_site() {
  111.                 // This can be done on a lot of way, i prefer this simple method:
  112.                 $regex = "%<(.|\\n)*?>|/\\*.*?\\*/|@import url(.*?);%";
  113.                
  114.                 $pattern = $regex;
  115.                 $replace = "";
  116.                 $source  = $this->web;
  117.                
  118.                 // Replace all outputs by well, nothing. so that we only see the text.
  119.                 $output = preg_replace($pattern,$replace,$source);
  120.                
  121.                 return $output;
  122.         }
  123. }
  124. ?>

 

Comments:

  1. .efwkmqaar | December 11, 2008 at 18:47

    Yh7ndb <a href="http://jldbducfabcg.com/">jldbducfabcg</a>, [url=http://ipesrgldomfl.com/]ipesrgldomfl[/url], [link=http://irrwxwfrufya.com/]irrwxwfrufya[/link], http://lrdrzkimvqzq.com/

  2. .irumzyrczc | December 11, 2008 at 18:47

    6OsUlm <a href="http://isdftwefrloz.com/">isdftwefrloz</a>, [url=http://wbawtjxpqzfi.com/]wbawtjxpqzfi[/url], [link=http://ackmezeydgeb.com/]ackmezeydgeb[/link], http://lkaevgmllrmv.com/


New Comment:
Name:
Comment:








Latest News

  • 30/12/09
    Webcodebase is BACK & New updates.
    Hello, Because of a jackass of a friend the main files for Webcodebase was deleted. I Thought everything was lost until i found a backup folder on my old linux box, so i restored WCB Back to its original state. The staff will continue posting snippets and fighting spam. We will also start releasing tools that web masters can use to check their scripts for vulnerabilities. These scripts will have special stars next to them and the "STAFF" tag. Please use these on your own systems only, we are NOT responsible for any damage they may do. We release them purely for educational purposes only! Thanks, Head of WCB, Arne AKA Cypher....
    [Read More]