Web Scraping

I want to learn how to scrap data from certain sites into a CSV or database. Which language would be better PHP or Python? I have zero python knowledge and intermediate PHP knowlege. Are there any PHP classes done so that i dont reinvent the wheel? Any advice appreciated. Sengiu

@Deorro njoo!

1 Like

use PHP class - simplehtmldom

1 Like

Don’t fret about languages. You know PHP, go with it. It also happens to be very good for this kind of thing.

Yes. use GuzzleHttp [guzzlehttp/guzzle] or Httpful [nategood/httpful] to make HTTP queries. HTTPFul is a bit easier to use. For parsing the response HTML, if you need to, use SimpleDom [simple-html-dom/simple-html-dom].

This is very simple once you start doing it, I promise.

EDIT: Won’t be very simple if you have to bypass captchas or logging in.

2 Likes

sijui php lakini Python he can use Beautifulsoup4 or scrappy

php @TerribleWaste can help

1 Like

use the php since you have knowledge about it, it can handle pretty much of these stuff

1 Like

Hii thread nimeona nunge lakini bora kaka asaidike. :D:D:D

2 Likes

:D:Dusijali…i would feel the same way ukitaja forex

1 Like

[PHP]<?php
$html = file_get_contents(‘Pokémon evolution charts | Pokémon Database’); //get the html returned from the following url

$pokemon_doc = new DOMDocument();

libxml_use_internal_errors(TRUE); //disable libxml errors

if(!empty($html)){ //if any html is actually returned

$pokemon_doc->loadHTML($html);
libxml_clear_errors(); //remove errors for yucky html

$pokemon_xpath = new DOMXPath($pokemon_doc);

//get all the h2's with an id
$pokemon_row = $pokemon_xpath->query('//h2[@id]');

if($pokemon_row->length > 0){
    foreach($pokemon_row as $row){
        echo $row->nodeValue . "<br/>";
    }
}

}[/PHP]

Source: [MEDIA=gist]anchetaWern/6150297[/MEDIA]

Next time visit Google.

1 Like

Ukiweza web scrapping nione niko na job opportunity