Back to Question Center
0

Semalt: 3 Izinyathelo kwi-WebPP Page Ukwahlula

1 answers:

I-Web scraping, ebizwa ngokuba yi-web extraction data okanye ukuvuna iwebhu, Inkqubo yokukhipha idatha kwiwebhusayithi okanye kwiblogi. Le ngcaciso isetyenziselwa ukusetha i-tags ze-meta, iinkcazo ze-meta, iifayile kunye nekhonkco kwisayithi, ukuphucula ukusebenza kwayo yonke kwimiphumo ye-injini yokusesha.

Amacandelo amabini aphezulu asetyenziswa ukutshiza idatha:

  • I-document parsing - Ifaka i-XML okanye i-HTML eguqulelwe kwi-DOM (Document Object Model ) iifayili. I-PHP isinika isongezelelo esikhulu seDOM - kanger vape supplies.
  • Amazwi rhoqo - Yindlela yokwazisa idatha kwiimpepha zewebhu ngohlobo lwamazwi avamile.

Umba kunye nokuchithwa kwedatha yewebhusayithi yesithathu inxulumene nekhredithi ye-copyright ngoba awunayo imvume yokusebenzisa le datha. Kodwa nge-PHP, unokwenza ululawule idatha ngaphandle kweengxaki ezinxulumene namalungelo okukopisha okanye umgangatho ophantsi. Njengomprogram we-PHP, unokudinga idatha kwiiwebhusayithi ezahlukeneyo ngenjongo yokwenza ikhowudi. Apha sichazile indlela yokufumana idatha kwezinye iisayithi ngokufanelekileyo, kodwa ngaphambi koko, kufuneka ukhumbule ukuba ekugqibeleni uza kufumana iifayile ze-index.php okanye ze-scrape.js. Isinyathelo1: Yakha Ifomu ukungena kwiWebhu ye-URL:

Okokuqala, kufuneka udale ifom kwi index.php ngokuchofoza inkquthelo yo-Hambisa uze ufake i-URL yewebhu ukuze ulandele idatha.


Iimpawu 2: Yenza umsebenzi we-PHP ukufumana i-Website Data:

Isinyathelo sesibini kukudala Umsebenzi we-PHP kwifayile ye scrape.php njengoko kuya kunceda ukufumana idatha kunye nokusebenzisa ilayibrari ye-URL. Kuya kukuvumela ukuba udibanise kwaye uqhagamshelane kunye namaseva ahlukeneyo kunye neeprotokholi ngaphandle kokuphuma..

umsebenzi scrapeSiteData (i-website_url) {

ukuba (! Function_exists ('curl_init')) {

afe ('cURL ayifakwanga. ');

}

$ curl = curl_init

;

curl_setopt ($ curl, CURLOPT_URL, i-website website_url);

curl_setopt ($ curl, CURLOPT_RETURNTRANSFER, yinyaniso);

$ output = curl_exec ($ curl);

i-curl_close (i-curl ye-$);

ubuyele imali ephumayo;

}

Lapha, sinokubona ukuba i-PHP cURL ifakwe ngokufanelekileyo okanye cha. I-CURL ezintathu eziphambili kufuneka zisetyenziswe kwimisebenzi yendawo kunye ne-curl_init

iya kunceda ukuqalisa iiseshoni, curl_exec

iya kuyenza kwaye i-curl_close

iya kunceda ukuvala uxhumano. Iimpawu ezifana neCURLOPT_URL zisetyenziselwa ukuseta ii-URL zewebhu ezifunekayo ukuze sizenze. Owesibini u-CURLOPT_RETURNTRANSFER uya kunceda ukugcina iphepha elichanekileyo kwifom eguqukileyo kunokuba ifomu engagqibekanga, ekugqibeleni iya kubonisa lonke iphepha lewebhu.

Amanyathelo3: Iinkcukacha ezicacileyo ze-Website:

Ixesha lokusingatha imisebenzi yefayile yakho ye-PHP kwaye uphawule icandelo elithile lephepha lakho lewebhu. Ukuba awufuni yonke idatha kwi-URL ethile, kufuneka uhlele ukusebenzisa iinguqu ze CURLOPT_RETURNTRANSFER kwaye ugxininise iinqununu ofuna ukuzenza.

ukuba (isset ($ _ POST ['ngenise'])) {{3)

$ html = scrapeWebsiteData ($ _ POST ['website_url']);

$ start_point = i-strpos (i-html html, 'iifayile zandwendwe');

$ end_point = strpos ($ html, '', $ start_point);

$ ubude = i-end_point- $ start_point;

$ html = substr (i-html html, $ start_point, ubude bwedola);

bhala i $ html;

}

Sikhuthaza ukuba uhlakulele ulwazi olusisiseko lwe-PHP kunye neenkcazo eziqhelekileyo ngaphambi kokusebenzisa nayiphi na le khowudi okanye uhlasele iblogi ethile okanye iwebhusayithi kwiinjongo zakho.

December 8, 2017