Back to Question Center
0

I-Semalt - Indlela Yokuchofoza Amakhasi Ewebhu?

1 answers:

Umsila omuhle umtapo wolwazi lwePython ojwayele ukutshala amakhasi ewebhu ngokudala umuthi kusuka kumadokhumenti e-XML kanye ne-HTML. I-Web scraping, inqubo yokukhipha idatha kusuka kumawebhusayithi namakhasi, isetshenziselwa kakhulu ukuhlaziywa kwedatha kanye nezinkampani zokuphatha. Ezimweni eziningi, ulimi lokuhlela uhlelo lwe-Python luyimfuneko yedatha yesayensi.

I-Python 3 ine amathuluzi okusika namamojula ongawasebenzisa kuphrojekthi yakho yokuphatha idatha - lineas directrices fotografia. Okwamanje isebenza njenge-Soup Beautiful 4, le module ihambisana ne-Python 3 ne-Python 2. 7. I-Soup Beautiful 4 module nayo iyakwazi ukudala umuthi we-parse for isobho esingagcini umaka. Kulesi sifundo, uzofunda ukuthi ungayishaya ikhasi bese ubhala idatha ekhishwe kwifayela le-CSV.

Ukuqalisa

Ukuze uqalise, usetha isiphakeli noma imvelo yendawo yokusekela i-Python imvelo ku-PC yakho. Kumele futhi ufake i-Soup Beautiful kanye nezicelo zoMumo kumshini wakho. Ulwazi lokusebenza nawo womabili amamojula nakho kuyadingeka okudingekayo. Ukuzijwayeza nge-tagging kanye nesakhiwo se-HTML kuyinzuzo eyengeziwe.

Ukuqonda idatha yakho

Kulesi simo, idatha yangempela evela ku-National Gallery of Art izosetshenziselwa ukukusiza ukuthi usebenzise indlela yokusebenzisa isobho esihle 4. Igalari kaZwelonke yezobuciko iqukethe izingcezu ezingu-120,000 ezenziwe ngabadwebi abangaba ngu-13,000. I-Art isekelwe eWashington D. C, United States.

Isizinda sedatha yeWebhu nge-Soup Beautiful akuyona leyo eyinkimbinkimbi. Isibonelo, uma ugxila ku-Z, phawula bese ubhala phansi igama lokuqala ohlwini. Kulesi simo, igama lokuqala nguZabaglia, Niccola. Ngokuvumelana, khombisa inani lamakhasi negama lomculi wokugcina kuleli khasi.

Indlela yokungenisa izicelo kanye nomtapo wezintambo ezimnandi

Ukungenisa amaleyibhrari, sebenzisa imvelo yakho ye-Python 3. Hlola ukuqinisekisa ukuthi usekuqondeni olufanayo nemvelo yakho yokuhlela. Qalisa umyalo olandelayo ukuze uqalise. i-my_env / bin / isebenze.

Dala ifayela elisha bese uqala ukungenisa imilabhu enhle yokucela isinkwa nokucela. Ilabhulali yokucela izokuvumela ukuthi usebenzise i-HTTP ngaphakathi kwezinhlelo zakho ze-Python emafomethi afundekayo. Ukupheka okuhle, ngakolunye uhlangothi, kusebenza ukuxuba amakhasi ngokushesha. Sebenzisa i-bs4 ukungenisa isobho esihle.

Indlela yokuqoqa nokudlulisa ikhasi lewebhu

Ukusebenzisa Izicelo ukuqoqa i-URL yekhasi lakho lokuqala. I-URL yekhasi lokuqala izokwaziswa ekhasini eliguquguqukayo. Yakha into ethi BeautifulSoup kusuka Ezicelayo bese uphakamisa into evela ku-Python's parser.

Kulesi sifundo, inhloso ukuqoqa izixhumanisi namagama abaculi. Isibonelo, ungakwazi ukuqoqa amadethi nabaculi. Kubasebenzisi be-Windows, chofoza ngakwesokudla igama lokuqala lomculi. Kulokhu, sebenzisa iZabaglia, Niccola. Kubasebenzisi be-Mac OS, thinta "CTRL" bese uchofoza igama. Chofoza imenyu ethi "Hlola i-Element" ukuthi ama-pop-ups esikrinini sakho afinyelele amathuluzi abathuthukisi bewebhu. Phrinta amagama omculi ukuze wenze isobho esihle siphume umuthi ngokushesha.

Ukususa izixhumanisi ezingezansi

Ukususa izixhumanisi ezingezansi kukhasi lakho lewebhu, hlola i-DOM ngokuchofoza kwesokudla kwesici. Uzobona ukuthi izixhumanisi zingaphansi kwetafula le-HTML. Ukusebenzisa isobho elihle, sebenzisa "indlela yokubola" ukususa amathegi kusuka esihlahleni.

Indlela yokudonsa okuqukethwe okuvela kumathe

Akudingeki uprinte wonke umaki wokuxhumanisa, sebenzisa isobho esihle ukususa izinto ezivela kumathe. Ungaphinda uthathe ama-URL ahlotshaniswa nabaculi ngokusebenzisa isobho esihle 4.

Ukuthola idatha ekhishwe kwifayela le-CSV

ifayela le-CSV kuzokuvumela ukuthi ugcine idatha ehleliwe ematheksthini ethafeni, ifomethi esetshenziselwa kakhulu ama-datasheet. Ulwazi lokusingatha amafayela wombhalo ocacile ku-Python Kunconywa.

Isizinda sedatha seWebhu sisetshenziselwa ukukhipha amakhasi bese uthola ulwazi. Qaphela amawebhusayithi oyilwazi lokukhipha kusukela. Amanye amawebhusayithi ashukumisayo avimbela ukukhishwa kwemininingwane yewebhu kumasayithi abo. Ukukhipha ikhasi nge-Soup Beautiful ne-Python 3 yilokho okulula.

December 22, 2017