[Python] HTML,XML 파싱 라이브러리

IT/프로그래밍

[Python] HTML,XML 파싱 라이브러리 - BeautifulSoup

NineKY 2009. 10. 28. 13:59

Python에서 HTML/XML 작업을 편하게 할 수 있도록 지원해주는 Library 이다. 사용법은 그리 어렵지 않으므로 구글사마에게 잠시 여쭤보면 대부분의 답이 나올 것이다.
제작사 : http://www.crummy.com/software/BeautifulSoup/

BeautifulSoup의 API 정보는 다음의 사이트에서 확인할 수 있다.
참고 :http://api.plone.org/Plone/3.0/private/frames/src/kss.core/kss/core/private/kss.core.BeautifulSoup-module.html

아래 소스는 BeautifulSoup을 이용해 작성한 간단한 코드이다.

try:
 socket.setdefaulttimeout(timeout)\
 // vatorul 에서 페이지 HTML 정보를 가져온다.
 text = urllib.urlopen(vitourl).read()        
 // BeautifulSoup의 입력으로 전달
 soup = BeautifulSoup.BeautifulSoup(text)        
 // '<table ~'을 검색, id 값이 tablaMotores인 것만 찾는다.
 table = soup.find("table", { "id" : "tablaMotores" })         
 // table 결과에서 모든 '<tr ~' 을 검색
 for TRs in table.findAll("tr"):                    
   // TRs 에서 '<td ~' 을 검색, class 값이 positivo인 것만 찾는다.
   node = TRs.find("td", { "class" : "positivo" })
   if (node):
     TDs = TRs('td')
     print "%-20s : %s" %(TDs.pop(0).contents[0], node.contents[0])
except Exception, msg:
 print "Error:Exception GetVirustotalResult : %s --> %s" %(msg, vitourl)

아래는 제작사에서 제공하는 사용법이다.