[Python] BeautifulSoup 패키지

웹 크롤링=스크래핑을 할 때 많이 쓰는 패키지

요청에 대한 응답을 가져오고 예쁘게 파싱해준다.

그래서 뷰티풀 수프인가보다.

설치를 해주자

pip install bs4

예시는 스파르타 코딩클럽 강의에서 가져왔다.

 import requests
from bs4 import BeautifulSoup
 
headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
data = requests.get('https://movie.naver.com/movie/sdb/rank/rmovie.naver?sel=pnt&date=20210829',headers=headers)
 
soup = BeautifulSoup(data.text, 'html.parser');
# print(soup);
 
a = soup.select_one('#old_content > table > tbody > tr:nth-child(3) > td.title > div > a');
# print(a.text);
# print(a['href']);
 
trs = soup.select('#old_content > table > tbody > tr');
for tr in trs :
    a = tr.select_one('td.title > div > a');
    if a is not None :
        print(a.text);

요청 헤더, 쿠키 등을 설정해줄 수 있다.

헤더에 User-Agent 부분은 크롤링을 할 때 설정해주지 않으면 봇으로 판단해서 연결을 끊어버리거나 응답을 해주지 않을 때도 있으니, 넣어주는 것이 좋겠다.

HTML에서 요소를 선택하는 방법은 XPATH도 있고 CSS Selector도 있고 여러가지가 있는데

나는 CSS Selector가 제일 편한 것 같다.

공식문서

https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Beautiful Soup Documentation — Beautiful Soup 4.9.0 documentation

Beautiful Soup Documentation Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers h

www.crummy.com

저작자표시 비영리 변경금지

'old > Python' 카테고리의 다른 글

[Python] requests 패키지 (0)	2023.02.14

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

[Python] BeautifulSoup 패키지

'old > Python' 카테고리의 다른 글

최근댓글

최근글

인기글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

	import requests
	from bs4 import BeautifulSoup

	headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
	data = requests.get('https://movie.naver.com/movie/sdb/rank/rmovie.naver?sel=pnt&date=20210829',headers=headers)

	soup = BeautifulSoup(data.text, 'html.parser');
	# print(soup);

	a = soup.select_one('#old_content > table > tbody > tr:nth-child(3) > td.title > div > a');
	# print(a.text);
	# print(a['href']);

	trs = soup.select('#old_content > table > tbody > tr');
	for tr in trs :
	a = tr.select_one('td.title > div > a');
	if a is not None :
	print(a.text);

[Python] BeautifulSoup 패키지

'old > Python' 카테고리의 다른 글

관련글

최근댓글

최근글

인기글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역