計算機科学のブログ

ほしい物リスト

Python - gazpacho - Working with HTML: Web Scraping - string, slice

Head First Python: A Learner’s Guide to the Fundamentals of Python Programming, A Brain-Friendly GuidePaul Barry(著)、 O’Reilly Mediaの Chapter 9.(Working with HTML: Web Scraping)、SHARPEN YOUR PENCIL(435/682)の解答を求めてみる。

Jupyter(コード、入出力結果)

webapp/WorldRecords.ipynb

import gazpacho
URL = 'https://en.wikipedia.org/wiki/List_of_world_records_in_swimming'
html = gazpacho.get(URL)
html[:50]
'<!DOCTYPE html>\n<html class="client-nojs vector-fe'
html[-50:]
'"Wikimedia list article"}</script>\n</body>\n</html>'
html.find('<table')
66328
i = html.find('<table')
html[i:i+500]
'<table class="wikitable sortable" style="font-size: 95%;">\n<caption>\n</caption>\n<tbody><tr>\n<th>Event\n</th>\n<th style="width:4em" class="unsortable">Time\n</th>\n<th class="unsortable">\n</th>\n<th>Name</th>\n<th>Nationality</th>\n<th>Date</th>\n<th>Meet</th>\n<th>Location\n</th>\n<th class="unsortable">Ref\n</th></tr>\n\n<tr>\n<td><span data-sort-value="01&#160;!"><a href="/wiki/World_record_progression_50_metres_freestyle" title="World record progression 50 metres freestyle">50m freestyle</a></span>\n</td>\n<'