Python - Working with HTML: Web Scraping - string, slices, index
Head First Python: A Learner’s Guide to the Fundamentals of Python Programming, A Brain-Friendly Guide、 Paul Barry(著)、 O’Reilly Mediaの Chapter 9.(Working with HTML: Web Scraping)、EXERCISE(435/682)の解答を求めてみる。
Jupyter(コード、入出力結果)
webapp/WorldRecords.ipynb
html[:500]
'<!DOCTYPE html>\n<html class="client-nojs vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-feature-limited-width-clientpref-1 vector-feature-limited-width-content-enabled vector-feature-custom-font-size-clientpref-1 vector-feature-appearance-pinned-clientpref-1 vector-feature-night-mode-enabled skin-theme-clientpref-day vect'
html[-500:]
'anization","name":"Contributors to Wikimedia projects"},"publisher":{"@type":"Organization","name":"Wikimedia Foundation, Inc.","logo":{"@type":"ImageObject","url":"https:\\/\\/www.wikimedia.org\\/static\\/images\\/wmf-hor-googpub.png"}},"datePublished":"2007-03-15T21:20:10Z","dateModified":"2025-04-23T11:08:46Z","image":"https:\\/\\/upload.wikimedia.org\\/wikipedia\\/commons\\/a\\/ae\\/Caeleb_Dressel_before_winning_100_fly_%2842769914221%29.jpg","headline":"Wikimedia list article"}</script>\n</body>\n</html>'
html.find('<table>')
-1
html.find('<table')
67383
i = html.find('<table')
html[i:i+500]
'<table class="wikitable sortable" style="font-size: 95%;">\n<caption>\n</caption>\n<tbody><tr>\n<th>Event\n</th>\n<th style="width:4em" class="unsortable">Time\n</th>\n<th class="unsortable">\n</th>\n<th>Name</th>\n<th>Nationality</th>\n<th>Date</th>\n<th>Meet</th>\n<th>Location\n</th>\n<th class="unsortable">Ref\n</th></tr>\n\n<tr>\n<td><span data-sort-value="01 !"><a href="/wiki/World_record_progression_50_metres_freestyle" title="World record progression 50 metres freestyle">50m freestyle</a></span>\n</td>\n<'
print(_)
<table class="wikitable sortable" style="font-size: 95%;">
<caption>
</caption>
<tbody><tr>
<th>Event
</th>
<th style="width:4em" class="unsortable">Time
</th>
<th class="unsortable">
</th>
<th>Name</th>
<th>Nationality</th>
<th>Date</th>
<th>Meet</th>
<th>Location
</th>
<th class="unsortable">Ref
</th></tr>
<tr>
<td><span data-sort-value="01 !"><a href="/wiki/World_record_progression_50_metres_freestyle" title="World record progression 50 metres freestyle">50m freestyle</a></span>
</td>
<