計算機科学のブログ

ほしい物リスト

Python - Flask - gazpacho - Working with HTML - Web Scraping - string, slice, find method

Head First Python: A Learner’s Guide to the Fundamentals of Python Programming, A Brain-Friendly GuidePaul Barry(著)、 O’Reilly Mediaの Chapter 9.(Working with HTML: Web Scraping)、SHARPEN YOUR PENCIL(435/682)の解答を求めてみる。

Jupyter(コード、入出力結果)

WorldRecords.ipynb

html[:500]
'<!DOCTYPE html>\n<html class="client-nojs vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-feature-limited-width-clientpref-1 vector-feature-limited-width-content-enabled vector-feature-custom-font-size-clientpref-1 vector-feature-appearance-pinned-clientpref-1 vector-feature-night-mode-enabled skin-theme-clientpref-day vect'
print(_)
<!DOCTYPE html>
<html class="client-nojs vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-feature-limited-width-clientpref-1 vector-feature-limited-width-content-enabled vector-feature-custom-font-size-clientpref-1 vector-feature-appearance-pinned-clientpref-1 vector-feature-night-mode-enabled skin-theme-clientpref-day vect
print(html[-500:])
anization","name":"Contributors to Wikimedia projects"},"publisher":{"@type":"Organization","name":"Wikimedia Foundation, Inc.","logo":{"@type":"ImageObject","url":"https:\/\/www.wikimedia.org\/static\/images\/wmf-hor-googpub.png"}},"datePublished":"2007-03-15T21:20:10Z","dateModified":"2025-08-04T13:23:30Z","image":"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/a\/ae\/Caeleb_Dressel_before_winning_100_fly_%2842769914221%29.jpg","headline":"Wikimedia list article"}</script>
</body>
</html>
html.find('<table>')
-1
html.find('<table')
67432
n = _
html[n:n+500]
'<table class="wikitable sortable" style="font-size: 95%;">\n<caption>\n</caption>\n<tbody><tr>\n<th>Event\n</th>\n<th style="width:4em" class="unsortable">Time\n</th>\n<th class="unsortable">\n</th>\n<th>Name</th>\n<th>Nationality</th>\n<th>Date</th>\n<th>Meet</th>\n<th>Location\n</th>\n<th class="unsortable">Ref\n</th></tr>\n\n<tr>\n<td><span data-sort-value="01&nbsp;!"><a href="/wiki/World_record_progression_50_metres_freestyle" title="World record progression 50 metres freestyle">50m freestyle</a></span>\n</td>\n<'
print(_)
<table class="wikitable sortable" style="font-size: 95%;">
<caption>
</caption>
<tbody><tr>
<th>Event
</th>
<th style="width:4em" class="unsortable">Time
</th>
<th class="unsortable">
</th>
<th>Name</th>
<th>Nationality</th>
<th>Date</th>
<th>Meet</th>
<th>Location
</th>
<th class="unsortable">Ref
</th></tr>

<tr>
<td><span data-sort-value="01&nbsp;!"><a href="/wiki/World_record_progression_50_metres_freestyle" title="World record progression 50 metres freestyle">50m freestyle</a></span>
</td>
<