Академический Документы
Профессиональный Документы
Культура Документы
Processing real-world
HTML
a quick introduction to html5lib
Edward O’Connor
Edward O’Connor, Django San Diego Django San Diego / SD Ruby Joint Meeting, 6 August 2009
Processing real-world HTML: a quick introduction to html5lib previous next
Edward O’Connor, Django San Diego Django San Diego / SD Ruby Joint Meeting, 6 August 2009
Processing real-world HTML: a quick introduction to html5lib previous next
<B>Hi, <I>Joe</b>!
<p/>
So good to </i><BLINK>finally
meet you & stuff.
Edward O’Connor, Django San Diego Django San Diego / SD Ruby Joint Meeting, 6 August 2009
Processing real-world HTML: a quick introduction to html5lib previous next
Edward O’Connor, Django San Diego Django San Diego / SD Ruby Joint Meeting, 6 August 2009
Processing real-world HTML: a quick introduction to html5lib previous next
Tag Soup
Browsers handle such markup well and
mostly uniformly.
Edward O’Connor, Django San Diego Django San Diego / SD Ruby Joint Meeting, 6 August 2009
Processing real-world HTML: a quick introduction to html5lib previous next
Edward O’Connor, Django San Diego Django San Diego / SD Ruby Joint Meeting, 6 August 2009
Processing real-world HTML: a quick introduction to html5lib previous next
Edward O’Connor, Django San Diego Django San Diego / SD Ruby Joint Meeting, 6 August 2009
Processing real-world HTML: a quick introduction to html5lib previous next
HTML 5
Edward O’Connor, Django San Diego Django San Diego / SD Ruby Joint Meeting, 6 August 2009
Processing real-world HTML: a quick introduction to html5lib previous next
html5lib
Edward O’Connor, Django San Diego Django San Diego / SD Ruby Joint Meeting, 6 August 2009
Processing real-world HTML: a quick introduction to html5lib previous next
import html5lib
f = open("mydocument.html")
parser = html5lib.HTMLParser()
document = parser.parse(f)
Edward O’Connor, Django San Diego Django San Diego / SD Ruby Joint Meeting, 6 August 2009
Processing real-world HTML: a quick introduction to html5lib previous next
require 'html5lib/html5parser'
include HTML5
f = File.open("mydocument.html")
document = HTMLParser.parse(f)
Edward O’Connor, Django San Diego Django San Diego / SD Ruby Joint Meeting, 6 August 2009
Processing real-world HTML: a quick introduction to html5lib previous next
Thousands of tests
"Python html5lib implements the spec
so well, it even implements an infinite
loop." — @gsnedders
Edward O’Connor, Django San Diego Django San Diego / SD Ruby Joint Meeting, 6 August 2009
Processing real-world HTML: a quick introduction to html5lib previous next
Tree building
Plugs into your favorite DOM or DOM-
like API
Edward O’Connor, Django San Diego Django San Diego / SD Ruby Joint Meeting, 6 August 2009
Processing real-world HTML: a quick introduction to html5lib previous next
Tree walking
Python: dom, ElementTree, genshi, lxml, pulldom,
Beautiful Soup
Ruby: REXML, Hpricot
Edward O’Connor, Django San Diego Django San Diego / SD Ruby Joint Meeting, 6 August 2009
Processing real-world HTML: a quick introduction to html5lib previous next
Filters
Sanitizer (whitelists)
Conformance checker (validator)
Edward O’Connor, Django San Diego Django San Diego / SD Ruby Joint Meeting, 6 August 2009
Processing real-world HTML: a quick introduction to html5lib previous next
Edward O’Connor, Django San Diego Django San Diego / SD Ruby Joint Meeting, 6 August 2009
Processing real-world HTML: a quick introduction to html5lib previous next
Edward O’Connor, Django San Diego Django San Diego / SD Ruby Joint Meeting, 6 August 2009
Processing real-world HTML: a quick introduction to html5lib previous next
Edward O’Connor, Django San Diego Django San Diego / SD Ruby Joint Meeting, 6 August 2009
Processing real-world HTML: a quick introduction to html5lib previous next
Questions?
http://edward.oconnor.cx/2009/08/djangosd-
html5lib
CC BY-SA 3.0
Edward O’Connor, Django San Diego Django San Diego / SD Ruby Joint Meeting, 6 August 2009