"<>"
<tagname>…</tagname>
<tagname/>
<hr>
is legal in HTML<hr/>
or <hr></hr>
<X>…<Y>…</Y></X>
is legal…<X>…<Y>…</X></Y>
is not"<"
and ">"
&name;
Sequence | Character | Description |
---|---|---|
< |
< |
Less than |
> |
> |
Greater than |
" |
" |
Double quote |
' |
' |
Apostrophe |
& |
& |
Ampersand |
Å |
Å |
Angstrom |
|
|
Non-breaking space |
λ |
λ |
Greek small letter lambda |
Λ |
Λ |
Greek capital letter lambda |
Tag | Usage |
---|---|
<html>
|
Root element of entire HTML document. |
<body>
|
Body of page (i.e., visible content). |
<h1>
|
Top-level heading. Use <h2> , <h3> , etc. for second- and third-level headings. |
<p>
|
Paragraph. |
<em>
|
Emphasized text; browser or editor will usually display it in italics. |
<address>
|
Address of document author (also usually displayed in italics). |
<html>
<body>
<h1>Software Carpentry</h1>
<p>This course will introduce <em>essential software
development skills</em>,
and show where and how they should be applied.</p>
<address>Greg Wilson (gvwilson@third-bit.com)</address>
</body>
</html>
Figure 1: Simple Page Rendered by Firefox
<h1/>
(level-1 heading) is semantic (meaning)<i/>
(italics) is display (formatting)<h1 align="center">A Centered Heading</h1>
<p id="disclaimer" align="center">This planet provided as-is.</p>
<p align="left" align="right">…</p>
is illegal<p align=center>…<p>
, but modern parsers will reject it<head/>
element as well as a <body/>
<!--
, and end with -->
<html>
<head>
<title>Comments Page</title>
<meta name="author" content="aturing"/>
</head>
<body>
<!-- House style puts all titles in italics -->
<h1><em>Welcome to the Comments Page</em></h1>
<!-- Update this paragraph to describe the forum. -->
<p>Welcome to the Comments Forum.</p>
</body>
</html>
<ul/>
for an unordered (bulleted) list, and <ol/>
for an ordered (numbered) one<li/>
<table/>
for tables<tr/>
(for “table row”)<td/>
(for “table data”)<html>
<head>
<title>Lists and Tables</title>
<meta name="svn" content="$Id: xml.html,v 1.15 2010/04/23 20:41:32 scooter Exp $"/>
</head>
<body>
<table cellpadding="3" border="1">
<tr>
<td align="center"><em>Unordered List</em></td>
<td align="center"><em>Ordered List</em></td>
</tr>
<tr>
<td align="left" valign="top">
<ul>
<li>Hydrogen</li>
<li>Lithium</li>
<li>Sodium</li>
<li>Potassium</li>
<li>Rubidium</li>
<li>Cesium</li>
<li>Francium</li>
</ul>
</td>
<td align="left" valign="top">
<ol>
<li>Helium</li>
<li>Neon</li>
<li>Argon</li>
<li>Krypton</li>
<li>Xenon</li>
<li>Radon</li>
</ol>
</td>
</tr>
</table>
</body>
</html>
Figure 2: Lists and Tables
<meta/>
elements in document head
<img/>
tagsrc
argument specifies where to find the image file<html>
<head>
<title>Images</title>
<meta name="svn" content="$Id: xml.html,v 1.15 2010/04/23 20:41:32 scooter Exp $"/>
</head>
<body>
<h1>Our Logo</h1>
<img src="../../.swc/lec/img/sc_powered.jpg" alt="[Powered by Software Carpentry]"/>
</body>
</html>
Figure 3: Images in Pages
alt
attribute to specify alternative text<a/>
element to create a linkhref
attribute specifies what the link is pointing at<html>
<head>
<title>Links</title>
<meta name="svn" content="$Id: xml.html,v 1.15 2010/04/23 20:41:32 scooter Exp $"/>
</head>
<body>
<h1>A Few of My Favorite Places</h1>
<ul>
<li><a href="http://www.google.com">Google</a></li>
<li><a href="http://www.python.org">Python</a></li>
<li><a href="http://www.nature.com/index.html">Nature Online</a></li>
<li>Examples in this lecture:
<ul>
<li><a href="comments.html">Comments</a></li>
<li><a href="image.html">Images</a></li>
<li><a href="list_table.html">Lists and Tables</a></li>
</ul>
</li>
</ul>
</body>
</html>
Figure 4: Links in Pages
style
attribute
<p style="text-align: center">
<p style="text-align:center; font-weight:bold;">
<style/>
tags<head/>
section:
<style type="text/css" media="all">
followed by a number of CSS instructions<head/>
section:
<link rel="stylesheet" href="/reveal.js/css/reveal.min.css"/>
selector {property1:value1; property2:value2;...}
element.class
, where class is the value of the class
attribute, and element is either an HTML element or an element you've "invented".a:hover
can be used to change style when over a linkp:first-letter
can be used to change the style for the first letter of a paragraphp#paragaph1
would refer to the paragraph whose ID attribute is "paragraph1"ul.inc li.active
" would refer to <LI/>
elements with a class attribute of "active" and that are descendants of <UL/>
elements with a class attribute of "inc".Example style:
<style type="text/css">
body {font-family:arial;}
p.example {font-family:courier; margin-left:5em; margin-right:5em; background-color:LightBlue;}
.center {text-align:center;}
myTitle {font-weight:bold; display:block; color:green; text-align:center; font-size:150%}
</style>
Example input:
<body>
<myTitle>This is our header</myTitle>
<p>We will now introduce an example. This
is a standard paragraph, with all of the default
styles set up by the browser. Can you think of
a way you might be able to override at least one
of those defaults? Back to our example, we now
want to highlight a section of text, which might
be a quote or some other kind of example</p>
<p class="example">This is our example. Note that
the margins have been adjusted and we also now have
a background color. We could also have drawn a box
around our example, or we could have made other
adjustments.</p>
<p>Now we're back to normal text.</p>
</body>
Figure 5: Simple CSS Example Rendered by Firefox
display:inline
<i/>
, <span/>
, and <b/>
that can be laid out within a line (no line break)
display: inline
<html>
<body>
This is a sentence with a
<myStyle style="display:inline; border: thin red solid">"myStyle" element</myStyle>
embedded in it.
</body>
</html>
display:block
<p/>
, <div/>
and <li/>
that cause the line of text to break
display: block
<html>
<body>
This is a sentence with a
<myStyle style="display:block; border: thin red solid">"myStyle" element</myStyle>
embedded in it.
</body>
</html>
margin-top,-bottom,-left, and -right
:border-top,-bottom,-left, and -right
:padding-top,-bottom,-left, and -right
:minidom
Figure 6: A DOM Tree
<root> <first>element</first> <second attr="value">element</second> <third-element/> </root>
ElementTree
use dictionaries instead<?xml version="1.0" encoding="utf-8"?>
<planet name="Mercury">
<period units="days">87.97</period>
</planet>
import xml.dom.minidom
doc = xml.dom.minidom.parse('mercury.xml')
print doc.toxml('utf-8')
<?xml version="1.0" encoding="utf-8"?> <planet name="Mercury"> <period units="days">87.97</period> </planet>
toxml
method can be called on the document, or on any element node, to create textimport xml.dom.minidom
my_xml = '''<name>Donald Knuth</name>'''
my_doc = xml.dom.minidom.parseString(my_xml)
name = my_doc.documentElement.firstChild.data
print 'name is:', name
print 'but name in full is:', repr(name)
name is: Donald Knuth but name in full is: u'Donald Knuth'
u
in front of the string the second time it is printedprint
statement converts the Unicode string to ASCII for displayimport xml.dom.minidom
src = '''<planet name="Venus">
<period units="days">224.7</period>
</planet>'''
doc = xml.dom.minidom.parseString(src)
print doc.toxml('utf-8')
<?xml version="1.0" encoding="utf-8"?> <planet name="Venus"> <period units="days">224.7</period> </planet>
import xml.dom.minidom
impl = xml.dom.minidom.getDOMImplementation()
doc = impl.createDocument(None, 'planet', None)
root = doc.documentElement
root.setAttribute('name', 'Mars')
period = doc.createElement('period')
root.appendChild(period)
text = doc.createTextNode('686.98')
period.appendChild(text)
print doc.toxml('utf-8')
<?xml version="1.0" encoding="utf-8"?> <planet name="Mars"><period>686.98</period></planet>
xml.dom.minidom
is really just a wrapper around other platform-specific XML libraries
document
nodecreateDocument
specifies the type of the document's root nodecreateDocument
aresetAttribute(attributeName, newValue)
<experimenter/>
nodes, extract names, and print a sorted listgetElementsByTagName
method to do thisimport xml.dom.minidom
src = '''<heavenly_bodies>
<planet name="Mercury"/>
<planet name="Venus"/>
<planet name="Earth"/>
<moon name="Moon"/>
<planet name="Mars"/>
<moon name="Phobos"/>
<moon name="Deimos"/>
</heavenly_bodies>'''
doc = xml.dom.minidom.parseString(src)
for node in doc.getElementsByTagName('moon'):
print node.getAttribute('name')
Moon Phobos Deimos
nodeType
ELEMENT_NODE
, TEXT_NODE
, ATTRIBUTE_NODE
, DOCUMENT_NODE
childNodes
data
import xml.dom.minidom
src = '''<solarsystem>
<planet name="Mercury"><period units="days">87.97</period></planet>
<planet name="Venus"><period units="days">224.7</period></planet>
<planet name="Earth"><period units="days">365.26</period></planet>
</solarsystem>
'''
def walkTree(currentNode, indent=0):
spaces = ' ' * indent
if currentNode.nodeType == currentNode.TEXT_NODE:
print spaces + 'TEXT' + ' (%d)' % len(currentNode.data)
else:
print spaces + currentNode.tagName
for child in currentNode.childNodes:
walkTree(child, indent+1)
doc = xml.dom.minidom.parseString(src)
walkTree(doc.documentElement)
solarsystem TEXT (1) planet period TEXT (5) TEXT (1) planet period TEXT (5) TEXT (1) planet period TEXT (6) TEXT (1)
Figure 7: Modifying the DOM Tree
<em/>
element whose only child is a text node containing that word<em/>
getElementsByTagName
, and iterate over them
def emphasize(doc):
paragraphs = doc.getElementsByTagName('p')
for para in paragraphs:
first = para.firstChild
if first.nodeType == first.TEXT_NODE:
emphasizeText(doc, para, first)
def emphasizeText(doc, para, textNode):
# Look for optional spaces, a word, and the rest of the paragraph.
m = re.match(r'^(\s*)(\S*)\b(.*)$', str(textNode.data))
if not m:
return
leadingSpace, firstWord, restOfText = m.groups()
if not firstWord:
return
# If there's text after the first word, re-save it.
if restOfText:
restOfText = doc.createTextNode(restOfText)
para.insertBefore(restOfText, para.firstChild)
# Emphasize the first word.
emph = doc.createElement('em')
emph.appendChild(doc.createTextNode(firstWord))
para.insertBefore(emph, para.firstChild)
# If there's leading space, re-save it.
if leadingSpace:
leadingSpace = doc.createTextNode(leadingSpace)
para.insertBefore(leadingSpace, para.firstChild)
# Get rid of the original text.
para.removeChild(textNode)
if __name__ == '__main__':
src = '''<html><body>
<p>First paragraph.</p>
<p>Second paragraph contains <em>emphasis</em>.</p>
<p>Third paragraph.</p>
</body></html>'''
doc = xml.dom.minidom.parseString(src)
emphasize(doc)
print doc.toxml('utf-8')
<?xml version="1.0" encoding="utf-8"?> <html><body> <p><em>First</em> paragraph.</p> <p><em>Second</em> paragraph contains <em>emphasis</em>.</p> <p><em>Third</em> paragraph.</p> </body></html>