Академический Документы
Профессиональный Документы
Культура Документы
<div class=”you-are-classy”> is a div tag with a class attribute that has an assignment of “you-are-classy”
Xpath = ‘//table’ directs to all table elements within the entire HTML code
Xpath = ‘/html/body/div[2]//table directs you to all table elements within the second child div of the
body
Xpath = ‘//span[@class=”span-class”]’ This will collect all span elements that have a class attribute
equal to “span-class”. We could substitute span with div or whatever.
The asterisk (*) is a wildcard character. For example, xpath = '/html/body/*' will lead to all child
elements within the body, regardless of tag.
xpath = '//p[@class="class-1"]' This directs to all paragraph elements with a class attribute equal to
class-1
xpath = '//*[@id="uid"]' The wildcard marker will reduce to any element that has an id attribute
equal to uid
xpath = '//div[@id="uid"]/p[2]' A step further from above, the second paragraph of any element that
has an id attribute equal to uid
xpath = '//*[contains(@class,"class-1")]' This expression chooses all elements where the class
attribute contains a string of “class-1”. This may include “class-1”, “class-1 class-2”, or “class-12” for
example.
An html string is used with the Selector function to create a list. An example setup is as follows:
Sel = Selector(text=html)
This is where the “html” is a previously defined string. Consider “sel” as having selected the entire html
document.
Sel.xpath(“//p”) selects all paragraphs from our running example (the html text). The output will be a
selector list of two selector objects such as [<Selector xpath='//p' data='<p>Hello World!</p>'>,
<Selector xpath='//p' data='<p>Enjoy DataCamp!</p>'>]
Ps = sel.xpath(‘//p’) creates the selector list with all paragraph selector objects contained within ps.
Second_ps = ps[1] can specifically choose the second selector object from that list (ps).
second_p.extract() applies the extract function to the single selector. A selector only has one piece of
data, so the output may look like out: '<p>Enjoy DataCamp!</p>'
Xpath Chaining
Using the Selector (assuming the name is “sel” in this case), you can chain an xpath to produce the same
results. For example:
sel.xpath('/html/body/div[2]')
sel.xpath('/html').xpath('./body/div[2]')
sel.xpath('/html').xpath('./body').xpath('./div[2]')
These all produce the same results. You must make certain to “glue” them together with the period that
comes before the front slash of each subsequent chain.
We eventually need to get a webpage’s HTML code. This can be accomplished with the requests.get
method. Import the python library “requests”.
url = 'https://www.datacamp.com/courses/all'
The above puts the html contents from https://www.datacamp.com/courses/all into a string called
“html”.
The above passes the content of the html source to the selector.
CSS LOCATOR
XPath: /html/body/div
Each of the two examples above moves forward one generation on the html tree
XPath: //div/span//p
From the two examples immediately above, the double front slash in XPath is equivalent to a blank
space in CSS Locator notation. Both perform the task of looking forward to all generations.
XPath: //div/p[2]
Xpath = ‘/html/body//div/p[2]’
Usage example:
The above line first navigates to the div element whose id is uid and then further to the paragraph
element whose class is class1
This directs to all elements in the html document whose class attribute belongs to class1.
This directs to all elements belonging to that class even if they belong to other classes. For example: <p
class=”class-1”> … </p> and <div class=”class-1 class-2”> … </p>
This is different from xpath = '//*[@class="class1"]' which forces an exact match. Also different from
using contains xpath = '//*[contains(@class,"class1")]' which snatches a string.
Must remember to add the star to follow through to those child elements.
XPath: <xpath-to-element>/@attr-name
xpath = '//div[@id="uid"]/a/@href'
CSS: <css-to-element>::attr(attr-name)
The double colon selects the desired attribute (that which is in between the quotes); a web address
in this case
Text Extraction
In some instances, you pay want to extract text from an element. For example:
<p id="p-example">
Hello world!
</p>
We may just want to extract the text. To do this, we need to navigate to the paragraph id that is equal to
“p-example”
sel.xpath('//p[@id="p-example"]/text()').extract()
The above line takes a scrappy selector with an xpath that navigates to what we need. It extracts the
following: ['\n Hello world!\n Try ', ' today!\n']
sel.xpath('//p[@id="p-example"]//text()').extract()
The above results: ['\n Hello world!\n Try ', 'DataCamp', ' today!\n']