Computer Science, Option C-Web Science

C.
1 Creating the web

C.1.1 Distinguish between the internet and World Wide Web (web).
C.1.2 Describe how the web is constantly evolving
C.1.3 Identify the characteristics of the following:
C.1.5 Describe the purpose of a URL.
C.1.6 Describe how a domain name server functions.
C.1.7 Identify the characteristics of:
C.1.8 Outline the different components of a web page.
C.1.9 Explain the importance of protocols and standards on the web.
C.1.10 Describe the different types of web page.
C.1.11 Explain the differences between a static web page and a dynamic web page.
C.1.12 Explain the functions of a browser.
C.1.13 Evaluate the use of client-side scripting and server-side scripting in web pages.
C.1.14 Describe how web pages can be connected to underlying data sources.
C.1.15 Describe the function of the common gateway interface (CGI).
C.1.16 Evaluate the structure of different types of web pages.
C.2 Searching The Web
C.2.1 Define the term search engine
C.2.2 Distinguish between the surface web and the deep web.
C.2.3 Outline the principles of searching algorithms used by search engines.
C.2.4 Describe how a web crawler functions.
C.2.5 Discuss the relationship between data in a meta-tag and how it is accessed by a
web crawler.
C.2.6 Discuss the use of parallel web crawling
C.2.7 Outline the purpose of web-indexing crawling in search engines.
C.2.8 Suggest how web developers can create pages that appear more prominently in
search engine results.
C.2.9 Describe the different metrics used by search engines.
C.2.10 Explain why the effectiveness of a search engine is determined by the
assumptions made when developing it.
C.2.11 Discuss the use of white hat and black hat search engine optimization.
C.2.12 Outline future challenges to search engines as the web continues to grow.
C.3 Distributed Approaches To The Web
C.3.1 Define the terms: mobile computing, ubiquitous computing, peer-2-peer network,
grid computing.
C.3.2 Compare the major features of:
C.3.3 Distinguish between interoperability and open standards.
C.3.4 Describe the range of hardware used by distributed networks.
C.3.5 Explain why distributed systems may act as a catalyst to a greater decentralization
of the web.
C.3.6 Distinguish between lossless and lossy compression.
C.3.7 Evaluate the use of decompression software in the transfer of information.
C.4 (has been planted The Evolving Web
C.4.1 Discuss how the web has supported new methods of online interaction such as
social networking. (Karan)
C.4.2 Describe how cloud computing is different from a client-server architecture. (Mitra)
C.4.3 Discuss the effects of the use of cloud computing for specified organizations.
(Stefan)
C.4.4 Discuss the management of issues such as copyright and intellectual property on
the web. (Thomas)
C.4.5 Describe the interrelationship between privacy, identification and authentication.
(Tian)
C.4.6 Describe the role of network architecture, protocols and standards in the future
development of the web.
C.4.7 Explain why the web may be creating unregulated monopolies.
C.4.8 Discuss the effects of a decentralized and democratic web.
C.5 Analysing the Web
C.5.1 Describe how the web can be represented as a directed graph.
C.5.2 Outline the difference between the web graph and sub-graphs.
C.5.3 Describe the main features of the web graph such as bowtie structure, strongly
connected core (SCC), diameter.
C.5.4 Explain the role of graph theory in determining the connectivity of the web.
C.5.5 Explain that search engines and web crawling use the web graph to access
information.
C.5.6 Discuss whether power laws are appropriate to predict the development of the
web.
C.6 The Intelligent Web
C.6.1 Define the term semantic web.
C.6.2 Distinguish between the text-web and the multimedia-web.
C.6.3 Describe the aims of the semantic web.
C.6.4 Distinguish between an ontology and folksonomy.
C.6.5 Describe how folksonomies and emergent social structures are changing the web.
C.6.6 Explain why there needs to be a balance between expressivity and usability on the
semantic web.
C.6.7 Evaluate methods of searching for information on the web.
C.6.8 Distinguish between ambient intelligence and collective intelligence.
C.6.9 Discuss how ambient intelligence can be used to support people.
C.6.10 Explain how collective intelligence can be applied to complex issues.
Here’s is a page with many good links for web science:

http://csopedia.wikispaces.com/C_Web_Science
C.1 Creating the web
C.1.1 Distinguish between the internet and World Wide Web (WWW).
The internet is a massive network of networks. It is a massive network infrastructure. The
internet, connects many different networks together and allows computers to communicate with
one another if they are connected to a network. Protocols are used to ensure information travels
throughout the internet.
The World Wide Web is a way of accessing information through the internet. Its an information
sharing module that is built on top of the internet. It uses HTTP (Hypertext Transfer Protocol ) to
transmit data. HTTP allows various applications like web browsers to communicate amongst
one another.
How the internet works
C.1.2 Describe how the web is constantly evolving
Languages are updated, new specifications and extensions, browsers evolve, increased
bandwidth allows for more, more dynamic, more functions, new programs.
At first web 1.0 just gave information. However, with social media websites users are the ones
that post information. Blogging, tagging, social media has developed the web. Web 3.0 is a
continuation of 2.0 where recommender system like in amazon recommend users items that
other people with the same interest bought.
Describe and compare web 1.0, 2.0 3.0
Web 1.0 is static web pages.
Web 2.0 is dynamic web pages that are driven by user created content, such as Facebook and
Youtube.
Web 3.0 is not clearly defined, but is the idea that the web will become more omnipresent
(internet enabled phones, fridges, cars, etc..) and intelligent (your email will alert you to a
conflict in your calendar with an event that you describe in your email and suggest alternative
dates). There are 3 ways this might happen:
1) Expanded application programming interfaces (APIs) from websites such as
Facebook, that will allow this kind of functionality.
2) Mashups - combining separate systems to provide more intelligent help. (Google

maps suggests restaurants based upon yours and others recommendations elsewhere
and suggests a good day to visit based upon your calendar).
3) The semantic web - web pages are encoded with data (invisible to the user) to add
this functionality. This can be done with a (formal) ontology system or an informal
folksonomy (tagging by users). Currently available ontology systems are ‘rdfa’ and
‘microformat’.
Hypertext transfer protocol (HTTP)

HTTP is the set of rules for transferring files on the world wide web. HTTP is the underlying
protocol used by the world wide web. Defines how messages are formatted/transmitted and the
actions web servers, and browsers should take in response to commands.
Hypertext transfer protocol secure (HTTPS)

is the secure version of HTTP. Uses SSL or TLS under regular HTTP layering. Encrypts and
decrypts user page requests and pages sent from the browser.
Hypertext markup language (HTML)

HTML is the standard markup language used to create web pages. HTML describes the
structure of the website semantically. HTML defines the structure of a website using tags and
attributes.
=
Uniform resource locator (URL)
Is the global address of documents and pages/resources on the world wide web. The first part
of a protocol is called a protocol identifier and indicates to the browser what protocol to use. The
second part is called resource name and it specifies the ip address/domain name of where the
resource is located on the world wide web. http:// <- protocol identifier www.google.com = the
location is the resource name
Extensible markup language (XML)

XML is a text based format that allows for the structuring of electronic documents and is not
limited to a set of labels. XML is used to describe data. It allows for designers to create their
own customized tags among other things.
Extensible stylesheet language transformations (XSLT)

XSLT is a standard way to describe how to change the structure of an XML document into a
XML document with a different structure. It is usually used to convert XML into HTML However,
it can also convert it into another type of document that is recognized by the browser.
You can add/remove elements and attributes to the output file. You can also rearrange and sort
elements, perform tests, and decide which elements to hide and display.
JavaScript
Javascript is a scripting language that enables users to make interactive sites. It shares many
features with Java, it has some difference however. Using javascript you can interact with the
HTML source code which can help make dynamic contents.
Javascript is an object based language. Its an interpreting language and not a compiler.
Javascript gives more control to the user over the browser. It can detect the user’s browser and
OS. It performs most of its computations client side rather than server side. Using javascript you
can also validate data, and use it to generate HTML.
Cascading style sheet (CSS).

Gives Web site developers and users control over how pages are displayed. With CSS,
designers and users can create style sheets that define how different elements, such as
headers and links, appear. These style sheets can then be applied to any Web page.
Youtube: Intro to XML
https://www.youtube.com/watch?v=Q0k5ySZGPBc
XML is a meta language, that gives meaning to data that other applications can use. Information
about other information in short. Helps information systems share structured data. A meta
language that gives meaning to data that other applications use, application and platform
independent, allows various types of data, extensible to accommodate new tags and processing
methods, allows users to define their own tags.
Simpler version of standardized general markup languages, easy to read and understand,
supported by a large number of platforms, used across open standards.
Youtube: XSLT Example
● Uniform resource identifier (URI)
● (URL)
URI is the term used for all the types of names/addresses that refer to objects on the world wide
web. A URL is one kind of URI. The URI identifies the object, but does not give a protocol to find
it or a location. For example, a URI could be an ISBN number for a book. It identifies a ‘thing’,
but does not give any information about where to find it.
URL Is the global address of documents and pages/resources on the world wide web. The first
part of a protocol is called a protocol identifier and indicates to the browser what protocol to use.
The second part is called resource name and it specifies the ip address/domain name of where
the resource is located on the world wide web.
This is a challenging topic. This article goes into more depth.
C.1.5 Describe the purpose of a URL.
Is the global address of documents and pages/resources on the world wide web. The first part
of a protocol is called a protocol identifier and indicates to the browser what protocol to use. The
second part is called resource name and it specifies the ip address/domain name of where the
resource is located on the world wide web. http:// <- protocol identifier www.google.com = the
location is the resource name
The purpose of URL is to tell the server which webpage to display or to search for. The URL
contains the name protocol to be used to access a file resource. Example of URL:
http://www.google.com
A URL is a formatted text string used by Web browsers, email clients and other software to
identify a network resource on the Internet. Network resources are files that can be plain Web
pages, other text documents, graphics, or programs.
C.1.6 Describe how a domain name server functions.
DNS is a standard technology for managing web sites. DNS allows you to type names into your
Web browser like google.com and turns it into an Internet Protocol (IP) address like
70.42.251.42 that computers use to identify each other on the network.
Understanding DNS:
C.1.7 Identify the characteristics of:
● Internet protocol (IP)
● Transmission control protocol (TCP)
● File transfer protocol (FTP)
Internet Protocol (IP) Is responsible for letting your machine, routers, switches and etcetera,
know where a specific packet is going. IP just mails whatever it gets and can be unreliable as it
does no error checking.
Transmission control protocol (TCP) breaks down data into packets known as TCP segments.
TCP makes data transmission very reliable. TCP recovers from data that is damaged, lost,
duplicated, or delivered out of order by the Network Layer. It achieves multitasking by using port
numbers.
Assigns sequence numbers to to each byte that is transmitted and rearranges TCP segments
into the correct order as they arrive. “Full duplex: TCP provides for concurrent data streams in
both directions”
File Transfer Protocol (FTP) FTP Clients transfer initiates a connection with a remote computer
running FTP server software. AFter that the client can chose to either send and or receive files.
Clients identify FTP servers by its IP address or by its hostname. IT supports two types of data
transfer such as plain text (ASCII) and binary.
C.1.8 Outline the different components of a web page.
Metadata: tags on a page to suggest keywords for search engines.
Title: In the title bar.
Other: Footers, headers, content, navigation bar.
C.1.9 Explain the importance of protocols and standards on the web.
Protocols are a set of rules and procedures that the sender and receiver must adhere to to
ensure coherent data transfer. Examples are TCP, IP, HTTP and FTP.
A standard is an anything that has been agreed upon, which could be a protocol or could be
something like how a network cable is constructed. Most protocols, including the ones listed
above, are also standards, or they would be useless.
Without protocols that are standards, communication networks would not be possible.
C.1.10 Describe the different types of web page.
Personal Pages: A personal webpage is created by an individual for his/her own
personal need.
Blogs: Blogs are basically community forums where you can leave input and is a
discussion or informational site on the world wide web
Search Engine Pages: A search engine results page (SERP) is the listing of results
returned by a search engine in response to a keyword query. The results normally
include a list of items with titles, a reference to the full version, and a short description
showing where the keywords have matched content within the page.
Forums: a place, meeting, or medium where ideas and views on a particular issue can
be exchanged.
C.1.11 Explain the differences between a static web page and a dynamic
web page.
Static web pages are the same every time you visit them.
Dynamic web pages use a database to change depending upon when you visit them (i.e.
Facebook) or what you enter into them. (also Facebook).
C.1.12 Explain the functions of a browser.
The primary functions of web browsers is to provide the resources or information to the user
when asked by them. It processes the user inputs in the form of a URL. it allows the user to
interact with the web pages and dynamic content like surveys and forms. It also provide security
to the data and resources that are available on the web. It also changes HTML into a
displayable format for the user.
Web browser functions are to provide the resources or information the user required and to
display web documents, render HTML and other web formats. It processes the user inputs in
the form of URL like http://www.google.com in the browser and allows the access to that page.
URL is used to identify the resources and fetch them from the server and displays it to the client.
It allows the user to interact with the web pages and dynamic content like surveys, forms, etc. It
also allows the user to navigate through the complete web page and see its source code in the
HTML format. It provides security to the data and the resources by using the secure methods.
C.1.13 Evaluate the use of client-side scripting and server-side scripting in
web pages.
Client side scripts run in the browser and do not require a server to run them. They are
appropriate for processing that does not require any remote information. For example, a good
use would be using javascript to validate a form before sending the information to a server.
Server side scripting are scripts that run on a server. The user never sees the scripts. Think
PHP. Server side scripts are appropriate for processing that is best done on the server, for
example processing a SQL request for a database that is on the server.
C.1.14 Describe how web pages can be connected to underlying data
sources.
Use server side scripts in languages such as PHP to connect connect to and query databases
on a server. The resulting data can then be sent to the web browser.
C.1.15 Describe the function of the common gateway interface (CGI).
The common gateway interface (CGI) is a standard way for a web server to pass a web user's
request to an application program and to receive data back to forward to the user. When the
user requests a web page, the server sends back the requested page.This method or
convention for passing data back and forth between the server and the application is called the
common gateway interface (CGI). It is part of the Web's Hypertext Transfer Protocol.
Websites using CGI usually have a CGI folder with small scripts (typically PERL) that are run on
request. These scripts usually access a database, such as we have been doing with PHP, and
then return the information to the browser that requested it.
C.1.16 Evaluate the structure of different types of web pages.
C.2 Searching The Web
C.2.1 Define the term search engine
Search engines are programs that search documents for specified keywords and returns a list of
the documents where the keywords were found.
C.2.2 Distinguish between the surface web and the deep web.
https://www.youtube.com/watch?v=_UOK7aRmUtw
The deep web is any website that cannot be indexed by a search engine. Only 10% of the web
is indexed by search engines. The deep web is basically anything you have to log into. It is the
part of the web where access is restricted unless a member of that website for example the
deep web can consist of your bank account portal where the general public doesn’t have access
to that information. The surface web is what we see when we use a search engine. Basically
search engines send out bots regarding your search topic and then displays them. Those links
generally appear on the surface or can easily be found.
C.2.3 Outline the principles of searching algorithms used by search engines.
How google search works:
Description of Pagerank, from 2014 Paper 2 IB Markscheme:

● Search engines use algorithms such as the Google PageRank or HITS to
determine the ranking of any web page;
● The Google PageRank calculates the rank as follows:
● Rank is determined by number of votes for it. This is based on the number of “in”
links and importance of pages voting for it;
● Page rank uses a recursive algorithm;
● However, some web masters use link farms to “artificially” raise the rank of the
web page, some algorithms remove this information before calculating the rank;
Description of HITS algorithm from 2014 Paper 2 Markscheme:

HITS (hyperlink-induced topic search) is based on the following principles:
Websites may be hubs (point to lots of authorities) or authorities (are pointed to by a
number of hubs);
The HITS algorithm calculates the rank as follows:

Determines a base set of web sites (a closed network)
From this set a number of pages are located by a search engine to form a root
Add to S all pages pointed to by any page in R
Add to S all pages that point to any page in R
Maintain for each page p in S:
○ Authority score: ap (vector a)
○ Hub score: hp (vector h)
Calculate the authority weighting for each web page
Calculate the hub weighting for each web page
Normalizes the values;
Webpage about google search: http://computer.howstuffworks.com/internet/basics/google1.htm
A good comparison of HITS and pagerank, just watch it up until 2:13 ---
C.2.4 Describe how a web crawler functions.
https://www.youtube.com/watch?v=CDXOcvUNBaA A web crawler is an internet bot that
systematically browses the the world wide web, typically for the purpose of web indexing. Web
indexing refers to various methods for indexing the contents of a website or of the internet as a
whole. As web crawlers visits URLs, it identifies all the hyperlinks in the page and adds them to
the list of URLs to visit in a list and then displays the results.
C.2.5 Discuss the relationship between data in a meta-tag and how it is
accessed by a web crawler.
The meta tag provides metadata about the HTML document. Metadata is data that describes
and gives information about other data. Metadata will not be displayed on the page, but will be
machine parsable. Web crawlers when sent out to find a specific webpage look at the meta data
and determine whether the description of the web page is sufficient enough to be shown as a
search result. If not, the crawler doesn’t compile that specific link in the list of results pertaining
to the search.
IB Spec
#2c The organizers of the theatre want to ensure that their web pages appear higher up
the ranking of search engines. Suggest whether the use of meta-tags can help achieve
this aim. (4)
C.2.6 Discuss the use of parallel web crawling
A parallel crawler is a crawler that runs multiple processes in parallel. The goal is to maximize
the download rate of information. To avoid downloading the same page more than once, the
crawling system requires a policy which states that if the same link is found by two separate
crawlers, the second one should disregard it.
C.2.7 Outline the purpose of web-indexing crawling in search engines.
A search engine would likely store more than just the keywords and url that the user search for;
they would also store the number of times the words appear on that page.Web-indexing is when
the crawlers called spiders, take the keywords in the search box input there by the users and
search the web for the pages that contains the keywords, synonyms or links that will link to what
users are looking for.
Youtube video:
Link: https://www.google.com/insidesearch/howsearchworks/crawling-indexing.html
C.2.8 Suggest how web developers can create pages that appear more
prominently in search engine results.
The ideal optimized web page has a clear editorial content focus, with the keywords or phrases
present in these elements of the page, in order of importance:
● Page titles: titles are the most important element in page topic relevance; they also have
another important role in the web interface—the page title becomes the text of
bookmarks users make for your page
● Major headings at the top of the page (<h1>, <h2>)
● The text of links to other pages: link text tells a search engine that the linked words are
important; “Click here” gives no information about the destination content, is a poor
practice for markup and universal usability, and contributes nothing to the topical
information of the page
● The alternate text for any relevant images on the page: accurate, carefully crafted
alternate text for images is essential to universal usability and is also important if you
want your images to appear in image search results such as Google Images and Yahoo!
Images; the alternate text of the image link is the primary way search engines judge the
topical relevance of an image
● The HTML file name and the names of directories in your site: for example, the ideal
name for a page on Bengal tigers is “bengal-tigers.html”
C.2.9 Describe the different metrics used by search engines.
Pagerank score
PageRank works by counting the number and quality of links to a page to determine a rough
estimate of how important the website is. More important websites are likely to receive more
links from other websites
The quality of pages that link to the page (in bound)

The number of backlinks is one indication of the popularity or importance of that website or
page.
Hub score
A page is given a high hub score by linking to nodes that are considered authorities on the
subject
● The quality of pages that are linked to by the hub (outbound)

○ External links are publicly visible and easily stored. For this reason and others,
external links are a great metric for determining the popularity of a given web
page. This metric (which is roughly similar to toolbar PageRank) is combined with
relevancy metrics to determine the best results for a given search query.
● The Domain
○ Domains such as .edu, .gov are usually more reliable than domains such as .com
or .net
Keyword is in the title.
How many times the keyword appears.
Keyword is in the url.
Synonyms to the keyword appear on the page.
Keywords appear directly adjacent to each other.
C.2.10 Explain why the effectiveness of a search engine is determined by
the assumptions made when developing it.
Search engines are programmed to search based on a fixed algorithm and ranking order
making general assumptions about what the user is looking for, regardless of the user’s intent or
circumstances
For example Google’s pagerank ranking order makes the assumption that if more sites link to
you, you are a more valuable site. HITS ranking order assumes that if you are linking to other
sites you are a valuable site yourself, regardless of content.
The same search may be valuable to one user while useless to the next
To improve this quality of searches some superficial features have been added, such as custom
google searches, or user preferences in settings. However these do not solve the problem of
the search engine lacking insight into the semantics of what the user wants
A way the search engine could perform a better search is with follow up questions, such as if
you search for a camera, it asks whether you are interested in comparing models and prices,
interested in purchasing one, or interested about the general art of photography
This information could be used later to allow the engine to make better assumptions about the
user and provide a better search
SOURCE
C.2.11 Discuss the use of white hat and black hat search engine
optimization.
Black Hat SEO White Hat SEO
Also known as unethical SEO Also known as ethical SEO
Manipulates the search system in order to Caters to human search techniques to make
make a page show up more often in searches it easy to find the page when looked for
Sites are often filled with links and ads Sites may feature relevant multimedia
(videos, etc)
Sites generally don’t look very appealing (the Sites are often very visually pleasing (the
purpose is to get users to just see the purpose is to attract users who are genuinely
content) interested in the page content)
Some characteristics of Black Hat SEO:

● Keyword Stuffing--associating lots of keywords with the page even if they are unrelated
so that search engines will find the page regardless of whether the content is what the
user was looking for.
● Doorway Pages--clicking on the link to a website that looks like what you’re searching
for, but which only takes you to a completely different page.
● Page Swapping--building up a respectable reputation in Pagerank before completely
switching all the material on the page.
● Paid Links--people pay pages with a high Pagerank to link to their page in order to boost
their own Pagerank.
Some characteristics of White Hat SEO:

● Keyword Analysis--including only relevant and related keywords based on synonyms
and related phrases
● Backlinking--when a page links to a second page, the second page will have a backlink,
or a link back to the original page.
● Link Building--linking and backlinking to other relevant websites in order to boost
Pagerank and improve the user experience and ease-of-access.
Image:
http://www.prolificwebsolutions.com/wp-content/uploads/black-hat-white-hat-seo-comparison.jpg
C.2.12 Outline future challenges to search engines as the web continues to
grow.
Challenges to future search engines:
● Error management
● Lack of quality assurance of info uploaded

C.3 Distributed Approaches To The Web
C.3.1 Define the terms: mobile computing, ubiquitous computing,
peer-2-peer network, grid computing.
Mobile Computing - Mobile computing is human–computer interaction by which a computer is
expected to be transported during normal usage. Mobile computing involves mobile
communication, mobile hardware, and mobile software. (i.e. laptops and smartphones, smart
cards (normally for payment or transport) and wearable computers like smart watches)
Ubiquitous Computer - Ubiquitous computing is a concept in software engineering and

computer science where computing is made to appear everywhere and anywhere. In contrast to
desktop computing, ubiquitous computing can occur using any device, in any location, and in
any format. The opposite of virtual reality - computers are everywhere but not obviously visible.
Peer-2-Peer Network - A network of computers configured to allow certain files and folders to
be shared with everyone or with selected users. Peer-to-peer networks are quite common in
small offices that do not use a dedicated file server. Equal clients without a server.
Grid Computing - Grid computing is the collection of computer resources from multiple
locations to reach a common goal. The grid can be thought of as a distributed system with
non-interactive workloads that involve a large number of files. Has a server that is coordinating.
Usually used to solve technical problems, such as SETI.
C.3.2 Compare the major features of:
● Mobile Computing - Mobile computing is human–computer interaction by which a
computer is expected to be transported during normal usage. Mobile computing involves
mobile communication, mobile hardware, and mobile software.
● Ubiquitous Computing- Ubiquitous computing is a concept in software engineering and
computer science where computing is made to appear everywhere and anywhere. In
contrast to desktop computing, ubiquitous computing can occur using any device, in any
location, and in any format.
● peer-2-peer network- P2P is a category of distributed system which optimizes the
benefits of resources such as storage or processing time that are available on the
internet. Implementing P2P requires the creation of overlay networks
● Grid Computing is a type of distributed computing which permits and ensures the
sharing of aggregated resources across dispersed locations. Resources are connected
through a required middleware such as Globus, Legion, or gLite with the internet to
provide various services for management of resources and security issue. Grid
computing is limited to tasks that have no time dependency, since all nodes are working
independently and many are likely to connect and disconnect from the network in any
given time period. For tasks that are highly time independent - since failure on one node
cannot effect another node.
The point of peer to peer networks is to share resources (i.e. bittorrent). The point of grid
computing is to solve problems (i.e. SETI).
C.3.3 Distinguish between interoperability and open standards.
Interoperability - The ability of software and hardware on different machines from different
people or companies to share data.
Open Standards facilitate interoperability, but are not strictly necessary.
C.3.4 Describe the range of hardware used by distributed networks.
Grid Computing: Desktop computers running middleware to coordinate them. A network
connection such as the internet.
Ubiquitous computing: Small processors and input/output devices, wireless data connections
such as wifi or bluetooth. All kinds of sensors.
Parallel computing: Fast and small networks such as fiber optics and powerful computers with
specialized operating systems to coordinate highly time dependent tasks.
Peer to peer networks: Personal computers running sharing software.
*********************************************
C.3.5 Explain why distributed systems may act as a catalyst to a greater
decentralization of the web.
Distributed system is an application that executes a collection of protocols to coordinate the
actions of multiple processes on a network. In other words, it is to perform a single or small set
of related tasks. (Google Code)
Decentralization of the web allows for remote users to connect to the remote resources in an
open and scalable way. Distributed system can be a larger and more powerful given the
combined capabilities of the distributed system. This also provides easier access for multiple
people at once instead of only working on the centralized one.
Link for Distributed System
C.3.6 Distinguish between lossless and lossy compression.
In lossless compression, every single bit of data that was originally in the file is kept after the file
is uncompressed. Essentially, all the information is restored.
Lossy compression is when a file is permanently reduced (eliminate some data). It usually
eliminated redundant information. When it is uncompressed only part of the original file will be
there. Commonly used in pictures, sound, video where some loss in quality would be hard to
identify.
For example, a picture stored in Raw format records the color for every pixel in an image. This
is (mostly) lossless. The files are very large. A jpg compresses this information and makes the
image much smaller, but loses some quality.
IB Question C3 Specimen paper:
Answer:
C.3.7 Evaluate the use of decompression software in the transfer of
information.
Compression software is essential to lowering file size. If file size is smaller, it gives users that
attempt to open website faster loading times. Which essentially makes the website more
responsive. Mobile users that open files on their phones will experience lower costs when files
are compressed.

C.4 The Evolving Web
C.4.1 Discuss how the web has supported new methods of online interaction

such as social networking.
To answer this, we must first look at the stakeholders of the Web:
● S/holder of the web: Developers, Administrators, Governments
● S/holder of companies: Customers,Vendors, Employees
Online Interaction has been developed through the different iterations of the Web :
● Web 1.0
● Web 2.0
● Web 3.0 ( If it exists )
Web 1.0 - Static websites - Close to NO interaction, as users do not have an input
Web 2.0 - User to website model - users have the ability to submit information to a website, so
the level of interaction is greater - however, there is not a full integration of user input onto the
web, as the interaction is often done through a website. Examples include Facebook and
Instagram, which allows users to communicate and share information through these websites.
Web 3.0 - Although the existence of Web 3.0 is questioned, the idea is that interaction is user
to user, with no interference from websites. An example would be if there are many low ratings
for a restaurant on a review site such as hellofood.com, the restaurant/website may delete the
reviews, whereas in web 3.0, users would be able to input “real” ratings without interference.
TO Summarize:
– Web 1.0: No interaction between computer and users.
– Web 2.0: Interaction between both computers and users.
– Web 3.0: Real ratings, true interactions between users without websites interfering.
C.4.2 Describe how cloud computing is different from a client-server

architecture.
Client Server:
Client server is a method where information processing is split between a client and a
server. In the old days, we had time share computers (mini's, mainframes, etc) that were
accessed by terminals that only manipulated the display of information, but didn't do any
processing. Much of what we do with web apps today are not really client-server for the same
reason.
Client server describes how applications are modelled
Cloud Computing:
Cloud computing embodies the ideas that you can abstract the software from hardware,
have applications that can scale up and down based on reasons such as demand, time, etc.
The act of provisioning services in the cloud is automated and requires no user intervention.
Clouds are also on-demand and can be metered meaning that you are only charged for the
resources that you use. It's a consumption model.
Cloud computing describes the environment that applications reside in.
C.4.3 Discuss the effects of the use of cloud computing for specified
organizations.
Cloud computing is appealing to organizations due to its lower costs, less need for space, and
natural destruction recovery measures. Cloud computing allows users in the organization to
connect and share files and data with one another.
Public:
Public cloud computing is low cost and is based upon infrastructure and servers located
externally that are already put in place. This makes it easy to transition to public cloud
computing and saves time and money. However due to the fact that the servers and information
are being stored externally, and other users may be using the same cloud computing services,
security is an issue as there is no way to secure the information other than the protocols already
put in place by the cloud provider.
Private:
Private cloud computing has been gaining traction amongst organizations. Private cloud
computing is built on a locally hosted server and computer network, so it requires more cost to
set up. However these organizations will likely already have a server and local network setup so
a private cloud computing network would be a return on the company's investment. The storage
and effectiveness of the computing system is reliant on the hardware that the company owns,
yet a private cloud system is much more secure as the information and data is stored locally.
The company still maintains the advantages of cloud computing, such as the destruction
recovery.
IB Questions:
Specimen Paper C1:
May 2015 11D
C.4.4 Discuss the management of issues such as copyright and intellectual
property on the web.
Intellectual property rights include:

● patents - right granted by gov’t to exclude others from reproducing work
● copyright - right of the creator to all work produced
● trademark - recognizable sign/design/expression that distinguishes products/services
● design rights
Issue concerning intellectual property on the web occur when the owners rights collide with
users rights and the public need to access and use resources on the web.
Copyright is a type of intellectual property, giving your rights to copy and distribute the relevant
intellectual property. However, you might have intellectual property that you do not have
copyright (legal rights) to. i.e. an idea for an invention that you have not yet copyrighted.
C.4.5 Describe the interrelationship between privacy, identification and

authentication.
Identification is when a user makes themselves known to a network or database. Some

examples of identification include creating an account on a site such as Google, or the IP
address of your computer on the internet network.
Authentication is the verification of the claimed identity. An example would be checking the
database to find a match between the username and password entered by the user, and if a
match is found they are granted entry to the site.
Privacy is the safekeeping of identification. For example, most websites that require a login are
private, because your password and/or username are not displayed for others to see. However,
some wifi networks are not private, and other computers on the network may be able to see
your IP address.
IB Question: Specimen C1 (d)

A publishing company, ABC Publications, based in London has a large IT department. This
department is responsible for:
● providing IT services to the company
● maintaining the company’s website
● creating and maintaining web based learning resources that are sold to schools and
colleges.
The company is finding it difficult to recruit and retain sufficient high quality IT staff to keep these
functions operating at an optimal level. It is investigating transferring at least part of its IT
operations to a cloud computing solution. At the moment it has not been decided how much of
this should be implemented by a private cloud and how much by a public cloud.
(d) Comment on the privacy and security issues relating to ABC’s use of cloud computing.
Markscheme:
Award marks as follows up to [4 marks max].
Award [2 marks] for a privacy issue identified and elaborated;
Award [2 marks] for a security issue identified and elaborated;
Privacy
Sensitive data is accessible to a third party;
If outsourcing occurs, potential exposure of data is increased;
Security
How secure is the data?;
Can it be guaranteed that this data will not be inadvertently passed to another company?;
Short Youtube video (< 4mins) detailing relationship between identification and authentication:
https://www.youtube.com/watch?v=bv6QL0H8kAY
C.4.6 Describe the role of network architecture, protocols and standards in
the future development of the web.
All 3 must develop to support increasing traffic and new developments such as ubiquitous
computing, cloud services and other new developments that we cannot anticipate now. For
example,a new IP system is needed to support more IP addresses (for example, IP 6). New
network architecture will be needed to support ever more devices and speeds. This new
architecture will require new protocols and standards.
C.4.7 Explain why the web may be creating unregulated monopolies.
Often, as a website grows larger, it becomes more dominant and more difficult for rivals to
challenge. For example, everyone joins Facebook because that’s what most people use. It’s
hard to start a rival because networks with less users are less useful. It’s hard to to challenge
Amazon’s dominance in e-commerce because sellers use it because it has the most customers
and customers use it because it has the most sellers. It’s scale allows it to negotiate very
favorable terms with shipping companies and suppliers.
C.4.8 Discuss the effects of a decentralized and democratic web.
Decentralized refers to the idea that content creation can come from anywhere, which promotes
participation (democracy). Note that in this context, democratic has nothing to do with politics, it
just means everyone can participate.
C.5 Analysing the Web
The graphic below is linked to a longish but good video about all the topics in this section.
C.5.1 Describe how the web can be represented as a

directed graph.
The web can be represented as a directed graph. The web-graph is the
theoretical graph of the entire web. However, it is impossible to do so
because there are so many connections and new connections being made
everyday, that it would be impossible to model. The nodes (circles) are
websites and the arrows are links.
C.5.2 Outline the difference between the web graph and sub-graphs.
A subgraph is a portion of a web graph that only shows part of the web. Whilst, a web graph is
the connection of all the pages on the internet a sub graph only shows a portion of the
connections. For example, the connections at ISK.
C.5.3 Describe the main features of
the web graph such as bowtie
structure, strongly connected core
(SCC), diameter.
A bow-tie map of the internet offers a three
dimensional model of understanding the web.
Reveals a subtler structure that may lead to
more efficient search engine crawling
techniques. The bow tie structure resembles a
bow tie. Which consists of three major regions a.
knot and two bows, and a fourth structure of the
pages that are disconnected.
The left bow consists of pages (origination) that allow users to reach the core. However, at the
same time they can’t be reached from the core. Origination pages, are pages that haven’t
attracted enough attention from the internet. Such that no pages in the core link to them. These
sites usually have links that link to each other but don’t reach the core of the internet.
The right bow consists of pages that can be accessed from the core but don’t link back to the
core. Pages in the right bow only contain information about themselves.
Disconnected pages are not part of the bow tie. These pages can connect to either the
origination/termination pages but can’t access the core directly.
SCC and diameter
C.5.4 Explain the role of graph theory in determining the connectivity of
the web.
Graph theory is essential in learning and determining how the web is connected. For example,
using bow tie theory. You can learn how web pages are interconnected and how sites are
interconnected to reach the core.
C.5.5 Explain that search engines and web crawling use the web graph to
access information.
Search engines and web crawlers use web graphs to determine the importance of information.
Currently the web only display 40-70% of all the information on the internet. As such not all of
the information is being shown. Web crawlers only identify links that have relevant data and
quality. The websites popularity is also determined depending on the amount of links and visits.
Using bow tie theory to determine how we can see how disconnected pages and origination
pages will not be accessed by web crawlers.
C.5.6 Discuss whether power laws are appropriate to predict the
development of the web.
(The rate at which the internet grows) Power laws suggest that the internet is developing
exponentially. However, it seems unlikely that exponential growth can continue indefinitely.
From 2014 Paper 2 MS:
The network diameter of the web growing no more than logarithmically with respect to
the network size. Sources suggest there were 26 million pages in 1998, one billion
pages by 2000 and 34 billion by 2011;
Therefore the diameter is not growing in a linear relationship with the number of web
pages, so the ability of the web surfer to access all sites remains possible;
10-fold increase of web pages results in only 2 more additional “clicks”;
C.6 The Intelligent Web
C.6.1 Define the term semantic web.
Semantics is related to syntax. Syntax is how you say something and semantics is the meaning
behind what you said. Although syntax can change, the meaning behind the syntax can stay the
same. For example, “I love you” and “I heart you” have different syntax (grammar), but the
same semantics (meaning).
Computers don’t understand the content of web pages, however they can understand the
syntax. Help computers understand the meaning of the website. Web 2.0 is documents, and
Semantic is things. Pointing things out to the computer and letting them know what they mean.
If a computer can understand what is going on it can help you in many ways.
The ‘semantic web’ is the idea of using emerging technologies such as ‘microformats’ to store
information about all kinds of objects on the web, and how they interconnect. So, reservations,
can be connected to calendars, which can be connected to recommendations, etc…
May 14 Paper 2 #13a
C.6.2 Distinguish between the text-web and the multimedia-web.
Text-web is a web browser that renders only the text of web pages, and ignores graphic
content. Usually, they render pages faster than graphical web browsers due to lowered
bandwidth demands. Multimedia refers to content that uses a combination of different content
forms. Unlike text-web, multimedia web pages use different forms of graphic content.
C.6.3 Describe the aims of the semantic web.
The aim of the semantic web is to evolve the current web by enabling users find, share, and
combine information more easily. What the semantic web is aiming to be is a vision of
information that can be interpreted by machines so they can perform more tedious tasks
involved with finding, sharing and combining information.
The purpose of the semantic web is to enhance the usability and usefulness of the web and its
resources by creating semantic web services such as:
- Documents marked up with semantic information with machine understandable

information about the content on the document.
- Web based services to supply information to agents ex. finding out if some sites have a
history of spamming or poor service.
https://www.youtube.com/watch?v=bEYQrmPwjPA
C.6.4 Distinguish between an ontology and folksonomy.
A folksonomy is a distributed form of tagging - users decide on tags themselves.
An ontology is a standardized form of tagging - a group or organization agrees on the tags that
are possible - this makes it easier to merge and process data, but is more difficult to achieve.
Ontology is a formal naming and definition of the types, properties, and interrelationships of the
entities that really or fundamentally exist for a particular domain of discourse. Folksonomy is a
system of classification derived from the practice and method of collaboratively creating and
translating tags to annotate and categorize content.Ontology is hard to implement on a large
scale and isn’t always web based. It is Key for the semantic web because of high expressive
power. Folksonomy is created by users as well as being quick and easy to implement. It is used
on a large scale for document collections. Most of the time its web based and important in web
2.0
http://www.cl.cam.ac.uk/~aac10/R207/ontology_vs_folksonomy.pdf
From the 2014 Paper 2 MS:
An ontology is the formal description of the concepts and relationships that exist
within a specified domain of discourse.
“A folksonomy is a type of distributed classification system. It is usually created
by a group of individuals, typically the resource users. Users add tags to online
items, such as images, videos, bookmarks and text. These tags are then shared
and sometimes refined.”
Folksonomies may be imprecise and informal, developing organically through

social networking.
C.6.5 Describe how folksonomies and emergent social structures are
changing the web.
Folksonomies and social structures are changing the web because they are a system of
classification. Folksonomy is created by users as well as being quick and easy to implement. If
we look at facebook for example, most of the content on there is created by the user. They are
able to upload images, create statuses and much more. They are changing the web because all
the content is determined by the user as opposed to the owners of the companies.
An example of folksonomies is the tag system in image sites or hashtags in social media. Users
are defining more and more tags and as the volume of users tagging increases, the accuracy of
tags increases such that the web is becoming more and more precise.
Tagging increases user participation in the web while enhancing searching and semantics of the
web.
C.6.6 Explain why there needs to be a balance between expressivity and
usability on the semantic web.
With the idea of using “ Meta-tagging” as a part of the semantic web, there comes the problem
of find a common ground between usability and expressivity - even with the idea that a
computer can understand the information that is on a page, there needs to be a balance to
achieve a level where the expressivity does not compromise the usefulness that is intrinsic to
proper syntax or language.
Without a balance, the idea of semantics becomes inefficient - if there is a lack of structure to
the way data concerning the semantics of a page is formatted, It makes it difficult for search
engines and other applications to understand and make use of this information - Hence, it is
necessary to define a language or structure by which the semantics of a webpage or application
C.6.7 Evaluate methods of searching for information on the web.
Some methods of information searching on the web are using unique and specific terms, using
filters to narrow the search, and custom searches. By using specific terms that pertain to your
search, you reduce the number of results that display. By using filters, you’re able to find your
information based on relevance, date added so on and so forth. Custom searches also limit the
number of results by using different operators.
https://www.youtube.com/watch?v=HYbRJ7vDRV8 - Advanced and Alternative Internet

Search Methods
C.6.8 Distinguish between ambient intelligence and collective intelligence.
Ambient intelligence refers to electronic environments that are responsive to the presence of
people. Ambient intelligence is aimed at supporting people in their daily lives and activities.
Technology that becomes small enough to be considered a part of the environment.
Youtube about ambient intelligence: https://www.youtube.com/watch?v=dy-QKM3i36E
Collective intelligence refers to intelligence shared by collaboration such that many people are
involved in consensus decision making. For example a google doc with many collaborators
inputting ideas, wikipedia, or using a website to help people collectively analyze and discuss
climate change.
Youtube about collective intelligence: https://www.youtube.com/watch?v=k7-CEDyoibQ
IB 2014 P2
#4f: Identify two characteristics of collective intelligence. (2 marks)
Answer:
C.6.9 Discuss how ambient intelligence can be used to support people.
Ambient Intelligence (AmI) is bringing technology into our environment and enabling for the
environment to become sensitive to us. Researchers build upon advances in sensors and
sensor networks, pervasive computing and artificial intelligence in order to make the
environment more aware of us and react accordingly if necessary.
Biometrics is an example that can be used to identify people. In India, there are many people
without identification and can’t use banking services or microcredit loans. Having everyone
tagged with biometrics will allow them to use these services. Another way ambient intelligence
can support people is if culprit x is identified by an intelligent surveillance camera, it will
immediately contact the police and inform them of his location.
Reading: How AmI can help with Health Services
Reading: Further information about Ambient Technology
Video: The Future of Ambient Life
IB Question May 2015 #14:

C.6.10 Explain how collective intelligence can be applied to complex issues.
Collective intelligence is the idea that many people coming together to make key decisions will
benefit the group because people have expertise in different areas. Combining different people's
strengths is more likely to result in a better outcome than having one person make all the
decisions relating to the group. This falls into technology as the services we use become more
and more advanced, and future interactions between humans and machines can be much more
intelligent. One example of this is rather than searching via keywords, search engines can
search for items via the semantics of the user inputted phrase.
Complex issues, such as climate change, can be addressed using methods of collective
intelligence. One example is the monitoring of temperature across the globe. Weather probes
located in airports, television stations, and more, automatically transmit data gathered about the
temperature to a central source such as the National Weather Service (NWS) in the US. The
NWS then uses this data to make predictions, graph weather patterns, and more. This is an
example of technology being used in collective intelligence to solve complex issues, which
would be made much more difficult without the benefit of technology.
Picture: http://www.henshall.com/blog/archives/images/mainresources-thumb.jpeg

Computer Science, Option C-Web Science

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Computer Science, Option C-Web Science

Загружено:

Авторское право:

Доступные форматы

C.

1 Creating the web

Here’s is a page with many good links for web science:

How the internet works

2) Mashups - combining separate systems to provide more intelligent help. (Google

Hypertext transfer protocol (HTTP)

Hypertext transfer protocol secure (HTTPS)

Hypertext markup language (HTML)

E​xtensible markup language (XML)

Extensible stylesheet language transformations (XSLT)

Cascading style sheet (CSS).

This is a challenging topic. ​This article goes into more depth.

How google search works:

Description of Pagerank, from 2014 Paper 2 IB Markscheme:

Description of HITS algorithm from 2014 Paper 2 Markscheme:

The HITS algorithm calculates the rank as follows:

Webpage about google search: ​http://computer.howstuffworks.com/internet/basics/google1.htm

The quality of pages that link to the page (in bound)

● The quality of pages that are linked to by the hub (outbound)

Black Hat SEO White Hat SEO

Also known as unethical SEO Also known as ethical SEO

Some characteristics of Black Hat SEO:

Some characteristics of White Hat SEO:

Ubiquitous Computer - Ubiquitous computing is a concept in software engineering and

Open Standards facilitate interoperability, but are not strictly necessary.

Peer to peer networks: ​Personal computers running sharing software.

Link for Distributed System

IB Question C3 Specimen paper:

C.4.1 Discuss how the web has supported new methods of online interaction

C.4.2 Describe how cloud computing is different from a client-server

Intellectual property rights include:

C.4.5 Describe the interrelationship between privacy, identification and

Identification is when a user makes themselves known to a network or database. Some

IB Question: Specimen C1 (d)

C.5.1 Describe how the web can be represented as a

From 2014 Paper 2 MS:

May 14 Paper 2 #13a

- Documents marked up with semantic information with machine understandable

From the 2014 Paper 2 MS:

Folksonomies may be imprecise and informal, developing organically through

https://www.youtube.com/watch?v=HYbRJ7vDRV8 - ​Advanced and Alternative Internet

IB Question May 2015 #14:

Вам также может понравиться

Extensible markup language (XML)

This is a challenging topic. This article goes into more depth.

Webpage about google search: http://computer.howstuffworks.com/internet/basics/google1.htm

Peer to peer networks: Personal computers running sharing software.

https://www.youtube.com/watch?v=HYbRJ7vDRV8 - Advanced and Alternative Internet