Вы находитесь на странице: 1из 148

Учебное пособие Master your Academic English

предназначено для комплексного обучения английскому


языку студентов магистратуры факультета ВМК МГУ им. М.В.
Ломоносова.
Цель данного пособия – развитие навыков чтения и
понимания, а также реферирования и аннотирования
англоязычной научной литературы, умения понимать
англоязычную речь в области профессиональных интересов.
Материалом пособия послужили оригинальные
научные и научно-популярные тексты, предложенные
кафедрами факультета ВМК. Благодаря помощи кафедр
факультета пособие охватывает широкий спектр тем,
актуальных для работы студентов магистратуры.
Пособие состоит из 10 глав и блока дополнительных
материалов. Главы пособия выстроены по принципу
усложнения изучаемого материала. Задания в пределах одной
главы разбиты на тематические блоки (например,Vocabulary
Study and Practice, Reading Comprehension). В блок
дополнительных материалов входят разработки к
аудиолекциям и документальным фильмам по научной
тематике факультета.
Аудио/видеоматериалы пособия записаны в цифровом
виде, что позволяет использовать их для дистанционного
обучения и для внеаудиторной работы.
Пособие также может быть полезно для студентов
разговорных групп старших курсов и для всех, кто хочет
совершенствовать навыки и умения владения английским
языком в сфере профессиональной деятельности.

Авторы

3
CONTENTS

UNIT 1 ...................................................................................................... 5
UNIT 2 .................................................................................................... 15
UNIT 3 .................................................................................................... 25
UNIT 4 .................................................................................................... 34
UNIT 5 .................................................................................................... 49
UNIT 6 .................................................................................................... 61
UNIT 7 .................................................................................................... 72
UNIT 8 .................................................................................................... 87
UNIT 9 .................................................................................................... 98
UNIT 10 ................................................................................................ 112
SUPPLEMENTARY MATERIALS ................................................... 132

4
UNIT 1

Pre-reading exercise.
Skim through the text and identify the main topic of the article.

PRINCIPLES OF DYNAMIC PROGRAMMING


This article concerns the use of a method known as dynamic
programming (DP) to solve large classes of optimization problems.
We will focus on discrete optimization problems for which a set or
sequence of decisions must be made to optimize (minimize or
maximize) some function of the decisions. There are numerous
methods to solve discrete optimization problems, many of which
are collectively known as mathematical programming methods.
Our objective here is not to compare these other mathematical
programming methods with dynamic programming. Each has
advantages and disadvantages. However, we will note that the
most prominent of these other methods is linear programming
(LP). As its name suggests, it has limitations associated with its
linearity assumptions whereas many problems are nonlinear.
Nevertheless, linear programming and its variants and extensions
(some that allow nonlinearities) have been used to solve many real
world problems, in part because very early in its development
software tools (based on the simplex method) were made available
to solve linear programming problems. On the other hand, no such
tools have been available for the much more general method of
dynamic programming, largely due to its absolute generality. One
of the objectives is to describe a software tool for solving dynamic
programming problems that is general, practical, and easy to use,
certainly relative to any of the other tools that have appeared from
time to time.
One reason that simplex-based tools for solving linear
programming problems have been successful is that, by the nature
of linear programming, problem specification is relatively easy. A
basic LP problem can be specified essentially as a system or matrix

5
of equations with a finite set of numerical variables as unknowns.
That is, the input to an LP software tool can be provided in a
tabular form, known as a table. This also makes it easy to
formulate LP problems as a spreadsheet. This led to spreadsheet
system providers to include in their product an LP solver, as is the
case with Excel.
A software tool for solving dynamic programming problems is
much more difficult to design, in part because the problem
specification task in itself presents difficulties. A DP problem
specification is usually in the form of a complex (nonlinear)
recursive equation, called the dynamic programming.
We discuss the basic principles underlying the use of dynamic
programming to solve discrete optimization problems. The key
task is to formulate the problem in terms of an equation, the DPFE,
such that the solution of the DPFE is the solution of the given
optimization problem. For numerous dissimilar DP problems, a
significant amount of additional effort is required to obtain their
computational solutions.
Dynamic programming is a method that in general solves
optimization problems that involve making a sequence of
decisions by determining, for each decision, subproblems that can
be solved in like fashion, such that an optimal solution of the
original problem can be found from optimal solutions of
subproblems. This method is based on Bellman‘s Principle of
Optimality, which he phrased as follows:«An optimal policy has
the property that whatever the initial state and initial decision are,
the remaining decisions must constitute an optimal policy with
regard to the state resulting from the first decision». More
succinctly, this principle asserts that ―optimal policies have
optimal subpolicies‖. The fact that the principle is valid follows
from the observation that, if a policy has a subpolicy that is not
optimal, then replacement of the subpolicy by an optimal
subpolicy would improve the original policy. The principle of
optimality is also known as the ―optimal substructure‖ property in
literature. Here are primarily concerns with the computational
solution of problems for which the principle of optimality is given.
6
For DP to be computationally efficient (especially relative to
evaluating all possible sequences of decisions), there should be
common subproblems such that subproblems of one are
subproblems of another. In this event, a solution to a subproblem
needs only be found once and reused as often as necessary;
however, we do not incorporate this requirement as part of our
definition of DP.
We will first elaborate on the nature of sequential decision
processes and on the importance of being able to separate the costs
for each of the individual decisions. This will lead to the
development of a general equation, the dynamic programming
functional equation (DPFE) that formalizes the principle of
optimality. The methodology of dynamic programming requires
deriving a special case of this general DPFE for each specific
optimization problem we wish to solve.
The solution of a DP problem generally involves more than only
computing the value of f(S) for the goal state S∗. We may also wish
to determine the initial optimal decision, the optimal second
decision that should be made in the next-state that results from the
first decision, and so forth; that is, we may wish to determine the
optimal sequence of decisions, also known as the optimal ―policy‖,
by what is known as a reconstruction process.
The Elements of Dynamic Programming
The basic form of a dynamic programming functional equation
isf(S) = OPT{R(S, d) ◦ f (T (S, d)): d∈D(S)},where S is a state in some
state space Z, d is a decision chosen from a decision space D(S),
R(S, d) is a reward function (or decision cost, denoted
C(d|S)above), T (S, d) is a next-state transformation (or transition)
function, and ◦ is a binary operator. We will restrict ourselves to
discrete DP, where the state space and decision space are both
discrete sets. (Some problems with continuous states or decisions
can be handled by discretization procedures.) The elements of a
DPFE have the following characteristics.
State. The state S, in general, incorporates information about the
sequence of decisions made so far. In some cases, the state may be
the complete sequence, but in other cases only partial information
7
is sufficient; for example, if the set of all states can be partitioned
into equivalence classes, each represented by the last decision. In
some simpler problems, the length of the sequence, also called the
stage at which the next decision is to be made, suffices. The initial
state, which reflects the situation in which no decision has yet been
made, will be called the goal state and denoted S∗.
Decision Space. The decision space D(S) is the set of possible or
―eligible‖ choices for the next decision d. It is a function of the
state S in which the decision d is to be made. Constraints on
possible next-state transformations from a state S can be imposed
by suitably restricting D(S). If D(S) = ∅, so that there are no eligible
decisions in state S, then S is a terminal state.
Objective Function. The objective function f, a function of S, is the
optimal profit or cost resulting from making a sequence of
decisions when in state S, i.e., after making the sequence of
decisions associated with S. The goal of a DP problem is to find f
(S) for the goal state S∗.
Reward Function. The reward function R, a function of S and d, is
the profit or cost that can be attributed to the next decision d made
in state S. The reward R(S, d) must be separable from the profits or
costs that are attributed to all other decisions. The value of the
objective function for the goal state, f(S∗), is the combination of the
rewards for the complete optimal sequence of decisions starting
from the goal state.
Transformation Function(s). The transformation (or transition)
function T, a function of S and d, specifies the next-state that
results from making a decision d in state S. As we shall later see,
for nonserial DP problems, there may be more than one
transformation function.
Operator. The operator is a binary operation, usually addition or
multiplication or minimization/maximization that allows us to
combine the returns of separate decisions. This operation must be
associative if the returns of decisions are to be independent of the
order in which they are made.
Base Condition. Since the DPFE is recursive, base conditions must
be specified to terminate the recursion. Thus, the DPFE applies for
8
S in a states pace S, but f(S0) = b, for S0 in a set of base-states not in
S. Base-values b are frequently zero or infinity, the latter to reflect
constraints. For some problems, setting f(S0) = ±∞ is equivalent to
imposing a constraint on decisions so as to disallow transitions to
state S0 , or to indicate that S0 does not contain in Z is a state in
which no decision is eligible.
To solve a problem using DP, we must define the foregoing
elements to reflect the nature of the problem at hand. We give
several examples below. We note first that some problems require
certain generalizations. For example, some problems require a
second-order DPFE having the form f(S) = OPT{R(S, d) ◦ f (T1 (S,
d)) ◦ f (T2 (S, d)): d∈D(S)},where T1 and T2 are both transformation
functions to account for the situation in which more than one next-
state can be entered, or f(S) = OPT{R(S, d) ◦ p1 f(T1 (S, d)) ◦ p2 f(T2
(S, d)): d∈D(S)},where T1 and T2 are both transformation functions
and p1 and p2 are multiplicative weights. In probabilistic DP
problems, these weights are probabilities that reflect the
probabilities associated with their respective state-transitions, only
one of which can actually occur. In deterministic DP problems,
these weights can serve other purposes, such as ―discount factors‖
to reflect the time value of money.

VOCABULARY STUDY AND PRACTICE

Glossary
 discrete optimization - дискретная оптимизация
 sequence of decisions - последовательность решения
 prominent - выдающийся
 can be attributed to - можно отнести к
 linearity assumptions - линейность предположений
 simplex-based tools - симплекс-инструменты
 more succinctly - более кратко
 valid principle - действующий принцип

9
1. Translate the following sentences into Russian:
1. Linear programming and its variants and extensions (some
that allow nonlinearities) have been used to solve many
real world problems.
2. The fact that the principle is valid follows from the
observation that, if a policy has a subpolicy that is not
optimal, then replacement of the subpolicy by an optimal
subpolicy would improve the original policy.
3. The methodology of dynamic programming requires
deriving a special case of this general DPFE for each
specific optimization problem we wish to solve.

2. Complete the second sentence so that it has a similar meaning


to the first sentence, using the word given. Do not change the
word given.
1. The IT company CEO wanted nobody to know about his new
idea.
KEEP
The IT company CEO wanted ______________________ a secret.
2. This online-programme is doing translation of the text in
various languages.
VARIETY
The text is ____________________________ of languages.
3. There needs to be stricter control over what happens in reality
online shows.
STRICTLY
Reality online shows should be ______________ than they are now.
4. Had he spent more time at the University classes, he wouldn‘t
have failed his exams.
GONE
If he __________________ often, he wouldn‘t have failed his exams.
5. The laptop was redesigned and, as a result, sales rose fast.
RESULTED
The successful redesigning of the laptop ____________ in sales rise.

10
6. Peter realized what he had forgotten to do as soon as he
arrived.
SOONER
No ________________ Peter realized what he had forgotten to do.
7. Downloading pirate audio and video is illegal in most
European countries.
AGAINST
It is _______________________ pirate audio and video in most
European countries.
8. Could I borrow your DVD-drive this evening, please?
LEND
Would _________________ this evening, please?
9. Andrew regretted not visiting the presentation.
WISHED
Andrew _________________ the presentation.
10. Many people think that the destruction of the ozone layer has
been exaggerated by the media.
WIDELY
The destruction of the ozone layer is __________________________
exaggerated by the media.

3. Complete the sentences by writing a form of the word in


capitals in each space
After reading in 1975 about Altair 8800, Bill
Gates contacted the _________________ of CREATE
the new microcomputer to inform them
about his work on a BASIC ______ for the INTERPRET
platform.
In ______________, Gates and Paul Allen REAL
did not have either an Altair or a piece of
_________________code for it. WRITE
They ____________ wanted to attract MERE
attention,

11
and__________ they were invited by MITS FINAL
president.
The ________ of their newly developed DEMONSTRATE
Altair
____________ was a success and resulted in EMULATE
a deal with MITS.
Gates took a leave of ___________ from ABSENT
Harvard to work with Allen, with whom
they called their _____________ ―Micro- PARTNER
Soft‖.

READING COMPREHENSION AND TEXT DISCUSSION

1. Scan the text to find information on:


 Dynamic programming (DP)
 The DPFE
 Linear programming (LP)
 The elements of a DPFE and its characteristics

2. Answer the following questions:


1. What does this article concern?
2. What are limitations of linear programming (LP)?
3. What definition can you give to Dynamic programming
method?
4. Give the definition to Bellman‘s Principle of Optimality.
How it is related to the article?

3. Summarize the text


1. Sum up the main points presented in the text.
2. Write the plan of the text in the form of statements.
3. Develop your plan into a summary.

12
GRAMMAR PRACTICE

Passive Voice

Model 1
A number of programming tools are commercially available now and
excellent results have been obtained by several of these.
Несколько программ в настоящее время коммерчески
доступны, и при помощи их были получены отличные
результаты.

Translate the following sentences into Russian taking into


account the model:
1. New materials must be carefully screened prior to
acceptance.
2. The problem was dealt upon in 1998.
3. The experiment was followed by a number of
demonstrations.
4. What kind of OS was used in the early 80-s - text-based or
graphic-based?
5. People find Windows very easy to use because everything
is presented in graphic images.
6. The user interface has been redesigned with new icons and
a new visual style.
7. Many questions were answered correctly.
8. The new equipment was sent for yesterday.
9. He was relied upon by the majority of the committee.
10. The process of natural selection can be greatly assisted by
the two methods mentioned above.
11. Writability is a measure of how easily a language can be
used to create programs for a chosen problem domain.

Model 2
The results were affected by many factors.
На результаты влияли многие факторы.

13
Translate the following sentences into Russian taking into
account the model:
1. This phenomenon has been dealt with by prof. S.Podvalny.
2. There is hardly any aspect of human life that would not be
affected by the changes that computers have brought
about.
3. The sequence of reasonable operations has been performed
by the computer.
4. Many books on computer organization and architecture
had been translated from Russian into English by the end
of the last year.
5. The instructions are recorded in the order in which they are
to be carried out.
6. The instruction format is the way in which the different
digits are allocated to represent specific functions.

14
UNIT 2

Pre-reading exercise.
Skim through the text and identify the main ideas of the paper.

SCHEDULING
Sequencing and scheduling is a form of decision-making that
plays a crucial role in manufacturing and service industries. In the
current competitive environment effective sequencing and
scheduling have become a necessity for survival in the market-
place. Companies have to meet shipping dates that have been
committed to customers, as failure to do so may results in a
significant loss of goodwill. They also have to schedule activities in
such a way as to use the resources available in an efficient manner.
Scheduling began to be taken seriously in manufacturing at the
beginning of this century with the work of Henry Gantt and other
pioneers. However, it took many years for the first scheduling
publications to appear in the industrial engineering and operations
research literature. Some of the first publications appeared in
Naval Research Logistics Quarterly in the early fifties and
contained results by W.E. Smith, S.M. Johnson and J.R. Jackson.
During the sixties, a significant amount of work was done on
dynamic programming and integer programming formulations of
scheduling problems. After Richard Karp‘s famous paper on
complexity theory, the research in the seventies focused mainly on
the complexity hierarchy of scheduling problems. In the eighties
several different directions were pursued in academia and
industry with an increasing amount of attention paid to stochastic
scheduling problems. Also, as personal computers started to
permeate manufacturing facilities, scheduling systems were being
developed for the generation of usable schedules in practice. This
system design and development was, and is being done by
computer scientists, operations researchers and industrial
engineers.

15
Scheduling is a decision-making process that is used on a regular
basis in many manufacturing and services industries. It deals with
the allocation of resources to tasks over given time periods and its
goal is to optimize one or more objectives.
The resources and tasks in an organization can take many different
forms. The resources may be machines in a workshop, runways at
an airport, crews at a construction site, processing units in a
computing environment, and so on. The tasks may be operations
in a production process, take-offs and landings at an airport,
stages in a construction project, executions of computer programs,
and so on. Each task may have a certain priority level, an earliest
possible starting time and a due date. The objectives can also take
many different forms. One objective may be the minimization of
the completion time of the last task and another may be the
minimization of the number of tasks completed after their
respective due dates.
Scheduling, as a decision-making process, plays an important role
in most manufacturing and production systems as well as in most
information processing environments. It is also important in
transportation and distribution settings and in other types of
service industries. The following examples illustrate the role of
scheduling in a number of real world environments.
Gate Assignments at an Airport
Consider an airline terminal at a major airport. There are dozens of
gates and hundreds of planes arriving and departing each day.
The gates are not all identical and neither are the planes. Some of
the gates are in locations with a lot of space where large planes
(widebodies) can be accommodated easily. Other gates are in
locations where it is difficult to bring in the planes; certain planes
may actually have to be towed to their gates.
Planes arrive and depart according to a certain schedule.
However, the schedule is subject to a certain amount of
randomness, which may be weather related or caused by
unforeseen events at other airports. During the time that a plane
occupies a gate the arriving passengers have to be deplaned, the
plane has to be serviced and the departing passengers have to be
16
boarded. The scheduled departure time can be viewed as a due
date and the airline‘s performance is measured accordingly.
However, if it is known in advance that the plane cannot land at
the next airport because of anticipated congestion at its scheduled
arrival time, then the plane does not take off (such a policy is
followed to conserve fuel). If a plane is not allowed to take off,
operating policies usually prescribe that passengers remain in the
terminal rather than on the plane. If boarding is postponed, a
plane may remain at a gate for an extended period of time, thus
preventing other planes from using that gate.
The scheduler has to assign planes to gates in such a way that the
assignment is physically feasible while optimizing a number of
objectives. This implies that the scheduler has to assign planes to
suitable gates that are available at the respective arrival times. The
objectives include minimization of work for airline personnel and
minimization of airplane delays.
In this scenario the gates are the resources and the handling and
servicing of the planes are the tasks. The arrival of a plane at a gate
represents the starting time of a task and the departure represents
its completion time.
Scheduling Tasks in a Central Processing Unit (CPU)
One of the functions of a multi-tasking computer operating system
is to schedule the time that the CPU devotes to the different
programs that have to be executed. The exact processing times are
usually not known in advance. However, the distribution of these
random processing times may be known in advance, including
their means and their variances. In addition, each task usually has
a certain priority level (the operating system typically allows
operators and users to specify the priority level or weight of each
task). In such a case, the objective is to minimize the expected sum
of the weighted completion times of all tasks.
To avoid the situation where relatively short tasks remain in the
system for a long time waiting for much longer tasks that have a
higher priority, the operating system ―slices‖ each task into little
pieces. The operating system then rotates these slices on the CPU
so that in any given time interval, the CPU spends some amount of
17
time on each task. This way, if by chance the processing time of
one of the tasks is very short, the task will be able to leave the
system relatively quickly.
An interruption of the processing of a task is often referred to as a
preemption. It is clear that the optimal policy in such an
environment makes heavy use of preemptions. It may not be
immediately clear what impact schedules may have on objectives
of interest. Does it make sense to invest time and effort searching
for a good schedule rather than just choosing a schedule at
random? In practice, it often turns out that the choice of schedule
does have a significant impact on the system‘s performance and
that it does make sense to spend some time and effort searching
for a suitable schedule.
Scheduling can be difficult from a technical as well as from an
implementation point of view. The type of difficulties encountered
on the technical side are similar to the difficulties encountered in
other forms of combinatorial optimization and stochastic
modeling. The difficulties on the implementation side are of a
completely different kind. They may depend on the accuracy of
the model used for the analysis of the actual scheduling problem
and on the reliability of the input data that are needed.
The Scheduling Function in an Enterprise
The scheduling function in a production system or service
organization must interact with many other functions. These
interactions are system-dependent and may differ substantially
from one situation to another. They often take place within an
enterprise-wide information system.
A modern factory or service organization often has an elaborate
information system in place that includes a central computer and
database. Local area networks of personal computers,
workstations and data entry terminals, which are connected to this
central computer, may be used either to retrieve data from the
database or to enter new data. The software controlling such an
elaborate information system is typically referred to as an
Enterprise Resource Planning (ERP) system. A number of software
companies specialize in the development of such systems,
18
including SAP, J.D. Edwards, and PeopleSoft. Such an ERP system
plays the role of an information highway that traverses the
enterprise with, at all organizational levels, links to decision
support systems.
Scheduling is often done interactively via a decision support
system that is installed on a personal computer or workstation
linked to the ERP system. Terminals at key locations connected to
the ERP system can give departments throughout the enterprise
access to all current scheduling information. These departments, in
turn, can provide the scheduling system with up-to-date
information concerning the statuses of jobs and machines.
There are, of course, still environments where the communication
between the scheduling function and other decision making
entities occurs in meetings or through memos.
In a manufacturing environment, the scheduling function has to
interact with other decision making functions. One popular system
that is widely used is the Material Requirements Planning (MRP)
system. After a schedule has been generated it is necessary that all
raw materials and resources are available at the specified times.
The ready dates of all jobs have to be determined jointly by the
production planning/scheduling system and the MRP system.
MRP systems are normally fairly elaborated. Each job has a Bill Of
Materials (BOM) itemizing the parts required for production. The
MRP system keeps track of the inventory of each part.
Furthermore, it determines the timing of the purchases of each one
of the materials. In doing so, it uses techniques such as lot sizing
and lot scheduling that are similar to those used in scheduling
systems. There are many commercial MRP software packages
available and, as a result, there are many manufacturing facilities
with MRP systems. In the cases where the facility does not have a
scheduling system, the MRP system may be used for production
planning purposes. However, in complex settings it is not easy for
an MRP system to do the detailed scheduling satisfactorily.

19
VOCABULARY STUDY AND PRACTICE

Glossary
 a crucial role – важнейшая роль
 to pursue - преследовать
 to permeate - проникать
 the allocation of resources – распределение ресурсов
 a preemption - выгрузка
 elaborated - тщательно продуманный, детально
разработанный

1. Translate the following sentences into Russian:


1. In the current competitive environment effective
sequencing and scheduling have become a necessity for
survival in the market-place.
2. The distribution of these random processing times may be
known in advance, including their means and their
variances. In addition, each task usually has a certain
priority level (the operating system typically allows
operators and users to specify the priority level or weight
of each task).
3. Scheduling is often done interactively via a decision
support system that is installed on a personal computer or
workstation linked to the ERP system. Terminals at key
locations connected to the ERP system can give
departments throughout the enterprise access to all current
scheduling information.

2. Complete the second sentence so that it has a similar meaning


to the first sentence, using the word given. Do not change the
word given.
1. The use of mobile phones is absolutely forbidden inside the
laboratory.
MUST
Under ___________________ used inside the laboratory.

20
2. The machine really needs cleaning.
TIME
It‘s high _________________ cleaned.
3. Our boss doesn‘t allow us to eat at our desks.
LINE
Our boss ______________________ at our desks.
4. My University mate took an active part in crowdfunding for
this new project.
ENGAGED
My University mate ________________ in crowdfunding for this
new project.
5. The parking is only to be used by staff.
EXCLUSIVE
The parking is for __________________ staff.
6. Nobody can do anything about the present situation
NOTHING
There ____________________ about the present situation.
7. It was the first time that James had tried mental activities to
help him concentrate during the exam.
BEFORE
Never _________________ mental activities to help him
concentrate during the exam.
8. My friend always gets almost angry when tourists do not show
respect to local traditions.
IS
The thing __________________ when tourists do not show respect
to local traditions.
9. Larry has never been at all interested in learning to program in
any assembly language.
SLIGHTEST
Larry has never _________________________ learning to program
in any assembly language.

21
10. The speaker performed brilliantly and received a standing
ovation.
BRILLIANT
The speaker gave _____________________ and received a standing
ovation.

3. Complete the sentences by writing a form of the word in


capitals in each space
IBM is an American company which
_____________ in 1911 as the ORIGIN
Computing-Tabulating-Recording Company
(CTR) and was _________________ NAME
"International Business Machines" in 1924. It
produces both _______________ SOFT
and________________, offers HARD
_____________ services in various areas, from CONSULT
mainframe to nanotechnology.
IBM is also a major research _________________. ORGANISE
_________ by the company include the ATM, the INVENT
floppy disk, the hard disk drive,
the relational __________, the SQL programming DATA
language,
the UPC __________, dynamic random-access BAR
memory (DRAM), and
many others which made it a highly __________ PROFIT
company.

READING COMPREHENSION AND TEXT DISCUSSION

1. Scan the text to find information on these aspects:


 Sequencing and scheduling
 Decision-making
 Enterprise Resource Planning (ERP) system
 Material Requirements Planning (MRP) system
22
2. Answer the following questions:
1. When was the scheduling taken seriously?
2. What are the main objectives of a decision-making process?
3. Why can scheduling be difficult from a technical as well as
from an implementation point of view?
4. What other decision making functions does the scheduling
function in a manufacturing environment have to interact
with?

3. Summarize the text


1. Sum up the main points presented in the text.
2. Write the plan of the text in the form of statements.
3. Develop your plan into a summary

GRAMMAR PRACTICE

“Be to”: modal verb

Model
We were to meet at 5.
Мы должны были встретиться в 5.
This medicine is to be taken after meals.
Лекарство нужно принимать после еды.

Translate the following sentences into Russian taking into


account the model:
1. When are you to send the letter?
2. I was not to see them again.
3. What are we to do next week?
4. There was no one to meet me at the station, as I was to
have arrived two days before.
5. The work was considered to be important and is under
way to be completed.

23
6. According to the time-table you are to begin your classes at
8 o‘clock.
7. The main task of the article was to show the result of
research work.
8. The general purpose of this unit (block) is to perform
different arithmetic operations.
9. The participants of the scientific conference are to arrive
tomorrow.
10. A more sophisticated approach - partition memory
management - is to have more than one application
program in memory at a time, sharing memory space and
CPU time.
11. Programs in such languages do not state exactly how a
result is to be computed but rather describe the form of the
result.

24
UNIT 3

Pre-reading exercise.
Skim through the text and identify the main ideas of the paper.

SPIRAL WAVES IN NONLOCAL EQUATIONS

We present a numerical study of rotating spiral waves in a partial


integral-differential equation defined on a circular domain. This
type of equation has been previously studied as a model for large
scale pattern formation in the cortex and involves spatially
nonlocal interactions through a convolution.
The main results involve numerical continuation of spiral waves
that are stationary in a rotating reference frame as different
parameters are varied. We find that parameters controlling the
strength of the nonlinear drive, the strength of local inhibitory
feedback, and the steepness and threshold of the nonlinearity must
all lie within particular intervals for stable spiral waves to exist.
Beyond the ends of these intervals, either the whole domain
becomes active or the whole domain becomes quiescent. An
unexpected result is that the boundaries seem to play a much more
significant role in determining stability and rotation speed of
spirals, as compared with reaction-diffusion systems having only
local interactions.
Rotating spiral waves are ubiquitous spatiotemporal patterns that
appear in two-dimensional active media. They have been observed
in a variety of experimental chemical and biological systems and
in mathematical models of reaction diffusion type. In cardiac
systems, spiral waves are thought to be associated with
pathological conditions such as fibrillation. There has been much
interest in observing spiral waves on intact hearts and in
simulating such waves with a view to perturbing the system so
that the spiral waves are destroyed. They are the simplest form of
wave propagation in excitable media that is self-maintained; i.e.,
once initiated they will persist indefinitely.

25
Most previous work on mathematical models of spiral waves has
involved reaction-diffusion equations, where spatial interactions
are local. Several authors have directly studied the stability of
spiral waves by considering a circular domain and moving into a
coordinate frame that rotates with the spiral. Spiral waves then
become solutions of a time-independent two-dimensional PDE,
and their stability can be found by examining the eigenvalues of a
large matrix that results from a discretization of the PDE. This
approach also allows one to numerically continue spiral waves as
one or more parameters of the system are varied and thus
investigate whole families of spiral waves, some members of
which may be unstable.
There is another class of pattern-forming systems that has been
studied recently as a model of large scale pattern formation in the
cortex for which spatial interactions are nonlocal, as a result of a
spatial convolution. These systems have mostly been studied on
one-dimensional domains and are known to support stationary
―bumps‖ of activity, multibump solutions, travelling wave fronts,
and travelling pulses.
Some study of these models in two-dimensional domains has
recently been done by Laingand Troy. These authors studied
circularly symmetric solutions and their stability with respect to
perturbations that broke that symmetry, concentrating on spatially
localized solutions (―bumps‖). Folias and Bressloff have also
studied two-dimensional neural field equations, looking
specifically at circular solutions that are centered at the maximum
of a spatially localized input current. They study the stability of
such pulses and find saddle-node and Hopf bifurcations, the latter
leading to localized ―breathers.‖ Kistler, Seitz, and van Hemmen
studied similar equations, analytically treating plane waves and
circular rings. They also performed simulations of large (106
neurons) networks of spiking neurons and observed spiral waves,
among other patterns.
However, as far as we know, spiral waves have not been studied
in nonlocal continuum models of this form. Spiral waves have
been seen previously in two-dimensional networks of model
26
spiking neurons with nonlocal coupling but have not been
analyzed in any detail.
Spiral waves have been observed in numerical simulations of
reaction-diffusion systems with nonlocal terms. For example,
Middya and Luss consider the effects of adding a term to the
―reaction‖ part of such an equation that is proportional to the
difference between the average value over the domain of one
variable and a reference value. This averaging introduces a
nonlocal coupling.
In this paper we have numerically demonstrated the existence of
spiral waves in a spatially nonlocal integral-differential equation
of the form commonly used to model two-dimensional neural
fields. For this model, we investigated the dependence of spiral
waves on the parameters of that system. We have determined the
ranges of those parameters over which stable spirals exist. This
information could be used in two different ways. If spiral waves
are viewed as undesirable (in the same way as in cardiac systems),
we can use this information to determine how sensitive the system
is to a change in a particular parameter, by knowing how wide the
parameter range is in which stable spirals exist. We can also
determine the necessary change in a particular parameter to make
the system no longer capable of supporting a stable spiral wave.
Conversely, if spiral waves are desirable, we could use this
information to steer the system toward a region in parameter
space far from any bifurcations, thus making it robust to
perturbations in those parameters. Of course, for this to be useful
in a particular system we would need to know the relationships
between the manipulable parameters in the system and the
parameters of the model we are studying. However, this study
does provide the first understanding of the dependence of spiral
waves in these systems on generic parameters such as strength
and timescale of the recovery variable.
This model could also be used to study the general problem of the
destruction of spiral waves by applying appropriate transient
stimuli instead of changing bulk parameters of the system.

27
We have also found two supercritical Hopf bifurcations of spirals,
one that occurs for a single-armed spiral as ρ is decreased, and one
for a two-armed spiral as R is increased. These seem very similar
to Hopf bifurcations found in reaction-diffusion systems. We have
not investigated these in any depth, but it would be interesting to
see what happens to the quasi-periodic patterns created in these
bifurcations as parameters are changed. We also observed
multiarmed spirals which are unstable for moderate domain sizes,
in agreement with results observed in reaction-diffusion systems.
Most of the results we have seen are not dissimilar to those
observed in reaction-diffusion systems with local interactions.
However, the issue of the influence of domain size seems to be
unresolved. We saw that the complex conjugate pair of
eigenvalues of the Jacobian with the most positive real part were
well away from the imaginary axis, in contrast to the situation for
reaction-diffusion systems. This may indicate that even for this
domain size, the boundaries are affecting the stability of the spiral
wave. We showed the rotation speed as a function of domain size
and observed no saturation even when the radius was
approximately three times the wavelength, indicating that even for
such a large domain, the effects of the boundary were being felt.
Unfortunately, due to numerical limitations we could not reliably
trace the eigenvalues of the Jacobian as the domain size was
increased.
Regarding possible extensions of the work presented here, one
interesting feature to include in the type of model we have
presented would be the effects of propagation delays. Although
including these in models in one spatial dimension does not seem
to change the stability of travelling waves, this does not seem to
carry over to two spatial dimensions. Another feature to include is
the presence of inhibitory neurons, which also have spatially
extended coupling.
There exist many results relating to spiral waves in reaction-
diffusion systems; these include their response to temporally
periodic forcing or anisotropies of the domain, their motion near
boundaries, the effects of differently shaped domains, and the
28
interaction between two or more spirals. A large open issue is to
determine whether or not generic behavior of spiral waves in
reaction-diffusion systems also occurs in systems with nonlocal
spatial interactions. Our work suggests that in at least one aspect it
does not.

VOCABULARY STUDY AND PRACTICE

Glossary
 rotating spiral waves - вращающиеся спиральные волны
 a convolution - извилина,виток
 quiescent - неподвижный
 eigenvalues - собственные значения
 nonlocal continuum models - модели нелокального
континуума

1. Match the terms from the left-hand column with the


corresponding definition or phrase on the right

1) equation a) any place or point of entering or


beginning
2) threshold b) existing or being everywhere, especially
at the same time; omnipresent
3) ubiquitous c) a point at which a function of two
variables has partial derivatives equal to
zero but at which the function has neither a
maximum nor a minimum value
4) saddle-node d) an expression or a proposition, often
algebraic, asserting the equality of two
quantities

29
2. Complete the second sentence so that it has a similar meaning
to the first sentence, using the word given. Do not change the
word given.

1. My brother eventually managed to persuade our mother that he


was telling the truth.
SUCCEEDED
My brother eventually ____________________ our mother that he
was telling the truth.
2. He does not intend to stay in his current job very much longer.
NO
He has ____________________ in his current job very much longer.
3. John‘s teeth really need curing.
TIME
It‘s ___________________ teeth cured.
4. This new student wants everyone to realize that he is an
intellectual.
SEEN
This new student wants _____________________ an intellectual.
5. It is not usual that you find a person with such a good memory
as Winston has.
COME
Seldom _________________________ a person with such a good
memory as Winston has.
6. The people in the photograph made in the computer class look
like experienced programmers.
IF
The people in the photograph made in the computer class look
__________________ experience of programming.
7. Everyone believes that the company moved away from the city.
HAVE
The company is _________________ from the city.

30
8. What are the chances of Mike to become the leader of the
project?
LIKELY
How ____________________ will become the leader of the project?
9. The two fellow software analysts were sitting on their own
looking at a computer screen.
FRONT
The two fellow software analysts were sitting by
___________________ a computer screen.
10. Jake will probably be very successful.
CHANCES
The ______________________ be very successful.

3. Complete the sentences by writing a form of the word in


capitals in each space
The Internet of things (IoT) is the __________ of NET
physical devices,
vehicles, ______________and other items BUILD
____________ with electronics, software, BED
sensors, actuators, and network
__________________that enable these objects to CONNECT
collect and exchange data.
The IoT allows objects to be controlled
__________ across existing infrastructure, REMOTE
creating opportunities for more direct
_____________ of the physical world INTEGRATE
into computer-based systems, and resulting in
_____________efficiency. PROVE
Each thing in such a network is
uniquely____________ and can refer to IDENTIFY
a wide _______________ of devices including VARY
automobiles with built-in sensors,

31
electric clams in coastal waters, DNA
________devices for environmental ANALYSE
monitoring, etc.

READING COMPREHENSION AND TEXT DISCUSSION

1. Scan the text to find information on three aspects:


 Rotating spiral waves
 Two-dimensional PDE
 Hopf bifurcations

2. Answer the following questions:


1. What is the unexpected result of the study?
2. What has most previous work on mathematical models of
spiral waves involved?
3. How are rotating spiral waves organized?

3. Summarize the text


1. Sum up the main points presented in the text.
2. Write the plan of the text in the form of statements.
Develop your plan into a summary

GRAMMAR PRACTICE

Gerund and Gerundial Constructions

Model
We look forward to much attention being given to this question.
Мы рассчитываем на то, что этому вопросу будет уделено
большое внимание.

32
Translate the following sentences into Russian taking into
account the model:
1. Sometimes a system error makes the computer stop
working altogether and you will have to restart the
computer.
2. A sensible way of avoiding system errors is to write code to
check that peripherals are present before any data is sent
into it.
3. Time-sharing is a method of meeting the demands of
multi-access systems.
4. Word-processing is used to automate some of secretarial
tasks such as printing letters.
5. A high-level language is a simple and convenient means of
describing the information structure.
6. Is there any possibility of developing an artificial
intelligence system?
7. Before switching off, make sure you have saved your work.
8. Business languages are characterized by facilities for
producing elaborate reports, precise ways of describing
and storing decimal numbers and character data, and the
ability to specify arithmetic operation.
9. After performing calculations a computer displays a result.
10. This paper presents a novel design procedure for Class E
amplifiers without using waveform equations.
11. Netscape included a facility called Secure Sockets Layer
(SSL) for carrying out encrypted commercial transactions
online.

33
UNIT 4

Pre-reading exercise.
 What do you know about brain source imaging? How does
it work and why is it useful?
 Find information about the following: Inverse problem,
forward problem, sparse and tensor-based approaches, lead-field
matrix (mixing matrix), convex optimization algorithms, source-
imaging algorithms
 Skim through the text and identify the main ideas of the
article

BRAIN SOURCE IMAGING


A number of application areas such as biomedical engineering
require solving an underdetermined linear inverse problem. In
such a case, it is necessary to make assumptions on the sources to
restore identifiability. This problem is encountered in brain-source
imaging when identifying the source signals from noisy
electroencephalographic or magnetoencephalographic
measurements. This inverse problem has been widely studied
during recent decades, giving rise to an impressive number of
methods using different priors. Nevertheless, a thorough study of
the latter, including especially sparse and tensor-based
approaches, is still missing. In this article, we propose 1) a
taxonomy of the algorithms based methodological considerations;
2) a discussion of the identifiability and convergence properties,
advantages, drawbacks, and application domains of various
techniques; and 3) an illustration of the performance of selected
methods on identical data sets.
In brain-source imaging, one is confronted with the analysis of a
linear static system—the head volume conductor—that relates the
electromagnetic activity originating from a number of sources
located inside the brain to the surface of the head, where it can be
measured with an array of electric or magnetic sensors using
electroencephalography (EEG) or magnetoencephalography
(MEG). The source signals and locations contain valuable
34
information about the activity of the brain, which is crucial for the
diagnosis and management of diseases such as epilepsy or for the
understanding of the brain functions in neuroscience research.
However, without surgical intervention, the source signals cannot
be directly observed and have to be identified from the noisy
mixture of signals originating from all over the brain, which is
recorded by the EEG/MEG sensors at the surface of the head. This
is known as the inverse problem. On the other hand, deriving the
EEG/MEG signals for a known source configuration is referred to
as the forward problem.
Thanks to refined models of head geometry and advanced
mathematical tools that allow the computation of the so-called
lead-field matrix (referred to as the mixing matrix in other
domains), solving of the forward problem has become
straightforward, whereas finding a solution to the inverse problem
is still a challenging task.
The methods that are currently available for solving the inverse
problem of the brain can be broadly classified into two types of
approaches that are based on different source models: the
equivalent current dipole and the distributed source. Each
equivalent current dipole describes the activity within a spatially
extended brain region, leading to a small number of active sources
with free orientations and positions anywhere within the brain.
The lead-field matrix is, hence, not known but parameterized by
the source positions and orientations. Equivalent current dipole
methods also include the well-known multiple signal classification
(MUSIC) algorithm, and beam-forming techniques. These methods
are based on a fixed source space with a large number of dipoles,
from which a small number of equivalent current dipoles are
identified. On the other hand, the distributed source approaches
aim at identifying spatially extended source regions, which are
characterized by a high number of dipoles (largely exceeding the
number of sensors) with fixed locations. As the positions of the
source dipoles are fixed, the lead-field matrix can be computed
and, thus, is known.

35
We concentrate on the solution of the inverse problem for the case
where the lead-field matrix is known and focus on the distributed
source model. This inverse problem is one of the main topics in
biomedical engineering and has been widely studied in the signal
processing community, giving rise to an impressive number of
methods. Our objective is to provide an overview of the currently
available source-imaging methods that takes into account the
recent advances in the field.

IDENTIFIABILITY
For methods that solve the inverse problem by exploiting sparsity,
the uniqueness of the solution depends on the conditioning of the
lead-field matrix. More particularly, sufficient conditions that are
based on the mutual or cumulative coherence of the lead-field
matrix are available in the literature and can easily be verified for a
given lead-field matrix. However, in brain-source imaging, these
conditions are generally not fulfilled because the lead-field vectors
of adjacent grid dipoles are often highly correlated, making the
lead-field matrix ill conditioned.
A strong motivation for the use of tensor-based methods is the fact
that the CP decomposition is essentially unique under mild
conditions on the tensor rank. These conditions are generally
verified in brain-source imaging because the rank of the noiseless
tensor corresponds to the number of distributed sources, which is
usually small (fewer than ten) compared to the tensor dimensions.
The limitations of the tensor-based approach thus arise from the
approximations that are made when imposing a certain structure
on the data and not from the identifiability conditions. Note,
however, that these identifiability conditions only concern the CP
decomposition, which separates the distributed sources.
Additional conditions are indeed required for the uniqueness of
the results of the subsequent source localization step that is
applied for each distributed source separately. Nevertheless, the
separation of the distributed sources facilitates their identification
and may alleviate the identifiability conditions for the source
localization step. Finally, for subspace-based approaches, the
36
number of sources that can be identified depends on the
dimensions of the signal and noise subspaces of the cumulant
matrix.

CONVERGENCE
The source-imaging methods exploiting sparsity may be
implemented using two types of convex optimization algorithms:
interior point methods such as second-order cone programming
(SOCP)and proximal splitting methods such as the fast iterative
shrinkage-thresholding algorithm (FISTA) or the alternating
direction method of multipliers (ADMM). Both types of solvers are
known to converge to the global solution of a convex optimization
problem. However, the interior point methods are
computationally too expensive to solve large-scale problems as
encountered in brain source imaging, and the simpler and more
efficient proximal splitting methods are to be preferred in this
case. To solve the optimization problem associated with the CP
decomposition, a wide panel of algorithms, including alternating
methods such as alternating least squares, derivative-based
techniques such as gradient descent (GD) or Levenberg-
Marquardt, and direct techniques have been used. Even if the local
convergence properties hold for most of these methods, there is no
guarantee that they will converge to the global minimum because
the cost function generally features a large number of local
minima. However, in practical situations, it has been observed
that good results can be achieved, e.g., by combining a direct
method such as the direct algorithm for canonical polyadic
decomposition (DIAG) algorithm described with a derivative-
based technique like GD. Similar to the tensor decomposition
algorithm, there is no guarantee of global convergence for the EM
algorithm, which is popular in empirical Bayesian approaches, or
in the alternating optimization method employed by the
Champagne algorithm.

37
ADVANTAGES AND DRAWBACKS
Since strengths and weaknesses are often specific to a given source
imaging method and cannot be generalized to other techniques of
the same family of approaches, we subsequently focus on several
representative algorithms. On the one hand, the regularized least-
squares techniques sLORETA, MCE, and MxNE are simple and
computationally efficient, but the source estimates obtained by
these algorithms tend to be very focal (for MCE and MxNE) or
blurred (for sLORETA). On the other hand, VB-SCCD, STWV-DA,
and 4-ExSo-MUSIC, which allow for the identification of spatially
extended sources, feature a higher computational complexity.
Furthermore,STWV-DA and 4-ExSo-MUSIC have additional
requirements such as knowledge of the number of sources or the
signal subspace dimension, a certain structure of the data (for
STWV-DA), or a sufficiently high number of time samples (for 4-
ExSo-MUSIC). While all of these methods require adjusting certain
parameters, which are tedious to tune in practice, the main
advantage of the Champagne algorithm consists in the fact that
there is no parameter to adjust. However, this method also has a
high computational complexity and leads to very sparse source
estimates.

APPLICATION DOMAINS
Brain-source imaging finds application both in the clinical domain
and in cognitive neuroscience. The most frequent clinical
application is in epilepsy, where the objective consists in
delineating the regions from where interictal spikes or ictal
discharges arise. For this purpose, brain-source-imaging methods
such as VB-SCCD, STWV-DA, or 4-ExSo-MUSIC, which can
identify both the spatial extent and the shape of a small number of
distributed sources, are well suited. In cognitive neuroscience,
multiple brain structures are often simultaneously activated,
particularly when the subjects are asked to perform complex
cognitive tasks during the experimental sessions. The source-
imaging methods employed for the analysis of these data should
thus be able to deal with multiple correlated sources. This is, e.g.,
38
the case for VB-SCCD and other regularized least-squares
techniques, but not for STWV-DA or 4-ExSo-MUSIC. On the other
hand, during simple tasks such as those related to perceptual
processes, the analysis of EEG signals of ERPs can also aim at
identifying focal sources, in which case methods such as MCE,
MxNE, or Champagne are preferred. Finally, there is a rising
interest in the analysis of source connectivity. While sLORETA,
MCE, MxNE, or Champagne can be employed for this purpose,
VB-SCCD, STWV-DA, and 4-ExSo-MUSIC, which enforce identical
signals for dipoles belonging to the same patch, would
theoretically be less suited, especially for the analysis of local
cortical networks. Nevertheless, at a macroscopic level, these
algorithms may be employed to identify cortical networks that
characterize the connectivity between distinct brain regions.

RESULTS
In this section, we give the reader an idea of the kind of source
imaging results that can be obtained with different types of
algorithms by illustrating and comparing the performance of
representative algorithms on simulated data for an example of
epileptic EEG activity. To do this, we consider two or three quasi-
simultaneous active patches and model epileptiform spike-like
signals that spread from one brain region to another. The sources
are localized using the sLORETA, MCE, MxNE, VB-SCCD, STWV-
DA, Champagne, and 4-ExSo-MUSIC algorithms. To
quantitatively evaluate the performance of the different methods,
we use a measure called the distance of localization error (DLE),
which characterizes the difference between the original and the
estimated source configuration. The DLE is averaged over 50
realizations of EEG data with different epileptiform signals and
background activity. We first consider two scenarios with two
patches of medium distance composed of a patch in the inferior
frontal region (InfFr) combined once with a patch in the inferior
parietal region (InfPa) and once with a patch in the middle
posterior temporal gyrus (MidTe). The patches are all located on
the lateral aspect of the left hemisphere, but the patch MidTe is
39
partly located in a sulcus, leading to weaker surface signals than
the patches InfFr and InfPa, which are mostly on a gyral convexity.
This has an immediate influence on the performance of all source-
imaging algorithms except for Champagne. For the first scenario,
the algorithms exhibit high dipole amplitudes for dipoles
belonging to each of the true patches. For the second scenario, on
the other hand, the weak patch is difficult to make out on the
estimated source distribution of sLORETA, slightly more visible
on the MCE and MxNE solutions, and completely missing for 4-
ExSo-MUSIC. VB-SCCD and STWV-DA both recover the second
patch, but with a smaller amplitude in the case of VB-SCCD and a
smaller size for STWV-DA.
According to the DLE, MCE leads to the best results among the
focal source-imaging algorithms while STWV-DA outperforms the
other distributed source localization methods.
In the third scenario, we add a patch at the temporo-occipital
function (OccTe) to the InfFr and MidTe patches, which further
complicates the correct recovery of the active grid dipoles. The
best result in terms of the DLE is achieved by VB-SCCD. Even
though this method mostly identifies the brain regions that
correspond to the active patches, it does not allow the patches
MidTe and OccTe to be distinguished into two separate active
sources. STWV-DA, on the other hand, identifies all three patches,
even though the extent of the estimated active source region that
can be associated to the patch MidTe is too small. However, this
method also identifies several spurious source regions of small
size located between the patches MidTe and InfFr. 4-ExSo-MUSIC
and Champagne recover only one of the two patches located in the
temporal lobe. Similar to VB-SCCD, sLORETA does not allow the
patches MidTeandOccTe to be distinguished. This distinction is
performed better by MCE and especially by MxNE, which
displays three foci of brain activity.

CONCLUSIONS AND PERSPECTIVES


We classified existing source-imaging algorithms based on
methodological considerations. Furthermore, we discussed the
40
different techniques, both under theoretical and practical
considerations, by addressing questions of identifiability and
convergence, advantages and drawbacks of certain algorithms as
well as application domains, and by illustrating the performance
of representative source-imaging algorithms through a simulation
study.
While uniqueness conditions are available for both tensor- and
sparsity-based techniques, in the context of brain-source imaging,
these conditions are generally only fulfilled for tensor-based
approaches, which exploit the concept of distributed sources,
whereas the bad conditioning of the lead-field matrix practically
prohibits the unique identification of a sparse source distribution.
On the other hand, while convex optimization algorithms used for
sparse approaches usually converge to the global minimum, such
algorithms are not available for tensor decompositions, which
suffer from multiple local minima, making it almost impossible to
find the global optimum. In practice, despite the limitations
concerning identifiability and convergence, both tensor-based and
sparse approaches often yield good source reconstruction. Since
the various source localization algorithms have different
advantages, drawbacks, and requirements, source-imaging
solutions may vary depending on the application. As discussed
previously, for each problem, an appropriate source-imaging
technique has to be chosen depending on the desired properties of
the solution, the characteristics of the algorithm, and the validity
of the hypotheses employed by the method. Furthermore, it is
advisable to compare the results of different methods for
confirmation of the identified source region(s).
To summarize the findings of the simulation study, we can say
that sLORETA, Champagne, MCE, and MxNE recover well the
source positions, though not their spatial extent as they are
conceived for focal sources, while ExSo-MUSIC, STWV-DA, and
VB-SCCD also allow for an accurate estimate of the source size.
We noticed that most of the methods, except for ExSo-MUSIC and
STWV-DA, require prewhitening of the data or a good estimate of
the noise covariance matrix (in the case of Champagne) to yield
41
accurate results. On the one hand, this can be explained by the
hypothesis of spatially white Gaussian noise made by some
approaches, while on the other hand, the prewhitening also leads
to a decorrelation of the lead-field vectors and, therefore, to a
better conditioning of the lead-field matrix, which consequently
facilitates the correct identification of active grid dipoles.
Furthermore, the source-imaging algorithms generally have some
difficulties in identifying mesial sources located close to the
midline as well as multiple quasi-simultaneously active sources.
On the whole, for the situations addressed in our simulation
study, STWV-DA seems to be the most promising algorithm for
distributed source localization, both in terms of robustness and
source reconstruction quality. However, more detailed studies are
required to confirm the observed performances of the tested
algorithms before drawing further conclusions. Based on these
results, we can identify several promising directions for future
research. As the VB-SCCD algorithm demonstrates, imposing
sparsity in a suitable spatial transform domain may work better
than applying sparsity constraints directly to the signal matrix.
This type of approach should, thus, be further developed. Another
track for future research consists in further exploring different
combinations of a priori information, e.g., by merging the
successful strategies of different recently established source-
imaging approaches, such as tensor or subspace-based approaches
and sparsity. In a similar way, one could integrate the steps of
two-step procedures such as STWVDA into one single step to
process all of the available information and constraints at the same
time.

VOCABULARY STUDY AND PRACTICE

Glossary
 identifiability - распознаваемость, отождествляемость
 sparse - немногочисленный
 taxonomy - биосистематика, классификацияи
систематизация
42
 domain - область определения, домен
 adjacent - смежный, сопредельный
 coherence - совокупность предельных точек множества,
целостность
 cumulant - семиинвариант, кумулянтный

1. Fill in the gaps using the following terms:


identifiability, tedious, coherent, drawbacks, patch, subsequently,
confronted, adjacent, alleviate, objective, cumulant, convex.

1. Each of these techniques has advantages and


certain____________.
2. We are _______________ with are unprecedented challenges
and complex problems.
3. In statistics, ___________ is a property which a model must
satisfy in order for precise inference to be possible.
4. One way to _____________ the problem is to install ACID
5. Two vertices are _____________ if they are connected by an
edge (they are often called two vertices neighbours).
6. A jet algorithm was initially proposed by Georgi and
____________ further developed into the class of
"JET algorithms".
7. The ____________ interaction of light with atoms can cause
quantum interference between the excitation amplitudes of
different optical transitions.

2. Complete the second sentence so that it has a similar meaning


to the first sentence, using the word given. Do not change the
word given.
1. The reliability of social networks as a source of information is
often difficult to determine.

43
HOW
It is sometimes difficult to know __________________ as a source
of information.
2. Peter gave up job in the support desk because there were too
few challenges.
ENOUGH
Peter gave up job in the support desk because it
____________________ him.
3. At least fifty virtual reality glasses may still be in existence.
THOUGHT
At least fifty virtual reality glasses _________________ exist.
4. The manager says that our department has recovered from the
accident quite fast.
SAID
Our department __________________ from the accident quite fast.
5. ‗Having several IT certificates changed my position in the
company dramatically‘, said Lesley.
WHICH
Lesley said that it _____________________ his position in the
company dramatically.
6. There are many things to discuss before accepting a contract
offer.
TAKEN
There are many things that should _______________ consideration
before accepting a contract offer.
7. They were able to leave the press show unobserved because the
room was full of the media crowd.
OWING
They were able to leave the press show unobserved
_____________________ the room was full of the media crowd.
8. I don‘t think you‘ll find it hard to write a proper code for this
operating system.
DIFFICULTY

44
I don‘t think ______________________ writing a proper code for
this operating system.
9. The community‘s popularity increased thanks to word-of-
mouth recommendations.
LED
Word-of-mouth recommendations __________________ popularity
of the community.
10. It would be great that taking exams were voluntary for college
students.
HAVE
It would be great if college students ____________________ exams.

3. Complete the sentences by writing a form of the word in


capitals in each space
The ___________of what was to become the DEVELOP
iPhone began
in 2004 with the team of 1000 ____________ to EMPLOY
work on the highly
___________ "Project Purple". Apple created CONFIDENT
the device during a secretive
______________with AT&T Mobility which LABOUR
lasted for over thirty months.
Later AT&T even paid Apple a fraction of its
monthly service revenue in __________ CHANGE
for four years of exclusive US sales, until
2011. Jobs______________the iPhone VEIL
to the public on January 9, 2007. It went on
sale six months later, while hundreds of
______________ CUSTOM

lined up outside the stores nationwide. The


______________ reaction to the launch PASSION
of the iPhone resulted in sections of the

45
media __________ it the 'Jesus phone'. DUB
___________ this successful release in the US, FOLLOW
the first generation iPhone was made
available in the UK, France, and Germany in
November, 2007

READING COMPREHENSION AND TEXT DISCUSSION

1. Answer the following questions:


1. What are the two types of approaches available for solving
the inverse problem of the brain?
2. Where does the brain-source imaging find application?
3. Name brain source imaging algorithms and methods
mentioned in the text. Give some advantages and
drawbacks of them.
4. What are the findings of the simulation study?
5. What do you think the prospects of the brain source
imaging are?

2. Summarize the text


1. Summarize the main points of the text.
2. Write the plan in the form of statements.
3. Use your plan and key terms to summarize the article

GRAMMAR PRACTICE

Participle and Participle Constructions


Participle I Attribute
Model
The input unit consists of some devices using different means.
Устройство ввода состоит из нескольких приборов,
использующих различные средства.

46
Translate the following sentences into Russian taking into
account the model:
1. The operator pressing the key makes the adding machine
operate.
2. The density of memorizing elements in MOS memory is
very high.
3. There are few researchers discussing the stability analysis
and synthesis for the discrete LPV T-S fuzzy models.
4. Most main memory is made of integrated circuits
containing random access memory.
5. The 3rd generation computers beginning in the mid 1960s
introduced processing made of integrated circuits.
6. Computing is a concept embracing not only arithmetics but
also literacy.
7. Hence, programming is a technique requiring attention to
details without losing sight of the overall plan.
8. Although HTTP is most often used to retrieve HTML-
formatted Web documents using other protocols, such as
ftp news or Gopher.
9. The control unit interpreting instructions is one of the
important parts of any computer system.

Participle II: Attribute


Model
“PC” means personal computer, but it actually stands for the kind of
personal computer IBM invented.
Аббревиатура ―PC‖ означает персональный компьютер, но в
действительности она относится к любому персональному
компьютеру, изобретенному IBM.
Translate the following sentences into Russian taking into
account the model:
1. ―Software‖ is like a set of directions - turn left, go two
miles, turn right at the left - written in a language a
computer can understand.

47
2. The development of this equipment was possible largely
because of continued improvements.
3. New equipment used there far exceeded accepted
standards of the day.
4. This method previously mentioned as affording good
results, is widely used.
5. The effects described in this article are worth considering.
6. During the war, British and American code-breakers built a
specialized electronic computer called Colossus, which
read encoded transmissions from tape and broke the code
of supposedly impregnable German Enigma machine.
7. … the abacus developed in ancient China could still beat
the best mechanical calculators as late as the 1940s.
8. An output unit is a device through which results stored in
the computer memory are made available to the outside
world.
9. … the components in a von Neumann machine reside
physically in a printed circuit board called motherboard.
10. The other letters attached to CD refer to various properties
of the desk, such as formatting, and whether or not the
information on them can be changed.

48
UNIT 5

Pre-reading exercise.
 What types of systems do you know? How do you think a
system and control are interrelated?
 Find information about the following: Suppressed output
variables, notion of state, notion of linearity, the “principle of
superposition”
 Skim through the text and identify the main ideas of the
paper.

SYSTEM AND CONTROL BASICS

The Concept of System


System is one of those primitive concepts (like set or mapping)
whose understanding might best be left to intuition rather than an
exact definition. Nonetheless, we can provide three representative
definitions found in the literature:
An aggregation or assemblage of things so combined by nature or
man as to form an integral or complex whole.
A regularly interacting or interdependent group of items forming
a unified whole.
A combination of components that act together to perform a
function not possible with any of the individual parts.
There are two salient features in these definitions. First, a system
consists of interacting ―components‖, and second a system is
associated with a ―function‖ it is presumably intended to perform.
It is also worth pointing out that a system should not always be
associated with physical objects and natural laws. For example,
system theory has provided very convenient frameworks for
describing economic mechanisms or modelling of human behavior
and population dynamics.
The Input–Output Modelling Process
As scientists and engineers, we are primarily concerned with the
quantitative analysis of systems, and the development of
techniques for design, control, and the explicit measurement of
49
system performance based on well-defined criteria. Therefore, the
purely qualitative definitions given above are inadequate. Instead,
we seek a model of an actual system. Intuitively, we may think of
a model as a device that simply duplicates the behavior of the
system itself. To be more precise than that, we need to develop
some mathematical means for describing this behavior.
To carry out the modelling process, we start out by defining a set
of measurable variables associated with a given system. For
example, particle positions and velocities, or voltages and currents
in an electrical circuit, which are all real numbers. By measuring
these variables over a period of time we may then collect data.
Next, we select a subset of these variables and assume that we
have the ability to vary them over time. This defines a set of time
functions that we shall call the input variables. Then, we select
another set of variables that we assume we can directly measure
while varying. Note that there may well be some variables that
have not been associated with either the input or the output; these
are sometimes referred to as suppressed output variables.
To simplify notation, we represent the input variables through a
column vector u(t) and the output variables through another
column vector y(t); for short, we refer to them as the input and
output respectively. To complete a model, it is reasonable to
postulate that there exists some mathematical relationship
between input and output. This is the simplest possible modelling
process.
Strictly speaking, a system is ―something real‖ (e.g., an amplifier, a
car, a factory, a human body),whereas a model is an ―abstraction‖
(a set of mathematical equations). Often, the model only
approximates the true behavior of the system. However, once we
are convinced we have obtained a ―good‖ model, this distinction is
usually dropped, and the terms system and model are used
interchangeably. This is what we will be doing in the sequel. But,
before doing so, it is worth making one final remark. For any
given system, it is always possible(in principle) to obtain a model;
the converse is not true, since mathematical equations do not
always yield real solutions.
50
It is important to emphasize the flexibility built into the modelling
process, since no unique way to select input and output variables
is imposed. Thus, it is the modeller‘s task to identify these
variables depending on a particular point of view or on the
constraints imposed upon us by a particular application.
Static and Dynamic Systems
We define a static system to be one where the output is
independent of past values of the input. A dynamic system is one
where the output generally depends on past values of the input.
Thus, determining the output of a static system requires no
―memory‖ of the input history, which is not the case for a dynamic
system.
Time-Varying and Time-Invariant Dynamic Systems
In considering the various types of input–output relationships in
systems, it is reasonable to pose the following question: Is the
output always the same when the same input is applied?
The answer cannot always be ―yes‖, and gives rise to another
important way for classifying systems. More precisely, a system is
said to be time-invariant if it has the following property: if an
input u(t) results in an output y(t), then the input u(t − τ) results in
the output y(t − τ), for any τ.In other words, if the input function is
applied to the system τ units of time later than t, the resulting
output function is identical to that obtained at t, translated by τ.
When a replica of the function u(t) is applied as input at time t =
τ> 0, the resulting output is an exact replica of the function y(t).
The Concept of State
Roughly speaking, the state of a system at a time instant t should
describe its behaviour rat that instant in some measurable way. In
system theory, the term state has a much more precise meaning
and constitutes the cornerstone of the modelling process and many
analytical techniques.
The state space of a system is the set of all possible values that the
state may take.
Linear and Nonlinear Systems
The notion of linearity is fundamental in science and engineering,
and is closely associated with the ―principle of superposition‖,
51
which is described by the following property: If a stimulus S1
produces a response R1, and a stimulus S2 produces a response
R2, then the superposition of the two stimuli, (S1 + S2 ),will
produce the superposition of the two responses, (R1 + R2). In its
simplest form, i.e.,S1 = S2, superposition amounts to
proportionality; for example, doubling the input to a system
results in doubling the output.
The class of linear systems is a small subset of all possible systems.
Fortunately, it covers many cases of interest, or provides adequate
approximations we can use for practical purposes. Much of system
and control theory is in fact based on the analysis of linear
systems, and has led to plenty of success stories, from designing
complex electromechanical structures to describing the behavior of
economies and population growth.
It is tempting to claim that all dynamic systems can be modelled
through differential equations, no matter how nonlinear and
complex they might be. Although it is certainly true that these
models are immensely useful in system and control theory, one
can see that for the discrete event systems we need to consider,
differential equations simply do not capture the essential dynamic
behavior, or they lead to design and control solutions that are not
sufficiently accurate for many practical purposes.
State Spaces
Thus far, the values of the state variables we have considered are
real numbers. Real variables are of course very convenient when it
comes to deriving models based on differential equations.
However, there is nothing sacred about state variables always
taking real number values, as opposed to integer values or just
values from a given discrete set, such. In fact, one should always
keep in mind that the modelling process allows for substantial
flexibility in defining the state, input, and output of a system
depending on the application or problem of interest.
The Concept of Control
Our discussion thus far has been limited to the basic issue: What
happens to the system output for a given input? Systems,
however, do not normally exist in a vacuum. In fact, we saw that
52
the very definition of a system contains the idea of performing a
particular function. In order for such a function to be performed,
the system needs to be controlled by selecting the right input so as
to achieve some ―desired behavior‖.
The Concept of Feedback
The idea of feedback is intuitively simple: Use any available
information about the system behaviour in order to continuously
adjust the control input. Feedback is used in our everyday life in a
multitude of forms. In a conversation, we speak when the other
party is silent, and switch to listening when the other party is
beginning to talk. In driving, we monitor the car‘s position and
speed in order to continuously make adjustments through our
control of the steering wheel and accelerator and brake pedals. In
heating a house, we use a thermostat which senses the actual
temperature in order to turn a furnace on or off.
There are certain obvious advantages to the use of feedback.
Briefly, without getting into details, we can point out the
following:
- The desired behavior of the system becomes less sensitive to
unexpected disturbances.
- The desired behavior of the system becomes less sensitive to
possible errors in the parameter values assumed in the model.
On the other hand, feedback comes at some cost:
- Sensors or other potentially complex equipment may be required
to monitor the output and provide the necessary information to
the controller.
- Feedback requires some effort (measured in terms of the gain of
the system), which may adversely affect the overall system
performance.
- Feedback could actually create some problems of undesirable
system behavior, while correcting others.
As in many other areas of engineering, using feedback entails
several tradeoffs. Control theory is to a large extent devoted to the
study of the tradeoffs outlined above.

53
Discrete-Time Systems
We have assumed thus far that time is a continuous variable. This
certainly corresponds to our basic notion of time in the physical
world. Moreover, it allows us to develop models based on
differential equations, which are particularly attractive from a
mathematical standpoint.
Suppose that we were to define the input and output variables of a
system at discrete time instants only. As a result, we obtain what is
called a discrete-time system, in contrast to the continuous-time
systems considered up to this point. There are several good
reasons why we might want to adopt such an approach.
1. Any digital computer we might use as a component of a system
operates in discrete-time fashion, that is, it is equipped with an
internal discrete-time clock. Whatever variables the computer
recognizes or controls are only evaluated at those time instants
corresponding to ―clock ticks‖.
2. Many differential equations of interest in our continuous-time
models can only be solved numerically through the use of a
computer. Such computer-generated solutions are actually
discrete-time versions of continuous-time functions. Therefore,
starting out with discrete-time models is reasonable if the ultimate
solutions are going to be in this form anyway.
3. Digital control techniques, which are based on discrete-time
models, often provide considerable flexibility, speed, and low cost.
This is because of advances in digital hardware and computer
technology.
4. Some systems are inherently discrete-time, such as economic
models based on data that is recorded only at regular discrete
intervals (e.g., quarterly).

VOCABULARY STUDY AND PRACTICE

Glossary
 aggregation - группирование, сведение в блок
 column vector - вектор-столбец
54
 inadequate - неподходящий, не отвечающий
требованиям, неэффективный
 particle - материальная точка, частица
 converse - обратная теорема
 stimulus - воздействие, тест-вектор
 to derive – брать производную, отклонять, выводить
 multitude - множество, совокупность

1. Fill in the gaps using the following words:


tempting, presumably, entail, inadequate, sacred, yield, multitude, trade
off, aggregation, particle, salient, sequel

1. Any decision will ___________ inconvenience for one group or


another.
2. These microdata do not include either direct identification
variables or possible data ______________or assemblage.
3. As a____________ to the project, a guide and instruction book
are being developed.
4. While it is ______________ to be optimistic given the factors I
have outlined, the current situation remains very fragile.
5. Steps are therefore taken to restructure obstetric and neonatal
services, and these efforts have already begun to -
____________________ positive results.
6. The collected data are ________________ to formulate firm
conclusions.
7. Of a ____________of algorithms used for fault diagnosis and
testing of digital circuits, VICTOR stands out because of its
multi-step approach to determine the test vectors needed for
detection of a particular fault.
8. The ___________ observations, conclusions and
recommendations of this report are highlighted in bold type.

55
2. Complete the second sentence so that it has a similar meaning
to the first sentence, using the word given. Do not change the
word given.
1. Unless the CEO gets the designer he wants for the new series of
laptops, the project will be cancelled.
MEAN
If the CEO doesn‘t get the designer he wants for the new series of
laptops, it _____________ the project.
2. There isn‘t too much space in our new office as there is in the
neighbor one.
SPACIOUS
The new office is _____________________ the neighbor one.
3. Jessica‘s train should have arrived at 16.25, but there is no sign
of it yet.
SCHEDULED
Jessica‘s train _____________ at 16.25, but there is no sign of it yet.
4. Today children ought to get computer skills before they start
school.
TAUGHT
Computer skills _______________ pre-school children.
5. The wind was so strong that walking along the beach became
exhausting.
STRENGTH
It was ____________ made walking along the beach so exhausting.
6. A lucid and concise CV is a must if you are applying for a job in
a big IT-company.
ESSENTIAL
If you are applying for a job in a big IT-company, it _____________
which is both lucid and concise.
7. We should make the password more sophisticated if we want it
to be secure enough.
COMPLICATE
We‘ll ___________________ order to make it secure.

56
8. Alex was very surprised to be offered the position of a software
engineer in this company.
CAME
The offer of the position of a software engineer in this company
_________________ Alex.
9. That young man is the leading programmer of the airspace
department I told you about.
HAS
That‘s the young man ______________________ the airspace
department I told you about.
10. Mary promised to call today, so I‘m sure that‘s her on the
phone now.
MUST
Mary promised to call today, so _____________ on the phone now.

3. Complete the sentences by writing a form of the word in


capitals in each space
Windows 8 introduced major _____________ IMPROVE
to the user experience on tablets, where
Windows was now competing with mobile
operating systems, including Android and
iOS.
In ___________, these changes included a PART
touch-optimized Windows shell based on
Microsoft's "Metro"
design language, with an emphasis on
touchscreen ________________ and PUT
the ________________ with online services. INTEGRATE
_________________ security features were ADD
introduced, such as
built-in______________ software and VIRUS
integration with phishing filtering service.

57
Windows 8 was released to a mixed SPECIAL
reception, _____________for being
potentially ________________ and difficult to CONFUSE
learn. Despite these
________________, 60 million Windows 8 SHORT
licenses have been sold through
January 2013, including____________ GRADE

READING COMPREHENSION AND TEXT DISCUSSION

1. Answer the following questions:


1. What is a system according to the article?
2. Name different systems and define the difference between
them.
3. What are the pros and cons of using feedback?
4. What is a discrete time system?

2. Summarize the text


1. Summarize the main points of the text.
2. Write the plan in the form of statements.
3. Use your plan and key terms to summarize the article

GRAMMAR PRACTICE

Participle I. Adverbial Modifier.

Model
Performing addition the computer must have two numbers to be added.
Производя сложение, компьютер должен иметь два числа,
которые будут складываться.

58
Translate the following sentences into Russian taking into
account the model:
1. Discussing the advantages of the new memory unit, the
professor gave the students all the necessary explanations.
2. Having punched holes in a card, the operator put it into the
computer.
3. Having carried out a modest amount of research, I was
surprised to find very little information on the total energy
footprint consumed.
4. Having processed the information, C updates the
information on C-B8 and transmits in its turn.
5. Opening his case, he took out a ―PC Magazine‖.
6. When entering the Internet, I always find a lot of
interesting information.
7. While operating on the basis of analogy, analog computers
simulate physical systems.
8. Being discrete events, commercial transactions are in a
natural form for a digital computer.
9. While dealing with discrete quantities, digital computers
count rather than measure.
10. When using a microcomputer, you are constantly making
choice - to open file, to close a file and so on.
11. Having unknown properties, the elements cannot be used
for experiments. -

Participle II. Adverbial Modifier.

Model.
Though never built, Babbage’s analytical engine was the basis for
designing today’s computers.
Так никогда и не построенная, аналитическая машина
Бэббиджа (несмотря на это) стала основой для создания
современных компьютеров.

59
Translate the following sentences into Russian taking into
account the model:
1. When written in a symbolic language, programs require
the translation into the machine language.
2. When used, voltage represents other physical quantities in
analog computers.
3. As constructed with analyst, the computer system architect
designs computers for many different applications.
4. If arranged to their atomic weight, elements exhibit an
evident periodicity of properties.
5. When passed through the reading equipment, the
characters are read in a way similar to a way used for a
magnetic tape.

60
UNIT 6

Pre-reading exercise.
The paper describes a kind of abstraction. What is abstraction?
Define memory abstraction.

RESILIENT DISTRIBUTED DATASETS: A FAULT-


TOLERANT ABSTRACTION FOR IN-MEMORY CLUSTER
COMPUTING

We present Resilient Distributed Datasets (RDDs), a distributed


memory abstraction that lets programmers perform in-memory
computations on large clusters in a fault-tolerant manner. RDDs
are motivated by two types of applications that current computing
frameworks handle inefficiently: iterative algorithms and
interactive data mining tools. In both cases, keeping data in
memory can improve performance by an order of magnitude. To
achieve fault tolerance efficiently, RDDs provide a restricted form
of shared memory, based on coarse-grained transformations rather
than fine-grained updates to shared state. However, we show that
RDDs are expressive enough to capture a wide class of
computations, including recent specialized programming models
for iterative jobs, such as Pregel, and new applications that these
models do not capture. We have implemented RDDs in a system
called Spark, which we evaluate through a variety of user
applications and benchmarks.
Cluster computing frameworks like MapReduce and Dryad have
been widely adopted for large-scale data analytics. These systems
let users write parallel computations using a set of high-level
operators, without having to worry about work distribution and
fault tolerance.
Although current frameworks provide numerous abstractions for
accessing a cluster‘s computational resources, they lack
abstractions for leveraging distributed memory. This makes them
inefficient for an important class of emerging applications: those
61
that reuse intermediate results across multiple computations. Data
reuse is common in many iterative machine learning and graph
algorithms, including PageRank, K-means clustering, and logistic
regression. Another compelling use case is interactive data mining,
where a user runs multiple ad hoc queries on the same subset of
the data. Unfortunately, in most current frameworks, the only way
to reuse data between computations (e.g., between two
MapReducejobs) is to write it to an external stable storage system,
e.g., a distributed file system. This incurs substantial overheads
due to data replication, disk I/O, and serialization, which can
dominate application execution times.
Recognizing this problem, researchers have developed specialized
frameworks for some applications that require data reuse. For
example, Pregel is a system for iterative graph computations that
keeps intermediate data in memory, while HaLoop offers an
iterative MapReduce interface. However, these frameworks only
support specific computation patterns (e.g., looping a series of
MapReduce steps), and perform data sharing implicitly for these
patterns. They do not provide abstractions for more general reuse,
e.g., to let a user load several datasets into memory and run ad-hoc
queries across them.
We propose a new abstraction called resilient distributed datasets
(RDDs) that enables efficient data reuse in a broad range of
applications. RDDs are fault-tolerant, parallel data structures that
let users explicitly persist intermediate results in memory, control
their partitioning to optimize data placement, and manipulate
them using a rich set of operators.
The main challenge in designing RDDs is defining a programming
interface that can provide fault tolerance efficiently. Existing
abstractions for in-memory storage on clusters, such as distributed
shared memory, key value stores, databases, and Piccolo, offer an
interface based on fine-grained updates to mutable state (e.g., cells
in a table). With this interface, the only ways to provide fault
tolerance are to replicate the data across machines or to log
updates across machines. Both approaches are expensive for data-
intensive workloads, as they require copying large amounts of
62
data over the cluster network, whose bandwidth is far lower than
that of RAM, and they incur substantial storage overhead.
In contrast to these systems, RDDs provide an interface based on
coarse-grained transformations (e.g., map, filter and join) that apply
the same operation to many data items. This allows them to
provide efficiently fault tolerance by logging the transformations
used to build a dataset (its lineage) rather than the actual data. If a
partition of an RDD is lost, the RDD has enough information about
how it was derived from other RDDs to recompute just that
partition. Thus, lost data can be recovered, often quite quickly,
without requiring costly replication.
Although an interface based on coarse-grained transformations
may at first seem limited, RDDs are a good fit for many parallel
applications, because these applications naturally apply the same
operation to multiple data items. Indeed, RDDs can efficiently
express many cluster programming models that have so far been
proposed as separate systems, including MapReduce, DryadLINQ,
SQL, Pregel and HaLoop, as well as new applications that these
systems do not capture, like interactive data mining. The ability of
RDDs to accommodate computing needs that were previously met
only by introducing new frameworks is the most credible evidence
of the power of the RDD abstraction.
RDDs have been implemented in a system called Spark, which is
being used for research and production applications at UC
Berkeley and several companies. Spark provides a convenient
language-integrated programming interface similar to DryadLINQ
in the Scala programming language. In addition, Spark can be
used interactively to query big datasets from the Scala interpreter.
Spark is the first system that allows a general-purpose
programming language to be used at interactive speeds for in-
memory data mining on clusters.
RDD Abstraction
Formally, an RDD is a read-only, partitioned collection of records.
RDDs can only be created through deterministic operations on
either (1) data in stable storage or (2) other RDDs. We call these

63
operations transformations to differentiate them from other
operations on RDDs. Examples of transformations include map,
filter, and join.
RDDs do not need to be materialized at all times. Instead, an RDD
has enough information about how it was derived from other
datasets (its lineage) to compute its partitions from data in stable
storage. This is a powerful property: in essence, a program cannot
reference an RDD that it cannot reconstruct after a failure.
Finally, users can control two other aspects of RDDs: persistence
and partitioning. Users can indicate which RDDs they will reuse
and choose a storage strategy for them (e.g., in-memory storage).
They can also ask that an RDD‘s elements be partitioned across
machines based on a key in each record. This is useful for
placement optimizations, such as ensuring that two datasets that
will be joined together are hash-partitioned in the same way.
Advantages of the RDD Model
To understand the benefits of RDDs as a distributed memory
abstraction, we compare them against distributed shared memory
(DSM). In DSM systems, applications read and write to arbitrary
locations in a global address space. Note that under this definition,
we include not only traditional shared memory systems, but also
other systems where applications make fine-grained writes to
shared state, including Piccolo, which provides a shared DHT, and
distributed databases. DSM is a very general abstraction, but this
generality makes it harder to implement in an efficient and fault-
tolerant manner on commodity clusters.
The main difference between RDDs and DSM is that RDDs can
only be created (―written‖) through coarse-grained
transformations, while DSM allows reads and writes to each
memory location. This restricts RDDs to applications that perform
bulk writes, but allows for more efficient fault tolerance. In
particular, RDDs do not need to incur the overhead of check-
pointing, as they can be recovered using lineage. Furthermore,
only the lost partitions of an RDD need to be recomputed upon

64
failure, and they can be recomputed in parallel on different nodes,
without having to roll back the whole program.
A second benefit of RDDs is that their immutable nature lets a
system mitigate slow nodes (stragglers) by running backup copies
of slow tasks as in MapReduce. Backup tasks would be hard to
implement with DSM, as the two copies of a task would access the
same memory locations and interfere with each other‘s updates.
Finally, RDDs provide two other benefits over DSM. First, in bulk
operations on RDDs, a runtime can schedule tasks based on data
locality to improve performance. Second, RDDs degrade
gracefully when there is not enough memory to store them, as
long as they are only being used in scan-based operations.
Partitions that do not fit in RAM can be stored on disk and will
provide similar performance to current data-parallel systems.
Applications Not Suitable for RDDs
RDDs are best suited for batch applications that apply the same
operation to all elements of a dataset. In these cases, RDDs can
efficiently remember each transformation as one step in a lineage
graph and can recover lost partitions without having to log large
amounts of data. RDDs would be less suitable for applications that
make asynchronous fine-grained updates to shared state, such as a
storage system for a web application or an incremental web
crawler. For these applications, it is more efficient to use systems
that perform traditional update logging and data check-pointing,
such as databases, RAMCloud, Percolator and Piccolo. Our goal is
to provide an efficient programming model for batch analytics and
leave these asynchronous applications to specialized systems.
Representing RDDs
One of the challenges in providing RDDs as an abstraction is
choosing a representation for them that can track lineage across a
wide range of transformations. Ideally, a system implementing
RDDs should provide as rich a set of transformation operators as
possible and let users compose them in arbitrary ways. We
propose a simple graph-based representation for RDDs that
facilitates these goals. In a nutshell, we propose representing each
65
RDD through a common interface that exposes five pieces of
information: a set of partitions, which are atomic pieces of the
dataset; a set of dependencies on parent RDDs; a function for
computing the dataset based on its parents; and metadata about its
partitioning scheme and data placement.
The most interesting question in designing this interface is how to
represent dependencies between RDDs. We found it both
sufficient and useful to classify dependencies into two types:
narrow dependencies, where each partition of the parent RDD is
used by at most one partition of the child RDD, wide dependencies,
where multiple child partitions may depend on it. For example,
map leads to a narrow dependency, while join leads to wide
dependencies (unless the parents are hash-partitioned).
This distinction is useful for two reasons. First, narrow
dependencies allow for pipelined execution on one cluster node,
which can compute all the parent partitions. For example, one can
apply a map followed by a filter on an element-by-element basis.
In contrast, wide dependencies require data from all parent
partitions to be available and to be shuffled across the nodes using
a MapReduce-like operation. Second, recovery after a node failure
is more efficient with a narrow dependency, as only the lost parent
partitions need to be recomputed, and they can be recomputed in
parallel on different nodes. In contrast, in a lineage graph with
wide dependencies, a single failed node might cause the loss of
some partition from all the ancestors of an RDD, requiring a
complete re-execution.
This common interface for RDDs made it possible to implement
most transformations in Spark in less than 20 lines of code. Indeed,
even new Spark users have implemented new transformations
(e.g., sampling and various types of joins) without knowing the
details of the scheduler.

66
VOCABULARY STUDY AND PRACTICE

Glossary
 resilient - отказоустойчивый
 fault-tolerant - отказоустойчивый; устойчивый к сбоям
 magnitude - абсолютная величина
 overhead - затраты вычислительных ресурсов,
перегрузка (интеллектуальная) (прогр.)
 abstraction - выделение главных признаков (вчт.),
абстракция
 coarse-grained - крупномодульный
 fine-grained - мелкомодульный, мелкоструктурный
 to leverage - эффективно использовать
 ad hoc query - произвольный (нерегламентированный)
запрос
 serialization - сериализация, преобразование в
последовательную форму ( из параллельной)
 bandwidth - пропускная способность, полоса
 persistence - инерционность изображения,
сохранность, долговременное хранение(объектов)
 partitioning - разбиение на разделы, форматирование
 checkpoint - контрольная цифра
 immutable - постоянный(комп.), неизменяемый
 dependency - (вчт) отношение, взаимосвязь,
взаимозависимость

1. Translate the following sentences into Russian:


1. Existing abstractions for in-memory storage on clusters,
such as distributed shared memory, key value stores,
databases, and Piccolo, offer an interface based on fine-
grained updates to mutable state (e.g., cells in a table).
2. Furthermore, only the lost partitions of an RDD need to be
recomputed upon failure, and they can be recomputed in

67
parallel on different nodes, without having to roll back the
whole program.
3. This restricts RDDs to applications that perform bulk
writes, but allows for more efficient fault tolerance.
4. Second, RDDs degrade gracefully when there is not
enough memory to store them, as long as they are only
being used in scan-based operations.
5. Ideally, a system implementing RDDs should provide as
rich a set of transformation operators as possible and let
users compose them in arbitrary ways.

2. Complete the second sentence so that it has a similar meaning


to the first sentence, using the word given. Do not change the
word given.
1. The new employee suggested some improvements to the
system which would make it easier to support.
FORWARD
The new employee _________________________ improving the
system to make it easier to support.
2. You can‘t blame Geoffrey for breaking the mainframe because
he wasn‘t even here this morning.
BEEN
It ___________________ broke the mainframe because he wasn‘t
even here this morning.
3. Wendy originally intended to travel to the conference by car
rather than by train.
WAS
Wendy‘s _____________________ travel to the conference by car
rather than by train.
4. More people are programming in C++ now than they did ten
years ago.
WIDELY
C++ _____________________ than it was ten years ago.

68
5. I was just about to send you a message in What‘s App with my
address.
POINT
I was ___________ you a message in What‘s App with my address.
6. ‗I‘m sorry that I‘ve broken your keyboard,‘ said Carey.
APOLOGISED
Carey _____________________ my keyboard.
7. I‘m assuming that you haven‘t heard the news about the brand
new Chinese smartphone yet.
UNLIKELY
I think you ________________________ the news about the brand
new Chinese smartphone yet.
8. I don‘t mind where we decide to celebrate Christmas and New
Year holidays this year.
DIFFERENCE
It doesn‘t __________________________where we decide to
celebrate Christmas and New Year holidays this year.
9. Why are some computer brands more popular than others?
MAKES
What is ______________________ some brands more popular than
others?
10. Working in the laboratory is a compulsory part of the physicist
training.
HAS
Every physicist ___________________ in the laboratory as part of
their training.

3. Complete the sentences by writing a form of the word in


capitals in each space
Unix is a family of ______________multiuser TASK
computer operating systems developed in
the 1970s by Ken Thompson, Dennis Ritchie,
and others.

69
AT&T licensed Unix to outside parties
which lead to both academic and COMMERCE
__________
variants of the OS. Unix systems are
_________________ CHARACTER
by a _______________ design that is MODULE
sometimes called the "Unix philosophy".
______________ from it, Unix is also said to SIDE
be the first portable operating
system _____________ written in the C ENTIRE
programming language that allowed
this OS to reach ____________ platforms. NUMBER
Many clones of Unix have ___________ over RISE
the years, of which Linux is the most
popular.
BSD _____________ were developed TRIBUTE
through the
_______________by a worldwide network of LABOUR
programmers.

READING COMPREHENSION AND TEXT DISCUSSION

1. Discuss the following:


 The applications that RDDs are motivated by. The main
challenge in designing RDDs.
 The advantages of RDDs over DSM ( Distributed Shared
Memory).
 Representation for RDDs.

2. Summarize the text.


1. Summarize the main points of the text.
2. Write the plan in the form of statements.
3. Use your plan and key terms to summarize the article

70
GRAMMAR PRACTICE
Absolute Participle Construction
Model
Personal computers being used for many purposes, scientists go on
improving their characteristics.
Так как персональные компьютеры используются для
различных целей, ученые продолжают улучшать их
характеристики.
Translate the following sentences into Russian taking into
account the model:
1. Data being accessed randomly, semiconductor memories
are called random access memory (RAM).
2. The information capacity of a single bit being limited to
two alternatives, codes are based on combination of bits.
3. An electron leaving the surface, the metal becomes
positively charged.
4. Computer system architecture being organized around the
primary unit, all instructions must pass through it.
5. Electromechanical memories depend upon moving
mechanical parts, their data access time being longer than
is that of electronic memories.
6. Large capacity tape devices are used with large data
processing systems, cassettes and cartridges being applied
with small systems.
7. The CPU controls the operation of the entire system,
commands being issued to other parts of the system.
8. The results of arithmetic operation being returned to the
accumulator, the storage registers transfer them to the
main memory.
9. Instructions being obtained, the control unit causes other
units to perform the necessary operations.
10. Electronics being used not only in industry but in many
other fields of human activity as well, one should have an
idea of what it is.

71
UNIT 7

Pre-reading exercise.
 The paper describes a distributed storage system that
resembles a database. What is a database? Enumerate
different types of databases.
 Define a relational data model.

BIGTABLE: A DISTRIBUTED STORAGE SYSTEM FOR


STRUCTURED DATA

Bigtable is a distributed storage system for managing structured


data that is designed to scale to a very large size: petabytes of data
across thousands of commodity servers. Bigtable has achieved
several goals: wide applicability, scalability, high performance,
and high availability. Bigtable is used by more than sixty Google
products and projects, including Google Analytics, Google
Finance, Orkut, Personalized Search, Writely, and Google Earth.
These products use Bigtable for a variety of demanding
workloads, which range from throughput-oriented batch-
processing jobs to latency-sensitive serving of data to end users.
The Bigtable clusters used by these products span a wide range of
configurations, from a handful to thousands of servers, and store
up to several hundred terabytes of data.
In many ways, Bigtable resembles a database: it shares many
implementation strategies with databases. Parallel databases and
main-memory databases have achieved scalability and high
performance, but Bigtable provides a different interface than such
systems. Bigtable does not support a full relational data model;
instead, it provides clients with a simple data model that supports
dynamic control over data layout and format, and allows clients to
reason about the locality properties of the data represented in the
underlying storage. Data is indexed using row and column names
that can be arbitrary strings. Bigtable also treats data as
uninterpreted strings, although clients often serialize various
forms of structured and semi-structured data into these strings.
72
Clients can control the locality of their data through careful choices
in their schemas. Finally, Bigtable schema parameters let clients
dynamically control whether to serve data out of memory or from
disk.

Data Model
A Bigtable is a sparse, distributed, persistent multidimensional
sorted map. The map is indexed by a row key, column key, and a
timestamp; each value in the map is an uninterpreted array of
bytes. (row:string, column:string, time:int64) ! string
Rows
The row keys in a table are arbitrary strings (currently up to 64KB
in size). Every read or write of data under a single row key is
atomic (regardless of the number of different columns being read
or written in the row), a design decision that makes it easier for
clients to reason about the system's behavior in the presence of
concurrent updates to the same row.
Bigtable maintains data in lexicographic order by row key. The
row range for a table is dynamically partitioned. Each row range is
called a tablet, which is the unit of distribution and load balancing.
As a result, reads of short row ranges are efficient and typically
require communication with only a small number of machines.
Clients can exploit this property by selecting their row keys so that
they get good locality for their data accesses. For example, in
Webtable, pages in the same domain are grouped together into
contiguous rows by reversing the hostname components of the
URLs. For example, we store data for
maps.google.com/index.html under the key
com.google.maps/index.html. Storing pages from the same
domain near each other makes some host and domain analyses
more efficient.
Column Families
Column keys are grouped into sets called column families, which
form the basic unit of access control. All data stored in a column
family is usually of the same type. A column family must be
created before data can be stored under any column key in that
73
family; after a family has been created, any column key within the
family can be used. It is our intent that the number of distinct
column families in a table be small (in the hundreds at most), and
that families rarely change during operation. In contrast, a table
may have an unbounded number of columns.
A column key is named using the following syntax: family:qualifier.
Column family names must be printable, but qualifiers may be
arbitrary strings. An example column family for the Webtable is
language, which stores the language in which a web page was
written. We use only one column key in the language family, and
it stores each web page's language ID. Another useful column
family for this table is anchor; each column key in this family
represents a single anchor. The qualifier is the name of the
referring site; the cell contents is the link text.
Access control and both disk and memory accounting are
performed at the column-family level. In our Webtable example,
these controls allow us to manage several different types of
applications: some that add new base data, some that read the base
data and create derived column families, and some that are only
allowed to view existing data (and possibly not even to view all of
the existing families for privacy reasons).
Timestamps
Each cell in a Bigtable can contain multiple versions of the same
data; these versions are indexed by timestamp. Bigtable
timestamps are 64-bit integers. They can be assigned by Bigtable,
in which case they represent real time in microseconds, or be
explicitly assigned by client applications. Applications that need to
avoid collisions must generate unique timestamps themselves.
Different versions of a cell are stored in decreasing timestamp
order, so that the most recent versions can be read first.
To make the management of versioned data less onerous, we
support two per-column-family settings that tell Bigtable to
garbage-collect cell versions automatically. The client can specify
either that only the last n versions of a cell be kept, or that only
new-enough versions be kept (e.g., only keep values that were
written in the last seven days).
74
In our Webtable example, we set the timestamps of the crawled
pages stored in the contents: column to the times at which these
page versions were actually crawled. The garbage-collection
mechanism described above lets us keep only the most recent
three versions of every page.

API
The Bigtable API provides functions for creating and deleting
tables and column families. It also provides functions for changing
cluster, table, and column family metadata, such as access control
rights.
Client applications can write or delete values in Bigtable, look up
values from individual rows, or iterate over a subset of the data in
a table.
Bigtable supports several other features that allow the user to
manipulate data in more complex ways. First, Bigtable supports
single-row transactions, which can be used to perform atomic
read-modify-write sequences on data stored under a single row
key. Bigtable does not currently support general transactions
across row keys, although it provides an interface for batching
writes across row keys at the clients. Second, Bigtable allows cells
to be used as integer counters. Finally, Bigtable supports the
execution of client-supplied scripts in the address spaces of the
servers. The scripts are written in a language developed at Google
for processing data called Sawzall. At the moment, our Sawzall-
based API does not allow client scripts to write back into Bigtable,
but it does allow various forms of data transformation, filtering
based on arbitrary expressions, and summarization via a variety of
operators.
Bigtable can be used with MapReduce, a framework for running
large-scale parallel computations developed at Google. We have
written a set of wrappers that allow a Bigtable to be used both as
an input source and as an output target for MapReduce jobs.

75
Building Blocks
Bigtableis built on several other pieces of Google infrastructure.
Bigtable uses the distributed Google File System (GFS) to store log
and data files. A Bigtable cluster typically operates in a shared
pool of machines that run a wide variety of other distributed
applications, and Bigtable processes often share the same
machines with processes from other applications. Bigtable
depends on a cluster management system for scheduling jobs,
managing resources on shared machines, dealing with machine
failures, and monitoring machine status.
The Google SSTable file format is used internally to store Bigtable
data. An SSTable provides a persistent, ordered immutable map
from keys to values, where both keys and values are arbitrary byte
strings. Operations are provided to look up the value associated
with a specified key, and to iterate over all key/value pairs in a
specified key range. Internally, each SSTable contains a sequence
of blocks (typically each block is 64KB in size, but this is
configurable). A block index (stored at the end of the SSTable) is
used to locate blocks; the index is loaded into memory when the
SSTable is opened. A lookup can be performed with a single disk
seek: we first find the appropriate block by performing a binary
search in the in-memory index, and then reading the appropriate
block from disk. Optionally, an SSTable can be completely
mapped into memory, which allows us to perform lookups and
scans without touching disk.
Bigtable relies on a highly-available and persistent distributed lock
service called Chubby. Bigtable uses Chubby for a variety of tasks:
to ensure that there is at most one active master at any time; to
store the bootstrap location of Bigtable data; to discover tablet
servers and finalize tablet server deaths (see Section; to store
Bigtable schema information (the column family information for
each table); and to store access control lists. If Chubby becomes
unavailable for an extended period of time, Bigtable becomes
unavailable.

76
Implementation
The Bigtable implementation has three major components: a
library that is linked into every client, one master server, and
many tablet servers. Tablet servers can be dynamically added (or
removed) from a cluster to accommodate changes in workloads.
The master is responsible for assigning tablets to tablet servers,
detecting the addition and expiration of tablet servers, balancing
tablet-server load, and garbage collection of files in GFS. In
addition, it handles schema changes such as table and column
family creations.
Each tablet server manages a set of tablets (typically we have
somewhere between ten to a thousand tablets per tablet server).
The tablet server handles read and write requests to the tablets
that it has loaded, and also splits tablets that have grown too large.
As with many single-master distributed storage systems, client
data does not move through the master: clients communicate
directly with tablet servers for reads and writes. Because Bigtable
clients do not rely on the master for tablet location information,
most clients never communicate with the master. As a result, the
master is lightly loaded in practice.
A Bigtable cluster stores a number of tables. Each table consists of
a set of tablets, and each tablet contains all data associated with a
row range. Initially, each table consists of just one tablet. As a table
grows, it is automatically split into multiple tablets, each
approximately100-200 MB in size by default.

Real Applications
Google Analytics
Google Analytics (analytics.google.com) is a service that helps
webmasters analyze traffic patterns at their web sites. It provides
aggregate statistics, such as the number of unique visitors per day
and the page views per URL per day, as well as site-tracking
reports, such as the percentage of users that made a purchase,
given that they earlier viewed a specific page.
To enable the service, webmasters embed a small JavaScript
program in their web pages. This program is invoked whenever a
77
page is visited. It records various information about the request in
Google Analytics, such as a user identifier and information about
the page being fetched. Google Analytics summarizes this data
and makes it available to webmasters.
Google Earth
Google operates a collection of services that provide users with
access to high-resolution satellite imagery of the world's surface,
both through the web-based Google Maps interface
(maps.google.com) and through the Google Earth
(earth.google.com) custom client software.
These products allow users to navigate across the world's surface:
they can pan, view, and annotate satellite imagery at many
different levels of resolution. This system uses one table to
preprocess data, and a different set of tables for serving client data.
The preprocessing pipeline uses one table to store raw imagery.
During preprocessing, the imagery is cleaned and consolidated
into final serving data. This table contains approximately 70
terabytes of data and therefore is served from disk. The images are
efficiently compressed already, so Bigtable compression is
disabled.
Each row in the imagery table corresponds to a single geographic
segment. Rows are named to ensure that adjacent geographic
segments are stored near each other. The table contains a column
family to keep track of the sources of data for each segment. This
column family has a large number of columns: essentially one for
each raw data image. Since each segment is only built from a few
images, this column family is very sparse.
The preprocessing pipeline relies heavily on MapReduce over
Bigtable to transform data. The overall system processes over 1
MB/sec of data per tablet server during some of these MapReduce
jobs.
The serving system uses one table to index data stored in GFS.
This table is relatively small (.500 GB), but it must serve tens of
thousands of queries per second per datacenter with low latency.
As a result, this table is hosted across hundreds of tablet servers
and contains in-memory column families.
78
Personalized Search
Personalized Search (www.google.com/psearch) is an opt-in
service that records user queries and clicks across a variety of
Google properties such as web search, images, and news. Users
can browse their search histories to revisit their old queries and
clicks, and they can ask for personalized search results based on
their historical Google usage patterns.
Personalized Search stores each user's data in Bigtable. Each user
has a unique userID and is assigned a row named by that userID.
All user actions are stored in a table. A separate column family is
reserved for each type of action (for example, there is a column
family that stores all web queries). Each data element uses as its
Bigtable timestamp the time at which the corresponding user
action occurred. Personalized Search generates user profiles using
a MapReduce over Bigtable. These user profiles are used to
personalize live search results.
The Personalized Search data is replicated across several Bigtable
clusters to increase availability and to reduce latency due to
distance from clients. The Personalized Search team originally
built a client-side replication mechanism on top of Bigtable that
ensured eventual consistency of all replicas. The current system
now uses a replication subsystem that is built into the servers.
The design of the Personalized Search storage system allows other
groups to add new per-user information in their own columns,
and the system is now used by many other Google properties that
need to store per-user configuration options and settings. Sharing
a table amongst many groups resulted in an unusually large
number of column families. To help support sharing, we added a
simple quota mechanism to Bigtable to limit the storage
consumption by any particular client in shared tables; this
mechanism provides some isolation between the various product
groups using this system for per-user information storage.

79
VOCABULARY STUDY AND PRACTICE

Glossary
 commodity server - стандартный (типовой) сервер
 batch processing - пакетная обработка данных,
обработка данных в пакетном режиме
 latency - задержка, период ожидания
 arbitrary string - произвольная строка
 uninterpreted string- неинтерпретируемая строка
 serialize - упорядочивать, преобразовывать из
параллельной в последовательную форму,
сериализовывать
 schema - схема (логическая структура в базах данных),
схема (управления данными) прогр.
 timestamp - отметка времени, временная отметка
 row key - ключ строки
 column key - ключ столбца
 column family - семейство столбцов
 GFS - Google File System
 anchor – точка привязки
 cluster management system – кластерная система
управления
 lock service- сервис блокировок
 preprocessing pipeline- первичная конвейерная
обработка
 user ID- идентификатор пользователя
 to crawl (pages)- просматривать (страницы веб-сайтов)

1. Translate the following sentences into Russian:


1. Bigtable also treats data as uninterpreted strings, although
clients often serialize various forms of structured and semi-
structured data into these strings.
2. Every read or write of data under a single row key is
atomic (regardless of the number of different columns
being read or written in the row), a design decision that

80
makes it easier for clients to reason about the system's
behavior in the presence of concurrent updates to the same
row.
3. As with many single-master distributed storage systems,
client data does not move through the master: clients
communicate directly with tablet servers for reads and
writes.
4. The Personalized Search team originally built a client-side
replication mechanism on top of Bigtable that ensured
eventual consistency of all replicas.

2. Match the terms in the left column with the lines in the right
column:
1. tablet a. the column family information for each
table
2. column families b. a sequence of characters or encoded
information identifying when a certain
event occurred
3. timestamp c. the delay from input into a system to
desired outcome
4. schema d. the organization or structure for a
information database
5. schema e. the unit of distribution and load
balancing
6. latency f. form the basic unit of access control

3. Complete the second sentence so that it has a similar meaning


to the first sentence, using the word given. Do not change the
word given.
1. Even a common office clerk felt a sense of pride in the
achievements of the company.
PROUD
Even a common office clerk ___________________ the company
had achieved.

81
2. It is not likely that the effects of Antarctic ice melting can be
reversed.
LIKELIHOOD
There is _____________________ reversing the effects of Antarctic
ice melting.
3. Randall didn‘t know all the arguments, but his attitude would
soon change.
ABOUT
Randall didn‘t know it, but there ____________________ change in
his attitude.
4. Not a lot of people pay much attention when teachers and
professors complain about falling educational standards.
NOTICE
Little ________ who complain about falling educational standards.
5. Matthew is obsessed with buying old baseball cards on the
Internet.
BECOME
Buying old baseball cards on the Internet __________ for Matthew.
6. ‗I‘m afraid I wasn‘t brilliant enough during my interview this
morning,‘ said Alan.
ADMITTED
Alan ____________________________ gone very well.
7. The company will not present another version of its operating
system this year.
INTENTION
The company has ___________________ another version of its
operating system this year.
8. Tom‘s friends persuaded him not to go to that long business
trip to India this winter.
TALKED
It was Tom‘s friends ___________________ going to that long
business trip to India this winter.

82
9. Unfortunately, I didn‘t have enough money to buy each new
iPhone version.
ABLE
If I‘d had more money, _______________________ to buy each new
iPhone version.
10. Life without the Internet would be very difficult for many
modern people.
LIVE
Many modern people would find ___________________ the
Internet.

4. Complete the sentences by writing a form of the word in


capitals in each space
In 1968 Dennis Ritchie defended his PhD
thesis on "Program Structure and
_____________ Complexity" COMPUTE
under the _____________ of Patrick C. VISION
Fischer. However, Ritchie never
_______________ received his PhD degree. He OFFICE
was best known as the
______________the C programming language CREATE
and the Unix operating system.
The C language is widely used today in
__________________, and its influence is seen APPLY
in most modern programming languages.
Ritchie said that the Linux phenomenon was
quite _____________, being strongly based on DELIGHT
his Unix OS.
He added that it seemed to be among the
______________of the direct HEALTHY
Unix ___________. He viewed both Unix and DERIVE
Linux as the
_________________ of ideas started many CONTINUE

83
years ago by him and Ken Thompson,
his friend and __________________. AUTHOR

READING COMPREHENSION AND TEXT DISCUSSION

1. Discuss the following points:


 Bigtable as compared to a database.
 The data model that Bigtable supports. Compare it with a
relational data model.
 The Bigtable implementation.

2. Summarize the text


1. Summarize the main points of the text.
2. Write the plan in the form of statements.
3. Use your plan and key terms to summarize the article

GRAMMAR PRACTICE

Infinitive: Attribute

Model
Mark I was the first machine to figure out mathematical problems.
Марк I (Модель I) был первой машиной, которая решала
математические задачи.

Translate the following sentences into Russian taking into


account the model:
1. Vacuum tubes to control and amplify electric signals were
invented by Neumann.
2. This question will be discussed at the conference shortly to
open in Moscow.
3. The experiments to be carried out will be very important.
4. Information to be completed is stored usually in registers -
units of hardware.

84
5. Information to be put into the computer for processing
should be coded for processing into ones and zeroes.
6. The high-speed devices to be used as secondary storage are
both input and output devices.
7. The progress of electronics to have resulted in the
invention of electronic computers was a breakthrough of
the second part of the 20th century.
8. Computers to have been designed originally for arithmetic
purposes are applicable for a great variety of tasks at
present.
9. The CPU of a computer to be arranged in a single or very
small number of integrated circuits is called a
microprocessor.
10. Russia was the first country to start the cosmic era.

Infinitive: Adverbial Modifier

Model
A CD-drive uses a laser to read information stored optically on a plastic
disc.
Дисковод использует лазер, чтобы считывать информацию,
хранящуюся оптически на пластиковом диске.

Translate the following sentences into Russian taking into


account the model:
1. It is simply not reasonable to compare the writability of
two languages in the realm of a particular application
when one was designed for that application and the other
was not.
2. Mainframe operating systems are designed primarily to
optimize utilization of hardware.
3. They obviously need a network interface controller and
some low-level software to drive it, as well as programs to
achieve remote login and remote file access.

85
4. Software engineer studies methods of working within an
organization to decide how tasks can be done efficiently by
computers.
5. He uses telecommunication software, electronic skills and
knowledge of networking software to locate and connect
faults.
6. A computer salesperson discusses computing needs with
the client to ensure that a suitable system can be supplied.
7. He then wrote a simple coding system, called HTML to
create links to files on any computer connected to the
network.
8. To make computers more reliable transistors were used.
9. To integrate large numbers of circuit elements into a small
chip, transistors should be reduced in size.
10. To protect your Google account, keep your password
confidential.

86
UNIT 8

Pre-reading exercise.
Discuss the following:
"Those outside of the computing are outside of
competition. There is simply no success without
computing"
(Karolj Scala, Croatia)

FROM SAFETY TO ECOBOOST: HPC ENABLES


INNOVATION AND PRODUCTIVITY AT FORD MOTOR
COMPANY

After an intense 30-year working relationship with


supercomputers—ranging from early water-cooled Crays to
today‘s commodity clusters—engineers at the Ford Motor
Company view modeling and simulation with high performance
computing (HPC) not as a high-tech miracle, but as an integral
part of the business.
Ford‘s executive technical leader for global computer-aided
engineering (CAE) and chief engineer for global materials and
standards engineering, Nand K. Kochhar, says: ―The combination
of HPC and CAE simulation technology is a key enabler of our
product development process. We provide advanced
computational capabilities for Ford not just as a service, but as an
integrated enabler of company business strategy. HPC is key to
delivering on our overall business plan; optimizing product
development, creating high quality products and improving time-
to-market. With advances in computing technologies, it is possible
to accomplish this in a cost-effective manner.‖
The Ford Motor Company, based in Dearborn, Mich.,
manufactures and distributes automobiles in 200 markets across
six continents. With about 201,000 employees and 90 plants
worldwide, the company‘s core automotive brands include Ford
and Lincoln.
87
Kochhar dates Ford‘s involvement with HPC back to the 1980s, the
early days of the supercomputer industry. The company‘s first
machine was from Control Data Corporation, one of the original
supercomputer leaders, before moving to an early Cray, the X-MP,
which was the world‘s fastest computer in the mid-1980s. Ford
stayed with Cray systems along with solutions from SGI, IBM and
Digital/Compaq through the 1990s and was among the leaders in
adopting commodity Linux cluster technology as it became
available at the turn of the century.
Today the company uses a mix of HPC clusters based on x86-64
processors supplied primarily by IBM and HP, along with
commercial applications software for its CAE applications—a
move that has substantially reduced the need for in-house
software development.
Alex Akkerman, Ford‘s senior HPC technical specialist, points out
that even though the clusters are located in two separate data
centers: ―We operate the various systems as one monolithic virtual
environment—our internal customers interface with it as if it is
one system. Although we have many touch points within Ford‘s IT
organization, our group, made up of a dozen or so people, is an
independent entity specifically dedicated to the HPC environment.
The vast majority of the work that runs on our HPC resources is
CAE-based analyses.‖
Akkerman says that the demand for HPC services by their in-
house customers—Ford‘s cohort of engineers and designers—is
insatiable. ―We add capacity based on our customer‘s
requirements and normally upgrade the systems at least once a
year. But the users tend to very quickly use up whatever we‘ve
installed, so deploying new HPC resources has become an
ongoing process. Our hardware utilization is about 85 to 90
percent of capacity, which is as high as we can go without
affecting our key metric, job turnaround time. Our primary service
level objective is to manage our resources to provide optimal and
predictable time-to-solution. This is in line with the company‘s
business objective of constantly reducing time-to-market to make
Ford more competitive in world markets.‖
88
Better Fuel Economy, Safer Rides and Quieter Cabins
Just a few examples of key initiatives that rely on the company‘s
extensive HPC computational resources include: fuel economy
and Ford‘s EcoBoost engine technology; safety, always a prime
attribute at Ford; and internal cabin noise, a major factor in
consumer satisfaction.
HPC and CAE played a pivotal role in the development of Ford‘s
EcoBoost engine technology. Ford‘s Powertrain team used HPC
technology along with computational fluid dynamics (CFD) and
CAE applications to optimize the design of the EcoBoost. In
particular, the engineers worked on optimizing combustion and
structural aspects of the EcoBoost powertrain technologies.
―A lot of HPC-based computational analysis is involved in
simulating the trade-offs between performance, shift quality and
fuel economy. In the case of the engine, we conduct combustion
analysis—optimizing a fuel-air mix, for example. And to develop
overall vehicle fuel efficiency, we use CFD calculations to compute
the optimal aerodynamics of the proposed vehicle,‖ Kochhar says.
HPC resources are also used to develop both passive and active
safety attributes. Passive safety focuses on improving structural
performance and airbag deployment to reduce intrusion into the
vehicle and help protect the occupants. Ford‘s active safety
initiatives include Adaptive Cruise Control and Collision Warning
with Brake Support, which uses radar to detect moving vehicles
directly ahead. When the danger of a collision is detected, the
system warns the driver, automatically pre-charges brakes, and
engages a brake-assist feature that helps drivers quickly reach
maximum braking once the brakes are engaged. The technology
was introduced in the summer of 2009 on the 2010 Ford Taurus,
Lincoln MKS sedan and Lincoln MKT crossover, and will be made
available on other Ford vehicles.
The entire vehicle is modeled to assess both the active and passive
safety designs. Engineers simulate the results of crashes based on a
wide variety of design and environmental factors without actually
building a physical prototype for testing. HPC is also used to
model what is known internally as ―NVH‖—noise, vibration and
89
harshness. Controlling interior noise in the automobile is a major
factor in customer satisfaction. As Kochhar notes: ―In some of our
products, like the Mustang, we want that powerful sound to come
in, so we need to tune the powertrain accordingly.‖ HPC comes
into play because of the computing complexity involved in
allowing a certain amount of noise to come into the cabin while at
the same time minimizing noises generated by the road, wind and
the vehicle‘s powertrain, all of which are influenced by the driving
dynamics of the vehicle. ―These are complicated interactions that
take a large amount of computational resources to deliver an
optimum design,‖ he says.

From the Physical to the Virtual


Over time, HPC has allowed Ford‘s engineers to perform
increasing amounts of virtual road testing and wind tunnel
simulations, and reduce the company‘s reliance on physical
prototyping, resulting in the ability to bring new products to
market faster and with higher quality.
―HPC modeling and simulation is allowing us to deliver the time-
to-market with the advanced designs our customers have come to
expect,‖ Kochhar says. ―We want fresh products showing up in
our showrooms more quickly than we have in the past. Using
simulation rather than relying heavily on physical testing allows
us to shorten product development cycle times. Instead of
building full-scale physical prototypes at every step in the
development process and subjecting them to actual road testing as
well as crash and wind tunnel tests, we can now use
computational capabilities to get many of the results needed. We
still use physical testing to validate our HPC-based results, but
over the years we have become very proficient in offsetting the
physical build technologies with analytical technologies.
―Not only does this reduce costs; it allows us to bring more
robustness, quality and creativity to vehicle designs,‖ he
continues. ―The flexibility and speed made possible by HPC lets us
simulate a wider range of scenarios, component combinations and
associated trade-offs than would have been possible with physical
90
testing. The result is that over the years, with continuous
improvements in technology, we have been able to maximize
creativity while reducing product development costs
dramatically.‖
Simulations help engineers reduce the number of costly design
level changes on any given component. With HPC, the number of
changes to parts are kept to a minimum by providing a level of
analytical validation of functional requirements and performance
factors early in the development process. Then, when a final
verification of the design is conducted using physical testing, the
product has a greater chance of being successful due to all the
analytical testing and virtual design modifications that have taken
place up front. ―If our prototype vehicle does not pass our
rigorous tests at the end of the development process, we follow
what‘s called a ‗closed loop lessons learned process‘ to see if we
need to update some of the assumptions in our computer models,‖
Kochhar adds.
This continuous improvement process is being used to help Ford
realize its overall electrification strategy in the development of
hybrids and battery powered electric vehicles. Using HPC,
materials simulation, weight optimization and systems modeling
is possible, improving the quality and design of greener vehicles.
Included is the use of recyclable materials—the 2008 Ford
Mustang, for example, was the first automobile Ford introduced
with soy-based foam seating.
―At Ford, HPC is a strategic enabler of our product development
process and an indispensable tool for continuous innovation,‖
Kochhar concludes. ―The technology allows us to build an
environment that continuously improves the product
development process, speeds up time-to-market and lowers costs.
HPC is an integral part of Ford‘s competitiveness in a very tough
marketplace.‖

91
VOCABULARY STUDY AND PRACTICE

Glossary
 high performance computing (HPC) –
высокопроизводительные вычисления
 computer-aided engineering (CAE) -
автоматизированное конструирование
 insatiable - неутолимый, бесконечный
 combustion - воспламенение, горение, внутреннее
сгорание
 trade-off - компромисс, согласование, выбор
оптимального соотношения
 subject to - подвергать чему-либо, воздействовать
 upfront - предварительно
 competitiveness – конкурентоспособность

1. Translate the following sentences into Russian:


1. Kochhar dates Ford‘s involvement with HPC back to the
1980s, the early days of the supercomputer industry.
2. Our hardware utilization is about 85 to 90 percent of
capacity, which is as high as we can go without affecting
our key metric, job turnaround time.
3. Our primary service level objective is to manage our
resources to provide optimal and predictable time-to-
solution.
4. HPC comes into play because of the computing complexity
involved in allowing a certain amount of noise to come into
the cabin while at the same time minimizing noises
generated by the road, wind and the vehicle‘s powertrain.

2. Complete the second sentence so that it has a similar meaning


to the first sentence, using the word given. Do not change the
word given
1. I only realized that I‘d forgotten a flash card when I got to the
presentation.
92
ARRIVED
It wasn‘t __________________ the presentation that I realized I‘d
forgotten a flash card.
2. After several years, heavy data traffic caused the network
quality to reduce.
DUE
The reduction ___________________________ several years of
heavy data traffic.
3. Someone ought to have let the boss know about the incident in
the office at once.
REPORTED
The incident should ________________ the boss at once.
4. Someone asked the IT-manager to explain why all department
PCs were blocked.
GIVE
The IT-manager ______________________ explanation for blocking
all department PCs.
5. I think it would have been nice to have had a college diploma to
get this position in Google.
WISH
I _______________ a college diploma to get this position in Google.
6. It‘s a waste of time attending interviews unless you really want
to get the job.
POINT
There __________ interviews unless you really want to get the job.
7. The completion of the new office is scheduled for next July.
DUE
The new office ___________________ completed next July.
8. Why did nobody tell me that the presentation had been
cancelled?
INFORMED
Why ______________________ the cancellation of the presentation?

93
9. Francis is sure to finish coding in first place during the
competition.
CROSS
Francis is sure to be the first person to __________________ during
the competition.
10. Their office is the one with the green door.
WHICH
Their company _____________________ a green door.

3. Complete the sentences by writing a form of the word in


capitals in each space
___________________ to media, PlayStation 3 ACCORD
systems were sold within 24 hours
of its_____________ in Japan. The console was INTRODUCE
________________ planned for a global release ORIGIN
and immediately became a hit.
Its features included a slimmer form factor,
decreased power ________________, CONSUME
and a quieter _______________ system. The COOL
games written for PS3 were
______________ praised by numerous video HEAVY
game websites,
_________ very influential GameSpot and IGN. INCLUDE
PS3's ______________ has also been used to HARD
build supercomputers
for high-____________ computing. n December PERFORM
2008, a group of hackers used
a cluster of PlayStation 3 computers to crack
SSL_____________________. AUTHENTIC

94
READING COMPREHENSION AND TEXT DISCUSSION

1. Discuss the following:


 The key enabler of Ford`s product development process.
 Evolution (time span) of Ford`s involvement with HPC.
 Use of the company‘s extensive HPC computational
resources.
 Advantages of HPC modeling and simulation.

2. Summarize the text.


1. Summarize the main points of the text.
2. Write the plan in the form of statements.
3. Use your plan and key terms to summarize the paper.

GRAMMAR PRACTICE

Infinitive Construction: for + noun (pronoun) + infinitive

Model
It is quite necessary for the programmer to understand the work of all
units of a computer.
Для программиста совершенно необходимо понимать работу
всех блоков компьютера.

Translate the following sentences into Russian taking into


account the model:
1. The speed of the computer may be found by measuring the
time which is required for it to transmit one word out of
the memory to where it will be used.
2. There is a good reason for us to use this kind of the bubble
memory in a personal computer.
3. In the middle of the 17th century it was possible for
B.Pascal to invent only the mechanical computer.
4. The possibility for the problem to be solved is illustrated by
the given formula.

95
5. It was not difficult for the people to understand the
function of the mouse in computer operation.
6. There is no reason for computer experts to use computers
of the first generation nowadays.
7. The mechanism is provided with special devices for the
whole system to function automatically.
8. The text was very interesting but rather difficult for the
students to translate it without a dictionary.

Infinitive Construction: Complex Object

Model 1
I want this computer to be repaired. = I want this computer repaired.
Я хочу, чтобы мне починили компьютер.

Model 2
I let him transmit information to another network.
Я позволил ему переместить информацию в другую сеть.
Before writing a program my boss made me write an algorithm.
Мой начальник заставил меня написать алгоритм перед тем,
как писать программу.

Translate the following sentences into Russian taking into


account the models:
1. We know metal to conduct electricity.
2. The Internet originated in the early 1970s when the
United States wanted to make sure the people could
communicate after the nuclear war.
3. Syntax errors cause the program to fail.
4. I have seen the microcomputer processing a large
amount of data.
5. How do you expect the next generation computers to
work?
6. They advised us to use this operating system.

96
7. The specialists expect new generation to get tired of
stereotypes.
8. High-level languages use words from natural
languages and allow these words and mathematical
symbols to be combined according to various rules.
9. We know her to investigate this problem.
10. The invention of a transistor let computers work more
quickly and have fewer failures.

97
UNIT 9

Pre-reading exercise.
 The subject of the document is exascale computing. Explain the
meaning of the term ―exascale‖. In terms of exascale, where
can modern supercomputers be placed? What is the previous
(lower) ―scale‖? What is the current one?
 The document seeks to present a ―roadmap‖. Define the term
in English and give Russian equivalent(s).

INTERNATIONAL EXASCALE SOFTWARE PROJECT


ROADMAP

Part I
The technology roadmap presented here is the result of more than
a year of coordinated effort within the global software community
for high-end scientific computing. It is the product of a set of first
steps taken to address a critical challenge that now confronts
modern science and is produced by a convergence of three factors:
(1) the compelling science case to be made, in both fields of deep
intellectual interest and fields of vital importance to humanity, for
increasing usable computing power by orders of magnitude as
quickly as possible; (2) the clear and widely recognized inadequacy
of the current high end software infrastructure, in all its
component areas, for supporting this essential escalation; and (3)
the near complete lack of planning and coordination in the global
scientific software community in overcoming the formidable
obstacles that stand in the way of replacing it. At the beginning of
2009, a large group of collaborators from this worldwide
community initiated the International Exascale Software Project
(IESP) to carry out the planning and the organization building
necessary to solve this vitally important problem.

The guiding purpose of the IESP is to empower ultra-high resolution and
data-intensive science and engineering research through the year 2020 by
98
developing a plan for (1) a common, high-quality computational
environment for petascale/exascale systems and (2) catalyzing,
coordinating, and sustaining the effort of the international open source
software community to create that environment as quickly as possible.

Part II
There exist good reasons to think that such a plan is urgently
needed. First and foremost, the magnitude of the technical
challenges for software infrastructure that the novel architectures
and extreme scale of emerging systems bring with them is
daunting. These problems, which are already appearing on the
leadership-class systems, are more than sufficient to require the
wholesale redesign and replacement of the operating systems,
programming models, libraries, and tools on which high-end
computing necessarily depends.
Second, the complex web of interdependencies and side effects
that exist among such software components means that making
sweeping changes to this infrastructure will require a high degree
of coordination and collaboration. Failure to identify critical holes
or potential conflicts in the software environment, to spot
opportunities for beneficial integration, or to adequately specify
component requirements will tend to retard or disrupt everyone‘s
progress, wasting time that can ill afford to be lost. Since creating a
software environment adapted for extreme-scale systems (e.g.,
NSF‘s Blue Waters) will require the collective effort of a broad
community, this community must have good mechanisms for
internal coordination.
Third, it seems clear that the scope of the effort must be truly
international. In terms of its rationale, scientists in nearly every
field now depend on the software infrastructure of high-end
computing to open up new areas of inquiry (e.g., the very small,
very large, very hazardous, very complex), to dramatically
increase their research productivity, and to amplify the social and
economic impact of their work. It serves global scientific
communities who need to work together on problems of global
significance and leverage distributed resources in transnational
99
configurations. In terms of feasibility, the dimensions of the task—
totally redesigning and recreating, in the period of just a few years,
the massive software foundation of computational science in order
to meet the new realities of extreme-scale computing—are simply
too large for any one country, or small consortium of countries, to
undertake on its own.
The IESP was formed to help achieve this goal. Beginning in April
2009, we held a series of three international workshops, in order to
work out a plan for doing so. Information about, and the working
products of all these meetings, can be found at the project website,
www.exascale.org.

Part III
Destination of the IESP Roadmap
… Building on the background knowledge that motivated the
work of IESP participants, we define the goal that the roadmap is
intended to help our community reach as follows:
By developing and following the IESP roadmap, the international
scientific software research community seeks to create a common, open
source software infrastructure for scientific computing that enables
leading-edge science and engineering groups to develop applications that
exploit the full power of the exascale computing platforms that will come
on-line in the 2018–2020 timeframe. We call this integrated collection of
software the extreme-scale/exascale software stack, or X-stack.
Unpacking the elements of this goal statement in the context of the
work performed so far by the IESP reveals some of the
characteristics that the X-stack must possess, at minimum:
 The X-stack must enable suitably designed science applications to
exploit the full resources of the largest systems: The main goal
of the X-stack is to support groundbreaking research on
tomorrow‘s exascale computing platforms. By using these
massive platforms and X-stack infrastructure, scientists
should be empowered to attack problems that are much
larger and more complex, make observations and
predictions at much higher resolution, explore vastly larger
data sets, and reach solutions dramatically faster. To
100
achieve this goal, the X-stack must enable scientists to use
the full power of exascale systems.
 The X-stack must scale both up and down the platform
development chain: Science today is done on systems at a
range of different scales, from departmental clusters to the
world‘s largest supercomputers. Since leading research
applications are developed and used at all levels of this
platform development chain, the X-stack must support
them well at all these levels.
 The X-stack must be highly modular, so as to enable alternative
component contributions: The X-stack is intended to provide
a common software infrastructure on which the entire
community builds its science applications. For both
practical and political reasons (e.g., sustainability, risk
mitigation), the design of the X-stack should strive for
modularity that makes it possible for many groups to
contribute and accommodate more than one choice in each
software area.
 The X-stack must offer open source alternatives for all
components in the X-stack: For both technical and mission
oriented reasons, the scientific software research
community has long played a significant role in the open
source software movement. Continuing this important
tradition, the X-stack will offer open source alternatives for
all of its components, even though it is clear that exascale
platforms from particular vendors may support, or even
require, some proprietary software components as well.

Part IV
Technology Trends and Their Impact on Exascale
The design of the extreme-scale platforms that are expected to
become available in 2018 will represent a convergence of
technological trends and the boundary conditions imposed by
over half a century of algorithm and application software
development. Although the precise details of these new designs
are not yet known, it is clear that they will embody radical changes
101
along a number of different dimensions as compared to the
architectures of today‘s systems and that these changes will render
obsolete the current software infrastructure for large-scale
scientific applications. The first step in developing a plan to ensure
that appropriate system software and applications are ready and
available when these systems come on line, so that leading edge
research projects can actually use them, is to carefully review the
underlying technological trends that are expected to have such a
transformative impact on computer architecture in the next
decade. These factors and trends provide essential context for
thinking about the looming challenges of tomorrow‘s scientific
software infrastructure; therefore, describing them lays the
foundation on which subsequent sections of this roadmap
document builds.
In developing a roadmap for the X-stack software infrastructure,
the IESP has been able to draw on several thoughtful and
extensive studies of impacts of the current revolution in computer
architecture. As these studies make clear, technology trends over
the next decade – broadly speaking, increases of 1000X in
capability over today‘s most massive computing systems, in
multiple dimensions, as well as increases of similar scale in data
volumes – will force a disruptive change in the form, function, and
interoperability of future software infrastructure components and
the system architectures incorporating them. The momentous
nature of these changes can be illustrated for several critical
system-level parameters:
Concurrency– Moore‘s law scaling in the number of transistors is
expected to continue through the end of the next decade, at which
point the minimal VLSI geometries will be as small as five
nanometers. Unfortunately, the end of Dennard scaling means that
clock rates are no longer keeping pace, and may in fact be reduced
in the next few years to reduce power consumption. As a result,
the exascale systems on which the X-stack will run will likely be
composed of hundreds of millions of arithmetic logic units
(ALUs). Assuming there are multiple threads per ALU to cover

102
main-memory and networking latencies, applications may contain
ten billion threads.
Reliability – System architecture will be complicated by the
increasingly probabilistic nature of transistor behavior due to
reduced operating voltages, gate oxides, and channel
widths/lengths resulting in very small noise margins. Given that
state-of-the-art chips contain billions of transistors and the
multiplicative nature of reliability laws, building resilient
computing systems out of such unreliable components will
become an increasing challenge. This cannot be costeffectively
addressed with pairing or TMR; rather, it must be addressed by X-
stack software and perhaps even scientific applications.
Power consumption – Twenty years ago, HPC systems consumed
less than a megawatt. The Earth Simulator was the first such
system to exceed 10 MW. Exascale systems could consume over
100 MW, and few of today‘s computing centers have either
adequate infrastructure to deliver such power or the budgets to
pay for it. The HPC community may find itself measuring results
in terms of power consumed, rather than operations performed.
The X-stack and the applications it hosts must be conscious of this
situation and act to minimize it.
Similarly dramatic examples could be produced for other key
variables, such as storage capacity, efficiency, and programmability.
More important, a close examination shows that changes in these
parameters are interrelated and not orthogonal. For example,
scalability will be limited by efficiency, as are power and
programmability.
Other cross correlations can be perceived through analysis. The
DARPA Exascale Technology Study exposes power as the
pacesetting parameter. Although an exact power consumption
constraint value is not yet well defined, with upper limits of
today‘s systems on the order of 5 megawatts, increases of an order
of magnitude in less than 10 years will extend beyond the practical
energy demands of all but a few strategic computing
environments. A politico-economic pain threshold of 25

103
megawatts has been suggested (by DARPA) as a working
boundary.

Part V
… exascale system architecture characteristics are beginning to
emerge, though the details will become clear only as the systems
themselves actually develop. Among the critical aspects of future
systems, available by the end of the next decade, which we can
predict with some confidence are the following:
 Feature size of 22 to 11 nanometers, CMOS in 2018
 Total average of 25 picojoules per floating point operation
 Approximately 10 billion-way concurrency for
simultaneous operation and latency hiding
 100 million to 1 billion cores
 Clockratesof 1 to 2 GHz
 Multithreaded, fine-grained concurrency of 10- to 100-way
concurrency per core
 Hundreds of cores per die (varies dramatically depending
on core type and other factors)
 Global address space without cache coherence; extensions
to PGAS (e.g., AGAS)
 128-petabyte capacity mix of DRAM and nonvolatile
memory (most expensive subsystem)
 Explicitly managed high-speed buffer caches; part of deep
memory hierarchy
 Optical communications for distances > 10 centimeters,
possibly intersocket
 Optical bandwidth of 1 terabit per second
 Systemwide latencies on the order of tens of thousands of
cycles
 Active power management to eliminate wasted energy by
momentarily unused cores
 Fault tolerance by means of graceful degradation and
dynamically reconfigurable structures
 Hardware-supported rapid thread context switching

104
 Hardware-supported efficient message-to-thread
conversion for message-driven computation
 Hardware-supported, lightweight synchronization
mechanisms
 3-D packaging of dies for stacks of 4 to 10 dies each
including DRAM, cores, and networking.

VOCABULARY STUDY AND PRACTICE

Glossary
 daunting - ошеломляющий, пугающий
 high-end computing - высокопроизводительные
вычисления на суперкомпьютерах предельной
вычислительной мощности
 tend to - как правило
 rationale - (логическое) обоснование, мотивация,
мотивы
 feasibility - осуществимость
 proprietary software - защищѐнное авторским
правом или патентом программное обеспечение;
проприетарное ПО
 boundary conditions - граничные условия, краевые
условия
 latency - задержка, время ожидания
 noise margins - запас помехоустойчивости
 given that - если; при условии; учитывая, что;
поскольку
 state-of-the-art chips - ультрасовременные микросхемы
 resilient computing systems - отказоустойчивые
вычислительные системы
 key variables - ключевые переменные (факторы)
 orthogonal - независимый
 cross correlation - взаимная корреляция
 upper limit - верхняя граница, верхний предел,
максимум
 an order of magnitude - порядок величины
105
 painthreshold - болевой порог
 working boundary - эксплуатационная\ действующая\
производственная\ рабочая граница
 feature size - размер топологического элемента;
размер, характеризующий прибор; (минимальный)
топологический размер;
 latency hiding - маскировка задержек (при параллельных
вычислениях)
 сlock rate - тактовая частота
 fine-grained concurrency - мелкомодульный
параллелизм
 nonvolatile memory - энергонезависимое ЗУ,
энергонезависимая память
 active power management - управление питанием в
активном состоянии
 fault tolerance - отказоустойчивость
 context switching - переключение контекста,
контекстно-зависимое переключение программ

1. Match English and Russian equivalents


sweeping changes cрочно необходим
to spot opportunities прежде всего
side effects определять требования к
компонентам
to dramatically increase новаторское исследование
collective effort принципиально новые
архитектуры
will be complicated by найти возможность
urgently needed технические трудности
groundbreaking research (энергично) взяться за
решение задачи
disruptive change первая система, которая
превзошла
to attack problems резко увеличивать
novel architectures будет осложняться (чем-л.)
106
first and foremost совместные усилия
to specify component радикальные перемены
requirements
technical challenges побочные эффекты
the first such system to exceed необратимые изменения

2. Complete the second sentence so that it has a similar meaning


to the first sentence, using the word given. Do not change the
word given
1. Some people can work better in a pressured environment.
CONSTANT
Some people work better when they are ________________ work.
2. We should leave about five, otherwise we might not get to the
station in time.
SET
If _________________ five, we might not get to the station in time.
3. The guests of the presentation experienced an hour delay in the
hall.
HELD
The guests of the presentation were __________ an hour in the hall.
4. Some linguists think that language developed through trade
negotiations.
THOUGHT
Language ______________________ through trade negotiations.
5. The aim of the competition is to acquire the best contract for
department software.
PROVIDE
The competition aims _________________ acquiring the best
contract for department software.
6. ‗Miranda, I think you‘ve been leaving the office early, haven‘t
you?‘ said her boss.
ACCUSED
Miranda‘s boss _________________ the office early.

107
7. I‘m just about to buy a new video card for my desktop PC.
POINT
I‘m _____________________ a new video card for my desktop PC.
8. Steven intends to complain about the work of support in the
online shop.
GOING
Steven _______________________ complaint about the work of
support in the online shop.
9. People should take more responsibility for their online
commentaries.
BE
People should __________________ for their online commentaries.
10. It is quite usual for boys to start playing warlike video games.
MEANS
It is by _____________________ for boys to start playing warlike
video games.

3. Complete the sentences by writing a form of the word in


capitals in each space
3D printing refers to processes used to
_____________ SYNTHESIS
a three-dimensional object in which
____________ layers of material are formed SUCCESS
under computer control to create an object.
Objects can be of almost any shape and are
produced from __________ model data 3D DIGIT
model or another electronic data source.
Some scientists and futurologists claim 3D
printing to be the third _____________ INDUSTRY
revolution succeeding the production line
_________________ ASSEMBLE
Early 3D _____________ and materials were EQUIP
developed in the 1980s.

108
Nowadays higher ____________ has proven EDUCATE
to be a major buyer of desktop and
professional 3D printers
_____________desktop 3D printer SIGN
purchases by universities help sustain a 3D
printer market.
Several projects and companies are making
efforts to develop _____________ 3D AFFORD
printers for home use.
It is said that this technology being applied
at home may reduce the ____________ ENVIRONMENT
impacts of manufacturing by reducing
material use and distribution impacts.

READING COMPREHENSION AND TEXT DISCUSSION

1. Translate Part I into Russian (written assignment).


2. Read Part II and re-formulate the three reasons to make
them as clear and concise as possible.
3. Part III outlines the main characteristics of the X-stack. Scan
Part III, choose one of the characteristics to find extra
information.
4. Read Part IV and render it in English.
5. Study the target characteristics of future systems (Part V).
Which of them are we likely to see in the immediate
future?
6. The full text of the document is available at
http://www.exascale.org/mediawiki/images/2/20/IESP-
roadmap.pdf. Go through Section 4 of the document and
prepare a 5-minute talk summarizing one of the aspects of
X-stack technology. Use information in relevant tables as a
plan.
109
GRAMMAR PRACTICE

Infinitive Construction: Complex Subject

Model 1
A static RAM is used for cache memory that is smaller but faster, and
can hold portions that are likely to be used again shortly.
Статическое ОЗУ используется для более компактной и
быстрой кэш-памяти, содержащей фрагменты, которые
вероятнее всего будут снова использованы в ближайшее
время.

Translate the following sentences into Russian taking into


account the model:
1. We are likely to find out how the brain works and to
recreate its operation using powerful computers.
2. But we are unlikely to program in human emotions, moral
responsibilities, and the uniqueness of the individual.
3. Printed books are still sure to be the best way to preserve
knowledge, as paper lasts from 50 to 500 years.
4. A site for an online store is likely to have more graphics
and other attention-getting features than an academic or
governmental site.

Model 2
seemed to, chanced to, happened to, proved to, turned out to.
You can use an antivirus program if your computer happens to be
infected.
Вы можете использовать антивирусную программу, если
окажется, что ваш компьютер заражен.

110
Translate the following sentences into Russian taking into
account the model:
1. When new architectures emerge, they may appear to be
evolutionary because they evince strong family
resemblance to earlier architectures from the same vendor.
2. The way we use machines today is sure to change very
soon.
3. Computers are certain to be used to develop other faster
computers.
Model 3
The life time of the electrode is assumed to depend of the material it is
made of.
Предполагается, что срок службы электрода зависит от
материала, из которого он сделан.
Translate the following sentences into Russian taking into
account the model:
1. These factors were found to have had no influence on the
result.
2. This method is said to yield good results.
3. The product has been proved to affect the overall yield.
4. The human being doesn‘t seem to be able to add or
multiply without using auxiliary devices such as pencil
and paper.
5. Devices for accepting information are said to have been
described in some magazines.
6. Automated Management Systems are known to have
appeared quite recently.
7. Our programmers are known to be studying the theory of
programming.
8. B.Pascal is known to be the first inventor of the mechanical
computer.
9. Human beings seem to be able to find facts or even logical
consequence of facts in their memory according to
association.

111
UNIT 10

PART 1
HOW TO READ A PAPER

ABSTRACT
Researchers spend a great deal of time reading research papers.
However, this skill is rarely taught, leading to much wasted effort.
This article outlines a practical and efficient three-pass method for
reading research papers. I also de scribe how to use this method to
do a literature survey.
General Terms: Documentation.
Keywords: Paper, Reading, Hints.

1. INTRODUCTION
Researchers must read papers for several reasons: to review them
for a conference or a class, to keep current in their field, or for a
literature survey of a new field. A typical researcher will likely
spend hundreds of hours every year reading papers.
Learning to efficiently read a paper is a critical but rarely taught
skill. Beginning graduate students, therefore, must learn on their
own using trial and error. Students waste much effort in the
process and are frequently driven to frustration.
For many years I have used a simple approach to efficiently read
papers. This paper describes the ‗three-pass‘ approach and its use
in doing a literature survey.

2. THE THREE-PASS APPROACH


The key idea is that you should read the paper in up to three
passes, instead of starting at the beginning and plowing your way
to the end. Each pass accomplishes specific goals and builds upon
the previous pass: The first pass gives you a general idea about the
paper. The second pass lets you grasp the paper‘s content, but not
its details. The third pass helps you understand the paper in
depth.
112
2.1 The first pass
The first pass is a quick scan to get a bird‘s-eye view of the paper.
You can also decide whether you need to do any more passes. This
pass should take about five to ten minutes and consists of the
following steps:
1. Carefully read the title, abstract, and introduction
2. Read the section and sub-section headings, but ignore
everything else
3. Read the conclusions
4. Glance over the references, mentally ticking off the ones
you‘ve already read
At the end of the first pass, you should be able to answer the five
Cs:
1. Category: What type of paper is this? A measurement
paper? An analysis of an existing system? A description of
a research prototype?
2. Context: Which other papers is it related to? Which
theoretical bases were used to analyze the problem?
3. Correctness: Do the assumptions appear to be valid?
4. Contributions: What are the paper‘s main contributions?
5. Clarity: Is the paper well written?
Using this information, you may choose not to read further. This
could be because the paper doesn‘t interest you, or you don‘t
know enough about the area to understand the paper, or that the
authors make invalid assumptions. The first pass is adequate for
papers that aren‘t in your research area, but may someday prove
relevant.
Incidentally, when you write a paper, you can expect most
reviewers (and readers) to make only one pass over it. Take care to
choose coherent section and sub-section titles and to write concise
and comprehensive abstracts. If a reviewer cannot understand the
gist after one pass, the paper will likely be rejected; if a reader
cannot understand the highlights of the paper after five minutes,
the paper will likely never be read.

113
2.2 The second pass
In the second pass, read the paper with greater care, but ignore
details such as proofs. It helps to jot down the key points, or to
make comments in the margins, as you read.
1. Look carefully at the figures, diagrams and other illustrations in
the paper. Pay special attention to graphs. Are the axes properly
labeled? Are results shown with error bars, so that conclusions are
statistically significant? Common mistakes like these will separate
rushed, shoddy work from the truly excellent.
2. Remember to mark relevant unread references for further
reading (this is a good way to learn more about the background of
the paper).
The second pass should take up to an hour. After this pass, you
should be able to grasp the content of the paper. You should be
able to summarize the main thrust of the paper, with supporting
evidence, to someone else. This level of detail is appropriate for a
paper in which you are interested, but does not lie in your research
specialty.
Sometimes you won‘t understand a paper even at the end of the
second pass. This may be because the subject matter is new to you,
with unfamiliar terminology and acronyms. Or the authors may
use a proof or experimental technique that you don‘t understand,
so that the bulk of the paper is incomprehensible. The paper may
be poorly written with unsubstantiated assertions and numerous
forward references. Or it could just be that it‘s late at night and
you‘re tired. You can now choose to: (a) set the paper aside,
hoping you don‘t need to understand the material to be successful
in your career, (b) return to the paper later, perhaps after reading
background material or (c) persevere and go on to the third pass.

2.3 The third pass


To understand a paper fully, particularly if you are reviewer,
requires a third pass. The key to the third pass is to attempt to
virtually re-implement the paper: that is, making the same
assumptions as the authors, re-create the work. By comparing this

114
re-creation with the actual paper, you can easily identify not only a
paper‘s innovations, but also its hidden failings and assumptions.
This pass requires great attention to detail. You should identify
and challenge every assumption in every statement. Moreover,
you should think about how you yourself would present a
particular idea. This comparison of the actual with the virtual
lends a sharp insight into the proof and presentation techniques in
the paper and you can very likely add this to your repertoire of
tools. During this pass, you should also jot down ideas for future
work.
This pass can take about four or five hours for beginners, and
about an hour for an experienced reader. At the end of this pass,
you should be able to reconstruct the entire structure of the paper
from memory, as well as be able to identify its strong and weak
points. In particular, you should be able to pinpoint implicit
assumptions, missing citations to relevant work, and potential
issues with experimental or analytical techniques.

3. DOING A LITERATURE SURVEY


Paper reading skills are put to the test in doing a literature survey.
This will require you to read tens of papers, perhaps in an
unfamiliar field. What papers should you read? Here is how you
can use the three-pass approach to help.
First, use an academic search engine such as Google Scholar or
CiteSeer and some well-chosen keywords to find three to five
recent papers in the area. Do one pass on each paper to get a sense
of the work, then read their related work sections. You will find a
thumbnail summary of the recent work, and perhaps, if you are
lucky, a pointer to a recent survey paper. If you can find such a
survey, you are done. Read the survey, congratulating yourself on
your good luck.
Otherwise, in the second step, find shared citations and repeated
author names in the bibliography. These are the key papers and
researchers in that area. Download the key papers and set them
aside. Then go to the websites of the key researchers and see
where they‘ve published recently. That will help you identify the
115
top conferences in that field because the best researchers usually
publish in the top conferences.
The third step is to go to the website for these top conferences and
look through their recent proceedings. A quick scan will usually
identify recent high-quality related work. These papers, along
with the ones you set aside earlier, constitute the first version of
your survey. Make two passes through these papers. If they all cite
a key paper that you did not find earlier, obtain and read it,
iterating as necessary.

4. EXPERIENCE
I‘ve used this approach for the last 15 years to read conference
proceedings, write reviews, do background research, and to
quickly review papers before a discussion. This disciplined
approach prevents me from drowning in the details before getting
a bird‘s-eye-view. It allows me to estimate the amount of time
required to review a set of papers. Moreover, I can adjust the
depth of paper evaluation depending on my needs and how much
time I have.

5. RELATED WORK
If you are reading a paper to do a review, you should also read
Timothy Roscoe‘s paper on ―Writing reviews for systems
conferences‖ [1]. If you‘re planning to write a technical paper, you
should refer both to Henning Schulzrinne‘s comprehensive web
site [2] and George Whitesides‘s excellent overview of the process
[3].

116
PART 2
Pre-reading exercise
 Guess the key words for the paper,
 Write an abstract for the paper based solely on its title.
 What is the story behind Alice and Bob? What is another
most common name in cryptographic context?
 Study Our contribution part for the use of articles. Explain
the choice (definite/indefinite/zero article) for the most
typical mathematical contexts.
 Go through Definition and Notation part to identify the
language of definitions and notation. Use the resulting list
to write the corresponding (definitions and notation) part
of a paper in your research area.
 Apply the three-pass approach to ―re-implement‖ the
paper:

CERTIFICATES OF NON-MEMBERSHIP FOR CLASSES OF


READ-ONCE FUNCTIONS

1. Introduction
Let Alice and Bob share the truth table of some Boolean
function𝑓: 0,1 𝑛 → 0,1 . Suppose that Alice learns that 𝑓 does not
belong to some fixed class of functions 𝐶. Now she wants to prove
this fact to Bob, who does not trust her word and is willing to
carry out all needed computation by himself. If the class 𝐶 is
known to both Alice and Bob beforehand, then Alice may want
just to point Bob to some of the values in the truth table of 𝑓. If the
combination of these values is inconsistent with all possible
functions 𝑕 ∈ 𝐶, then Bob will be convinced that 𝑓 ∉ 𝐶. Suppose
that Alice only cares to point Bob to as few values of 𝑓 as possible,
that is, all computational issues are ignored and the problem is
combinatorial. How many values are sufficient to prove that 𝑓 ∉
𝐶?
To capture this setting, we construct sets of input strings called
certificates of non-membership. Basically, such a certificate for a
117
function 𝑓 with respect to a class 𝐶 can be used to prove that𝑓 ∉
𝐶. In this paper, we study this concept for several classes of read-
once functions, obtaining bounds on the smallest possible
certificate size.
While we delay most formal definitions until later, some
background on read-once functions is needed in the introduction.
A function 𝑕 is said to be read-once over a finite set of functions 𝐵,
called a basis, if it can be expressed by a formula over𝐵 where no
input variable appears more than once. All other functions will be
called read-manyover 𝐵. Read-once functions have been studied
from various points of view for more than half a century. Classes
of read-once functions emerge in different areas of discrete
mathematics and computer science, from formula (circuit)
complexity and positional games to computational learning theory
and probabilistic databases.

Related work
The idea of certifying non-membership in concept classes has been
studied in computational learning theory for more than a decade.
Perhaps the most well-known is the work of Hellerstein et al. [11],
who defined so-called polynomial certificates to characterize
polynomial-query learnable representation classes in Angluin‘s
learning model (a standard model for exact learning; a
representation class basically provides some language for
expressing functions). Following this line of research, Arias,
Khardon, and Servedio [1] studied certificate size, in the sense of
[11], for classes of Boolean functions representable by monotone
CNF (conjunctive normal forms), unite CNF, Horn CNF, and so-
called renamable Horn CNF.
Since classes of read-once functions over all finite bases are known
to be polynomial-time learnable in Angluin‘s model as proved by
Bshouty, Hancock, and Hellerstein [3], it follows that appropriate
representation classes have polynomial certificates. We shall see
that with our definition of certificates of non-membership this
conclusion can, in a sense, be strengthened (although our current
paper does not deal with arbitrary finite bases).
118
We wish to emphasize that our definition of certificates is different
from, although not unrelated to that in [11]: our certificates show
that a certain object cannot be represented within a class, while
certificates in [11, 1] also show (not necessarily infinite) lower
bounds on the representation size.
However, a different characterization of polynomial-query
learnable classes, involving almost literally (under the name of
unique specification dimension) the (worst-case) size of certificates of
nonmembership as used in the present paper, was obtained by
Heged˝us [8]. We discuss this characterization and its implications
for our results in sections 2 and 5. The same characteristic (worst-
case size of a certificate of non-membership) was also studied by
Hellerstein [10], who characterized and studied classes of
functions admitting constant-size certificates.
Another related area of research is the development of certifying
algorithms for various computational tasks such as decision
problems (see, e. g., a general survey by McConnell et al. [12]).
This area is motivated by software engineering and builds upon
the idea that a kind of certificate should be provided as a part of an
algorithm‘s output. The entire output can afterwards be verified
(authenticated) by a separate algorithm, which in certain cases can
be expected to run faster than the main (original) algorithm. When
certifying algorithms are used to decide membership in some fixed
class of discrete objects, they augment each yes-answer and no-
answer with a certificate of membership and non-membership,
respectively.
As an example of the implementation of this approach we refer the
reader to a series of linear-time certifying algorithms for deciding
membership in (or, put differently, recognition of) various classes
of graphs, developed by Heggernes and Kratsch. Nonmembership
certificates output by these algorithms are based on
characterizations in terms of forbidden induced subgraphs. For an
overview of a related subject of characterizing graph classes with
sets of forbidden minors, we refer the reader to a paper by
Thomas.

119
For various classes of Boolean functions, certificates of non-
membership can take the form of forbidden projections (a
projection is obtained from a given function by substituting some
constants for input variables). For the class of unate functions, a
characterization in these terms was given by Feigelson and
Hellerstein, who thus captured the family of all minimal non-unate
functions [5]. Stetsenko obtained the list of all minimal forbidden
projections of read-once functions over the standard basis B0 =
{&,∨,¬} in [16]. This result was subsequently extended to larger
bases; most of the papers in this subarea are only available in
Russian (in the current paper, for instance, we use a theorem by
Peryazev [13];).

Our contribution
We obtain several bounds on the size of certificates of non-
membership for classes of read-once functions. For the standard
basis 𝐵0 = {&,∨, ¬} we show in section 3 that all read-many
functions over 𝐵0 have constant-sized certificates of non-
membership. In other words, the number of strings in a (shortest
possible) certificate does not grow with the number of input
variables. For each read-many function𝑓 we construct a certificate
and prove its optimality, that is, show that no shorter certificate
exists.
We next turn to generalizing these results to larger bases 𝐵. In
section 4 we consider a family of bases of the form 𝐵(𝑠) = 𝐵0 ∪
(𝑠) (𝑠)
{𝑕𝑡 }, where 𝑠-variable functions 𝑕𝑡 are taken from Stetsenko‘s
list of all minimal read-many functions over 𝐵0 . For every fixed 𝑠,
we construct a sequence of 𝑛-variable read-many functions that
require Ω(𝑛 𝑠−1 )-long certificates as 𝑛 → ∞.
Next, for 𝑠 = 2 we complement this result by proving that each
read-many function over the basis 𝐵(2) has a certificate of non-
membership of size at most 𝑂(𝑛), so our lower bound turns out to
be tigh in this spectial case. This basis 𝐵(2) is especially interesting,
because it is equivalent to the (standard in some areas) basis of all
two-variable functions, in the sense that an arbitrary Boolean

120
function is read-once over the former if and only if it is read-once
over the latter.
Last but not least, using the aforementioned characterization of
polynomial-query learning algorithms due to Heged˝us, we
improve existing upper bounds by Angluin, Hellerstein, and
Karpinski and Bshouty, Hancock, and Hellerstein on the query
complexity of learning read-once functions over bases 𝐵0 and 𝐵(2) ,
respectively, in Angluin‘s learning model, i. e., with membership
and equivalence queries. We discuss these conclusions along with
open problems in section 5.
Some of the results obtained in this paper improve upon our
previous work. More specifically, the upper bound for the basis
𝐵0 can be thought of as a result of, although a direct proof was not
given there. The results on the bases 𝐵(𝑠) in general and 𝐵(2)in
particular generalize and improve over previously known ones for
the basis 𝐵(2) : a lower bound of the form Ω(𝑛) with a rather
involved proof, together with a weaker upper bound, which is
only available in Russian.

2. Definitions and Notation


In this section, we give basic definitions, including that of a
certificate of non-membership, and fix some notation. We first
define terms related to certificates, and then review other, mostly
standard, concepts. All mappings of the form 𝑔: {0, 1} 𝑛 → 0, 1,∗
will be called partial Boolean functions. The domain of such a
function f is the inverse image 𝑔−1 ({0, 1}), and 𝑔 is said to be
undefined on all input strings outside its domain. A total function
is a partial function whose domain is {0, 1} 𝑛 . A total function
𝑓: {0, 1} 𝑛 → 0, 1 is called an extension of 𝑔 if 𝑓 and 𝑔 agree on all
strings from the domain of 𝑔. Unless explicitly stated otherwise,
the term ―function‖ will only be used to refer to total functions.
Now let 𝐶 be an arbitrary class of functions, and consider some
function 𝑓 not contained in 𝐶. Call a set 𝑆 ⊆ {0, 1} 𝑛 a certificate of
non-membership for the function f with respect to the class C if for
any n-variable function 𝑕 ∈ 𝐶 there exists a string 𝑥 ∈ 𝑆 such that

121
𝑓(𝑥) ≠ 𝑕(𝑥). Alternatively, consider a (unique) partial function 𝑔
with domain 𝑆 whose extension is 𝑓. Then 𝑆 is a certificate of non-
membership for 𝑓 with respect to 𝐶 if and only if 𝑔 has no
extensions inside 𝐶. The size of a certificate 𝑆 is its cardinality |𝑆|; a
certificate is optimal if no certificate of smaller size exists.
Recall the setting sketched in the introduction, where Alice wants
to convince Bob that the function 𝑓 is not contained in the class 𝐶.
The smallest possible number of input strings Alice needs to point
Bob to is exactly the smallest cardinality of a certificate of non-
membership for the function 𝑓. Indeed, let 𝑆 be a certificate of the
smallest possible size. Then the values of 𝑓 on strings from 𝑆 prove
that 𝑓 ∉ 𝐶, and for any set 𝑆 ′ of input strings such that |𝑆 ′| < |𝑆|
there exists a function 𝑕 ′ ∈ 𝐶 which agrees with 𝑓 on all strings
from 𝑆 ′ .
Now let B be a finite set of Boolean functions. A function f is
called read-once over B if it is either 0, 1, or some variable 𝑥𝑖 or if it
can be expressed as 𝑕(𝑓1 , . . . , 𝑓𝑠 ), where, firstly, 𝑕 ∈ 𝐵 and,
secondly, all 𝑓𝑖 depend on disjoint sets of variables and are read-
once over 𝐵. All other functions will be called read-many over 𝐵.
The set 𝐵is usually referred to as the basis.
Let 𝑅𝑂𝐹[𝐵] be the set of all read-once functions over 𝐵, and
suppose that 𝑓 is a read-many function over 𝐵. Denote by 𝑀𝐵 𝑓
the smallest possible size of a certificate of non-membership for f
with respect to 𝑅𝑂𝐹[𝐵]. By 𝑀𝐵 𝑛 we denote the maximum of
𝑀𝐵 𝑓 over all read-many functions 𝑓: 0,1 𝑛 → 0,1 . The value
𝑀𝐵 𝑓 captures the size of an optimal certificate for a specific
function 𝑓, and the value 𝑀𝐵 𝑛 provides a tight upper bound on
the optimal certificate size for all 𝑛-variable (read-many) functions.
In this paper, we are primarily interested in obtaining lower and
upper bounds on 𝑀𝐵 𝑓 and 𝑀𝐵 𝑛 .
Remark 2.1. Hellerstein et al. introduced the following definition
to characterize polynomial-query learnable classes. A
representation class for a Boolean concept class 𝐶 is said to have
polynomial certificates if there exist two-variable polynomials 𝑝
and 𝑞 with the following property: for all 𝑚, 𝑛 and for all 𝑛-
variable functions 𝑓 of (minimal representation) size greater than
122
𝑝(𝑚, 𝑛) there exists a set 𝑄 of input strings such that, first,
|𝑄| ≤ 𝑞(𝑚, 𝑛) and, second, for any 𝑛-variable function 𝑕 from 𝐶 of
size 𝑚 or less there exists a string 𝑥 ∈ 𝑄 such that 𝑓(𝑥) ≠ 𝑕(𝑥). It is
implied that non-representable functions have infinite size.
One can formally check that any class 𝐶 (or, more precisely, a
representation class for 𝐶) has polynomial certificates, as defined
by Hellerstein et al., with 𝑝(𝑚, 𝑛) set, for all 𝑚, to the largest
possible (representation) size of an 𝑛-variable function in 𝐶 and
with some polynomial 𝑞(𝑚, 𝑛) if and only if every function 𝑓 ∉ 𝐶
has a certificate of non-membership of size 𝑞(𝑝(𝑚, 𝑛), 𝑛) with
respect to 𝐶. Therefore, the fact that (certain representation) classes
of read-once functions have polynomial certificates implies that
the values 𝑀𝐵 𝑛 are bounded from above by polynomialsin 𝑛. In
this paper, we strengthen this conclusion and strive to obtain tight
bounds on 𝑀𝐵 𝑛 .
As mentioned in the introduction, a different characterization of
polynomial-time learnable classes, which turns out to be more
closely connected to our work, was independently obtained by
Hegedus. It follows from his results that for an arbitrary basis 𝐵,
the query complexity 𝐿𝐵 𝑛 of learning the class 𝑅𝑂𝐹[𝐵] with
membership and equivalence queries satisfies the inequality

2𝑀′ 𝐵 𝑛
𝑀 𝐵 𝑛 ≤ 𝐿𝐵 𝑛 ≤ ∙ 𝑙𝑜𝑔2 𝑅𝑂𝐹 𝐵 ,
𝑙𝑜𝑔2 𝑀′ 𝐵 𝑛
where the value 𝑀′ 𝐵 𝑛 is called the unique specification
dimension and satisfies |𝑀′ 𝐵 𝑛 − 𝑀𝐵 𝑛 | ≤ 1. We discuss the
implications in more detail in the concluding section 5.
We shall sometimes appeal to rooted trees representing read-once
functions. Leaves of these trees are labeled with variables (we shall
not use 0 and 1 here) without repetitions, and non-leaf nodes with
functions from 𝐵. Basically, any such tree is a Boolean formula
with function symbols (gate operations) from 𝐵.
We call functions 𝑓 𝑥1 , . . . , 𝑥𝑛 and 𝑔 𝑥1 , . . . , 𝑥𝑛 similar if there
exist constant values σ σ1 , . . . , σ𝑛 from 0, 1 and a permutation 𝜋
σ σn
on 1, . . . , 𝑛 such that 𝑓 𝑥1 , . . . , 𝑥𝑛 = 𝑔σ (𝑥𝜋 11 , … , 𝑥𝜋(𝑛) , where 𝑧 𝜏
stands for 𝑧 if 𝜏 = 1 and for 𝑧 if 𝜏 = 0. We shall sometimes use

123
this concept for partial functions, in which case it is understood
that ∗= ∗.
As usual, a function f 𝑓 𝑥1 , . . . , 𝑥𝑛 is called monotone if the
inequalities 𝛼𝑖 ≤ 𝛽𝑖 , 𝑖 = 1, . . . , 𝑛, imply that 𝑓(𝛼) ≤ 𝑓(𝛽) (here
𝛼𝑖 and 𝛽𝑖 are 𝑖th bits of 𝛼 and 𝛽, respectively). A function is called
unate if it is similar to some monotone function.
A function 𝑕 is a projection of another function 𝑓 if 𝑕 can be
obtained from 𝑓 by substituting constants for some 𝑘 ≥ 0 input
variables. The projection obtained from 𝑓 by substituting 𝜎 for 𝑥𝑖 is
𝑥 𝑥 𝑥
denoted 𝑓σ 𝑖 . A variable 𝑥𝑖 is called relevant to 𝑓 if 𝑓0 𝑖 ≢𝑓1 𝑖 , that is, if
these two projections disagree on at least one input string. The
function 𝑓 is then said to depend on the variable 𝑥𝑖 .
We shall usually write input strings as 𝛼 = (𝛼1 , . . . , 𝛼𝑛 ), where
𝛼𝑖 ∈ {0, 1}, but sometimes use comma to denote concatenation, as
in 𝑓 𝑥1 , . . . , 𝑥𝑛 and 𝑔(𝑥1 , 𝛽), where 𝑥𝑖 ∈ {0, 1} and 𝛽 ∈ {0, 1}𝑛−1 .
We write 0 and 1 to denote strings (0 . . . 0) and (1 . . . 1),
respectively, and 𝑒𝑖 to denote the string with all 0𝑠 and a unique
1 in the 𝑖th position. The length of the string is in these cases
understood from the context.
The sign ⊕ denotes the binary sum modulo two function (parity,
XOR). When applied to strings, the sum is calculated component-
wise. Boolean conjunction is denoted by the & sign and by
juxtaposition.

VOCABULARY STUDY AND PRACTICE

1. Complete the second sentence so that it has a similar meaning


to the first sentence, using the word given. Do not change the
word given
1. We are sure that the new reforms will successfully reduce
unemployment.
BOUND
We think that the new reforms __________________ in reducing
unemployment.

124
2. Philip picked the working laptop gently, because he didn‘t
want to stop the checking process.
TO
Philip picked the working laptop gently so _________________ the
checking process.
3. The original plan was to finish the project by early spring.
HAVE
In the original plan, the project _________________ by early spring.
4. There don‘t seem to be quite as many visitors at the annual IT-
exposition this year.
SLIGHT
There seems to have been ___________________ in the number of
visitors at the annual IT-exposition this year.
5. Paul regretted not going to the party.
WISHED
Paul _____________ the party.
6. Despite improving the company‘s performance, it is still not in
the top three for consultancy.
LED
The improvement in the company‘ performance
____________________ in the top three for consultancy.
7. Fewer people use ICQ messenger nowadays preferring Skype
or Viber.
COMMON
It is _____________________ to use ICQ messenger nowadays
rather than Skype or Viber.
8. Could I borrow your cellphone just for one call, please?
LEND
Would _____________ for one call, please?
9. Charles is a supporter of several charity organizations.
NUMBER
Charles gives __________________ charity organizations.

125
10. It‘s possible that this manual might help you in your PhD thesis
research.
HELPFUL
You ____________________ for your PhD thesis research.

2. Complete the sentences by writing a form of the word in


capitals in each space.
LinkedIn has been described as having
become a tool for __________ networking. PROFESSION
It has also been praised for its __________ in USE
fostering business relationships.
It is said to be the most __________ social ADVANTAGE
networking tool available to job seekers and
business professionals today.
LinkedIn has also received criticism,
primarily regarding e-mail address mining
and auto-_____________. DATE
The sign up process includes a step for
entering your email _______________. WORD
LinkedIn will then offer to send out contact
______ to all members in your address book. INVITE
Up to 1500 invitations can then be sent out in
one click, with no possibility to undo or
______________ them. DRAW
Changing the __________ below a member's DESCRIBE
name is seen as a change in a job title.
______________ a member opts to "turn off LESS
activity updates", an update is sent to all of
that person's contacts.
But nevertheless LinkedIn has considered to
be the most __________ business-oriented PROPER
social networking service today.

126
GRAMMAR PRACTICE
Inversion

Model 1
Had we applied new highly efficient equipment, we would have decreased
the production cost.
Если бы мы использовали новое высокоэффективное
оборудование, мы бы снизили себестоимость.
Model 2
In Table 3 are given the results of a new experiment.
В Таблице 3 представлены результаты нового эксперимента.
Model 3
Important for this result is temperature.
Для этого метода важным фактором является температура.
Model 4
Though, although, only, never, rarely, hardly… when, neither… nor
Never in this case will speed remain constant.
Никогда в этом случае скорость не будет оставаться
постоянной.

Translate the following sentences into Russian taking into


account the models:
1. To the sophisticated eye today, software component of the
late 80s appears primitive.
2. Hardly had economic depression dripped the world when
the stock market crashed machine building industry.
3. Closely relating to the problem is the problem of encoding.
4. Faraday was no mathematician, nor was Hamilton much of
a physicist.

Noun Chains

Model
Read Only Memory is a permanent memory chip for program storage.

127
Постоянное запоминающее устройство - это чип постоянной
памяти для хранения программ.

Translate the following terms into Russian taking into account


the model:
1. High speed generator
2. Access control system
3. Service and component based development
4. Information security risks managing
5. Wireless sensor network architecture
6. IBM Web shpere portal primer
7. Designing storage area network
8. Aided design engineering and manufacturing systems
9. Geographic information system implementation
10. File compression utility

“Once”: Time

Model
Once the execution of a command has been initiated, the indication of the
neon bulb can be seen on the control panel.
Как только выполнение команды инициируется, индикацию
неоновой лампы можно увидеть на контрольной панели.

Translate the following sentences into Russian taking into


account the model:
1. Once set, the condition code remains unchanged until
modified by an instruction that reflects a different
condition.
2. This method is called a compiler implementation, and has
the advantage of very fast program execution, once the
translation process is complete.
3. In most cases once a user has entered a person‘s e-mail
address into the ―address-book‖, e-mail can be sent with a
few clicks of the mouse.

128
4. Once the general approach is settled on, the design must be
implemented.

“As”: Time

Model
As programs are loaded, space is “carved out”, using only the space
needed to accommodate the program and leading a new, smaller empty
partition, which may be used by another program later.
По мере того, как программа загружается, пространство
заполняется, при этом используется только пространство,
необходимое для размещения программы и создается новая
компактная пустая область, которая может быть использована
другой программой позже.

Since/ as/ for/ once in adverbial clauses

“Since”: Reason

Model
Since most of the memory is volatile and limited, it is essential that there
be other types of storage devices where programs and data can be stored
when they are no longer being processed.
Поскольку большая часть памяти является энергонезависимой
и ограниченной, важно, чтобы существовали другие типы
устройств памяти для хранения программ и данных, не
обрабатывающихся в данный момент.

“As”: Reason

Model
As the ordinary adding machine has the special equipment inside, it can
store information.
Поскольку/так как обычное счетно-решающее устройство
имеет специальное оборудование внутри, она может хранить
информацию.

129
Translate the following sentences into Russian taking into
account the models:
1. E-mail can be defined as sending of message to one or
more individuals via a computer connection.
2. As e-mail use increased and new features were developed,
the question of a standardized protocol for messages
became more important.
3. As more companies begin to use e-mail for providing
routine bills and statements, government-run postal
systems are seeing the first-class mail revenue drop
considerably.

Emphatic Construction

Model
It was P.L.Chebyshev who invented in Russia in 1882 the first
arithmometer performing automatically multiplication and division.
Именно П.Л. Чебышев в России в 1882 году изобрел первый
арифмометр, выполняющий автоматически умножение и
деление.

Translate the following sentences into Russian taking into


account the model:
1. However, we believe it is bounty of services that will
ultimately demonstrate the potential of digital libraries.
2. It is this very phenomenon that is of interest to us.
3. It was that result which stimulated us to continue
investigation.
4. It was Pr. Blacksmith who demonstrated this equipment
for the first time.
5. It was a new kind of technology that permitted higher
operation speed.

130
Provided, providing

Model
The process could be repeated, providing we wanted to receive the final
result.
Процесс мог быть повторен в случае, если мы хотели получить
конечный результат.

Translate the following sentences into Russian taking into


account the model:
1. A programmer can operate a computer provided he has the
proper training.
2. Any main storage location provided in the system can be
used to transfer data to or from I/O device, provided that
during an output operation the location is not protected.

131
SUPPLEMENTARY MATERIALS

PART I. LISTENING COMPREHENSION SKILLS

INFORMATION SOCIETY
(4. 51) (Intermediate)
https://learnenglish.britishcouncil.org/en/magazine/informatio
n-society

I. Listen to the recording and answer the questions.


1. How were societies organized in the past?
2. What is becoming more and more important nowadays?
3. What opportunities does the growth in
telecommunications give?
4. Does everyone have access to information?
5. What is ‗digital divide‘?
6. What does the spread of technology mean?
7. Who controls the flow of information?

II. Listen to the text again and find the synonyms to the
following words.

Essential; basis; evolve; concern; transmit; pass; unite; represent;


stay.

III. Answer the questions raised in the text.


1. Does only good come with freedom of information?
2. If information is power, why will people share it?
3. How can the exchange of information keep local cultures
alive if most of that information is only in one language?

132
CRYPTOLOGY. PART I.
(3.17) (Intermediate)
https://learnenglish.britishcouncil.org/en/magazine/cryptology

I. Listen to the recording and answer the questions.

1. How old are secret codes?


2. What was the reason for creating secret codes?
3. Who invented the Enigma code? When was it invented?
4. What was so special about the Enigma code?
5. Who tried to break the Enigma code?
6. Why was it so important to break the Enigma code?
7. Who succeeded in breaking the Enigma code?
8. What influence did code-breaking have on the history of
Europe?

II. Listen to the text again and find the synonyms to the
following words.

Essential, seize, conquer, sophisticated, great, awesome,


worry, encounter, perform, enhance.

133
CRYPTOLOGY. PART II.
(3. 17 – 4. 36) (Intermediate)

Listen to the recording and fill in the gaps.

From code-breaking to computer-building


Turing _______________ working with machines and electronics
and in 1944 he talked about ‗building a brain‘. Turing had an idea
for an electronic ‗_______________ machine‘ that could do any
logical task. _____________ after the war, he went to work at
Manchester University and in ___________ the ‗Manchester Baby‘
was born. It was Turing‘s second great ______________ and the
world‘s first _____________ computer. When he sent a message
from his computer to a telex machine, Alan Turing wrote the first
e-mail in history. So, what ________________ next in the life of this
highly talented man? His great _________________ in code-
breaking and computing happened in his twenties and thirties. He
was still a young man - in the same year that his computer
________________ for the first time, he nearly ran in the Olympic
Games for Britain. We know that he had many ideas to
________________ in digital computing, ______________ physics,
biology and philosophy. Sadly, he _______ _________ to work fully
on these ideas. Turing‘s personal life became more and more
problematic.

134
ARTIFICIAL INTELLIGENCE
(5. 38) (Upper-intermediate)
http://www.bbc.co.uk/programmes/b05372sx

I. Listen to the recording and answer the questions.

1. What did Professor Stephen Hawking say in his interview?


2. What is the difference between Full Artificial Intelligence
and Narrow Artificial Intelligence?
3. Are the fears about AI new? What film brought these fears
to life?
4. What possible risk does AI have according to Swedish
philosopher Nick Bostrom?
5. What example of such a risk does Professor Bostrom give?
6. How can we replicate the human brain?

II. What is your attitude to AI? Is it possible to build a Full


Artificial Intelligence? Answer these questions and write
an essay.

135
ALGORITHMIC SPECIFIED COMPLEXITY
(6. 30) (Advanced)
http://www.discovery.org/multimedia/audio/2016/02/robert-
marks-and-winston-ewert-newly-published-papers-on-
information/

I. Listen to the recording and answer the questions.

1. What is specified complexity?


2. What are the examples of specified complexity?
3. What is the difference between Dr. Ewert‘s and Dr.
Dembski‘s model?
4. What is Kolmogorov Complexity?

II. Listen to the text again and then retell it.

136
QUANTUM COMPUTING
(6. 30) (advanced).
(http://www.pri.org/stories/2016-03-29/scientists-just-created-
holy-grail-computing-first-quantum-computer)

I. Listen to the recording and answer the questions.

1. What tasks can a quantum computer solve?


2. What quantum computers were scientists able to build?
3. How does this computer work?
4. What is Shor's algorithm?
5. Why is a quantum computer ‗a holy grail‘ of computing?
6. What is the difference between a digital computer and a
quantum computer?
7. What quantum mechanical principles does a quantum
computer take advantage of?

II. Listen to the text again and then retell it.

137
PART II. DOCUMENTARIES
THE GREAT MATH MYSTERY

1. Warm-up tasks
1. What field of mathematics do you consider to be the most
interesting/dull?
2. What is the most ground-breaking discovery in
mathematics you know?
3. Give some examples how mathematics helps us in our
everyday life.
4. What is called the language of the universe?
5. Choose the most interesting topic (word) from the list and
speak on it for 1 minute: a car-size rover/
mathematician/to advance research/ constellation/
electromagnetic waves/cell phone/ random numbers/
evolution/ sequence/probability theory/ the software/ the
law of physics
6. Rank these with your partner. Put the things most likely to
spell the end of human race at the top.
Cell phone; artificial intelligence; electromagnetic waves;
mathematical techniques; software; nuclear war; a virus;
overpopulation
7. Spend one minute writing down all of the different words
you associate with the word COMPUTATION.

2. Translate into Russian.


1. Physicists probe the essence of all matter, while we
communicate wirelessly on a vast worldwide network.
2. There doesn‘t really seem to be an upper limit to the
numerical abilities of animals.
3. Major funding for ―The Great Math Mystery‖ is provided
by working to advance research in the basic sciences and
mathematics.

138
4. Eons ago, we gazed at the stars and discovered patterns we
call constellations, even coming to believe they might
control our destiny.
5. While most of those claims remain unproven, it is curious
how evolution seems to favor these numbers.
6. But somehow, pi is a whole lot more.
7. Since pi relates a round object, a circle, with a straight one,
its diameter, it can show up in the strangest of places.
8. Pi is but one example of a vast interconnected web of
mathematics that seems to reveal an often hidden and deep
order to our world.
9. That's exactly what I perceive in this reality, too, as a
physicist, that the closer I look at things that seem non-
mathematical, like my arm here and my hand, the more
mathematical it turns out to be.
10. While the universe is vast in its size and complexity,
requiring an unbelievably large collection of numbers to
describe it, Max sees its underlying mathematical structure
as surprisingly simple.
11. The part that I enjoy about math I get to experience
through music, too.
12. In the sixth century BCE, the Greek philosopher
Pythagoras is said to have discovered that those beautiful
musical relationships were also beautiful mathematical
relationships by measuring the lengths of the vibrating
strings.
13. Seeing a common pattern throughout sound, that could be
a big eye opener of saying…

3. Guess the answers. Watch to check.


1. Where do Fibonacci numbers appear a lot?
2. Can you give the definition of Pi? Where can it show up?
3. Can you recollect the names of all the scientists mentioned
in the film?

139
4. Many physicists say that mathematics describes our
physical reality at least in some approximate sense. Can
you prove it?
5. What did Greek philosopher and mystic Pythagoras
explore?

4. Find English equivalents of the following expressions:


1. суть всей материи
2. беспроводная связь
3. всемирная паутина
4. материалистические методы
5. случайные числа
6. последовательность
7. теория вероятности
8. законы физики
9. изобретения и открытия
10. разумный подход
11. выделять
12. перспективные исследования
13. инновационное открытие

4. Explain and find extra information about the following.


1. Our physical reality is a bit like a digital photograph…
2. I am really confident that what will go here will be
mathematical equations.
3. … but it has deep roots in history… going back to ancient
Greece…
4. In the sixth century BCE, the Greek philosopher
Pythagoras is said to have discovered…
5. … that could be a big eye opener of saying…
6. The stable cube was earth…

5. Look at the words below. With your partner, try to recall how
they were used in the documentary (reproduce the context).
random numbers Fibonacci sequence wireless
communication Large Hadron Collider a subatomic particle
140
7. Multiple choice.

1. Isaac Newton worked


1. at Trinity College in Cambridge, England
2. for NASA‘s Jet Propulsion Lab
3. was a speaker of the British Parliament
2. Newton cultivated the reputation of being a solitary genius
because he
1. didn‘t go to the theatre
2. he was afraid of students
3. he would walk meditatively up and down the paths,
drawing mathematical diagrams.
3. Adam Steltzner was the lead engineer on the team that designed
1. the landing system
2. the bycicle
3. 1st landed on the Moon
4. built a ramp

8. Listen and fill in the gaps:

In Ancient Greece, Pythagoras and his ___________1 had a


___________ 2 on another Greek philosopher, Plato, whose ideas
also ___________3 to this day, especially among
mathematicians. Plato believed that geometry and mathematics
exist in their own ideal world. So when we draw a circle on a piece
of paper, this is not the real circle. The real circle is in that world,
and this is just an ___________4 of that real circle, and the same
with all other shapes. And Plato liked very much these five solids,
the platonic solids we call them today, and he ___________ 5 each
one of them to one of the elements that formed the world as he
saw it.

141
CODE-BREAKERS: BLETCHLEY PARK'S LOST HEROES

1. Warm-up tasks
1. What names and stories do you associate with the name of
Bletchley Park? Can you recall any movies based on the
history of the place?
2. Mathematics and War: suggest possible links between the
two.
3. Which adjectives are likely to be used to describe a
mathematician?
4. Spend one minute writing any words and phrases
associated with the word ‗code‘ (synonyms, derivates etc.)
5. In pairs/groups, talk about these topics or words from the
documentary. What is their likely context in the film?
Military intelligence, pattern, insight, effort, feat, an Achilles
heel, trust, vulnerable.

2. Translate into Russian:


1. This is a British mathematician called Bill Tutte. You won't
have heard of him.
2. He died in 2002 without ever being officially recognized for
his achievement.
3. The secrecy about Tunny and Colossus has completely
distorted the history of computing and it's also left the
story of the World War Two codebreaking effort
incomplete.
4. … this sprawling complex was home to a clandestine army
engaged in a shadowy struggle for military intelligence.
5. In its heyday, the place was really buzzing.
6. So increasingly, they came to rely upon radio technology…
7. …the British radio engineers were considerably more
advanced than their German counterparts…
8. So, the vast volumes of information needed to fight a
modern war at that time would simply have overwhelmed
a system based upon using an Enigma machine.

142
9. And if you looked at that set of impulses in terms of the
standard international teleprinter code…
10. But they (...) were the lifeblood of the German command,
feeding out to the furthest fingertips of the Third Reich's
reach.
11. I imagine he might have been frustrated at the school itself
in that he was apparently so much ahead of all the other
pupils so he would've been a bit isolated in that respect…
12. Bletchley's habit of raiding the best academic talent.
13. …that was the best thing that could have happened to Bill.
14. Now, it might be argued that that could have occurred by
chance, but it's very unlikely…
15. I can still remember him staring into the middle distance
and making counts on reams and reams of paper.
16. From the beginning, the Nazis were in the impossible
position of having to trust these machines.

4. Find Russian equivalents of the following expressions:


Intellectual feat, a former GPO engineer, the course of the war, as
the war progressed, heyday, intercept, deliver victory, a tightly
guarded secret, preceded, cipher machine, encipher, to pick up,
wordsmiths, the throughput of information, plaintext, apparent
randomness, pseudo-random character, to crack a code, utterly
unimaginable, keen intelligence, gained a scholarship, the cream of
the cryptographic people, invincible Tunny code, sloppy, despatch
rider, perseverance, to decompose, to hand-break and unravel the
transmission, a phenomenal piece of decryption, work out the
system, an act of desperation, disparagingly, applied brute mental
force, the chief operating officer , massive surge, major assault, a
pincer attack, forewarned and forearmed, handicap,
apprenticeship, valve tubes, flaky kind of devices, worked until
their eyes dropped out, chi wheels, right-hand men, to play Hitler
like a fish on a line, ill-equipped, pretty jolly important, (secrecy
would) remain intact.

143
5. Watch the documentary and answer the questions
1. What was a ―not-Enigma‖ machine?
2. Name the feats of the three heroes of Bletchley Park
(respectively).
3. What was the nickname of Bletchley Park? What is the
story behind the nickname?
4. What is described as the toughest and most rewarding
struggle?
5. Why was wireless rather than wired communication used
during the Second World War?
6. What was the German for ―Tunny‖?
7. How was the internal wiring changed within the Enigma?
8. Which part of the process of transmission is more
important for cryptographers?
9. Why did the authorities (in Bletchley Park) begin to
recruit mathematicians?
10. In what ways was the Lorenz more sophisticated than the
Enigma?
11. What were the early years of Bill Tutte like?
12. What was the ―gift‖ Germans happen to give code-
breakers in Britain?
13. How did the Tunny decrypt influence the Kursk battle?
14. What were the two major contributions of William Tutte?
15. What kind of a person was Tommy Flowers?
16. What improvements over Robinson did Flowers propose?
17. How did the Bletchley Park departments split the job of
decoding Tunny?
18. Why did the people operating Colossus had to wear
Wellington boots?
19. What were the two contributions that Tunny decrypt made
to the success of the D-Day landing? What details became
known thanks to it?
20. What is said about Russian Cold War communication?
What did the Russians allegedly borrowed from the
Germans?

144
6. What is the plural of Colossus?

7. Explain and find extra information about the following:


1. It was Hitler's Blackberry really.
2. An elite group known as "The Testery"
3. Or as the Allies called it, "Tunny".
4. You need depth to break any cipher.
5. Using Tutte's insight and a method known as Turingery
6. Max's department, called the Newmanry

8. Listen and fill in the gaps:

This is Colossus. And what it did was, you took the ___________1
cipher text, on a lot of paper tape. Five ________ 2 code there. And
that is received by us on our radio station, ___________ 3 on a
paper tape, and loaded on to this part of Colossus here, called the
___________4. That's the part of Colossus that holds the
intercepted cipher _________ 5, and that is joined into a ________6,
and being read continuously. And that is being read at 5,000
characters per second. That's the data going into Colossus. They
put the results of those __________7 up on to a lamp panel here,
and here are the results of a particular run. So this is ________ 8
every time the tape goes round one continuous cycle.

145
QUANTUM COMPUTER IN A NUTSHELL

1. Warm-up tasks:
1. What do you associate with quantum computing?
2. Choose the most interesting topic (word) from the list and
speak on it for one minute:
Knowledge deepening, development, mysteries of the
universe, crossing a new threshold of scientific knowledge,
computer development, quantum, superconducting
materials, turning points, the laws of nature, humanity.

2. Translate into English:


1. Our drive to explore has opened the door to new
possibilities to improve our quality of life.
2. The premise behind Feynman`s model rested in the
conviction that it would be impossible to conduct the
simulation of a quantum system with the use of a classic
computer.
3. He based his reasoning on the laws of nature.
4. Although it`s not possible to describe this particular feature
through the use of classical mechanics, it can be likened to
a magnetic bar capable of deviations.
5. However, beyond this point, all similarity ends.
6. The advantage of quantum computing mainly rests in the
quantum mechanical feature thanks to which an
elementary particle can be in multiple states
simultaneously.
7. Working with qubits provides us with the incredible new
possibilities for the effective processing of databases,
beyond what we could have ever before imagined.
8. The possibility of actually developing such a system for
practical applications is not readily conceivable.
9. Each elementary particle is subject to wave-particle duality,
meaning that sometimes it behaves like a particle, and
other times, it behaves like a wave.

146
10. Such evolution of entanglement and mutual decoherence
may be analyzed and controlled in time, which allows for
the processing of information in a completely new way.
11. Aside from nuclear magnetic resonance, other solutions
and phenomena may be used to create a quantum
computer.
12. Regardless of the method used, the goal is to achieve the
capability to control quantum states in such way that it
would be possible to program the computer, perform the
calculations, and finally, read the desired result.
13. In light of the many positive and interesting results of the
research on the control of quantum states, the team of
Australian researchers, led by Michelle Simmons, has
garnered worldwide recognition.
14. This type of electron detachment from the atom is
equivalent to a particular direction of spin corresponding
to number ―1‖ in binary notation.
15. Quantum tunneling is a unique phenomenon which allows
the particles of the micro-world to cross the walls, contrary
to the law of conservation of energy.
16. Nevertheless, there are many people who have risen to the
challenge.
17. Most notably, since its creation Shor`s algorithm has
generated a great deal of discussion among the scientific
community, as it could be used to break the modern
encryption keys such as RSA.
18. In order to find a given telephone number, you would have
to search through each and every listing, which would
undoubtedly be cumbersome and time-consuming.
19. The more times the computer performs the calculations, the
more likely it is to find the proper solutions to the problem.

3. Give Russian equivalents of:


To contemplate, to be driven by the conviction, to be confronted
with, to be within arm`s reach, as of 2014, internal angular
momentum of the particle, to be likened to a magnetic bar, an
147
exemplary particle, at the most, due to the phenomenon of
superposition, to be subject to wave-particle duality, susceptible
quantum information, entanglement, a leakage of information, to
propel into, to garner worldwide recognition, pulse voltage,
dangling bond, molecular beam epitaxy, turning points, with
respect to, pendulum, to reach one`s culminating point, the flow of
electric charge, electron tunneling, sensitive measurements,
adiabatic quantum computer, the intensity of magnetic field, to
take precedence over, the (counter)clockwise-flowing current,
quantum annealing, to be derived from, quantum tunneling, to
pose allegations, counterintuitive laws of quantum mechanics, a
new rung of possibilities.

4. Watch the documentary and answer the questions


1. Who was the first to propose the idea of quantum computing?
2. What is considered to be a key moment in the development of
quantum computer theory?
3. Which is the most powerful commercially available processor,
as of 2014? How many transistors does it possess?
4. What is described in zeroes and ones in quantum computing?
5. How can we describe the quantum states with the use of
binary system?
6. Describe the phenomenon of superposition.
7. How is the advantage of working with qubits illustrated?
8. What is one of the biggest problems faced by scientists
working to develop quantum computers?
9. Why is it essential to isolate and cool the quantum computer
processor?
10. What other solutions and phenomena may be used to create a
quantum computer aside from nuclear magnetic resonance?
11. When and where was the first single atom silicon transistor
created?
12. Describe electron tunneling.
13. What can open up a new door in the world of quantum
computing?

148
14. What paved the way to the idea of building a quantum
computer system?
15. What does SQUID stand for? What are SQUIDs used for?
16. What allows quantum uniqueness to take precedence over the
classic principles of physics?
17. What do zeroes and ones describe in D-wave`s computer
processors?
18. What is quantum annealing?
19. Describe quantum tunneling.
20. In what areas is the D-wave Two computer used by the AI
laboratory researchers?
21. Creating quantum algorithms is a very difficult task. Why?
22. Name the most well-known quantum algorithms.
23. What perspective do quantum computers provide us with?

5. Find extra information about the following:


1. Schrödinger`s equation
2. Bose-Einstein condensate
3. STM technique
4. Josephson junction
5. The Meissner effect

6. Listen and fill in the gaps:

The role of a quantum computer is ________ 1 in capturing what is


beyond the ________ 2 imposed by time and energy needs.
Perhaps, in the not so distant future, we will be able to ________ 3
to a new rung of possibilities, such as the creation of new drugs,
________ 4 in research on climate change, and the development of
new ________ 5. It is the hope that these new discoveries will
provide us with a ________ 6 of the structure of the reality that
________ 7 us. And all of this thanks to the ________ 8 and the
desire to ________ 9, which defines humanity.

149
REFERENCES

1. A. Lew, H. Mauch, Dynamic Programming: a Computational


Tool, Springer, 2007
2. M. Pinedo, Scheduling: Theory, Algorithms and Systems,
Springer, 2008
3. H. Becker, L. Albera, P. Comon, Brain-Source Imaging, Signal
Processing Magazine, 2015
4. C. Laing, Spiral Waves in Nonlocal Equations, Applied
Dynamical Systems, 2005
5. C. Cassandras, S. Lafortune, Introduction to Discrete Event
Systems, Springer, 2010
6. F. Chang, J. Dean, S. Ghemawat, Bigtable: a Distributed Storage
System for Structured Data, ACM Transactions on Computer
Systems, Volume 26 Issue 2, 2008
7. P. Domingos, A Few Useful Things to Know about Machine
Learning, Communications of the ACM, Volume 55, Issue 10,
2012
8. M. Zaharia, M. Chowdhury, T. Das, Resilient Distributed Datasets:
A Fault-Tolerant Abstraction for In-Memory Cluster Computing,
USENIX, 2012
9. Case Study: From Safety Performance to EcoBoost Technology:
HPC Enables Innovation and Productivity at Ford Motor
Company, 2010
10. J. Dongarra, P. Beckman, T. Moore, P. Aerts, The International
Exascale Software Project Roadmap, International Journal of
High Performance Computing Applications, Volume 25 Issue 1,
2011
11. S. Keshav, How to Read a Paper, Computer Communication
Review, 2007
12. D. Chistikov, V. Fedorova, A. Voronenko, Certificates of Non-
Membership for Classes of Read-Once Functions, Fundamenta
Informaticae, 201

150