Fortress Ovo Je Fortress

The Fortress Programming Language
Presented by Nimrod Partush Based on various presentations by the Fortress team
Agenda

Goal & Context Fortress motivation
Main ideas and features

Focusing on parallelism
State of the art
Context
Improving programmer productivity for scientific and engineering applications Research funded in part by DARPA through their
High Productivity Computing Systems program

Alongside IBM and Cray
Goal is economically viable technologies for both

government and industrial applications by the year 2010 and beyond
The Goal of Fortress
To boldly go where no programming language has gone before!

No.
To seek out great programming language
design ideas and make them our own

And let it grow
Motivation: To Do for Fortran What JavaTM Did for C
Catch stupid mistakes

array bounds errors garbage collection
Scientific-centered computation Extensive libraries (e.g., for network environment) Security model (including type safety)
Dynamic compilation
Platform independence Parallelism
Growing A Language
Wherever possible, consider whether a proposed language feature can be
provided by a library rather than having it

wired into the compiler
Main Ideas

Contracts OOP
Mathematical Syntax
Parallelism
Generators & Reducers
Transactional Memory
Contracts

Part of the function declaration Better documentation for clients Exception thrown on violation
factorial(var n:32)
requires { n 0 }
ensures { result 0, result 2 provided n 2 } invariant { n } = if n = 0 then 1 else n factorial(n-1) end
OOP: Objects and Traits
Multiple vs. single inheritance

Single inheritance is limiting. Multiple inheritance is complicated.
Java works around this by having single inheritance augmented by interfaces. In Fortress: Traits and Objects. Parametric polymorphism exists (not in this talk)
OOP: Objects and Traits
Traits
Like Javas interfaces Multiple inheritance tree Only methods
May suggest a concrete implementation Otherwise abstract
trait Moving extends {Tangible, Object} position() : 3 velocity() : 3 end trait Fast extends Moving velocity() = [0,0,9999999] end
Objects
Like Javas classes
Fields and methods

Ctor arguments are implicit fields
object Particle(position: 3 ,velocity: 3) extends Moving mass = 0 gram end
Must implement inherited abstract methods
Leafs in the inheritance tree In-language support for singletons
object Sun extends {Moving, Stellar} temperature = 5800 Kalvin position() = [0,0,0] velocity() = [0,0,0] end
Mathematical Notation
Math is old, concise, convenient, and widely taught
Make the programming language closer to the mathematical notation
Implicitly parallel computation
What you write on your whiteboard works
Juxtaposition parsing presents an interesting research question
The NAS (NASA Advanced

Supercomputing)
Kernel CG
benchmark algorithm:
A typical computation for irregular long distance
communication done in grid computing.
do j=1,naa+1 q(j) = 0.0d0 z(j) = 0.0d0 r(j) = x(j) p(j) = r(j) w(j) = 0.0d0 enddo sum = 0.0d0 do j=1,lastcol-firstcol+1 sum = sum + r(j)*r(j) enddo rho = sum do cgit = 1,cgitmax do j=1,lastrow-firstrow+1
Which would you rather code?

do j=1,lastcol-firstcol+1 w(j) = 0.0d0 enddo sum = 0.0d0 do j=1,lastcol-firstcol+1 sum = sum + p(j)*q(j) enddo d = sum alpha = rho / d rho0 = rho do j=1,lastcol-firstcol+1 z(j) = z(j) + alpha*p(j) r(j) = r(j) - alpha*q(j) do j=1,lastrow-firstrow+1 sum = 0.d0 do k=rowstr(j),rowstr(j+1)-1 sum = sum + a(k)*z(colidx(k)) enddo w(j) = sum enddo do j=1,lastcol-firstcol+1 r(j) = w(j) enddo sum = 0.0d0 do j=1,lastcol-firstcol+1 d = x(j) - r(j) sum = sum + d*d enddo d = sum rnorm = sqrt( d
enddo
sum = 0.0d0 do j=1,lastcol-firstcol+1 sum = sum + r(j)*r(j) enddo rho = sum
sum = 0.d0
do k=rowstr(j),rowstr(j+1)-1 sum = sum + a(k)*p(colidx(k)) enddo w(j) = sum enddo
beta = rho / rho0

do j=1,lastcol-firstcol+1 p(j) = r(j) + beta*p(j) enddo endd
do j=1,lastcol-firstcol+1
q(j) = w(j) endd
Created by the Fortress-to-Latex compiler
Parallelism in Fortress
The creators of Fortress admit to hating parallelism

Parallel programming is difficult and error-prone. (This is not a
property of machines, but of people.) I would be much easier if we could just make sequential execution
faster, but we cant (the power wall)
Parallel programming is not a goal, but a pragmatic compromise The Fortress language encourages you to be parallel and efficient Fortress tries to protect you from errors
Cant always succeed.
Explicit Parallelism
Explicitly creating a thread to do some computation is easy:

spawn do factorial(42) end
This is not advised.
Implicit Parallelism
Implicitly parallel statements in Fortress:

tuples
also do
Parallelism achieved by generator
for loops
Implementation - Work Stealing Queues
Implicit parallel works are divided among the runtime thread pool Work is pushed onto a per thread queue.
Idle threads may steal work from the top of another thread's queue.
Built on top of Doug Lea's jsr166y forkjoin library.
Generators drive parallelism
Generators (defined by libraries) manage parallelism and the division of tasks to threads Examples:
Aggregates
Lists 1,2,4,3,4 and vectors [1 2 4 3 4] Sets {1,2,3,4} and multisets {|1,2,3,4,4|} Arrays (including multidimensional)
Ranges 1:10 and 1:99:2 and 0#50 Index sets a.indices Index-value sets ht.keyValuePair
Whats that about generators?
When we execute this:
The library generator # takes over:

Creates an array of indexes 1..1000 Gives each loop iteration an index for it work on Divides execution between threads
Distributed among threads work queues
Implicit Parallelism
Work-stealing
What About Locality?
The arrays may be spread in any manor across the architecture:
Regions
Hierarchical data structure describes CPU and memory resources and their
properties
Allocation heaps
Parallelism
Memory coherence
Regions Control Locality
All objects & threads has an interface for checking and manipulating the memory
area in which it resides.

Distributions assign region This can (and should) affect execution:

Implicit threads (should?) run on data in their region Explicit threads:
Distributions
Describe how to map a data structure onto a region

Built-in or user-definable!
Override region() method
Some built-in distributions include:

blocked(n) = Blocked, block size multiple of n
ruler = Hierarchical division at powers of 2

etc.
Smarter Parallelism with Regions

Parallel is not enough, we also want local! Say we want to run:
Whats the best way to do this?

Remember, you have multiple execution units
with different memory locations
Smarter Parallelism with Regions
Write special allocation (Distribution) and iterations (Generator) functions for the arrays
Co-allocate chunks
d is the object implementing allocation (and extends Distribution)
Co-locate iterations of the loop
implements iteration (and extends Generator)
Clever Parallel Computation with Reducers
Up to now we just considered sporadic computations
What if we want:
i1#1000 a[i] + b[i]
This is a reduction over an expression

For true parallelism we need to cleverly collect
the result
Reducers
Reduction operators are managed using an abstract collection

Leaf operator (unit) Binary operator
Optional empty collection (zero)
This translates into an actual reducer

By suggesting actual operations and unit. For example: is defined by (+,id,0)
Reducers
Summing up a blocked-array A:
Imagine the possibilities
Distribution
Generator
Example: Blocked array generator
Programming with locks is hard, often inefficient, and error prone.
Transactions are simple and easy to

reason about.
The atomic keyword defines a transaction

Visibility: all or nothing.
Can be aborted in mid-execution.
Fortress provides:
Software Transactional Memory
may have multiple threads cooperating in a single transaction.
Nested Transactions
Mixing atomic and non-atomic accesses to the same
data.
Built on top of Javas DSTM2 library for
transactional memory.
All mutable values are represented by Reference Cells and may be a part of a transaction
Rollback
Collision detection
Invalidate
Status update via compare and set
Example: R/W Collision

Thread 1 x := 0 atomic do x := 3 end
Thread 2 atomic do z += x w += x end
After these two threads run, the values of (z,w) are either (z,w) or (z+3,w+3)
Example: R/W Collision

Thread 1 x := 0 atomic do x := 3 end Thread 2 atomic do z += x w += x end Value = 3 Old Value = 3 0 z = z w = w + 3
Transaction 1, Status = Active Done Transaction 2
Need to abort at this point! Otherwise
Transactions: Read Sets
Why not have per transaction read sets instead of per object read sets?
A transaction would keep track of every value it read and then prior to committing updates it would validate that the read values haven't changed.
Validating the reads before a commit may take a

long time; we can't block other threads for that
long
Example: W/W Collision
Transactions: Contention
Aborting a transaction:
Revert all written variables backoff via spin and retry
All of the aborted transactions are placed in a

contention manager
Transaction created by the lowest numbered thread wins.
One transaction always makes progress.
State of the art

Not selected for HPCS part 3 (2006) The Fortress Language Specification Version 1.0 was released (2008)
Working (partial) core implementation on top of

JVM exists
With partial correctness & soundness proofs
Last stable release (end of 2010)
Programmer community probably small
Questions?

Fortress Ovo Je Fortress

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Fortress Ovo Je Fortress

Загружено:

Авторское право:

Доступные форматы

The Fortress Programming Language

Presented by Nimrod Partush Based on various presentations by the Fortress team

Goal & Context Fortress motivation

Main ideas and features

State of the art

High Productivity Computing Systems program

Goal is economically viable technologies for both

The Goal of Fortress

To boldly go where no programming language has gone before!

To seek out great programming language

design ideas and make them our own

Motivation: To Do for Fortran What JavaTM Did for C

Catch stupid mistakes

Wherever possible, consider whether a proposed language feature can be

provided by a library rather than having it

OOP: Objects and Traits

Multiple vs. single inheritance

OOP: Objects and Traits

Fields and methods

object Particle(position: 3 ,velocity: 3) extends Moving mass = 0 gram end

Must implement inherited abstract methods

Leafs in the inheritance tree In-language support for singletons

Math is old, concise, convenient, and widely taught

Make the programming language closer to the mathematical notation

Implicitly parallel computation

What you write on your whiteboard works

Juxtaposition parsing presents an interesting research question

The NAS (NASA Advanced

Which would you rather code?

beta = rho / rho0

Created by the Fortress-to-Latex compiler

The creators of Fortress admit to hating parallelism

faster, but we cant (the power wall)

Explicitly creating a thread to do some computation is easy:

This is not advised.

Implicitly parallel statements in Fortress:

Implementation - Work Stealing Queues

Built on top of Doug Lea's jsr166y forkjoin library.

Generators drive parallelism

Whats that about generators?

When we execute this:

The library generator # takes over:

What About Locality?

The arrays may be spread in any manor across the architecture:

Regions Control Locality

area in which it resides.

Distributions assign region This can (and should) affect execution:

Describe how to map a data structure onto a region

Some built-in distributions include:

ruler = Hierarchical division at powers of 2

Smarter Parallelism with Regions

Parallel is not enough, we also want local! Say we want to run:

Whats the best way to do this?

with different memory locations

Smarter Parallelism with Regions

Co-locate iterations of the loop

implements iteration (and extends Generator)

Clever Parallel Computation with Reducers

Up to now we just considered sporadic computations

i1#1000 a[i] + b[i]

This is a reduction over an expression

Reduction operators are managed using an abstract collection

Optional empty collection (zero)