Вы находитесь на странице: 1из 41

The Fortress Programming Language

Presented by Nimrod Partush Based on various presentations by the Fortress team

Agenda

Goal & Context Fortress motivation

Main ideas and features


Focusing on parallelism

State of the art

Context

Improving programmer productivity for scientific and engineering applications Research funded in part by DARPA through their

High Productivity Computing Systems program


Alongside IBM and Cray

Goal is economically viable technologies for both


government and industrial applications by the year 2010 and beyond

The Goal of Fortress

To boldly go where no programming language has gone before!


No.

To seek out great programming language

design ideas and make them our own


And let it grow

Motivation: To Do for Fortran What JavaTM Did for C

Catch stupid mistakes


array bounds errors garbage collection

Scientific-centered computation Extensive libraries (e.g., for network environment) Security model (including type safety)

Dynamic compilation
Platform independence Parallelism

Growing A Language

Wherever possible, consider whether a proposed language feature can be

provided by a library rather than having it


wired into the compiler

Main Ideas

Contracts OOP

Mathematical Syntax
Parallelism
Generators & Reducers

Transactional Memory

Contracts

Part of the function declaration Better documentation for clients Exception thrown on violation
factorial(var n:32)

requires { n 0 }
ensures { result 0, result 2 provided n 2 } invariant { n } = if n = 0 then 1 else n factorial(n-1) end

OOP: Objects and Traits

Multiple vs. single inheritance


Single inheritance is limiting. Multiple inheritance is complicated.

Java works around this by having single inheritance augmented by interfaces. In Fortress: Traits and Objects. Parametric polymorphism exists (not in this talk)

OOP: Objects and Traits

Traits
Like Javas interfaces Multiple inheritance tree Only methods
May suggest a concrete implementation Otherwise abstract

trait Moving extends {Tangible, Object} position() : 3 velocity() : 3 end trait Fast extends Moving velocity() = [0,0,9999999] end

Objects
Like Javas classes

Fields and methods


Ctor arguments are implicit fields

object Particle(position: 3 ,velocity: 3) extends Moving mass = 0 gram end

Must implement inherited abstract methods

Leafs in the inheritance tree In-language support for singletons

object Sun extends {Moving, Stellar} temperature = 5800 Kalvin position() = [0,0,0] velocity() = [0,0,0] end

Mathematical Notation

Math is old, concise, convenient, and widely taught

Make the programming language closer to the mathematical notation

Implicitly parallel computation

What you write on your whiteboard works

Juxtaposition parsing presents an interesting research question

Mathematical Notation

The NAS (NASA Advanced


Supercomputing)

Kernel CG

benchmark algorithm:
A typical computation for irregular long distance
communication done in grid computing.

Mathematical Notation

do j=1,naa+1 q(j) = 0.0d0 z(j) = 0.0d0 r(j) = x(j) p(j) = r(j) w(j) = 0.0d0 enddo sum = 0.0d0 do j=1,lastcol-firstcol+1 sum = sum + r(j)*r(j) enddo rho = sum do cgit = 1,cgitmax do j=1,lastrow-firstrow+1

Which would you rather code?


do j=1,lastcol-firstcol+1 w(j) = 0.0d0 enddo sum = 0.0d0 do j=1,lastcol-firstcol+1 sum = sum + p(j)*q(j) enddo d = sum alpha = rho / d rho0 = rho do j=1,lastcol-firstcol+1 z(j) = z(j) + alpha*p(j) r(j) = r(j) - alpha*q(j) do j=1,lastrow-firstrow+1 sum = 0.d0 do k=rowstr(j),rowstr(j+1)-1 sum = sum + a(k)*z(colidx(k)) enddo w(j) = sum enddo do j=1,lastcol-firstcol+1 r(j) = w(j) enddo sum = 0.0d0 do j=1,lastcol-firstcol+1 d = x(j) - r(j) sum = sum + d*d enddo d = sum rnorm = sqrt( d

enddo
sum = 0.0d0 do j=1,lastcol-firstcol+1 sum = sum + r(j)*r(j) enddo rho = sum

sum = 0.d0
do k=rowstr(j),rowstr(j+1)-1 sum = sum + a(k)*p(colidx(k)) enddo w(j) = sum enddo

beta = rho / rho0


do j=1,lastcol-firstcol+1 p(j) = r(j) + beta*p(j) enddo endd

do j=1,lastcol-firstcol+1
q(j) = w(j) endd

Created by the Fortress-to-Latex compiler

Parallelism in Fortress

The creators of Fortress admit to hating parallelism


Parallel programming is difficult and error-prone. (This is not a
property of machines, but of people.) I would be much easier if we could just make sequential execution

faster, but we cant (the power wall)

Parallel programming is not a goal, but a pragmatic compromise The Fortress language encourages you to be parallel and efficient Fortress tries to protect you from errors
Cant always succeed.

Explicit Parallelism

Explicitly creating a thread to do some computation is easy:


spawn do factorial(42) end

This is not advised.

Implicit Parallelism

Implicitly parallel statements in Fortress:


tuples

also do
Parallelism achieved by generator

for loops

Implementation - Work Stealing Queues

Implicit parallel works are divided among the runtime thread pool Work is pushed onto a per thread queue.
Idle threads may steal work from the top of another thread's queue.

Built on top of Doug Lea's jsr166y forkjoin library.

Generators drive parallelism

Generators (defined by libraries) manage parallelism and the division of tasks to threads Examples:
Aggregates
Lists 1,2,4,3,4 and vectors [1 2 4 3 4] Sets {1,2,3,4} and multisets {|1,2,3,4,4|} Arrays (including multidimensional)

Ranges 1:10 and 1:99:2 and 0#50 Index sets a.indices Index-value sets ht.keyValuePair

Whats that about generators?

When we execute this:

The library generator # takes over:


Creates an array of indexes 1..1000 Gives each loop iteration an index for it work on Divides execution between threads
Distributed among threads work queues

Implicit Parallelism

Work-stealing

What About Locality?

The arrays may be spread in any manor across the architecture:

Regions

Hierarchical data structure describes CPU and memory resources and their

properties
Allocation heaps

Parallelism
Memory coherence

Regions Control Locality

All objects & threads has an interface for checking and manipulating the memory

area in which it resides.


Distributions assign region This can (and should) affect execution:


Implicit threads (should?) run on data in their region Explicit threads:

Distributions

Describe how to map a data structure onto a region


Built-in or user-definable!
Override region() method

Some built-in distributions include:


blocked(n) = Blocked, block size multiple of n

ruler = Hierarchical division at powers of 2


etc.

Smarter Parallelism with Regions


Parallel is not enough, we also want local! Say we want to run:

Whats the best way to do this?


Remember, you have multiple execution units

with different memory locations

Smarter Parallelism with Regions

Write special allocation (Distribution) and iterations (Generator) functions for the arrays
Co-allocate chunks
d is the object implementing allocation (and extends Distribution)

Co-locate iterations of the loop

implements iteration (and extends Generator)

Clever Parallel Computation with Reducers

Up to now we just considered sporadic computations

What if we want:

i1#1000 a[i] + b[i]

This is a reduction over an expression


For true parallelism we need to cleverly collect

the result

Reducers

Reduction operators are managed using an abstract collection


Leaf operator (unit) Binary operator

Optional empty collection (zero)

This translates into an actual reducer


By suggesting actual operations and unit. For example: is defined by (+,id,0)

Reducers

Summing up a blocked-array A:
Imagine the possibilities

Distribution

Generator

Example: Blocked array generator

Transactional Memory

Programming with locks is hard, often inefficient, and error prone.

Transactions are simple and easy to


reason about.

Transactional Memory

The atomic keyword defines a transaction


Visibility: all or nothing.

Can be aborted in mid-execution.

Transactional Memory

Fortress provides:
Software Transactional Memory
may have multiple threads cooperating in a single transaction.

Nested Transactions
Mixing atomic and non-atomic accesses to the same

data.

Built on top of Javas DSTM2 library for

transactional memory.

Transactional Memory

All mutable values are represented by Reference Cells and may be a part of a transaction
Rollback

Collision detection

Invalidate

Status update via compare and set

Example: R/W Collision


Thread 1 x := 0 atomic do x := 3 end

Thread 2 atomic do z += x w += x end

After these two threads run, the values of (z,w) are either (z,w) or (z+3,w+3)

Example: R/W Collision


Thread 1 x := 0 atomic do x := 3 end Thread 2 atomic do z += x w += x end Value = 3 Old Value = 3 0 z = z w = w + 3

Transaction 1, Status = Active Done Transaction 2

Need to abort at this point! Otherwise

Transactions: Read Sets

Why not have per transaction read sets instead of per object read sets?
A transaction would keep track of every value it read and then prior to committing updates it would validate that the read values haven't changed.

Validating the reads before a commit may take a


long time; we can't block other threads for that

long

Example: W/W Collision

Transactions: Contention

Aborting a transaction:
Revert all written variables backoff via spin and retry

All of the aborted transactions are placed in a


contention manager
Transaction created by the lowest numbered thread wins.
One transaction always makes progress.

State of the art


Not selected for HPCS part 3 (2006) The Fortress Language Specification Version 1.0 was released (2008)

Working (partial) core implementation on top of


JVM exists
With partial correctness & soundness proofs
Last stable release (end of 2010)

Programmer community probably small

Questions?

Вам также может понравиться