Efficient & Portable Multi-Language Shading PDF

AnySL
Efficient and Portable Multi-Language Shading

Philipp Slusallek
Sebastian Hack, Ralf Karrenberg, Dmitri Rubinstein
German Research Center for Artificial Intelligence (DFKI)

Intel Visual Computing Institute
Saarland University
Monday, August 15, 2011
Saarbrcken
Saarland Campus
Computer Science
at the Saarland Campus
Computer Science
Computer Science
Computer Science
Computer Science
Computer Science
Multimodal
Computing
and
Interaction
Computer Science
Multimodal
Computing
and
Interaction
Shaders
Programmable Shading
Allows for controlling core rendering features
Today: Many different shading languages
HLSL, glsl, Cg, RenderMan, MetaSL, OSL, OpenRL,

many C++ dialects,
Mostly the same features, expressed differently
We need a portable way to exchange materials
Material properties, light emission, participating media,
Common specification of shading features

Ease implementation for different renderers and HW
Here: Efficient and Portable Implementation
Shaders
A plug-in for the innermost loops
From one-liners to thousands of lines of code

Run for every new ray, surface hit, light sample,
Sometimes, once for every MADD along ray
Efficient implementation
Low overhead interface to renderer
Ideally works directly on internal data structures
Highly optimized code for specific HW architectures
Use of SIMD (SSE, AVX, PTX, )
Implementation Choices
Renderer
Data
Code
Shader DSO/DLL
Glue
Code
C/C++
API
/
ABI
C/C++
API
/
ABI
C++
Shader
Shaders code in C++
API specifies interface to renderer

Separate C/C++ compilation to DLL/DSO
API gets mapped directly to platform specific ABI
Predefined data layout, function call overhead

No optimization options in interface
Renderer
Data
Code
Shader DSO/DLL
Gen.
API/
ABI
Gen.
API/
ABI
SL
Shader
Using a Shading Language Compiler
Compiler can transform and optimize shader code
Requires renderer and language specific compiler
E.g. use of renderer internal APIs: No glue code

Transform shaders to SIMD
Most renders support only one shading language
Renderer-specific code gets embedded in result
Renderer
Data
Code
Renderer
Glue
Code
A
P
I
SL
Shader
Compiler (LLVM)
Data
Code
Optimized
Shader
Compiler (LLVM)
AnySL: Embedded SL Compiler
Any language compiled into portable format

Types, data layout, interface not fixed yet
Renderer supplies implementations at runtime
Embedded compiler links and optimizes code
AnySL: Portable Shading
Any Shading Language Supported
Common Intermediate Format
Independent of renderer and HW architecture
Easy Implementation by Renderer
Currently: RenderMan, C++ dialects, Javascript,
Need only supply the glue code
Different Backends
Ray Tracing: PBRT, Manta, RTfact,

Rasterization: Deferred shading (with RTT)
HW: x86, SSE, AVX, PTX, OpenCL, glsl,
AnySL & XML3D: Interactive RenderMan in

Your Web Browser
AnySL
Implementation
AnySL: Implementation
Designing an Interpreter: Options
Many OP-codes with large switch() statement

Replace OP-codes with function calls
Subroutine Threaded Code
Long list of function calls
Nice for portability, implementations can be replaced
Even for control flow (if) and types (allocate a float)

E.g.: use predication for if or substitute own float type
Can be directly encoded in compiled code
Use LLVM bitcode for representation Efficiency
Subroutine Threaded Code

Original shader code
Handling control flow: RM illuminace loop
Conversion to Threaded Code
Mapping to Threaded Code
Its implementation
(supplied by renderer)
Possible implementation
(supplied by renderer)
But Interpreters are Slow?!?
STC is used for portable representation only
Type Replacement
Eliminated at runtime with embedded compiler

Substitute own types and operators
Inline all interpreter calls
Perform all usual scalar optimization
Can be used for special shader functionality
Taking derivatives of arbitrary expressions

Bounding the result of shader over intervals
E.g. using Affine Arithmetic [Heidrich et al., 1998]
How it All Fits Together
Special Functionality
Derivatives of arbitrary expressions
Implemented through Automatic Differentiation
Each type stores and maintains (2) derivatives

Each operation updates value and derivatives
Input provides initial derivatives (e.g. w.r.t screen space)
Bounding the value of a shader over interval
Implemented through Interval or Affine Arithmetic
Each type stores and maintain value plus interval
AA: plus terms for linear dependencies on (all) input values
Each operation updates value and derivatives

Input provides initial interval (e.g. w.r.t parameter space)
All maps nicely to Type Replacement
Results
Automatic differentiation for anti-aliasing
Point sampling
Analytic AA: Blend to average near Nyquist
Optimization:
Packet-Based Shading
Modern ray tracers shoot packets of rays
Exploit SIMD instructions of modern CPUs
Can execute instruction on k n floats at once

Current architectures:
SSE (4), AVX (8), KNF (16), GPU (32)
Shader function has to shade n hit points at once
AnySL:
Packetized Shaders
Writing packetized shaders is REALLY HARD
Not an option for any application
You may not want to do this by hand:
AnySL:
Packetized Shaders
Given:
Needed:
A shader is given by a control-flow graph of scalar

instructions
A packetized shader is a new shader that executes k
instances of the original shader at once
Control flow of instances can diverge!
Main Issues: Control Flow
Diverging control flow of a shader
Need to efficiently merge flows again!
Shaders are nested in a deep recursion
Must handle closures and reordering of packets
Packetized Shaders
Approach
Program transformation
Flatten control flow
Every instance executes
all instructions
Mask out wrong results
Loops are iterated until
last instance is done
Already exited instances
are invalidated
Simulate what GPUs do in HW
AnySL:
Dealing With Data Divergence
SSE has no gather/scatter support
Need to resort to serial load/store
Data must be in multiple of four and properly aligned

Extract individual values from SSE vector
Load/Store
Merge/blend results back into SSE register
Very expensive (lots of dependencies)
Calling non-packetized functions
Essentially, the same as scatter/gather

E.g. hand-crafted SSE noise() function
Packetized Shader
Results
Packet size of 4 (SSE)

Completely automated (LLVM)
Shaders are packetized automatically
On average 3.2x speedup
for complete rendering
Not specific to graphics
Can be used wherever
data parallelism is available
AnySL Results
Applications Beyond Graphics
Whole Function Vectorization
Transform a function over one or more scalar

parameter into function over SIMD parameters
Maintaining semantics within each SIMD lane
Application to shader code & packet ray tracing
OpenCL-Compiler
Simply add an OpenCL-Frontend

Re-use existing AnySL backends
Currently fastest OpenCL compiler for CPUs & GPUs
AnyDSL
Vision
Language, enabling domain specific environments
A new base language (others are to complex already)

New environments can be written in AnyDSL
Meta programming
Think libraries of types, code, syntax, etc.
Ensures predictable performance

Programmer directly controls which parts of a program are
evaluated at compile time
Convenient syntax, no special templates
Implicit support for parallelism
Based on continuation passing style
ECOUSS Project
Efficient and Open Compiler Environment for

Semantically Annotated Parallel Simulations
German National Project
Application Partners
Supercomputing Center HLRS, Stuttgart

Cray Computer
BMW Group
Bhringer-Ingelheim (Pharmacy)
Research Partners
Intel Visual Computing Institute

German Research Center for Artificial Intelligence (DFKI)
Karlsruhe Institute of Technology
Conclusions
AnySL
Shaders are compiled to platform-independent code
Reduce work for the renderer implementer
Eliminates interfaces and optimized code
High-performance through packetization
Need only supply renderer-specific code and link to AnySL
Highly-optimizing JIT compiler within the renderer
Can be produced from any shading language
Significant speedup on benchmarks (~3.2x )

Eliminated need for SIMD shader coding
Many applications beyond graphics

Efficient & Portable Multi-Language Shading PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Efficient & Portable Multi-Language Shading PDF

Загружено:

Авторское право:

Доступные форматы

AnySL

Efficient and Portable Multi-Language Shading

German Research Center for Artificial Intelligence (DFKI)

Monday, August 15, 2011

Monday, August 15, 2011

Monday, August 15, 2011

Monday, August 15, 2011

Monday, August 15, 2011

Monday, August 15, 2011

Monday, August 15, 2011

Monday, August 15, 2011

Monday, August 15, 2011

Monday, August 15, 2011

Allows for controlling core rendering features

Today: Many different shading languages

HLSL, glsl, Cg, RenderMan, MetaSL, OSL, OpenRL,

We need a portable way to exchange materials

Material properties, light emission, participating media,

Common specification of shading features

Here: Efficient and Portable Implementation

Monday, August 15, 2011

A plug-in for the innermost loops

From one-liners to thousands of lines of code

Sometimes, once for every MADD along ray

Low overhead interface to renderer

Ideally works directly on internal data structures

Highly optimized code for specific HW architectures

Use of SIMD (SSE, AVX, PTX, )

Monday, August 15, 2011

Shaders code in C++

API specifies interface to renderer

Predefined data layout, function call overhead

Monday, August 15, 2011

Using a Shading Language Compiler

Compiler can transform and optimize shader code

Requires renderer and language specific compiler

E.g. use of renderer internal APIs: No glue code

Renderer-specific code gets embedded in result

Monday, August 15, 2011

AnySL: Embedded SL Compiler

Any language compiled into portable format

Monday, August 15, 2011

AnySL: Portable Shading

Any Shading Language Supported

Common Intermediate Format

Independent of renderer and HW architecture

Easy Implementation by Renderer

Currently: RenderMan, C++ dialects, Javascript,

Need only supply the glue code

Ray Tracing: PBRT, Manta, RTfact,

Monday, August 15, 2011

AnySL & XML3D: Interactive RenderMan in

Monday, August 15, 2011

Monday, August 15, 2011

Designing an Interpreter: Options

Many OP-codes with large switch() statement

Subroutine Threaded Code

Long list of function calls

Nice for portability, implementations can be replaced

Even for control flow (if) and types (allocate a float)

Can be directly encoded in compiled code

Use LLVM bitcode for representation Efficiency

Monday, August 15, 2011