Вы находитесь на странице: 1из 25

Design Exploration of a

Human-machine Interface (HMI)


Application
Francis Li
Sam Madden

The Application
Data glove interface
Wired, bulky

SmartDust scenario
A mote on each fingertip

Investigate implementations
Explore design alternatives

Proof-of-Concept Prototype
By SmartDust group

Atmel AVR Microprocessor


RFM TR1000 Radio
6 accelerometers
Host PC performs processing

Analysis
Power: 45 mW measured
Continuous operation of processor,
accelerometers, communication with host

Application Analysis
Processing (on PC)
Do 20 times per second, for each accelerometer
Read in X and Y samples (10 bits each)
Compute rolling average to smooth input data
Convert averages to polar coordinates
Dominates cost: sqrt, acos, atan
Secondary cost: floating point operations

Periodically, calculate gesture via simple


template matching (static hand positions)

Application Analysis (cont)


Communication (from Atmel to PC)
20 samples / sec 6 accelerometers 4
bytes/sample 480 bytes/sec
115.6 kb/sec RF link
Radio = 12mA @ 3V, when transmitting
1.2 mW for radio alone

Real world power >> 1.2 mW, due to


software and analog overhead
( real world analysis later )

Optimization Process
Match Application to HW

Optimization Process
Match Application to HW

Match Hardware to Application

Optimization Process
Match Application to HW
Local computation to reduce communication

Match Hardware to Application

Optimization Process
Match Application to HW
Local computation to reduce communication
Floating point Fixed Point

Match Hardware to Application

Optimization Process
Match Application to HW
Local computation to reduce communication
Floating point Fixed Point

Match Hardware to Application


Distributed vs. Centralized

Optimization Process
Match Application to HW
Local computation to reduce communication
Floating point Fixed Point

Match Hardware to Application


Distributed vs. Centralized
TI vs. Atmel

Optimization Process
Match Application to HW
Local computation to reduce communication
Floating point Fixed Point

Match Hardware to Application


Distributed vs. Centralized
TI vs. Atmel
DSP

Optimization Process
Match Application to HW
Local computation to reduce communication
Floating point Fixed Point

Match Hardware to Application


Distributed vs. Centralized
TI vs. Atmel
DSP

Communication vs.Computation
Estimates of local processing cost on Atmel
(via simulation of GCC program)
Average: 2223 instr. x 2
Loop620/sec
CalcPolar: 19017 instr.
2.83x106 instructions
Report gesture once per second
FindGestureError: 5444 instr.
10 gestures, 6 accelerometers 5444 60
3.26x105 instr.

Memory operations are 2 cyles/instruction


Total cycles ~ 3.7M 4Mhz 13.5 mW
Communication = 8 bits/sec negligible cost

Communication vs.Computation 2
Cost of communication to Host PC
(measured)
4317 nJ/bit
From Culler, Hill, Szewczyk, Woo, System
Architecture For Networked Sensors.

4317nJ/bit 480 bytes/sec 8 = 16.57 mW

Processor still sucks power


Current implementation requires 13.5mW
Using sleep, only 1.17 mW 17.74 mW total

Optimization Process
Match Application to HW
Local computation to reduce communication
Floating point Fixed Point

Match Hardware to Application


Distributed vs. Centralized
TI vs. Atmel
DSP

Distributed vs. Centralized


Move some processing to each sensor
6 processors
Each computing average, polar transform
Transmitting 4 x 8 = 32bits once/second

Using Atmel processor on each mote


Computation
~ .5M cycles/sec 2mA @ 2.7V 5.4mW

Communication
Very small: 4317nJ 32 = .13 mW

5.53 mW/mote = 33.2 mW total (Bad Idea!)

Optimization Process
Match Application to HW
Local computation to reduce communication
Floating point Fixed Point

Match Hardware to Application


Distributed vs. Centralized
TI vs. Atmel
DSP

TI Microcontroller Evaluation
A microcontroller with better specs
MSP430P112 330 A/Mhz active mode
1.5 A standby (6 ns wakeup)

Used IAR Systems compiler, profiler,


development environment
Analysis
Centralized 3.3V, 4 Mhz: 3.8 mW
Distributed 2.5V, 1 Mhz: 0.48 mW per mote
Six processors 2.9 mW

Optimization Process
Match Application to HW
Local computation to reduce communication
Floating point Fixed Point

Match Hardware to Application


Distributed vs. Centralized
TI vs. Atmel
DSP

TI DSP Evaluation
TMS320C54x
Used TI Code Composer Studio, compiler,
simulator
Power
Active Mode, 3.3V 10 Mhz: 33 mW
IDLE1, 0.36 mW

Analysis
Centralized: 7.8 mW
Distributed: 1.6 mW per mote
Six processors = 9.6 mW total

TI DSP Evaluation Part 2


TMS320C55x (two parallel MACs)
Same tools, with C55x compiler, simulator
Power: No details available...
Advertised: 0.9V, 0.05 mW/Mhz

Analysis
Centralized: 1170240 cycles (vs 2290440 54x)
2 Mhz: 0.1 mW

Distributed: 195040 cycles (vs 381740 54x)


1 Mhz: 0.05 mW
Six processors: 0.3 mW total

Other Explorations
Hand optimized code
Possible to massively reduce computation cost
FP/Transcendentals conspicuously painful
Outside scope of our exploration

Radio Hardware
Bluetooth ~ 100 times more efficient

Reconfigurable Computing
Other circuitry (e.g. accelerometers)

Results Summary
Cost, in mW of various implementations

17.74 using sleep mode, 28 without

31/104 % improvement with same hardware


170x improvement with new hardware

Atmel
TI
DSP 1
DSP 2

PC
Centralized
17.74/28
13.5
3.8
7.8
0.1

Distributed
33.2
2.9
9.6
0.3

Conclusions
By finding better mappings from SW HW
Application, big performance gains are
possible.
Effective use of local processor resources
can reduce communication overheads, which
are significant.
DSPs and other specialized processors can
be a big win and dont require hand-coded
assembly or reconfigurable design

Вам также может понравиться