Академический Документы
Профессиональный Документы
Культура Документы
The Application
Data glove interface
Wired, bulky
SmartDust scenario
A mote on each fingertip
Investigate implementations
Explore design alternatives
Proof-of-Concept Prototype
By SmartDust group
Analysis
Power: 45 mW measured
Continuous operation of processor,
accelerometers, communication with host
Application Analysis
Processing (on PC)
Do 20 times per second, for each accelerometer
Read in X and Y samples (10 bits each)
Compute rolling average to smooth input data
Convert averages to polar coordinates
Dominates cost: sqrt, acos, atan
Secondary cost: floating point operations
Optimization Process
Match Application to HW
Optimization Process
Match Application to HW
Optimization Process
Match Application to HW
Local computation to reduce communication
Optimization Process
Match Application to HW
Local computation to reduce communication
Floating point Fixed Point
Optimization Process
Match Application to HW
Local computation to reduce communication
Floating point Fixed Point
Optimization Process
Match Application to HW
Local computation to reduce communication
Floating point Fixed Point
Optimization Process
Match Application to HW
Local computation to reduce communication
Floating point Fixed Point
Optimization Process
Match Application to HW
Local computation to reduce communication
Floating point Fixed Point
Communication vs.Computation
Estimates of local processing cost on Atmel
(via simulation of GCC program)
Average: 2223 instr. x 2
Loop620/sec
CalcPolar: 19017 instr.
2.83x106 instructions
Report gesture once per second
FindGestureError: 5444 instr.
10 gestures, 6 accelerometers 5444 60
3.26x105 instr.
Communication vs.Computation 2
Cost of communication to Host PC
(measured)
4317 nJ/bit
From Culler, Hill, Szewczyk, Woo, System
Architecture For Networked Sensors.
Optimization Process
Match Application to HW
Local computation to reduce communication
Floating point Fixed Point
Communication
Very small: 4317nJ 32 = .13 mW
Optimization Process
Match Application to HW
Local computation to reduce communication
Floating point Fixed Point
TI Microcontroller Evaluation
A microcontroller with better specs
MSP430P112 330 A/Mhz active mode
1.5 A standby (6 ns wakeup)
Optimization Process
Match Application to HW
Local computation to reduce communication
Floating point Fixed Point
TI DSP Evaluation
TMS320C54x
Used TI Code Composer Studio, compiler,
simulator
Power
Active Mode, 3.3V 10 Mhz: 33 mW
IDLE1, 0.36 mW
Analysis
Centralized: 7.8 mW
Distributed: 1.6 mW per mote
Six processors = 9.6 mW total
Analysis
Centralized: 1170240 cycles (vs 2290440 54x)
2 Mhz: 0.1 mW
Other Explorations
Hand optimized code
Possible to massively reduce computation cost
FP/Transcendentals conspicuously painful
Outside scope of our exploration
Radio Hardware
Bluetooth ~ 100 times more efficient
Reconfigurable Computing
Other circuitry (e.g. accelerometers)
Results Summary
Cost, in mW of various implementations
Atmel
TI
DSP 1
DSP 2
PC
Centralized
17.74/28
13.5
3.8
7.8
0.1
Distributed
33.2
2.9
9.6
0.3
Conclusions
By finding better mappings from SW HW
Application, big performance gains are
possible.
Effective use of local processor resources
can reduce communication overheads, which
are significant.
DSPs and other specialized processors can
be a big win and dont require hand-coded
assembly or reconfigurable design