Академический Документы
Профессиональный Документы
Культура Документы
Start Here! 31
Machine learning for sensors and signal data is becoming easier than ever: hardware is
becoming smaller and sensors are getting cheaper, making IoT devices widely available
for a variety of applications ranging from predictive maintenance to user behavior mon-
itoring .
Whether you are using sounds, vibrations, images, electrical signals or accelerometer or
other kinds of sensor data, you can build richer analytics by teaching a machine to detect
and classify events happening in real-time, at the edge, using an inexpensive microcon-
troller for processing - even with noisy, high variation data.
Go beyond the Fast Fourier Transform (FFT). This definitive guide to machine learning
for high sample-rate sensor data is packed with tips from our signal processing and ma-
chine learning experts.
Enjoy your reading and don’t hesitate to get in touch with us if you have any questions!
In many cases, particularly in industrial applications, the purpose of the new instrumen-
tation is to monitor machines in new ways to improve uptime and reduce cost by pre-
dicting maintenance problems before they occur. Vibration sensors are an obvious go-to
here, as vibration analysis has a long history in industrial circles for machine diagnosis.
At Reality AI, we see our industrial customers trying to get results from all kinds of sen-
sor implementations. Many of these implementations are carefully engineered to pro-
vide reliable, controlled, ground-truthed, rich data. And many are not.
For more subtle kinds of conditions, like identifying wear and maintenance issues, just
knowing that a machine is shaking more isn’t enough. You need to know whether it’s
shaking differently. That requires much richer information than a simple RMS energy.
Higher sample rates are often required, and different measures.
Trained vibration analysts would generally go to the Fast Fourier Transform (FFT) to
calculate how much energy is present in different frequency bands, typically looking for
spectral peaks at different multiples of the rotational frequency of the machine (for rotat-
ing equipment, that is; other kinds of equipment are more difficult with Fourier analysis).
Other tools, like Reality AI, do more complex transforms based on the actual multidimen-
sional time-waveforms captured directly from the accelerometer.
Figure 2 - This example shows vibration data pre-processed thru a Fast Fourier Transform (FFT) at high-fre-
quency resolution. The X-axis is frequency and the Y-axis is intensity. This data is much more useful
than Figure 1 - the spikes occurring at multiples of the base rotation frequency give important information
about what’s happening in the machine and is most useful for rotating equipment. FFT data can be good
for many applications, but it discards a great deal of information from the time-domain. It shows only a
snapshot in time – this entire chart is an expansion of a single data point from Figure 1.
But rich data brings rich problems – more expensive sensors, difficulty in interrupting
the line to install instrumentation, bandwidth requirements for getting data off the local
node. Many just go with the cheapest possible sensor packages, limit themselves to
simple metrics like RMS and Peak-to-Peak, and basically discard almost all of the infor-
mation contained in those vibrations.
Others use sensor packages that sample at higher rates and compute FFTs locally with
good frequency resolution, and tools like Reality AI can make good use of this kind of
data. Some, however, make the investment in sensors that can capture the original
time-waveform itself at high sample rates, and work with tools like Reality AI to get as
much out of their data as possible.
Do I really need high sample rates and time-waveforms or at least hi-resolution FFT?
Maybe you do.
Are you trying to identify subtle anomalies that aren’t manifested by large movements
and heavy shaking? Then you do too.
Is the environment noisy? With a good bit of variation both in target and background?
Then you really, really do.
RMS and Peak-to-Peak kinds of measures, on the other hand, are “poor data.” They don’t
tell you much, and discard much of the information necessary to make the judgements
that you most want to make. They’re basically just high-level descriptive statistics that
discard almost all the essential signature information you need to find granular events
and conditions that justify the value of the sensor implementation in the first place. And
as this excellent example from another domain shows, descriptive statistics just don’t
let you see the most interesting things.
In practical terms for vibration analysis, what does that mean? It means that by relying
only on high-level descriptive statistics (poor data) rather than the time and frequency
domains (rich data), you will miss anomalies, fail to detect signatures, and basically sac-
rifice most of the value that your implementation could potentially deliver. Yes, it may be
more complicated to implement. It may be more expensive.
But it can deliver exponentially higher value.
But the truth is that algorithms are not the most important thing for building AI solutions
-- data is. Algorithms aren’t even #2. People in the trenches of machine learning know
that once you have the data, It’s really all about “features.”
In machine learning parlance, features are the specific variables that are used as input to
an algorithm. Features can be selections of raw values from input data, or can be values
derived from that data. With the right features, almost any machine learning algorithm
will find what you’re looking for. Without good features, none will. And that’s especially
true for real-world problems where data comes with lots of inherent noise and variation.
My colleague Jeff (the other Reality AI co-founder) likes to use this example: Suppose
I’m trying to detect when my wife comes home. I’ll take a sensor, point it at the doorway
and collect data. To use machine learning on that data, I’ll need to identify a set of fea-
tures that help distinguish my wife from anything else that the sensor might see. What
would be the best feature to use? One that indicates, “There she is!” It would be perfect
-- one bit with complete predictive power. The machine learning task would be rendered
trivial.
If only we could figure out how to compute better features directly from the underlying
data… Deep Learning accomplishes this trick with layers of convolutional neural nets,
but that carries a great deal of computational overhead. There are other ways.
At Reality AI, where our tools create classifiers and detectors based on high sample
rate signal inputs (accelerometer, vibration, sound, electrical signals, etc) that often have
high levels of noise and natural variation, we focus on discovering features that deliver
the greatest predictive power with the lowest computational overhead. Our tools follow
a mathematical process for discovering optimized features from the data before wor-
rying about the particulars of algorithms that will make decisions with those features.
The closer our tools get to perfect features, the better end results become. We need less
data, use less training time, are more accurate, and require less processing power. It’s a
very powerful method.
Why this approach? Probably because it’s convenient, since all the tools these engineers
use support it. Probably because they understand it, since everyone learns the FFT in en-
gineering school. And probably because it’s easy to explain, since the results are easily
relatable back to the underlying physics. But the FFT rarely provides an optimal feature
set, and it often blurs important time information that could be extremely useful for clas-
sification or detection in the underlying signals.
Take for example this early test comparing our optimized features to the FFT on a mod-
erately complex, noisy group of signals. In the first graph below we show a time-frequen-
cy plot of FFT results on this particular signal input (this type of plot is called a spectro-
gram). The vertical axis is frequency, and the horizontal axis is time, over which the FFT
is repeatedly computed for a specified window on the streaming signal. The colors are
a heat-map, with the warmer colors indicating more energy in that particular frequency
range.
Compare that chart to one showing optimized features for this particular classification
problem generated using our methods. On this plot you can see what is happening with
much greater resolution, and the facts become much easier to visualize. Looking at
this chart it’s crystal clear that the underlying signal consists of a multi-tone low back-
ground hum accompanied by a series of escalating chirps, with a couple of other tran-
sient things going on. The information is de-blurred, noise is suppressed, and you don’t
need to be a signal processing engineer to understand that the detection problem has
just been made a whole lot easier.
As mentioned above, Deep Learning (DL) also discovers features, though they are rarely
optimized. Still, DL approaches have been very successful with certain kinds of prob-
lems using signal data, including object recognition in images and speech recognition
in sound. It can be a highly effective approach for a wide range of problems, but DL re-
quires a great deal of training data, is not very computationally efficient, and can be diffi-
cult for a non-expert to use. There is often a sensitive dependence of classifier accuracy
on a large number of configuration parameters, leading many of those who work with DL
to focus heavily on tweaking previously used networks rather than focusing on finding
the best features for each new problem. Learning happens “automatically”, so why worry
about it?
The very public successes of Deep Learning in products like Apple’s Siri, the Amazon
Echo, and the image tagging features available on Google and Facebook have led the
community to over-focus a little on the algorithm side of things. There has been a tre-
mendous amount of exciting innovation in ML algorithms in and around Deep Learning.
But let’s not forget the fundamentals.
-Yogi Berra
“Duh!”
To those of us working in the field, including those at Carnegie Mellon, this was no great
revelation. “Duh! Of course, you can!” It was a nice-but-limited academic confirmation
of what many people already know and are working on. TechCrunch, however, in typical
breathless fashion, reported as if it were news. Apparently, the reporter was unaware
of the many commercially available products that perform gesture recognition (among
them Myo from Thalmic Labs, using its proprietary hardware, or some 20 others offering
smartwatch tools). It seems he was also completely unaware of commercially available
toolkits for identifying very subtle vibrations and accelerometry to detect machines con-
ditions in noisy, complex environments (like our own Reality AI for Industrial Equipment
Monitoring), or to detect user activity and environment in wearables (Reality AI for Con-
sumer Products).
But my purpose is not to air sour grapes over lazy reporting. Rather, I’d like to use this
case to illustrate some key issues about using machine learning to make products for
the real world: Generalization vs Overtraining, and the difference between a laboratory
trial (like that study) and a real-world deployment.
Typically, the best guard against overtraining is to use a training set that captures as
much of the expected variation in target and environment as possible. If you want to
detect when a type of machine is exhibiting a particular condition, for example, include
Small sample sizes. Reuse of training objects Illustration from the CMU study using vibra-
for validation. Limited variation. Very high ac- tions captured with an overclocked smart-
curacy... Classic overtraining. watch to detect what object a person is hold-
ing.
K-fold validation involves repeatedly 1) holding out a randomly selected portion of the
training data (say 10%), 2) training on the remainder (90%), 3) classifying the holdout
data using the 90% trained model, and 4) recording the results. Generally, hold-outs do
not overlap, so, for example, 10 independent trials would be completed for a 10% hold-
out. Holdouts may be balanced across groups and validation may be averaged over mul-
tiple runs, but the key is that in each iteration the classifier is tested on data that was not
part of its training. The accuracy will almost certainly be lower than what you compute
by applying the model to its training data (a stat we refer to as “class separation”, rather
Copyright 2018 Reality AI © All Rights Reserved Page 17
than accuracy), but it will be a much better predictor of how well the classifier will per-
form in the wild – at least to the degree that your training set resembles the real world.
Counter-intuitively, classifiers with weaker class separation often hold up better in K-fold.
It is not uncommon that a near perfect accuracy on the training data drops precipitous-
ly in K-fold while a slightly weaker classifier maintains excellent generalization perfor-
mance. And isn’t that what you’re really after? Better performance in the real world on
new observations?
Getting high-class separation, but low K-fold? You have a model that has been over-
trained, with poor ability to generalize. Back to the drawing board. Maybe select a less
aggressive machine learning model, or revisit your feature selection. Reality AI does this
automatically.
Be careful, though, because the converse is not true: A good K-fold does not guarantee
a deployable classifier. The only way to know for sure what you’ve missed in the lab is
to test in the wild. Not perfect? No problem: collect more training data capturing more
examples of underrepresented variation. A good development tool (like ours) will make
it easy to support rapid, iterative improvements of your classifiers.
But it’s not the only thing. Deployment considerations matter too. Can it run in the cloud,
or is it destined for a processor-, memory- and/or power-constrained environment? (To
the CMU guys – good luck getting acceptable battery life out of an overclocked smart-
watch!) How computationally intensive is the solution, and can it be run in the target en-
vironment with the memory and processing cycles available to it? What response-time
or latency is acceptable? These issues must be factored into a product design, and into
the choice of machine-learning model supporting that product.
Tools like Reality AI can help. R&D engineers use Reality AI Tools to create machine
learning-based signal classifiers and detectors for real-world products, including wear-
ables and machines and can explore connections between sample rate, computational
intensity, and accuracy. They can train new models and run k-fold diagnostics (among
others) to guard against overtraining and predictability to generalize. And when they’re
done, they can deploy to the cloud, or export code to be compiled for their specific em-
bedded environment.
Reality AI tools are data-driven machine learning tools optimized for sensors and sig-
nals.
To learn more about our data-driven methods visit our Technology page and download
our technical white-paper.
Because the decision engine no longer owns the transducer, the underlying data is also
available in its raw form. This means a smartphone app can increasingly leverage the
same sensor data to make its own decisions in ways never specifically intended by the
hardware designer. Am I running, walking, or standing in line? Am I on the bus or in the
car? How’s my driving? How’s my workout going? Is it getting dark out? Is it the ambient
crowd noise loud enough I should turn up the volume? What’s the gender and age of the
speaker?
A home security system based on similar thinking, with a microphone and a suitable mi-
crocontroller, can do much with a software-defined audio processing capability: A glass
break detector, a footstep detector, a heartbeat counter, a doorknob rattle detector, a dog
bark or shout detector, a trip and fall sensor for grandma, an unauthorized teenager party
alarm, all of these sensors defined in software within the same, flexible hardware box.
No longer the “one box, one answer”, traditional security sensor design.
Industrial IoT applications are numerous, and many are already in production.
Automotive sensors are also driving this way: decision making functions are trending
away from hard wired, end-point transducers and toward onboard computers. Makers
know this places increasing flexibility and software-adaptive capability into the hands of
the system designers.
Modularity is increasingly moving from the physical layer to a network layer, in which
modules are connected on a peer-to-peer network, exchanging packetized information.
While this network layer begins as a digital substitute for individual electrical circuits,
with increasing bandwidth capacity it can also provide the flexibility for devices to share
underlying data as well as local yes/no decisions. This creates an unprecedented oppor-
tunity both to integrate information across modalities and to add brand new capability,
ad hoc, in the form of software sensors.
This shift in thinking also opens the door to incorporating more complex, AI-based algo-
rithms, rather than just simple condition thresholds. Sensor information can be integrat-
ed in ever more complex ways, and even the innocuous electrical panel circuit breaker is
becoming a micro-controller powered, software sensing device.
Tools like Reality AI are enabling machine learning smarts in these environments.
So to apply machine learning, we compute “features” from the incoming data that re-
duce the large number of incoming data points to something more tractable. For data
like sound or vibration, most engineers would probably try selecting peaks from a Fast
Fourier Transform (FFT) – a process that reduces raw waveform data to a set of coef-
ficients, each representing the amount of energy contained in a slice of the frequency
spectrum. But there is a wide array of options available for feature selection, each more
effective in different circumstances. For more on features, see our blog called “It’s all
about the features”.
Here are our five top suggestions for data collection to make your project successful:
Minimize any unnecessary variation – usually variation in the equipment is the easiest
to eliminate or control – and make sure you capture data that gets as much of the like-
ly real-world target variation in as many different backgrounds as possible. The better
you are at covering the gamut of background, target and equipment variation, the more
successful your machine learning project will be – meaning the better it will be able to
make accurate predictions in the real world.
5. Collect iteratively
Machine learning works best as an iterative process. Start off by collecting just enough
data to build a bare-bones model that proves the effectiveness of the technique, even
if not yet for the full range of variation expected in the real world, and then use those
results to fine-tune your approach.
Take the next data you get from the field and test your bare-bones model against it to
get an accuracy benchmark. Take note of specific areas where it performs well and
performs poorly. Retrain using the new data and test again. Use this to chart your
progress and also to guide your data collection – circumstances where the model per-
forms poorly are circumstances where you’ll want to collect more data. When you get
to the point where you’re getting acceptable accuracy on new data coming in – what
we call “generalizing” – you’re just about done.
Now you can focus on model optimization and tweaking to get the best possible per-
formance.
Start
engineers that uses AI-based tools to
create signal classifiers and detectors
-- code that recognizes the signatures of
specific events and conditions in vibration,
sound, accelerometry, current/voltage
Here!
waveforms, imagery, LiDAR and remote
sensing, as well as other sensor types.
www.reality.ai
Copyright 2018 Reality AI © All Rights Reserved Page 31