The Complete Guide To Machine Learning For Sensors and Signal Data PDF

THE COMPLETE GUIDE TO
MACHINE LEARNING FOR

SENSORS AND SIGNAL DATA
Table of Content
Rich Data, Poor Data:
Getting the most out of Sensors 4
It’s all about the Features 10
Machine learning: Lab vs Real Rorld 14
Model-Driven vs Data-Driven methods for

working with Sensors and Signals 20
What is a Sensor, anyway? 23
5 tips for collecting Machine Learning data

from high-sample-rate Sensors 26
Start Here! 31
Copyright 2018 Reality AI © All Rights Reserved Page 2

Introduction
Machine learning for sensors and signal data is becoming easier than ever: hardware is
becoming smaller and sensors are getting cheaper, making IoT devices widely available
for a variety of applications ranging from predictive maintenance to user behavior mon-
itoring .
Whether you are using sounds, vibrations, images, electrical signals or accelerometer or
other kinds of sensor data, you can build richer analytics by teaching a machine to detect
and classify events happening in real-time, at the edge, using an inexpensive microcon-
troller for processing - even with noisy, high variation data.
Go beyond the Fast Fourier Transform (FFT). This definitive guide to machine learning
for high sample-rate sensor data is packed with tips from our signal processing and ma-
chine learning experts.
Enjoy your reading and don’t hesitate to get in touch with us if you have any questions!

Rich Data, Poor Data.
Getting the most out of
Sensors

“People who
advocate simplicity
have money in the bank;
the money came first,
not the simplicity.”
- Douglas Coupland, The Gum Thief

Accelerometers and vibration sensors are having their day. As prices have come down
drastically, we are seeing more and more companies instrumenting all kinds of devices
and equipment. Industrial, automotive and consumer products use cases are proliferat-
ing almost as fast as startups with “AI” in their names.
In many cases, particularly in industrial applications, the purpose of the new instrumen-
tation is to monitor machines in new ways to improve uptime and reduce cost by pre-
dicting maintenance problems before they occur. Vibration sensors are an obvious go-to
here, as vibration analysis has a long history in industrial circles for machine diagnosis.
At Reality AI, we see our industrial customers trying to get results from all kinds of sen-
sor implementations. Many of these implementations are carefully engineered to pro-
vide reliable, controlled, ground-truthed, rich data. And many are not.
Working with accelerometers and vibrations

In vibration data, there are certainly things you can detect by just looking at how much
something shakes. To see how much something is shaking, one generally looks at the
amplitudes of the movement and calculates the amount of energy in the movement.
Most often, this means using measures of vibration intensity such as RMS and “Peak-to-
Peak”. Looking at changes in these kinds of measures can usually determine whether
the machine is seriously out of balance, for instance, or whether it has been subject to
an impact.
For more subtle kinds of conditions, like identifying wear and maintenance issues, just
knowing that a machine is shaking more isn’t enough. You need to know whether it’s
shaking differently. That requires much richer information than a simple RMS energy.
Higher sample rates are often required, and different measures.
Trained vibration analysts would generally go to the Fast Fourier Transform (FFT) to
calculate how much energy is present in different frequency bands, typically looking for
spectral peaks at different multiples of the rotational frequency of the machine (for rotat-
ing equipment, that is; other kinds of equipment are more difficult with Fourier analysis).
Other tools, like Reality AI, do more complex transforms based on the actual multidimen-
sional time-waveforms captured directly from the accelerometer.

Figure 1- This example shows a time series of data from an accelerometer attached to a machine in a
manufacturing facility. X, Y and Z components of the acceleration vector are averaged over one second.
There is very little information in this data – in fact, just about all it can tell us is which direction is gravity.
This data was provided from an actual customer implementation, and is basically useless for anomaly
detection, condition monitoring, or predictive maintenance.
Figure 2 - This example shows vibration data pre-processed thru a Fast Fourier Transform (FFT) at high-fre-
quency resolution. The X-axis is frequency and the Y-axis is intensity. This data is much more useful
than Figure 1 - the spikes occurring at multiples of the base rotation frequency give important information
about what’s happening in the machine and is most useful for rotating equipment. FFT data can be good
for many applications, but it discards a great deal of information from the time-domain. It shows only a
snapshot in time – this entire chart is an expansion of a single data point from Figure 1.

Figure 3 - Raw time-waveform data as sampled directly from the accelerometer. This data is informa-
tion-dense – being the raw data from which both the simple averages in Figure 1 and the FFT in Figure 2
were computed. Here we have frequency information is much more resolution than the FFT, coupled with
important time information such as transient and phase. We also see all of the noise, however, which can
make it more difficult for human analysts to use. But data-driven algorithms like those used by Reality AI
extract maximum value from this kind of data. It holds important signatures of conditions, maintenance
issues, and anomalous behavior.
But rich data brings rich problems – more expensive sensors, difficulty in interrupting
the line to install instrumentation, bandwidth requirements for getting data off the local
node. Many just go with the cheapest possible sensor packages, limit themselves to
simple metrics like RMS and Peak-to-Peak, and basically discard almost all of the infor-
mation contained in those vibrations.
Others use sensor packages that sample at higher rates and compute FFTs locally with
good frequency resolution, and tools like Reality AI can make good use of this kind of
data. Some, however, make the investment in sensors that can capture the original
time-waveform itself at high sample rates, and work with tools like Reality AI to get as
much out of their data as possible.
It’s not overkill

But I hear you asking “Isn’t that overkill?”
Do I really need high sample rates and time-waveforms or at least hi-resolution FFT?
Maybe you do.

Are you trying to predict bearing wear in advance of a failure? Then you do.
Are you trying to identify subtle anomalies that aren’t manifested by large movements
and heavy shaking? Then you do too.
Is the environment noisy? With a good bit of variation both in target and background?
Then you really, really do.
Rich data, Poor data

Time waveform and high-resolution FFT are what we describe as “rich data.” There’s a
lot of information in there, and they give analytical tools like ours which look for signa-
tures and detect anomalies a great deal to work with. They make it possible to tell that,
even though a machine is not vibrating “more” than it used to, it is vibrating “differently.”
RMS and Peak-to-Peak kinds of measures, on the other hand, are “poor data.” They don’t
tell you much, and discard much of the information necessary to make the judgements
that you most want to make. They’re basically just high-level descriptive statistics that
discard almost all the essential signature information you need to find granular events
and conditions that justify the value of the sensor implementation in the first place. And
as this excellent example from another domain shows, descriptive statistics just don’t
let you see the most interesting things.
Figure 4 – Why basic statistics are never

enough. All of these plots have the same
X and Y means, the same X and Y stan-
dard deviations, and the same X:Y cor-
relation. With just the averages, you’d nev-
er see any of these patterns in your data.
(source: https://www.autodeskresearch.
com/publications/samestats)
In practical terms for vibration analysis, what does that mean? It means that by relying
only on high-level descriptive statistics (poor data) rather than the time and frequency
domains (rich data), you will miss anomalies, fail to detect signatures, and basically sac-
rifice most of the value that your implementation could potentially deliver. Yes, it may be
more complicated to implement. It may be more expensive.
But it can deliver exponentially higher value.

It’s all about
the Features
We’re an AI company, so people always ask about our algorithms. If we could get a dollar
for every time we’re asked about which flavor of machine learning we use –convolutional
neural nets, K-means, or whatever – we would never need another dollar of VC invest-
ment ever again.
But the truth is that algorithms are not the most important thing for building AI solutions
-- data is. Algorithms aren’t even #2. People in the trenches of machine learning know
that once you have the data, It’s really all about “features.”
In machine learning parlance, features are the specific variables that are used as input to
an algorithm. Features can be selections of raw values from input data, or can be values
derived from that data. With the right features, almost any machine learning algorithm
will find what you’re looking for. Without good features, none will. And that’s especially
true for real-world problems where data comes with lots of inherent noise and variation.
My colleague Jeff (the other Reality AI co-founder) likes to use this example: Suppose
I’m trying to detect when my wife comes home. I’ll take a sensor, point it at the doorway
and collect data. To use machine learning on that data, I’ll need to identify a set of fea-
tures that help distinguish my wife from anything else that the sensor might see. What
would be the best feature to use? One that indicates, “There she is!” It would be perfect
-- one bit with complete predictive power. The machine learning task would be rendered
trivial.
If only we could figure out how to compute better features directly from the underlying
data… Deep Learning accomplishes this trick with layers of convolutional neural nets,
but that carries a great deal of computational overhead. There are other ways.
At Reality AI, where our tools create classifiers and detectors based on high sample
rate signal inputs (accelerometer, vibration, sound, electrical signals, etc) that often have
high levels of noise and natural variation, we focus on discovering features that deliver
the greatest predictive power with the lowest computational overhead. Our tools follow
a mathematical process for discovering optimized features from the data before wor-
rying about the particulars of algorithms that will make decisions with those features.
The closer our tools get to perfect features, the better end results become. We need less
data, use less training time, are more accurate, and require less processing power. It’s a
very powerful method.
Features for signal classification

For an example, let’s look at feature selection in high-sample rate (50Hz on up) IoT signal
data, like vibration or sound. In the signal processing world, the engineer’s go-to for fea-
ture selection is usually frequency analysis. The usual approach to machine learning on
this kind of data would be to take a signal input, run a Fast Fourier Transform (FFT) on

it, and consider the peaks in those frequency coefficients as inputs for a neural network
or some other algorithm.
Why this approach? Probably because it’s convenient, since all the tools these engineers
use support it. Probably because they understand it, since everyone learns the FFT in en-
gineering school. And probably because it’s easy to explain, since the results are easily
relatable back to the underlying physics. But the FFT rarely provides an optimal feature
set, and it often blurs important time information that could be extremely useful for clas-
sification or detection in the underlying signals.
Take for example this early test comparing our optimized features to the FFT on a mod-
erately complex, noisy group of signals. In the first graph below we show a time-frequen-
cy plot of FFT results on this particular signal input (this type of plot is called a spectro-
gram). The vertical axis is frequency, and the horizontal axis is time, over which the FFT
is repeatedly computed for a specified window on the streaming signal. The colors are
a heat-map, with the warmer colors indicating more energy in that particular frequency
range.
Time-frequency plot showing features Time-frequency plot showing features

based on FFT based on Reality AI
Compare that chart to one showing optimized features for this particular classification
problem generated using our methods. On this plot you can see what is happening with
much greater resolution, and the facts become much easier to visualize. Looking at
this chart it’s crystal clear that the underlying signal consists of a multi-tone low back-
ground hum accompanied by a series of escalating chirps, with a couple of other tran-
sient things going on. The information is de-blurred, noise is suppressed, and you don’t
need to be a signal processing engineer to understand that the detection problem has
just been made a whole lot easier.

There’s another key benefit to optimizing features from the get go – the resulting classifi-
er will be significantly more computationally efficient. Why is that important? It may not
be if you have unlimited, free computing power at your disposal. But if you are looking to
minimize processing charges, or are trying to embed your solution on the cheapest pos-
sible hardware target, it is critical. For embedded solutions, memory and clock cycles are
likely to be your most precious resources, and spending time to get the features right is
your best way to conserve them.
Deep Learning and Feature Discovery

At Reality AI, we have our own methods for discovering optimized features in signal data
(read more about our Technology), but ours are not the only way.
As mentioned above, Deep Learning (DL) also discovers features, though they are rarely
optimized. Still, DL approaches have been very successful with certain kinds of prob-
lems using signal data, including object recognition in images and speech recognition
in sound. It can be a highly effective approach for a wide range of problems, but DL re-
quires a great deal of training data, is not very computationally efficient, and can be diffi-
cult for a non-expert to use. There is often a sensitive dependence of classifier accuracy
on a large number of configuration parameters, leading many of those who work with DL
to focus heavily on tweaking previously used networks rather than focusing on finding
the best features for each new problem. Learning happens “automatically”, so why worry
about it?
My co-founder Jeff (the mathematician) explains that DL is basically “a generalized

non-linear function mapping – cool mathematics, but with a ridiculously slow conver-
gence rate compared to almost any other method.” Our approach, on the other hand, is
tuned to signals but delivers much faster convergence with less data. On applications
for which Realty AI is a good fit, this kind of approach will be orders of magnitude more
efficient than DL.
The very public successes of Deep Learning in products like Apple’s Siri, the Amazon
Echo, and the image tagging features available on Google and Facebook have led the
community to over-focus a little on the algorithm side of things. There has been a tre-
mendous amount of exciting innovation in ML algorithms in and around Deep Learning.
But let’s not forget the fundamentals.
It’s really all about the features.

Machine Learning:
Lab vs Real World

“In theory there’s no
difference between
theory and practice.
In practice there is.”
-Yogi Berra

Not long ago, TechCrunch ran a story reporting on Carnegie Mellon research showing
that an “Overclocked smartwatch sensor uses vibrations to sense gestures, objects and
locations.” These folks at the CMU Human-Computer Interaction Institute had apparently
modified a smartwatch OS to capture 4 kHz accelerometer waveforms (most wearable
devices capture at rates up to 0.1 kHz), and discovered that with more data you could
detect a lot more things. They could detect specific hand gestures, and could even tell a
what kind of thing a person was touching or holding based on vibrations communicated
thru the human body. (Is that an electric toothbrush, a stapler, or the steering wheel of
a running automobile?”)
“Duh!”
To those of us working in the field, including those at Carnegie Mellon, this was no great
revelation. “Duh! Of course, you can!” It was a nice-but-limited academic confirmation
of what many people already know and are working on. TechCrunch, however, in typical
breathless fashion, reported as if it were news. Apparently, the reporter was unaware
of the many commercially available products that perform gesture recognition (among
them Myo from Thalmic Labs, using its proprietary hardware, or some 20 others offering
smartwatch tools). It seems he was also completely unaware of commercially available
toolkits for identifying very subtle vibrations and accelerometry to detect machines con-
ditions in noisy, complex environments (like our own Reality AI for Industrial Equipment
Monitoring), or to detect user activity and environment in wearables (Reality AI for Con-
sumer Products).
But my purpose is not to air sour grapes over lazy reporting. Rather, I’d like to use this
case to illustrate some key issues about using machine learning to make products for
the real world: Generalization vs Overtraining, and the difference between a laboratory
trial (like that study) and a real-world deployment.
Generalization and Overtraining

Generalization refers to the ability of a classifier or detector, built using machine learning,
to correctly identify examples that were not included in the original training set. Over-
training refers to a classifier that has learned to identify with high accuracy the specific
examples on which it was trained, but does poorly on similar examples it hasn’t seen
before. An overtrained classifier has learned its training set “too well” – in effect memo-
rizing the specifics of the training examples without the ability to spot similar examples
again in the wild. That’s ok in the lab when you’re trying to determine whether something
is detectable at all, but an overtrained classifier will never be useful out in the real world.
Typically, the best guard against overtraining is to use a training set that captures as
much of the expected variation in target and environment as possible. If you want to
detect when a type of machine is exhibiting a particular condition, for example, include

in your training data many examples of that
type of machine exhibiting that condition, and
exhibiting it under a range of operating condi-
tions, loads, etc.
It also helps to be very skeptical of “perfect”

results. Accuracy nearing 100% on small sam-
ple sets is a classic symptom of overtraining.
It’s impossible to be sure without looking more

closely at the underlying data, model, and vali-
dation results, but this CMU study shows clas-
sic signs of overtraining. Both the training and
validation sets contain a single example of
each target machine collected under careful-
ly controlled conditions. And to validate, they
appear to use a group of 17 subjects holding
the same single examples of each machine.
In a nod to capturing variation, they have each
subject stand in different rooms when holding
the example machines, but it’s a far cry from
the full extent of real-world variability. Their re-
sult has most objects hitting 100% accuracy,
with a couple of objects showing a little lower.
Small sample sizes. Reuse of training objects Illustration from the CMU study using vibra-
for validation. Limited variation. Very high ac- tions captured with an overclocked smart-
curacy... Classic overtraining. watch to detect what object a person is hold-
ing.
Detect overtraining and predict generalization

It is possible to detect overtraining and estimate how well a machine learning classifier
or detector will generalize. At Reality AI, our go-to diagnostic is the K-fold Validation,
generated routinely by our tools.
K-fold validation involves repeatedly 1) holding out a randomly selected portion of the
training data (say 10%), 2) training on the remainder (90%), 3) classifying the holdout
data using the 90% trained model, and 4) recording the results. Generally, hold-outs do
not overlap, so, for example, 10 independent trials would be completed for a 10% hold-
out. Holdouts may be balanced across groups and validation may be averaged over mul-
tiple runs, but the key is that in each iteration the classifier is tested on data that was not
part of its training. The accuracy will almost certainly be lower than what you compute
by applying the model to its training data (a stat we refer to as “class separation”, rather
than accuracy), but it will be a much better predictor of how well the classifier will per-
form in the wild – at least to the degree that your training set resembles the real world.
Counter-intuitively, classifiers with weaker class separation often hold up better in K-fold.
It is not uncommon that a near perfect accuracy on the training data drops precipitous-
ly in K-fold while a slightly weaker classifier maintains excellent generalization perfor-
mance. And isn’t that what you’re really after? Better performance in the real world on
new observations?
Getting high-class separation, but low K-fold? You have a model that has been over-
trained, with poor ability to generalize. Back to the drawing board. Maybe select a less
aggressive machine learning model, or revisit your feature selection. Reality AI does this
automatically.
Be careful, though, because the converse is not true: A good K-fold does not guarantee
a deployable classifier. The only way to know for sure what you’ve missed in the lab is
to test in the wild. Not perfect? No problem: collect more training data capturing more
examples of underrepresented variation. A good development tool (like ours) will make
it easy to support rapid, iterative improvements of your classifiers.
Lab Experiments vs Real World Products

Lab experiments like this CMU study don’t need to care much about generalization –
they are constructed to illustrate a very specific point, prove a concept, and move on.
Real-world products, on the other hand, must perform a useful function in a variety of
unforeseen circumstances. For machine learning classifiers used in real-world products,
the ability generalize is critical.
But it’s not the only thing. Deployment considerations matter too. Can it run in the cloud,
or is it destined for a processor-, memory- and/or power-constrained environment? (To
the CMU guys – good luck getting acceptable battery life out of an overclocked smart-
watch!) How computationally intensive is the solution, and can it be run in the target en-
vironment with the memory and processing cycles available to it? What response-time
or latency is acceptable? These issues must be factored into a product design, and into
the choice of machine-learning model supporting that product.
Tools like Reality AI can help. R&D engineers use Reality AI Tools to create machine
learning-based signal classifiers and detectors for real-world products, including wear-
ables and machines and can explore connections between sample rate, computational
intensity, and accuracy. They can train new models and run k-fold diagnostics (among
others) to guard against overtraining and predictability to generalize. And when they’re
done, they can deploy to the cloud, or export code to be compiled for their specific em-
bedded environment.

R&D engineers creating real-world products don’t have the luxury of controlled environ-
ments – overtraining leads to a failed product. Lab experiments don’t face that reality.
Neither do TechCrunch reporters.

Model -Driven vs Data-Driven
methods for working with
Sensors and Signals

There are two main paradigms for solving classification and detection problems in sen-
sor data: Model-driven, and Data-driven.
Model-Driven is the way everybody learned to do it in Engineering School.

Start with a solid idea of how the physical system works -- and by extension, how it can
break. Consider the states or events you want to detect and generate a hypothesis about
what aspects of that might be detectable from the outside and what the target signal
will look like. Come collected samples in the lab and try to confirm a correlation between
what you record and what you are trying to detect. Then engineer a detector by hand to
find those hard-won features out in the real world, automatically.
Data-Driven is a new way of thinking, enabled by machine learning. Find an algorithm

that can spot connections and correlations that you may not even know to suspect. Turn
it loose on the data. Magic follows. But only if you do it right.
Both of these approaches have their pluses and minuses:
Model-Driven approaches limit

complexity
Model-driven approaches are powerful
because they rely on a deep understand-
ing of the system or process, and can
benefit from scientifically established re-
lationships.
Models can’t accommodate infinite com-

plexity and generally must be simplified.
They have trouble accounting for noisy data and non-included variables. At some level
they’re limited by the amount of complexity their inventors can hold in their heads.
Model-Driven is expensive and takes time

Who builds models? The engineers that understand the physical, mechanical, electron-
ic, data flow, or other appropriate details of the complex system -- in-house experts or
consultants that work for a company and develop its products or operational machinery.
These are generally experienced experts, very busy, and are both scarce and expensive
resources.
Furthermore, modeling takes time. It is inherently a trial-and-error approach, rooted in

the old scientific method of theory-based hypothesis formation and experiment-based
testing. Finding a suitable model and refining it until it produces the desired results is
often a lengthy process.
Data-Driven is Data Hungry
Data-Driven approaches based on machine learning require a good bit of data to get
decent results. AI tools that discover features and train-up classifiers learn from exam-
ples, and there need to be enough examples to cover the full range of expected variation
and null cases. Some tools (like our Reality AI) are powerful enough to generalize from
limited training data and discover viable feature sets and decision criteria on their own,
but many machine learning approaches require truly Big Data to get meaningful results
and some demand their own type of experts to set them up.
Reality AI tools are data-driven machine learning tools optimized for sensors and sig-
nals.
To learn more about our data-driven methods visit our Technology page and download
our technical white-paper.

What is a Sensor,
anyway?

Sensor: a device that detects or measures a physical property and records,
indicates, or otherwise responds to it.
We all know what a sensor is, right? A sensor makes “sense” of physical property -- it
turns something about the physical world into data upon which a system can act. Tradi-
tionally, sensors have filled well defined, single-purpose roles: A thermostat, a pressure
switch, a motion detector, an oxygen sensor, a knock detector, a smoke detector, a volt-
age arrestor. Measure one thing, and transmit a very simple message about that one
thing. This thinking stems from several hundred years of physical engineering of devices
and persists today in part because of the convenience of modular thinking in system
design.
But this is changing. Fundamentally. Software is becoming the new sensor.

Consider your smartphone: it has sound and image capability, along with a multi-ax-
is accelerometer, 3-axis gyroscope, magnetic compass, air pressure, light levels, touch,
you name it. The sensor suite on current generations of smartphone would completely
outclass many sensor packages flown by the US military not long ago. Some of these
phone-based sensors still use dedicated hardware to reduce transduced data to infor-
mation, but increasingly it’s all done in software: acceleration, gyro and other data is
reduced to a screen orientation, to a “phone-to-ear” detector, and to navigational inputs.
Instead of the old-school sensor design, chips capable of capturing highly granular phys-
ical inputs at high-frequency sample rates feed software run in local memory on a local
processor, reducing that data stream into specific inputs needed for a variety of different
purposes by the OS and by apps. Even the radio components are becoming a software
function.
Because the decision engine no longer owns the transducer, the underlying data is also
available in its raw form. This means a smartphone app can increasingly leverage the
same sensor data to make its own decisions in ways never specifically intended by the
hardware designer. Am I running, walking, or standing in line? Am I on the bus or in the
car? How’s my driving? How’s my workout going? Is it getting dark out? Is it the ambient
crowd noise loud enough I should turn up the volume? What’s the gender and age of the
speaker?
A home security system based on similar thinking, with a microphone and a suitable mi-
crocontroller, can do much with a software-defined audio processing capability: A glass
break detector, a footstep detector, a heartbeat counter, a doorknob rattle detector, a dog
bark or shout detector, a trip and fall sensor for grandma, an unauthorized teenager party
alarm, all of these sensors defined in software within the same, flexible hardware box.
No longer the “one box, one answer”, traditional security sensor design.
Industrial IoT applications are numerous, and many are already in production.

Dedicated physical thermocouples and vibration-limit-switches are being replaced with
digital temperature probes and accelerometers attached to embedded microcontrollers.
New software-defined sensing can now employ AI and predictive analytics (like ours)
to intervene before a problem happens. We can now alert operators to a pending issue
or needed maintenance with time for critical, high-value-processes to be spooled down
in a controlled, planned fashion – or resolved during the next scheduled downtime so
no interruption is necessary at all. Manufacturers and insurers can be kept in the loop
regarding equipment field issues and parts needs, and can perform post-event forensics
after critical failures.
Automotive sensors are also driving this way: decision making functions are trending
away from hard wired, end-point transducers and toward onboard computers. Makers
know this places increasing flexibility and software-adaptive capability into the hands of
the system designers.
Modularity is increasingly moving from the physical layer to a network layer, in which
modules are connected on a peer-to-peer network, exchanging packetized information.
While this network layer begins as a digital substitute for individual electrical circuits,
with increasing bandwidth capacity it can also provide the flexibility for devices to share
underlying data as well as local yes/no decisions. This creates an unprecedented oppor-
tunity both to integrate information across modalities and to add brand new capability,
ad hoc, in the form of software sensors.
This shift in thinking also opens the door to incorporating more complex, AI-based algo-
rithms, rather than just simple condition thresholds. Sensor information can be integrat-
ed in ever more complex ways, and even the innocuous electrical panel circuit breaker is
becoming a micro-controller powered, software sensing device.
Tools like Reality AI are enabling machine learning smarts in these environments.
This only makes sense.

5 tips
for collecting Machine
Learning data from
High-sample-rate Sensors

Machine learning is hard. Programming for embedded environments, where process-
ing cycles, memory and power are all in short supply is really hard. Deploying machine
learning to embedded targets, well… That’s been pretty close to impossible for all but the
simplest scenarios… Until now.
New modern tools, including ours from Real-
ity AI, are making it possible to use highly so-
phisticated machine learning models in em-
bedded solutions unlike ever before. But there
are some important considerations.
What is high-sample-rate data?

But let’s start by being clear about what we’re
talking about: high-sample-rate sensor data 1 second of data captured at 300Hz - at this
includes things like sound (8 kHz – 44kHz), ac- speed, it becomes possible to see the under-
celerometry (25 Hz and up), vibration (100Hz lying vibration (a fan turning at around 70
on up to MHz), voltage and current, biomet- revs per second).
rics, and any other kind of physical-world data
that you might think of as a waveform. With
this kind of data, you are generally out of the
realm of the statistician, and firmly in the terri-
tory of the signal processing engineer.
Machine logs and slower time series (eg pres-

sure and temperature once per minute) can
be analyzed effectively using both statistical
and machine learning methods intended for
time series data. But these higher-sample-rate The same 1 second of data sampled at 50Hz
datasets are much more complex, and these
basic tools just won’t work.
One second of sound captured at 14.4kHz

contains 14,400 data points, and the informa-
tion it contains is more than just a statistical
time series of pressure readings. It’s a physi-
cal wave, with all of the properties that come
along with physical waves, including oscilla-
tions, envelopes, phase, jitter, transients, and
so on.
The same 1 second of data captured at 10Hz

It’s all about the features
For machine learning, this kind of data also presents another problem – high dimen-
sionality. That one second of sound with 14,400 points, if used raw, is treated by most
machine learning methods as a single vector with 14,400 columns. With thousands, let
alone tens of thousands of observations, most machine learning algorithms will choke.
Deep Learning (DL) methods offer a way of dealing with this high dimensionality, but the
need to stream real-time, high-sample-rate data to a deep learning cloud service leaves
DL impractical for many applications.
So to apply machine learning, we compute “features” from the incoming data that re-
duce the large number of incoming data points to something more tractable. For data
like sound or vibration, most engineers would probably try selecting peaks from a Fast
Fourier Transform (FFT) – a process that reduces raw waveform data to a set of coef-
ficients, each representing the amount of energy contained in a slice of the frequency
spectrum. But there is a wide array of options available for feature selection, each more
effective in different circumstances. For more on features, see our blog called “It’s all
about the features”.
But this is about collecting data

But this post is really about collecting data – in particular about collecting data from
high-sample-rate sensors for use with machine learning. In our experience, data collec-
tion is the most expensive, most time-consuming part of any project. So, it makes sense
to do it right, right from the beginning.
Here are our five top suggestions for data collection to make your project successful:
1. Collect rich data

Though it may be difficult to work with directly, raw, fully-detailed, time-domain input
collected by your sensor is extremely valuable. Don’t discard it after you’ve computed
an FFT and RMS – keep the original sampled signal. The best machine learning tools
available (like our Reality AI) can make sense out of it and extract the maximum infor-
mation content. For more on why this is important, see our blog post on “Rich Data,
Poor Data”.
2. Use the maximum sample rate available, at least at first

It takes more bandwidth to transmit and more space to store, but it’s much easier to
downsample in software than to go back and re-collect data to see if a higher sample
rate will help improve accuracy. Really great tools for working with sensor data (like
our Reality AI) will let you use software to downsample repeatedly and explore the re-
lationship between sample-rate and model accuracy. If you do this with your early data,
once you have a preliminary model in place you can optimize your rig and design the
most cost-effective solution for broader deployment later, knowing that you’re making
the right call.
3. Don’t over-engineer your rig

Do what’s easiest first, and what’s more expensive once you know it’s worth it. If one is
available to support your use case, try a prototyping device for your early data collects
to explore both project feasibility and the real requirements for your data collection rig
before you commit. There are a number of kits available for IoT device prototypes, but
for machine learning projects you might want to consider something like the Reality AI
Starter Kit.
The Reality AI Starter Kit is an all-inclusive

kit for getting started with an accelerometry
project.
A new version that also supports sound data
is coming soon.
Learn more about Reality AI Starter Kit
4. Plan your data collect to cover all sources of variation

Successful real-world machine learning is an exercise in overcoming variation with
data. Variation can be related both to the target (what you are trying to detect) and to
the background (noise, different environments and conditions) as well as to the collec-
tion equipment (different sensors, placement, variations in mounting).
Minimize any unnecessary variation – usually variation in the equipment is the easiest
to eliminate or control – and make sure you capture data that gets as much of the like-
ly real-world target variation in as many different backgrounds as possible. The better
you are at covering the gamut of background, target and equipment variation, the more
successful your machine learning project will be – meaning the better it will be able to
make accurate predictions in the real world.
5. Collect iteratively
Machine learning works best as an iterative process. Start off by collecting just enough
data to build a bare-bones model that proves the effectiveness of the technique, even
if not yet for the full range of variation expected in the real world, and then use those
results to fine-tune your approach.

Engage with the analytical tools early – right from the beginning – and use them to
judge your progress.
Take the next data you get from the field and test your bare-bones model against it to
get an accuracy benchmark. Take note of specific areas where it performs well and
performs poorly. Retrain using the new data and test again. Use this to chart your
progress and also to guide your data collection – circumstances where the model per-
forms poorly are circumstances where you’ll want to collect more data. When you get
to the point where you’re getting acceptable accuracy on new data coming in – what
we call “generalizing” – you’re just about done.
Now you can focus on model optimization and tweaking to get the best possible per-
formance.

About Reality AI:
Reality AI is an artificial intelligence
company focusing on sensors and signals.
We offer an application for practicing
Start
engineers that uses AI-based tools to
create signal classifiers and detectors
-- code that recognizes the signatures of
specific events and conditions in vibration,
sound, accelerometry, current/voltage
Here!
waveforms, imagery, LiDAR and remote
sensing, as well as other sensor types.
Reality AI has 12 US patents and 6

more pending, all in the areas of feature
discovery and machine learning as applied
to sensor and signal problems.
www.reality.ai

The Complete Guide To Machine Learning For Sensors and Signal Data PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

The Complete Guide To Machine Learning For Sensors and Signal Data PDF

Загружено:

Авторское право:

Доступные форматы

THE COMPLETE GUIDE TO

MACHINE LEARNING FOR

It’s all about the Features 10

Machine learning: Lab vs Real Rorld 14

Model-Driven vs Data-Driven methods for

What is a Sensor, anyway? 23

5 tips for collecting Machine Learning data

Copyright 2018 Reality AI © All Rights Reserved Page 2

Copyright 2018 Reality AI © All Rights Reserved Page 3

Copyright 2018 Reality AI © All Rights Reserved Page 4

- Douglas Coupland, The Gum Thief

Copyright 2018 Reality AI © All Rights Reserved Page 5

Working with accelerometers and vibrations

Copyright 2018 Reality AI © All Rights Reserved Page 6

Copyright 2018 Reality AI © All Rights Reserved Page 7

It’s not overkill

Copyright 2018 Reality AI © All Rights Reserved Page 8

Rich data, Poor data

Figure 4 – Why basic statistics are never

Copyright 2018 Reality AI © All Rights Reserved Page 9

Features for signal classification

Copyright 2018 Reality AI © All Rights Reserved Page 11

Time-frequency plot showing features Time-frequency plot showing features

Copyright 2018 Reality AI © All Rights Reserved Page 12

Deep Learning and Feature Discovery

My co-founder Jeff (the mathematician) explains that DL is basically “a generalized

It’s really all about the features.

Copyright 2018 Reality AI © All Rights Reserved Page 13

Copyright 2018 Reality AI © All Rights Reserved Page 14

Copyright 2018 Reality AI © All Rights Reserved Page 15

Generalization and Overtraining

Copyright 2018 Reality AI © All Rights Reserved Page 16

It also helps to be very skeptical of “perfect”

It’s impossible to be sure without looking more

Detect overtraining and predict generalization

Lab Experiments vs Real World Products

Copyright 2018 Reality AI © All Rights Reserved Page 18

Copyright 2018 Reality AI © All Rights Reserved Page 19

Copyright 2018 Reality AI © All Rights Reserved Page 20

Model-Driven is the way everybody learned to do it in Engineering School.

Data-Driven is a new way of thinking, enabled by machine learning. Find an algorithm

Both of these approaches have their pluses and minuses:

Model-Driven approaches limit

Models can’t accommodate infinite com-

Model-Driven is expensive and takes time

Furthermore, modeling takes time. It is inherently a trial-and-error approach, rooted in

Copyright 2018 Reality AI © All Rights Reserved Page 22

Copyright 2018 Reality AI © All Rights Reserved Page 23

But this is changing. Fundamentally. Software is becoming the new sensor.

Copyright 2018 Reality AI © All Rights Reserved Page 24

This only makes sense.

Copyright 2018 Reality AI © All Rights Reserved Page 25

Copyright 2018 Reality AI © All Rights Reserved Page 26

What is high-sample-rate data?

Machine logs and slower time series (eg pres-

One second of sound captured at 14.4kHz

Copyright 2018 Reality AI © All Rights Reserved Page 27

But this is about collecting data

1. Collect rich data

2. Use the maximum sample rate available, at least at first

3. Don’t over-engineer your rig

The Reality AI Starter Kit is an all-inclusive

Learn more about Reality AI Starter Kit