Вы находитесь на странице: 1из 18

Running head: GRAYCEMTIM8130-8 1

From Maersk Data Mining to Deep Machine Learning

Melissa Grayce

Northcentral University

TIM-8130: Data Mining

Dr. David Bouvin

March 31, 2019


GRAYCEMTIM8130-8 2

From Maersk Data Mining to Deep Machine Learning

Gartner defines The Internet of Things (IoT) as a network of physical objects containing

embedded technology to communicate event data based on the device sensing or interacting with

their internal or external environment (Gartner, n.d.). Experts expect the number of devices

within the IoT to reach 50 billion by 2024 (Alam, Mehmood, Katib, & Albeshri, 2016). This

number of devices would generate a massive amount of valuable data, but can the existing data

mining tools turn that data into valuable information?

Answering that question is the goal of this paper. To begin the search for the answer the

first section describes the journey of Maersk to turn IoT data into information to assist their

organization in reaching business goals. Specifically, the case study section discusses their

business goals, data collection, and system composition. The next section presents information

on logic components and statistical techniques from studies demonstrating similar solutions from

data elements and solutions comparable to those from the Maersk case study. Finally, the data

mining theory section focuses on current and future machine learning research. Researchers

assert the machine learning technique, known as deep learning, provides insights from IoT data

not possible using other data mining algorithms (Alam et al., 2016). It is that assertion which

provides the basis for the specificity of the final section.

Case Study

Maersk is the largest operating subsidiary of the Danish conglomerate AP Moller -

Maersk A/S. It is the largest container shipping company operating in 343 ports across 121

countries (Murison, 2016). They understand the value of data and technology to their revenue.

Their journey in the value of data began with the information from Maersk.com. It is a

massive B2B transaction site supporting over 60,000 customers and averaging $1.3 million in
GRAYCEMTIM8130-8 3

revenue per hour (Sharma, Shrivastava, Laghate, & Mendonca, n.d.). The significant amount of

detail in this system introduced the company to the value of data mining related to supply chain

and customer relationship management.

Maersk proves its commitment to technology by creating a startup accelerator program,

known as OceanPro. (Sharma et al., n.d.). The sole purpose of this organization is to enable

innovative companies to deliver state-of-the-art technology to traditional shipping processes.

The Maersk journey in implementing IoT technology began in 2012 when they partnered

with Ericsson to install real-time monitoring across its fleet (Murison, 2016). Luckily, it was a

well-publicized journey, resulting in the availability of the information in this section related to

their business goals, data collection, and system composition.

Business Goals

At Maersk, there are four business goals, which they improved through their IoT

initiatives. First, saving fuel is a business goal with a direct impact on the bottom line. They

installed flow meters and fuel sensors (Paris & Sudal, 2018). The company integrated data from

these sensors with weather and sea current forecasts to save fuel (Paris & Sudal, 2018). The

predictive models used in data mining can use historical and current data to predict fuel

consumption in time to allow for management decisions to reduce the impact of weather and

current changes. Equally crucial to managing fuel decisions at sea is monitoring refrigerated

containers.

While at sea, the containers are isolated from support in the event of a power failure

(Murison, 2016). The company uses over 300,000 containers to transport food items, which

require climate control throughout the shipment. Their business goal related to these containers

is to maintain the temperature requirements to prevent the loss of product.


GRAYCEMTIM8130-8 4

The next business goal is to decrease the manual inspection of the containers while at a

port. Historically, each container receives a costly pre-trip inspection (PTI); however, data from

the container sensors allow the company to remotely analyze the container and compare the

current condition to the expected condition (Murison, 2016). Not performing inspections of

every container reduces labor and expense, while increasing the safety of the port staff by

limiting their interaction with the containers (Murison, 2016). Limiting inspections is only one

of the cost-saving business goals while in port.

The final goal is to reduce the idle time at ports. Through better coordination of port

activities, where predictive analysis aids the scheduling, the company can save time and money

(Paris & Sudal, 2018). These goals are all within reach using the data gained through the IoT

sensors deployed on Maersk ships and containers.

Data and Decisions

The Erikson and Maersk partnership deployed thousands of sensors to each ship, making

one ship is capable of transmitting more than 2 gigabytes of data to the Maersk systems each day

(Matthews, 2017). Maersk systems ingest the data directly from the sensors transmissions,

which means there is no manual input of data required from employees.

One of the systems consuming this data is the Remote Container Management (RCM)

system, which monitors the condition and temperature of the shipping containers. The company

was able to reduce the container inspection process through this system. Instead of conducting a

PTI on every container, the company monitors the condition of the container. If the container

performs according to expectations, then it only receives a quick visual inspection. Maersk used

this analysis to reduce the number of PTI by 60% (Murison, 2016). Not only has this decreased

cost and increased safety, but it also reduced the carbon emissions associated with the ship idling
GRAYCEMTIM8130-8 5

at a port.

Maersk also used RCM to monitor the temperature of the containers in real time. The

technology from Ericsson transmits vital statistics via satellite, including temperature, location

and power supply (Murison, 2016). The company established climate thresholds, based on

historical data associated with the products shipped, for each container in a current shipment.

If there is a deviation in the temperature of a container, then the company analyzes the

measurements to identify the extent of the problem and takes action to avoid damage to the cargo

(Murison, 2016). Over a fifteen-week pilot, the company remotely changed the temperature set

points in the containers to avoid a potential claim of lost product (Murison, 2016). These actions

enabled the company to use data for cost avoidance. While all of these goals are achievable with

the right data, the hardware and personnel supporting the systems will determine the ultimate

success of the company.

Hardware and Personnel

Exploring the hardware and personnel provides insight into the operations of Maersk

systems. First, there are three components involved in the RCM system, a GPS unit, a 3G SIM

card, and a GSM antenna. (Murison, 2016). The fleet of over 400 ships generates more than 30

terabytes of data per month (Matthews, 2017). This amount of data makes an on-premises

solution virtually impossible. To that end, Maersk announced Its new agreement to use

Microsoft’s Azure as its preferred cloud service provider (Matthews, 2017). The system

transmits all of the data to the cloud for processing. The cloud provider should be able to

accommodate any future data growth experienced at Maersk. A cloud solution provided future

proofing for their digital journey.

While the hardware components and cloud deployment show how serious the company is
GRAYCEMTIM8130-8 6

about using data, their investment in personnel shows the same future vision. Maersk Data was

established in 1970 as an in-house IT function. The group launched the Maersk

Communications System, a global satellite communication system, as well as, customer service

and document management systems before being bought by International Business Machines in

2004. Maersk Data was too small to meet the future needs of the shipping company and the

investment required to properly grow the IT group would remove focus from the core business of

Maersk (Sherriff, 2004). By selling it to IBM, Maersk Data got the investment it needed to keep

pace with the industry and Maersk got a technology supplier at the leading edge of the industry.

Maersk’s commitment to IT innovation did not end there.

In 2018, the company started OceanPro, a technology startup accelerator program

(Sharma et al., n.d.). One of the companies funded by OceanPro is Dhruv, a company with a

focus on data mining. Currently, Dhruv is creating an E2E tracking solution to track the progress

of a container across the ocean using existing data in Maersk systems (Sharma et al., n.d.).

Another company benefiting from OceanPro, Linkeddots, specializes in industrial IoT. They are

in the process of developing a hardware and provider agnostic application using GPS data to

provide visibility into the container as other companies transport it over land (Sharma et al.,

n.d.). Both of these solutions have the potential to increase profit at Maersk, without requiring

Maersk to build the solutions in house. Their investment in technology companies allows them

access to cutting-edge technology in the future without bearing full development costs. The

approach also has the side benefit of creating viable companies to be employers to many more

people in the future.

Technical Implementation

While there was a lot of published information on the IoT journey of Maersk, there were
GRAYCEMTIM8130-8 7

not many details on the actual design specifications. Therefore, this section derives its

information on logic components and statistical techniques from studies demonstrating similar

solutions from comparable data elements.

Logic Components

The Cross Industry Standard Process for Data Mining (CRISP-DM) project defined a six-

phase process for data mining efforts (Wirth & Hipp, 2000). In the modeling phase of data

mining, organizations use mathematical models to describe the business logic applied to the data.

The purpose of these models is to find patterns, which either describe the data or predict future

values based on the data. This process differs from traditional statistical analysis with its focus

on using inference to establish the parameters of the population (“What is data mining”, n.d.). In

data mining, the interaction between the data preparation and modeling iteratively define the

logic.

When considering the goal of monitoring container temperature, iteration to produce the

logical framework for a model begins with mining historical information regarding the container

type, shipping route, product type, and season to develop association rules to apply during

container monitoring. Previous research used the Apriori algorithm to find frequent itemsets and

extract 120 association rules from the input data including transit time, temperature, season,

conveyance, package, and product type (Wang & Yue, 2017). Maersk can use the same

technique to extract association rules from their historical shipping data. After validating the

rules generated by the algorithm, the company can use those rules as the threshold criteria for the

system monitoring the containers.

The goal of reducing container inspections can use outlier detection techniques. One

study used the K-means clustering algorithm to find groups in water consumption data (García
GRAYCEMTIM8130-8 8

Valverde, González, Quevedo Casín, Puig Cayuela, & Saludes Closa, 2015). The algorithm

processes through the data iteratively using the features provided to assign each data point to a

group (Trevino, 2016). After the algorithm created the groups and by default create the logic,

the researchers used scatter plots to visualize the data. The scatter plots illustrated the presence

of outliers, which required human intervention to either validate the cause or design the remedy.

Maersk could use the K-means clustering algorithm with their historical data to find groups of

shipment types related to container condition. After grouping the data, the company can use

scatter plots to visualize current data, enabling them to recognize the outliers and predict

container condition.

Statistical Techniques

One of the primary tools for predictive analytics is regression analysis, which is an

established statistical technique (Davenport, 2014). Regression analysis uses a model of the

relationship between variables to forecast the change of dependent variables based on the

fluctuations of an independent variable (Mishra & Silakari, 2012). In practice, a regression

analysis begins with an analyst defining the set of independent variables. The analyst performs a

regression analysis to identify the correlation of those variables to the dependent ones. This step

generally requires multiple iterations to produce the appropriate model. Once the analyst

establishes the model, they use the regression coefficients to generate a score forecasting the

likelihood of the future event (Davenport, 2014). To prove the slope of the regression differs

significantly from zero, the analyst performs a t-test.

In data science, researchers use the t-test to determine if the difference between the two

sets of data has a statistical significance. Although there are different types of t-tests, most are

parametric tests, which generally requires the data to have a normal distribution. The analyst
GRAYCEMTIM8130-8 9

needs to plot the frequency to determine distribution. After determining the distribution, the next

step is to calculate the standard deviation and mean, which are inputs for the t-test calculation.

The larger the number of the t-statistic, the higher the evidence there is a statistical difference.

However, the meaning of the t-statistic is in the calculation of probability.

One of the issues with linear regression is that the analyst must choose the types of basic

functions (Mahdavinejad et al., 2018). Of course, choosing the parameters is difficult. In these

cases, using machine learning techniques allows the model to adjust the parameters of the basic

functions as it trains on a dataset (Mahdavinejad et al., 2018). This technique highlights the

iterative nature between using data mining in preprocessing and using data mining to model.

There is a fragile line between the portions of specific algorithms responsible for each activity.

The section below explores that fine line in more detail.

Data Mining Theory

A significant objective of IoT is to apply computational intelligence to data, through real-

time and historical feeds, creating information used to automatically make smart decisions (Alam

et al., 2016). IoT devices supply data, but data mining produces the patterns from which

organizations derive information desired for making decisions. Of course, the biggest challenge

in using data mining for IoT applications is the applicability of the conventional data mining

algorithms. A significant amount of research focuses on analyzing the performance and

accuracy of these algorithms against more extensive datasets.

One problem with this research is trying to identify conventional data mining algorithms

from machine learning algorithms. Experts assert that mined datasets provide the input for

machine learning (Cross, 2018). However, it is not that simple.

One study exploring the ability of conventional data mining algorithms to work for the IoT
GRAYCEMTIM8130-8 10

datasets used eight data mining algorithms, which were all machine learning algorithms (Alam et

al., 2016). One vendor site asserted that machine learning takes the concept of data mining to the

next level by using the algorithms to automatically learn from and adapt to the data (“Data

mining vs. machine learning: What’s the difference?”, 2017). Since machine learning appears to

be the future of data mining, the remainder of this section focuses on current and future machine

learning research.

Machine Learning

To better understand artificial intelligence (AI), machine learning, and deep learning, this

section begins with an abbreviated overview of how the three concepts fit together. Over 60

years ago, John McCarthy coined the term AI to refer to machines capable of performing tasks

generally associated with human intelligence, including understanding language, learning, and

problem-solving as well as recognizing objects and sounds (McClelland, 2017). Machines use

algorithms to preforms the tasks, but merely using algorithms does not make a machine

intelligent. It is the nature of the task, which defines AI.

Machine learning is a type of AI, which describes the ability of a computer to receive a data set

and learn from it (Venkatesan, 2018). In this case, learning involves the computer adjusting its

model based on the parameters returned from the training dataset.

An artificial neural network (ANN) is a machine learning technique, which consists of

algorithms inspired by the human brain (Brownlee, 2016). It is characterized by non-linear, high

parameter models containing sets of processing units, known as neurons, used to approximate the

relationship between inputs and outputs within a complex system (Zhang et al., 2018). The

algorithms are not task-based, but seek to discover representations from data input. The more

data it ingests, the better the representation it learns.


GRAYCEMTIM8130-8 11

Deep Learning ANNs (DLANNs), an extension of ANNs based on deep learning

concepts, use their extreme learning ability to process an enormous amount of data and produce

highly accurate results (Alam et al., 2016). DLANNs achieve higher accuracy rates than other

conventional machine learning and data mining algorithms by using vectors of real numbers.

When inputs do not have a natural vector representation, embedding functions map

discrete objects, such as words, to vectors. Analysts typically use neural network embeddings to

find the nearest correlation between entities in a vector space, to provide input to a supervised

machine learning model, or to create visualizations of relationships between categories

(Koehrsen, 2018). However, previous research extended the use of embeddings to create the

output schema for a knowledge base by using patterns within mentions of concepts to define

relations (Riedel, Yao, McCallum, & Marlin, 2013). Additional research extended the use of

embeddings to create representations of textual metadata in different languages (Song, Batjargal,

& Maeda, 2017). This research shows the ability of machine learning to embrace ever more

complicated data types. While the research shows the art of the possible, it does not prove the

performance of machine learning is acceptable for typical business intelligence applications.

The study previously mentioned in this section compared the performance with the

classification accuracy (CA) of eight common machine learning algorithms on IoT datasets. The

study was performed on the Aziz supercomputer, which has a total of 11,904 cores in 496 nodes

and delivers a peak performance of 230 teraflops (Alam et al., 2016). While this level of

computing power is not available to business intelligence consumer, the results are still valid.

The study found that ANNs and DLANNs had the best CA but also required the most computing

power (Alam et al., 2016). The DLANN algorithm had the longest execution time (Alam et al.,

2016). However, the researchers suggest using Linear Discriminant Analysis (LDA) when
GRAYCEMTIM8130-8 12

processing time matters. LDA achieved a CA of 81.85% in 0.98 seconds compared to 99.52% in

12600 seconds by DLANN (Alam et al., 2016). It is an acceptable tradeoff for those businesses

without access to a supercomputer. There is no doubt that future research into DLANN

algorithms will work to decrease execution time by creating distributed and parallel processing

techniques.

Future Research

There are two additional areas of future research for DLANN algorithms. First, noisy

data exerts significant influence over these algorithms (Mahdavinejad et al., 2018). They require

additional research into removing noise to make them more commercially viable. One area

worth considering is contrastive principal component analysis (cPCA), which discovers low-

dimensional structures unique to a dataset useful for removing noise or selecting features (Abid,

Zhang, Bagaria, & Zou, 2017). The study introducing this technique conducted experiments

providing the ability of cPCA to identify dataset-specific patterns missed by PCA (Abid et al.,

2017). Extending this research to enable the technique to function inside of the DLANN

algorithm could potentially circumvent the influence of the noise.

Another difficulty with neural network-based algorithms is their black box nature. Data

scientists cannot easily explain the rationale behind the model results (Mahdavinejad et al.,

2018). Specifically, the challenge is to identify the most critical descriptors or predictors and to

relate them to the property being modeled (Zhang et al., 2018). Some preliminary research into

this issue has been promising.

One study provided three methods used to understand neural network models (Zhang et

al., 2018). First, Garson’s algorithm dissects the model weights to describe the relative

importance of a descriptor or predictor in connection with outcome variables (Zhang et al.,


GRAYCEMTIM8130-8 13

2018). Second, the study presents the Lek’s profile method, which explores the relationship of

the outcome variable and a predictor (Zhang et al., 2018). Finally, the researchers presented the

local interpretable model-agnostic explanations (LIME) method, which locally approximates an

interpretable model to show classification or regression predictions (Zhang et al., 2018). While

researchers have begun to address noise in data, more is needed to make DLANNs commercially

viable.

Conclusion

The goal of this paper was to determine if the existing data mining tools were capable of

turning IoT data into valuable information. It began by exploring the business goals of Maersk

and how their data collection and analysis efforts lead to decisions, which enabled them to meet

their goals. Maersk illustrated their commitment to their digital journey by creating a startup

accelerator, OceanPro, to ensure the hardware and personnel supporting their data analysis

systems would not be a limiting factor.

The next section presented information on logic components and statistical techniques

from studies demonstrating similar solutions from data elements and solutions comparable to

those from the Maersk case study. It showed how the Apriori algorithm could create the logic

for climate thresholds for containers, as well as, how the k-means clusters could be used for

outlier detection when determining container inspection requirements. The statistical techniques

included a discussion of regression analysis and t-tests.

Finally, the data mining theory section began with a discussion regarding the fine line

between data mining and machine learning. Since machine learning appears to be an extension

of data mining encompassing many of the same algorithms, the remainder of the section focused

on current and future machine learning research. The section began with an abbreviated
GRAYCEMTIM8130-8 14

overview of how artificial intelligence (AI), machine learning, and deep learning, both fit

together and are separate. DLANN algorithms achieve the highest accuracy rates by using

vectors and embeddings, but they require more computing resources and have significantly

slower performance than other conventional machine learning and data mining algorithms.

Future research into DLANN algorithms needs to focus on methods to reduce the influence of

noisy data and methods to explain the black box areas of the models. The overall answer to the

question is the tools exist to mine IoT data, but there is room for improvement.
GRAYCEMTIM8130-8 15

References

Abid, A., Zhang, M. J., Bagaria, V. K., & Zou, J. (2017). Contrastive Principal Component

Analysis. ARXIV, 1. Retrieved from https://arxiv.org/abs/1709.06716

Alam, F., Mehmood, R., Katib, I., & Albeshri, A. (2016). Analysis of eight data mining

algorithms for smarter internet of things (IoT). Procedia Computer Science, 98, 437–442.

http://dx.doi.org/10.1016/j.procs.2016.09.068

Brownlee, J. (2016). What is deep learning? Retrieved from

https://machinelearningmastery.com/what-is-deep-learning/

Cross, A. (2018). Data mining vs. machine learning: What’s the difference? Retrieved from

https://www.ngdata.com/data-mining-vs-machine-learning/

Data mining vs. machine learning: What’s the difference? (2017). Retrieved from

https://www.import.io/post/data-mining-machine-learning-difference/

Davenport, T. H. (2014). A predictive analytics primer. Retrieved from

https://hbr.org/2014/09/a-predictive-analytics-primer

García Valverde, D., González, D., Quevedo Casín, J. J., Puig Cayuela, V., & Saludes Closa, J.

(2015). Water demand estimation and outlier detection from smart meter data using

classification and big data methods. 2nd New Developments in IT & Water Conference,

1–8. Retrieved from http://hdl.handle.net/2117/26473

Gartner. (n.d.). IT glossary. Retrieved from https://www.gartner.com/it-glossary/internet-of-

things/

Koehrsen, W. (2018). Neural network embeddings explained. Retrieved from

https://towardsdatascience.com/neural-network-embeddings-explained-4d028e6f0526

Mahdavinejad, M. S., Rezvan, M., Barekatain, M., Adibi, P., Barnaghi, P., & Sheth, A. P.
GRAYCEMTIM8130-8 16

(2018). Machine learning for internet of things data analysis: A survey. Digital

Communications and Networks, 4, 161–175.

http://dx.doi.org/10.1016/j.dcan.2017.10.002

Matthews, K. (2017). What Maersk’s adoption of Microsoft Azure means for the future of

commercial shipping data. Retrieved from https://insidebigdata.com/2017/05/13/maersks-

adoption-microsoft-azure-means-future-commercial-shipping-data/

McClelland, C. (2017). The difference between artificial intelligence, machine learning, and

deep learning. Retrieved from https://medium.com/iotforall/the-difference-between-

artificial-intelligence-machine-learning-and-deep-learning-3aa67bff5991

Mishra, N., & Silakari, S. (2012). Predictive analytics: A survey, trends, applications,

oppurtunities & challenges. International Journal of Computer Science and Information

Technologies, 3, 4434–4438. http://dx.doi.org/10.1.1.301.7387

Murison, M. (2016). Maersk and Ericsson collaborate for IIoT success story. Retrieved from

https://internetofbusiness.com/maersk-ericsson-iot-success/

Paris, C., & Sudal, M. (2018). With container ships getting bigger, maersk focuses on getting

faster. Wall Street Journal. Retrieved from https://www.wsj.com/articles/with-container-

ships-getting-bigger-maersk-focuses-on-getting-faster-11545301800

Riedel, S., Yao, L., McCallum, A., & Marlin, B. M. (2013). Relation extraction with matrix

factorization and universal schemas. Proceedings of the 2013 Conference of the North

American Chapter of the Association for Computational Linguistics: Human Language

Technologies, 74–84.

Sharma, M., Shrivastava, A., Laghate, G., & Mendonca, J. (n.d.). World’s largest shipping

company is looking for tech innovation. Indian startups maybe the answer. The Economic
GRAYCEMTIM8130-8 17

Times. Retrieved from https://economictimes.indiatimes.com/small-

biz/startups/newsbuzz/worlds-largest-shipping-company-maersk-is-looking-for-tech-

innovation-indian-startups-maybe-the-answer/articleshow/68459732.cms

Sherriff, L. (2004). IBM buys Maersk Data. Retrieved from

https://www.theregister.co.uk/2004/08/17/ibm_buys_maersk/

Song, Y., Batjargal, B., & Maeda, R. (2017). Finding the identical Ukiyo-e prints across multiple

digital collections in different languages. Proceedings of the Conference of

Transdisciplinary Federation of Science and Technology, 1.

http://dx.doi.org/0.11487/oukan.2017.0_E-4-5

Trevino, A. (2016). Introduction to k-means clustering. Retrieved from

https://www.datascience.com/blog/k-means-clustering

Venkatesan, M. (2018). Artificial intelligence vs. machine learning vs. deep learning. Retrieved

from https://www.datasciencecentral.com/profiles/blogs/artificial-intelligence-vs-

machine-learning-vs-deep-learning

Wang, J., & Yue, H. (2017). Food safety pre-warning system based on data mining for a

sustainable food supply chain. Food Control, 73, 223–229.

http://dx.doi.org/10.1016/j.foodcont.2016.09.048

What is data mining. (n.d.). Retrieved from http://www.statsoft.com/textbook/data-mining-

techniques

Wirth, R., & Hipp, J. (2000). CRISP-DM: Towards a standard process model for data mining.

Proceedings of the 4th International Conference on the Practical Applications of

Knowledge Discovery and Data Mining, 29–39. http://dx.doi.org/10.1.1.198.5133

Zhang, Z., Beck, M. W., Winkler, D. A., Huang, B., Sibanda, W., & Goyal, H. (2018). Opening
GRAYCEMTIM8130-8 18

the black box of neural networks: Methods for interpreting neural network models in

clinical applications. Annals of Translational Medicine, 6, 216.

http://dx.doi.org/10.21037/atm.2018.05.32

Вам также может понравиться