Manufacturing Big Data Ecosystem A Systematic Literature Review

Robotics and Computer Integrated Manufacturing 62 (2020) 101861
Contents lists available at ScienceDirect
Robotics and Computer Integrated Manufacturing

journal homepage: www.elsevier.com/locate/rcim
Review
Manufacturing big data ecosystem: A systematic literature review T

a,⁎ a a,b
Yesheng Cui , Sami Kara , Ka C. Chan
a
Sustainable Manufacturing and Life Cycle Engineering Research Group, School of Mechanical and Manufacturing Engineering, The University of New South Wales Sydney,
Sydney, NSW 2052, Australia
b
School of Management and Enterprise, Faculty of Business, Education, Law and Arts, University of Southern Queensland, Springfield, QLD 4305, Australia
ARTICLE INFO ABSTRACT
Keywords: Advanced manufacturing is one of the core national strategies in the US (AMP), Germany (Industry 4.0) and
Smart manufacturing China (Made-in China 2025). The emergence of the concept of Cyber Physical System (CPS) and big data im-
Big data peratively enable manufacturing to become smarter and more competitive among nations. Many researchers
Cloud computing have proposed new solutions with big data enabling tools for manufacturing applications in three directions:
Cloud manufacturing
product, production and business. Big data has been a fast-changing research area with many new opportunities
Internet of things
for applications in manufacturing. This paper presents a systematic literature review of the state-of-the-art of big
NoSQL
data in manufacturing. Six key drivers of big data applications in manufacturing have been identified. The key
drivers are system integration, data, prediction, sustainability, resource sharing and hardware. Based on the
requirements of manufacturing, nine essential components of big data ecosystem are captured. They are data
ingestion, storage, computing, analytics, visualization, management, workflow, infrastructure and security.
Several research domains are identified that are driven by available capabilities of big data ecosystem. Five
future directions of big data applications in manufacturing are presented from modelling and simulation to real-
time big data analytics and cybersecurity.
1. Introduction Customer Relationship Management (CRM) [13] etc. However, smart

manufacturing cannot be realized with traditional manufacturing soft-
Smart manufacturing is critical to national economies by providing ware and technologies due to two main challenges. First, these systems
jobs, improving innovation and advancing sustainability [1]. Several and software cannot be fully integrated and collaborative since they are
national strategies were initiated to boost their competitiveness of developed by multiple vendors using different interfaces or protocols.
manufacturing, such as: ‘Industry 4.0’ in Germany [2], ‘Advanced Second, manufacturers cannot perceive and respond to the real-time
Manufacturing Partnership (AMP)’ program in the United States [3], changes on time from the factory, supply chain and market since the
‘Made in China 2025’ and so on. These initiatives provide massive po- traditional manufacturing software lack sensory data to notice the
tential to envision the future of manufacturing: Smart manufacturing, changes inside and outside the systems.
which is defined by National Institute of Standards and Technology Digital thread and digital twin are two recent concepts proposed by
(NIST) as a completely integrated, collaborative manufacturing system integrating disparate systems over the product lifecycle [14] and
that respond in real time to meet changing demands and conditions in building up the real-time relationship between the physical space and
the factory, in the supply network and in customer needs [4]. the cyberspace in manufacturing [15] respectively. Cyber-Physical
Manufacturing industry uses a wide range of software and auto- systems (CPS) are physical and engineered systems, which are mon-
mation systems to increase efficiency and productivity from shop floors itored, controlled, coordinated and integrated with computing and
to enterprise layers such as CNC machines, Programmable Logic communicating core [16]. Internet of Things (IoT) is data-accessing and
Controllers (PLC), Supervisory Control And Data Acquisition System data-processing technologies on the cyberspace to perceive the real-
(SCADA) [5], Manufacturing Executive System (MES) [6], product de- time changes of physical space with sensory tools [17,18]. Digital
sign and development (CAx: CAD, CAPP, CAM, CAE [7]), Product thread and digital twin can be enabled by using IoT and CPS. With the
Lifecycle Management (PLM) [8], Enterprise Resource Planning system practices of these new concepts and technologies, massive data will be
(ERP) [9], Operating and Maintenance (O&M) [10], Energy Manage- generated from the systems, which have to go through the processes of
ment System (EMS) [11], Supply Chain Management (SCM) [12], data collection, storage, aggregation, analysis and exchange to provide
⁎
Corresponding author.
E-mail address: yesheng.cui@unsw.edu.au (Y. Cui).
https://doi.org/10.1016/j.rcim.2019.101861
Received 30 January 2019; Received in revised form 28 August 2019; Accepted 10 September 2019
0736-5845/ © 2019 Elsevier Ltd. All rights reserved.
Y. Cui, et al. Robotics and Computer Integrated Manufacturing 62 (2020) 101861
Nomenclature MES Manufacturing executions system

NoSQL Not only structured query language
BDA Big data analytic NIST National institute of standards and technology, USA
BI Business intelligence OLAP On-line analytic processing
CAD Computer-aided design OLTP On-line transaction processing
CAPP Computer-aided process planning O&M Operation and maintenance
CAM Computer-aided manufacturing OPC-UA OPC unified architecture
CAE Computer-aided engineering OWL Web Ontology language
CNC Computer numerical control PDM Product data management
CRM Customer relationship management PLM Product lifecycle management
DSS Decision support system QMS Quality management system
EOL End-of-life RDF Resource description framework
ERP Enterprise resource planning RDBMS Relational database management system
HTML Hypertext markup language SCADA Supervisory control and data acquisition
IIoT Industry internet of things SCM Supply chain management
JSON JavaScript object notation STEP Standard for exchange of product data
KM Knowledge management STEP-NC STEP for numerical control
MES Manufacturing execution system XML Extensible Markup Language
MOM Manufacturing operations management
timely information to manufacturers. Being empowered with cloud manufacturing, many researchers reported big data based solutions,
computing [19], data science [20] and Artificial Intelligence (AI) [21], which use a set of big data tools to address problems to enable smart
big data focuses on addressing the big data issues amongst the pro- manufacturing. In 2012, a Hadoop-based sensor data management
cesses, which traditional manufacturing tools cannot. Big data could be framework was proposed for cloud manufacturing [33]. In 2014, Tao
the enabler of the concepts of digital thread and digital twin. proposed the architecture of cloud manufacturing system as well as the
Smart manufacturing gains actionable knowledge in real-time with investigation of applying cloud computing and IoT technology in
the fusion of big data and manufacturing knowledge. As big data are manufacturing [34]. In 2015, the Cloud-Based Design Manufacturing
collected and analyzed to extract timely information, manufacturing paradigm (CBDM) was proposed by comparing other design methods
industry may still not know which approach to use, and their impacts [35]. In 2017, Nagorny etc. conclude that big manufacturing data and
without the domain knowledge [22]. The actionable knowledge is big data analytics would provide a vast potential in smart manu-
created when manufacturers get timely information from big data and facturing [26]. In 2017, Wu presented a fog computing-based frame-
apply manufacturing knowledge in a specific application. Some ex- work to monitor machine health in cyber-manufacturing. In 2018, Tao
amples found in the reviewed literature are: identifying the reasons of proposed a data-driven conceptual framework to interoperate with
faults from the production process by analyzing real-time big proces- ERP, MES, CRM, PLM systems in manufacturing [36]. Big data tools
sing data and manufacturing knowledge [23], predicting maintenance complement and provide additional functionalities to address manu-
intervals by utilizing knowledge discovery and hundreds of machine facturing big data issues which could not have been solved by tradi-
data attributes [24], making real-time scheduling and cost-effective tional approaches.
decisions in MES system with streaming shop floor data and existing Three challenging issues have to be addressed in the research of big
manufacturing systems [25] data in manufacturing. Firstly, big data tools from Internet industry do
Similar to big data, manufacturing industry faces the same chal- not consider the differences between Internet and manufacturing. Most
lenges associated with 5Vs of big data (Volume, Velocity, Variety, manufacturing data is standardized which is supported by various in-
Veracity and Value) [26]. IDC reports that manufacturing has the lar- dustrial vendors and associations such as manufacturers of CNC ma-
gest share of data (3584 Exabyte) in 2018 and will have 30% annual chines, meters and sensors, controllers and software companies. The
growth rate of data from 2018 to 2025 [27]. Of the reported data types, manufacturers use different hardware interfaces, communication pro-
structured data, such as tabular data in relational databases or tocols, manufacturing machine readable languages, and semantical
spreadsheets, accounts for only 5% of all the data generated [23]; while definitions. Whereas, most data in the internet is based on natural
the rest is made up of semi-structured and unstructured data with for- languages and easier to be exchanged without the difficulties associated
mats JSON, XML, image, video, and audio, etc. Issues of velocity, with multiple interfaces and protocols. The differences between the two
variety and veracity can be explained that the same type of data come industries have not been taken into account in developing big data
from different devices with various sampling frequencies, formats, tools. Secondly, big data tools are massive, diverse, with many overlap
precisions, which leads to inconsistent data and makes challenging to functions. It is challenging to design big data based solution by selecting
extract the value-added insight to manufacturers. Amir et al. illustrate suitable big data tools. However, designing big data solutions not only
that the limitations of the traditional methods (relational database depends on big data tools but are closely associated with the specific
management systems (RDBMS) and on-premise software) could not manufacturing applications and scenarios. This paper categorizes si-
handle big data [28]. As more manufacturing enterprises generate big milar types of big data tools and identifies the differences as the pre-
data, the issues of big data will become pressing. paration to achieve this purpose. Thirdly, many manufacturing systems
Tapping into the capabilities of big data tools presents enormous have dedicated aims, complicated and sophisticated functions, and are
opportunities for smart manufacturing. A large number of big data tools closely specialized with application scenarios. Social media uses big
are developed by the big players in the Internet industry such as data tools to collect and store time series data from billions of customers
Google, Yahoo, Facebook for their own applications in search engines, who follow each other online; and to report trending events. For
social media, and business analytics [29,30,31], such as Apache Ha- manufacturing, time series data from multiple data sources can be
doop [30], Apache Spark, Apache Flume, Apache Flink, Apache Storm, collected, integrated, and analzsed to explain the states of manu-
NoSQL and NewSQL databases [32], Apache Hive, Apache Pig, Apache facturing entities. For example, the sensor data collected from a CNC
Zookeeper etc. As these tools are enterprise-ready to use in machine reflects the state of the machine and can be used to develop
2
simulation models or prediction models for preventive maintenance. As their capabilities need to be effectively classified and analyzed to know
many solutions are proposed, a systematic literature review is required which Vs are addressed.
to identify the data issues in manufacturing, capabilities of big data Thirdly, gaps of data applications in manufacturing could be iden-
tools, essential components to design big data based solutions in man- tified by systematically reviewing the capabilities of the traditional
ufacturing and the potential research directions of big data in manu- manufacturing systems and big data analysis. Since much traditional
facturing. manufacturing software has been widely used in enterprises, big data
The rest of this paper is presented as follows: Section 2 presents the could integrate and collaborate the software and systems as well as
methodology to systematically review the state of the art of big data providing timely information. The massive amount of data generated by
research in manufacturing; Section 3 presents the outcomes of the these applications can be fed back to the big data ecosystems for ana-
systematic review; Section 4 discusses several critical issues of big data lytics and innovative applications such as prediction, optimization,
ecosystem in manufacturing including critical drivers, system require- monitoring, simulation and visualization, etc. Therefore, these appli-
ments, essential components, research innovation and future directions; cation gaps would be the future research directions in academia and the
and finally, Section 5 presents the conclusion of this systematic review. demands in the industry.
In summary, knowing the data requirements of manufacturing ap-
plications, understanding the capabilities of big data tools, and identi-
2. Methodology
fying the gaps will help define future research directions and generate
new ideas for innovative applications. This systematic literature review
This paper presents a systematic literature review (SLR) on the
presents a holistic overview of big data in manufacturing to study the
current state of research associated with big data technologies in
possible use cases for manufacturing. As shown in Fig. 1, the conceptual
manufacturing [37]. To apply big data technologies in manufacturing
framework of this systematic literature review includes three layers:
successfully, it is essential to systematically review the literature of big
data source, big data ecosystem, and data consumers.
data technologies in manufacturing from the following three perspec-
The first layer, at the bottom of Fig. 1, is the data source, which
tives: manufacturing data, big data technologies and data applications
consists of five aspects:
in manufacturing.
Firstly, manufacturing data is the foundation to conduct data-driven
1 Data types refer to the meaning of data such as temperature, hu-
manufacturing. It is impossible to propose one big data based solution
midity from the physical space, and log, email, operational data
to fit all manufacturing circumstances since different applications have
from cyberspace;
different data issues (data types, data formats and data sources) and
2 Source devices to collect data sources, which include sensors, con-
require specific tools to address. Therefore, systematically analyzing
trollers, actuators, software systems. The data type and source de-
manufacturing data could provide a useful guideline to select appro-
vices have a close relationship with the first four characteristics of
priate big data enabling technologies.
big data (Volume, Velocity, Variety, Veracity) [39]. For example, in
Secondly, the big data tools in the big data ecosystem have to be
order to know the temperature of a production line, a temperature
identified the similarities and differences. The 5Vs characters of big
sensor is selected with the determined sampling rate (speeds of data
data are widely recognized as challenges, such as volume (TB/PB level
generation), the sizes of data accumulating by time, formats and
of data size), velocity (ingesting or processing big data in streams or
quality of data from the sensor.
batches, in real time or non-real time), variety (dealing with complex
3 Data dynamics describe the states of data. Data-at-rest refers to the
big data formats, schemas, semantic models and information), value
inactive data stored in spreadsheets, databases and data ware-
(analyzing data to deliver added-value to some events), and veracity
houses; while data-in-motion refers to the active data generated by
(validate data consistency and trustworthy) [38]. In general, these big
sensors, equipment or machines, and fed into the big data ecosystem
data technologies are intended to address some Vs of big data. Hence,
Fig. 1. Conceptual framework of systematic literature review.
3
Fig. 2. Process of literature review methodology.
Table 1
Data formats in reviewed articles and NIST report.
Categories Systems Structured Semi-structured Unstructured
Product CAD/CAE/CAPP/ [53] XML [54], G-code [55], STL [56], (aml, obj, UML, AutomationML) [57], IGES, DXF, AMF, RDL ○
CAM
PLM [58] (XML, B2MML) [59], PMML [60], (RDF, SPARQL, STEP, QIF, STL) [61], PLMXML [60] pdf [61]
Production ERP [62] (XML, HTML, SCUFL) [63], PMML e-mail [64]
MOM/MES [65, 66] JSON [67], RDF [68], AutomationML [69], BatchML ○
SCADA/DCS/HMI [70] (RDF, OWL, XML) [71], SPDML [72] Image [73]
IIoT/CNC/Robot ○ (XML, UML, AutomationML) [69], OPC-UA, PLC Open, COLLADA) [74], BSON [75], JSON image [77, 78],
[76]
O&M [79, 80] cad [80], RDF [81], XML [82] Image [83], video [82]
QMS [84] BSON [85], (XML, QIF) [86] image [87], (audio, document)
[88]
Safety [89] ○ ○
Business SCM//BI/AM [90] (XML, JSON, RDF) [91], (WSDL, EPL) [92], XPDL, ebXML, BPEL, UBL, WS-CDL, OAGIS, document [93]
ICT CPS/CM/ICT [94,95] (JSON, STEP, JT Open) [96], HTML [97], (AutomationML, PLCopen) [98], (XML, XSD, RDF) e-mail [64]
[99], (UML, SysML, STEP, B2MML) [5], EDDL [100,101]
Data analytics/DM [102] BSON [102], JSON [103], Parquet [104] (image, video, document)
[105]
KM ○ (OWL, UML, RDF, SWRL) [23], (JSON, PMML, AMPL) [106], (STEP-NC, G-code, XML, DMIS, ○
QIF) [107]
○: Not found.
Text: Not found in reviewed articles (mentioned in the NIST report).
in real-time. • Computing engines: batch processing (MapReduce), iterative/near

4 Data formats are structures of the data. Data is exchangeable among real-time processing (Spark, Flink), real-time processing/streaming
various systems with consistent data formats and languages. (Storm, Flink) [41];
• Database: Relational database (RDBMS) has a standard schema but
The system aspect of data source refers to the system where data is without scalable capabilities (MySQL, Oracle DB, SQL server,
originated. There is a diverse range of manufacturing systems used in ProgresSQL); NoSQL database does not have a standard schema and
different applications such as product design, manufacturing pyramid, has scalable capability (four types NoSQL: Column-based: HBase;
product lifecycle management, supply chain management, logistics. Document-based: MongoDB; Key-value-based: Redis; Graph-based:
The second layer, the middle layer shown in Fig. 1, is the big data Neo4j); NewSQL database is scalable relational database (VoltDB)
ecosystem comprising all the big data software. This layer plays the role [42,43]), search engine (Solr, Elasticsearch);
of connecting the data sources from the layer below and the big data • Data analysis (BDA): Machine Learning (MLlib, Caffe, Tensorflow,
analytics applications at the layer above. Python), statistic (SparkR, R), OLAP,
The big data ecosystem is a set of complex and interrelated com- • Data visualization (Zeppelin, Matplotlib, Tableau, D3 [44], GraphX;
ponents to process and analzse big data [40]. Also, the ecosystem needs • Workflow which is a scheduler of the jobs of various big data tools
to store data from various data sources for data integration and ana- and dataflow which manages data transfer and data transformation
lytics as well as other applications. Therefore, data storage layer in the among different big data tools: Oozie, Kepler, Apache NiFi;
ecosystem includes database and file system technologies to store big • Data management and KM: Apache Falcon, Apache Atlas, Apache
data. The ecosystem consists of the following components and tools: Sentry, Apache Hive, Operation (Zookeeper, Ambari), Apache
Griffin, Apache Ranger, Apache Jena;
• Data collection and ingestion: log data collection (Flume), bulk data • Big data infrastructure (BDI): computing resources (general purpose
collection from a relational database (Sqoop), distributed messaging computing and HPC), cluster management (YARN, Mesos) [37],
system (Kafka), dataflow (NiFi); network communication (Software-Defined
4
Fig. 3. Smart manufacturing systems and various data formats.
Table 2
Data issues in manufacturing.
Issues 5Vs Description References
Large scale Volume, Variety Large volume dataset with massive features. [109]
Inconsistent sampling frequencies or Velocity Sensors use various sampling frequency and timestamps; [110,111,112]
timestamp Unnecessary high sampling frequency affect the real time performance, it is related with [113]
“Smart data” topic.
Batching data and streaming data Velocity Data modification is sensitive to time or not, some literature also uses the terms: data-at-rest [114]
and data-in-motion [13];
Missing value Veracity Record is empty when equipment is an anomaly; [115]
Product passes some portion of machines; [109]
Sensor is an anomaly or losing communication with sensor; [110]
Too costly to capture data by installing sensor or building models. [113]
Imbalance Veracity Small probability data in a very large dataset with most of normal data. [109]
Data outlier Veracity Data is out of range of measurement device. [110,84]
Noise or an anomaly data Veracity Data is out of the similar clustering dataset; [115]
Noise data is possibly generated by replacing the missing data. [109]
Drifting data Veracity Process drift caused by vulnerability to external environment; Sensor drift caused by [110]
modification in measuring device or calibration.
[113]
Asynchronization Veracity Several data producers (sensor and machine) use non-central time server in manufacturing [113]
enterprise.
Data correlation Veracity Nature and structure of data caused by redundant sensor arrangement; [110]
Data correlation is to improve data quality with process variables. [116]
Inconsistent simulation and collection of data from CNC machine. [107]
Data model, data format exchange and Variety, Veracity, Merge data from multiple data sources into a single view; [115]
data integration Value Exchange different data formats from various systems: OPC-UA and IoT, OPC-UA and [117,98,86,[118],
AutomationML, MTConnect with QIF, MTConnect and IEEE 1451 wired smart transducer
Integrate data models to explain the relationships of data; [113]
Data information must be available for information sharing. OWL is the enabling [99,118]
technology on resource description.
• Networks (SDN) [45], InfiniBand [46], 5G [47]) etc.; 2.1. Literature identification
• Big data security: Apache Metron [44], Apache Knox;
Pertinent articles are identified within the scope of manufacturing
Although Hadoop is a big part of the big data ecosystem with many data, big data technologies and big data based solution in manu-
big data tools [48], it lacks functions such as data flow, data manage- facturing. Fig. 2 illustrates the research method in this literature re-
ment and security [41]. view. First, four citation databases are chosen due to their compre-
Finally, the top layer represents the way data is used and data users. hensive coverage and high relevance to the scope: Scopus, IEEE Xplore,
It includes the applications of big data analytics [26] and manu- ASME Digital Collection, and ACM Digital Library. Second, several big
facturing applications [36]. The databases of traditional manufacturing data technologies and popular manufacturing data collection tools are
software maybe the data sources of the ecosystem as well, such as selected as keywords. Hadoop is chosen since it is the earliest big data
SCADA or EPR. technology which is well studied and used. The underlying technologies
5
Fig. 4. Chronological distribution of big data tools.
Fig. 5. Percentage allocation of Big Data applications (Based on our literature investigation).
of big data are computing and storage. There are a few big data com- with better features than traditional SQL. The features of NewSQL will
puting engines. Three common ones are selected: Spark, Storm and be discussed in the following Sections. Time series data widely exist in
Flink. For storage, because there are hundreds of available databases, it manufacturing such as machines, sensors, controllers. Time-Series da-
is not appropriate to limit to specific databases. All the databases could tabase (OpenTSDB) is selected since it seems more suitable for manu-
be categorized into three types: SQL, NoSQL and NewSQL. We select facturing data. Two widely adopted manufacturing data collection tools
SQL and NoSQL because NewSQL could be recognized as SQL database are selected: OPC-UA [49] and MTConnect [14].
6
Third, to further focus on the nature of this research paper, the

papers were filtered by using the abstract “Manufacturing”, “Industry
logistic regression, naïve Bayes, and a decision tree [109], regression [137], LSM [138], SVM [83], Anomaly detection [139], DTW
4.0”, “Industrial automation”, “Smart manufacturing”, “Digital twin”
and “Digital thread”. Four, manual review is implemented to select the
papers about manufacturing data issues, or big data based solutions and
applications in manufacturing. For example: some papers of other in-
dustrial sectors are found since they merely mentioned manufacturing
in the abstract, such as Oil and Gas, Healthcare, Energy and
Agriculture. Hence, the 339 articles are reviewed and 128 relevant
articles are selected. The search strings of four databases are listed in
Table 1 in Appendices.
regression [127], Distance, Regression, Self-organizing map, principal component analysis [128],
3. Results
3.1. Manufacturing systems
In 2016, NIST reported three dimensions of concerns in smart

manufacturing systems (SMS): product, production and business. Many
traditional manufacturing systems and software can be categorized into
one of the dimensions [49] (Fig. 3). Business dimension is presented in
the upper rectangle block with dash lines, which includes suppliers,
SPC [87], ANN, decision tree, random forest, SVM [143, 84]
customers and manufacturing enterprises (SCM, CRM, BI, asset man-

agement). Product dimension is located at the bottom rectangle block
Classification [133], OPL [106], KM [134], GA [73],
with solid lines. It includes objects and activities from product design to
[140], RF [141], K-means, Markov [142], KD [24],
end-of-life of the product (CAx: CAD, CAM, CAPP and CAE, PLM).
KM [106], Semantic data integration [23, 153],
Production dimension is the triangle block, which includes an entire

MP [63], regression [62], K-means [126]
production system (ERP, MOM/MES, SCADA/DCS/HMI, O&M, Safety,

quality management). Industry Internet of Things (IIoT) and RFID
technology (grey blue) are widely used in manufacturing, logistics [50],
in-use and product End-of-Life (EOL) [51].
Regressions [53], ANN [53],
Stream data analytics [103],
Afterwards, all the reviewed articles are classified into four cate-
gories: Product, Production, Business and ICT (Information
graph analytics [89],
Communication Technology). The first three categories focus on en-

Big data analytics
gineering functions and business, ICT architecture underpins all three

dimensions to provide the ICT infrastructure and digitalization to
manufacturing, which includes several topics: CPS, Cloud manu-
facturing (CM), ICT, Data analytics/Data management(DM), KM.
Allocations of proposed solutions of reviewed literatures (Based on our literature investigation).
Table 2 in Appendices illustrates the distribution of reviewed articles by

Big data computing and storage
these four categories.

[104,130,131,104,132,112]
3.2. Data source

[68,148,95,149,150],
3.2.1. Data format

[136,81,82,80]
Based on the three dimensions of SMS in NIST report, the standards

[57,56,125],
of data formats, computer languages are listed with the italic style in
[152,94],
[144],
every manufacturing system in Fig. 3. The bold black text represents

[145]
[66],
some of the data formats found in the reviewed articles, whereas the
black text is not found in the review but mentioned in the NIST report
Big data framework
[49].
[64,146,76,147]
Table 1 demonstrates the complete data formats found from the

reviewed articles and NIST report. To discuss the formats conveniently
[69,116],
[129,71],
[39,135]
[151,33]
[93,90],
in the following chapter, all the data formats are categorized into three
[58],
groups:
• Structured data: data that is presented in tables and can be stored in

Data analytics/DM
SCM/CRM/BI/AM
IIoT/CPS/CM/ICT
SCADA/DCS/HMI
CAD/CAE/CAPP/
a relational database;
• Semi-structured data: data that has a self-described structure and is
MOM/MES
not presented in tables, such as XML, JSON, HTML [52];

• Unstructured data: data that does not have a self-described struc-
Systems
Safety
O&M
CAM
QMS
PLM
ERP
KM
ture, such as document, image, audio, video, text and e-mail.

ICT architecture
Fig. 3 and Table 1show that one challenging issue to realize that
smart manufacturing requires different data formats from various
Production
Business
manufacturing systems. In order to make these systems collaborative

Product
Table 3
and integrated, the transformation of these data formats is an essential

Field
function to the manufacturing big data solutions. It also illustrates that
7
Y. Cui, et al.
Table 4
Summary of discussion section.
Topics Sub-topics Application systems Enabling tools References
6 drivers for big data in smart System integration Product design, AM, ERP, MES, BI, SCM, PLM, Kepler, Hadoop, OPC-UA, RESTful API [49,57,56,125,63,62,90,153,97,55,128,154,155]
manufacturing Data Predictive maintenance, KM, Production planning, Cassandra, MongoDB, Blueflood, [121,62,73,156,157,82,138,158,24,89,94,153,159,75,160,88]
Safety, Anomaly detection, industrial process control, OpenTSDB, DalmatinerDB and InfluxDB,
model prediction, QC, shop floor scheduling, Storm, Spark, Flink, Apache Hive,
Prediction 3D printing, product performance, production Random Forest, Bayesian Network, statistic [39,109,63,62,93,136,137,138,141,24,87,143,90,55,161,85,88,25,162,79]
planning, energy consumption, MES, QC, SCM
Sustainability PLM, Maintenance [163,59]
Resource sharing and SCM, ERP, Data integration Public cloud, Private cloud, Hybrid cloud, [19,63,62,144,95,152,97,101,164]
networking hypervisor, container,
Low cost hardware SCM, Industrial automation RFID, IoT, Robotic [65,165,166,167]
9 essential components of big Data ingestion ERP, MES, SCADA, O&M, QC, PLM, Data management Sqoop, Flume, Kafka, [39,116,58,71,138,139,140,151,85,79,65]
data ecosystem Storage ERP, SCM, SCADA, OLAP, OLTP Redis, HBase, Cassandra, MongoDB, Neo4j, [30,32,43,168,169]
HDFS, VoltDB, Clustrix, NuoDB
Computation ERP, SCM, PLM, MES, SCADA, O&M, QC, IoT MapReduce, Spark, Flink, Storm, [170,171]
8
Analytics DSS, CRM, machine vision, QC, O&M MLlib, Scikit-Learn, CNTK, Caffe, Kylin, [109,73,82,87,25,105,172,173]
CaffeOnSpark, Hive, CaffeOnSpark
Visualization MES, SCADA, O&M, QC, Security, Zeppelin, Tableau, D3.Js, Matplotlib, [126,76]
QlikView
Workflow and dataflow Business Oozie, Kepler, InfoSphere, Wings/Pegasus, [116,123,63,172]
NiFi
Data management ICT, KM, SCADA, SCM, Apache Falcon, Apache Atlas, Apache [121,23,174,175,176,177,178,179]
Sentry, Apache Griffin, Jena
Infrastructure and ICT HPC(AWS), [180,181]
deployment model
Cybersecurity SCADA, ICT [182,183]
5 future directions Modelling and VR/AR, PHM, PLM, CAx Cloud computing, Quantum computing [184,185]
simulation
Connectivity and File formats of PLM (PLM XML) and 3D printing (AMF, NiFi, [186]
interoperability 3MF)
Standardized big data [5]
platform design
Real time big data SCADA, MES, Data warehousing Spark, Storm, Flink, Beam, Spark R, MLlib, [26,187]
analytics GraphX, SparkSQL
Cybersecurity SCM, SCADA, Security, Safety Apache Metron, Apache Ranger, Apache [49,188]
Knox
Robotics and Computer Integrated Manufacturing 62 (2020) 101861
some data formats in the specific manufacturing systems are missing in their increasing patterns. Because OPC-UA protocol provides open
the proposed bid data solutions. It requires solutions to fill the gap to connection to monitoring and automation systems as well as commu-
realize data exchange among the systems. Big data tools can address the nications between MES and SCADA systems, these systems are pre-
variety issue of manufacturing data such as NiFi. valent in manufacturing [120]. Kafka is also used widely in applications
at the shop floor level of manufacturing such as processing, machine
3.2.2. Data issues and sensor data, due to it is message streaming capability [121].
Data issues are fundamental challenges to smart manufacturing, Apache NiFi is entirely new to manufacturing. Only four articles were
which extract actionable information from good quality of data. In found in the industry in general and no article in manufacturing in-
order to prepare the suitable data for smart applications, amount of cost dustry [122,123,124].
and time is consumed to address the data issues. For example, data
scientists spend over 90% of their time on data preparation before 3.4. Applications of big data in manufacturing
analyzing data for innovative tasks such as machine learning, AI [108].
Enterprises spend billions of dollars on their data warehousing systems, Data applications are essential to realize Smart manufacturing.
which can only use well and pre-defined methods (ETL) to process Identified scopes of data applications could provide a clear guideline to
product structure data and produce business reports in non-real time. design manufacturing big data platforms. Among the reviewed 128
Therefore, understanding and addressing data issues is critical to design articles, 78 articles are big-data based applications, which are cate-
big data-based solutions. From Table 2, we have identified 11 common gorized into17 applications. The percentages of the applications are
issues of manufacturing data. Data issues are generally related to the presented in Fig. 5. Monitoring (25%), prediction (23.8%), ICT frame-
Big data 5Vs features [38]: volumes and variety (large scale); velocity work (11.9%) and data analytics (9.5%) are the four most frequently
(inconsistent sampling frequencies or timestamp, batching and used big-data applications in manufacturing. Through this statistic, it
streaming data); veracity (missing value, imbalance, data outlier, noisy, can be shown that the researches of big-data based solutions focus on
drifting, asynchronization, data correlation); variety, veracity and value monitoring, prediction, data analytics and propose ICT solutions in
(data model, data format exchange and data integration). The re- manufacturing in Table 3.
searchers of these kind of literature address these issues without using
big data tools. Traditional software cannot address these data issues if 4. Discussion
the data is generated in large scale systems with large number of de-
vices. Some big data tools are discussed to provide potential solutions in Four fundamental questions about the relationship between big data
Sections 4.2.7 and 4.4.2. ecosystem and smart manufacturing are six drivers and requirements
for big data application in manufacturing, seven essential components
3.3. Big data ecosystem of the big data ecosystem, harnessing big data capabilities for research
innovation in manufacturing, and future directions of big data appli-
Through this SLR, new big-data based research innovation could be cation in manufacturing. They illustrate the driving factors of big data
identified by closely tracking the attention of various big data tools in applications in manufacturing. These are summarized in Table 3 and
manufacturing and other research domains. Fig. 4 shows the distribu- Table 4 for ease of reading to interested readers.
tions of big data tools over the years from the literature, including
manufacturing engineering and others. Fig. 4(a) shows the numbers of 4.1. Question 1: what are the drivers and requirements for big data
the term Hadoop occurred in all literature increased from 2008, applications in smart manufacturing?
reached its peak in 2016 and slightly decreased in 2017. The number of
Spark articles had been higher than Hadoop since 2016. There are four Identifying drivers for big data applications is essential to imple-
factors to compare Spark and Hadoop: volume, velocity, fault-tolerant ment feasible smart manufacturing initiatives. Kusiak discusses the fu-
and data analysis. MapReduce and Spark Streaming are the computa- ture developments in manufacturing and identified six drivers for smart
tion engines of Hadoop and Spark, respectively. MapReduce executes manufacturing theoretically [4], which are manufacturing technology
batch processing by reading and writing data on disk multiple times. and processes, material, data, predictive engineering, sustainability,
Spark Streaming executes micro-batch processing in memory. It results resource sharing and networking. Through reviewing the proposed big-
in the differences between Hadoop and Spark because disk can persis- data based solutions, six drivers are identified: data, prediction, sus-
tently store a larger volume of data with slower velocity than memory, tainability, resource sharing, system integration, and low-cost hard-
which temporarily stores limited volume of data. By comparing data ware. Four common drivers from Kusiak's six drivers are verified with
analysis, Spark has built-in tools (MLlib) and support third-party tools big-data based solutions.
(Mahout, H2O), whereas Hadoop is only supported by the third-party
tool: Mahout [41]. Spark supports iterative computation with GraphX, 4.1.1. Driver 1: system integration
which is the graph processing engine. Therefore, Hadoop is suitable for System integration with big data technologies is a crucial enabler of
the applications which need planned extraction of non-real time and smart manufacturing to integrate and cooperate manufacturing systems
critical information from a larger volume of data and guarantee without to timely adapt dynamic demands from production and supply chain
loss of data, such as ERP, Production planning. Spark can be used to [49]. Integration of production systems demonstrates the significant
provide near real-time monitoring and analytics by processing improvement in production efficiency and productivity since the 1980s
streaming data such as monitoring process and product quality, MES, [4]. In the context of smart manufacturing, it needs to further expand
SCADA, predictive maintenance. The number of Storm articles has kept the scope of system integration from production to product and busi-
increasing since 2012; however, its total number is smaller than Ha- ness domains. Fig. 3 shows that various manufacturing systems use
doop and Spark since it only executes streaming with limited data different networks and protocols, which are challenging due to the
analysis supported by SAMOA, which is a version of Mahout [119]. necessity of implementing data and information exchange among these
Fig. 4(b) presents the growing number of literatures about these three systems. Big data technologies can integrate these independent systems
tools in the manufacturing research literature since their first inception. by using “cloud” as a common place to collect data, extract and ex-
From Fig. 4(c) and (d), NoSQL database, as the highest frequently change the required information on the cloud. With the fusion of IoT,
mentioned database, is compared with NewSQL and time series data- BDA not only can integrate manufacturing systems but integrating
bases (OpenTSDB) over the last eight years. Fig. 4(e) also demonstrates physical and cyber worlds closer [36]. From the system integration in
that Kafka and OPC-UA are closely related to manufacturing as seen by Table 4, many benefits of system integration are demonstrated with the
9
proposed big-data based solutions. big data ecosystem, databases in manufacturing is data-at-rest, which
Integrating product design and additive manufacturing with big represents static, historical data [159]. This data is mainly used to
data provides many benefits. With the integration of product design and predict the long-term performance in production planning [62], global
additive manufacturing, product costs can be estimated by analyzing manufacturing network design [94], critical event detection in safety
more features of product model with DBA than the traditional approach [89]. Both historical batching data and real-time streaming data are
[55]; using Hadoop clusters demonstrates faster velocity of converting a integrated to train models and monitor real time condition information
huge 3D model to G-code for 3D printer than traditional methods [56]. such as anomaly detection of machines’ energy consumption data
Spark and Cassandra demonstrate capabilities of computing and storing [139].
a large volume of streaming data from the application of real time Secondly, big data technologies make the management of manu-
monitoring 3D printing, which traditional manufacturing systems facturing big data feasible. Traditionally, RDBMS is mainly designed to
cannot offer [125]. store structured data with limited scalability. However, NoSQL data-
As to production, big-data based solutions drive the integration of bases present better performance for handling semi-structured (JSON,
ERP and MES. The big data scientific workflow management system XML) and unstructured data (audio, video, and email) with unlimited
(Kepler) demonstrates the efficient scheduling capability for smart scalability. For example, column NoSQL database Cassandra was used
manufacturing [63]. Another paper concludes that the critical cycle to store event data of automation controller [131], document NoSQL
time can be predicted with Hadoop for production planning [62]. database MongoDB was used to store machine data [85]. Time-series
In the business domain, several researchers focus on designing big databases (TSDB) begin to receive increasing attention by providing
data architecture to integrate BI, SCM, ERP, MES and PLM systems. dedicated applications for sensor data. A comprehensive evaluation was
Although a business intelligence architecture is proposed to advance implemented to several TSDBs: Blueflood, OpenTSDB, DalmatinerDB
the integration of business information from various existing systems and InfluxDB [82]. The collected data needs cleaning before usage in
such as ERP, CAx, SCM, PDM/PLM, the specific technical framework is order to resolve issues such as noisy and incorrect format, as shown in
not provided [90]. It demonstrates that it is necessary to have a cap- Table 2. Streaming (Flink, Storm), micro-batching (Spark) and batching
ability to store data and information with various data formats, struc- (MapReduce) data processing technologies provide the capabilities to
tures and models to integrate various systems. Another big-data based clean and calculate big volume of manufacturing data. The following
solution is proposed to integrate supply chain and production planning big-data based solutions are proposed and implemented in various
by retrieving and integrating SCM, ERP and MES data [97]. However, manufacturing applications: complex event processing (CEP) with
the framework focuses on designing business functionality without Storm [138], anomaly detection with Flink [121], industrial process
providing support to process the collected big data. A cloud manu- control with Spark [73], model prediction [137] and quality control
facturing collaboration system is proposed to achieve better perfor- with MapReduce [88].
mance and functionality on production, resource planning with the Lastly, past and new knowledge can emerge from the generated big
Hadoop ecosystem. It is concluded that information integration from data by harvesting big data technologies. Knowledge of predictive
the web and other data sources is a critical issue to implement system maintenance can be extracted with an Apache Hive-based platform
collaboration [153]. Processing and analysing data from MES and [24]. Knowledge of intelligent applications of a smart factory is man-
SCADA with MapReduce and BDA can detect anomaly minutes be- aged with Hadoop and OWL technologies [134].
forehand in a large-scale production [128]. In the industry, Bosch
presented a conceptual, analytic platform with a data integration
method to integrate various data sources [154]. 4.1.3. Driver 3: prediction
System integration with BDA requires standard data format and Prediction enables manufacturing to change from reaction to pre-
standard interfaces. New standards of file formats need to be developed vention. Because of big data and increasing applications of data analysis
in order to fill the deficiencies of the existing standards to transfer in manufacturing, it is feasible to predict the behaviours of various
consistent content [49]. Additive Manufacturing File (AMF) is a new manufacturing systems accurately.
XML based standard format to replace STL by providing many new Prediction attracts many researchers’ attention to manufacturing.
features in addictive manufacturing such as materials, material prop- From Fig. 5, the prediction is the common BDA application in manu-
erties, colors. [49]. Because RDBMS is not suitable to store data with facturing. In the product domain, the costs of 3D printed products can
flexible data models (unstructured or semi-structured data), big data be predicted with the proposed big-data based solution and three ma-
technologies are thus mainly used to process unstructured data such as chine learning algorithms [55]. In the production domain, machine
Hadoop, NoSQL databases. The standard interface is of importance to learning is used for the prediction of product performance degradation
seamlessly integrate systems in manufacturing. Because OPC-UA tech- [63], cycle times for production planning [62], energy consumption
nology provides dedicated interfaces to production equipment such as and KPI values on MES system [25], and production efficiency [162]. A
PLC, it received increasing attention in the industry [128]. RESTful API number of papers focus on the prediction of product quality with dif-
demonstrated to be more efficient to connect web data by comparing ferent machine learning algorithms such as Bayesian Network [88] and
with the traditional method SOAP [155]. statistic analytics [87]. ANN is identified as the highest prediction ac-
curacy by comparing with a decision tree, random forest and support
4.1.2. Driver 2: data vector machine [143]. Some applications need a trade-off between
Timely comprehensive data with enabling big data tools is the key acquiring higher accuracy and shorter calculation time since it takes
driver to smart manufacturing. The data row in Table 4 presents many more time to calculate to get higher accuracy, such as real-time quality
manufacturing applications could be implemented with more compre- control. In terms of the full consideration of shorter calculation time
hensive data and big data tools, which were challenging with tradi- and higher accuracy, Random forest is outperformed Naive Bayesian,
tional tools. Firstly, a large volume of data collecting from various Multi-Layer Perceptron and Logistic Regression [85,109]. Many papers
sources provides sufficient data for big data analytics. Data from var- publish the solutions of predictive maintenance on machines by using
ious industrial equipment, IoT devices, web and smartphones, is called machine learning algorithms to recommend scheduling of proactive
data-in-motion [159], which is continuously generated and ingested measures before outages occurred [24,39,79,136–138,141,161]. In the
into the systems to provide real-time response from the physical world, business domain, some potential applications are found such as pre-
such as dynamic shop floor scheduling [75], predictive maintenance dicting user behaviour with using principal component analysis and
[141,136,24], anomaly detection [81], diagnosis [82], prognosis [161] Hadoop [128], proactive inventories, location and throughput times on
and systems collaboration [153]. As another significant data source for logistics with a big data based platform [90].
10
4.1.4. Driver 4: sustainability Virtualization technologies such as hypervisor and container, pro-
Sustainability with big data technologies plays a vital role in smart vide faster deployments, high efficiency to share software packages in
manufacturing. Sustainable manufacturing considers the four factors: enterprises [95]. Virtualization can be used to quickly test and validate
material, manufacturing processes, energy and pollutants. Big data the proposed big data solutions with minimum influence on the other
technologies can provide a data-driven solution to analzse the big vo- systems.
lume of data for these four factors. For example, product design could
be guided by analyzing the End-of-Life data of products; strategic de- 4.1.6. Driver 6: low cost hardware
cision making by analyzing marketing, production and supply network Low-cost hardware makes smart manufacturing more accessible.
data from CRM, ERP, MES and SCM systems; energy consumption and Low-cost actuator and IoT sensor reduce the wiring cost to collect data
pollutant influence would be monitored through IoT sensors and RFID and improve automation at the factory floor. Wang et al. present a
tags. Although several conceptual big-data based frameworks are pro- collaboration mechanism to deploy large scale robotics with big data
posed as shown in Table 4 [58,59], the performance of sustainability technologies at a small factory [65]. Big-data based solution with smart
with big data technologies has not been evaluated. Further research is sensors elevates the constraints of time and geolocation to monitor
required to discuss the effectiveness of big data technologies on sus- manufacturing processes [165]. Low cost RFID tags make traceability of
tainability in manufacturing. enormous resources more feasible at the supply chain level. Industrial
Internet of Things hub is proposed to realize the smart connection of
4.1.5. Driver 5: resource sharing and networking various resources in a manufacturing facility with RFID tags and sensors
Manufacturing could benefit by sharing virtual and physical re- [166]. Cost competitive NC machines and 3D printers can be flexibly
sources with the supply chain network. The issue of information silo deployed to demand sides such as design prototype product or serving
results in losses of productivity and economy in manufacturing. customers. Big data technologies can secure critical data transmission
between design departments and manufacturing equipment. Further-
• Sharing information more, low-cost data processing and storage technologies provide a cost-
effective approach to manage large volumes of data from massive data
Sharing information is beneficial to manufacturing systems of pro- sources in manufacturing [167]. Hence, cheap hardware makes it more
duct, production and business. Helu et al. discuss that digital thread can feasible for manufacturers to monitor and respond to timely changes
improve product design and manufacturing processes by sharing data of from production, to supply chain networks.
product lifecycle systems [14]. For instance, some information on idle
equipment among enterprises can be shared with big data tools in order 4.2. Question 2: what are the essential components of big data ecosystem to
to reduce holding costs such as machines, 3D printers, equipment, better serve smart manufacturing?
transport and warehouses [101]. Sharing information about SCM and
PLM systems could meet the new business requirements such as find As big data applications enable smart manufacturing, several es-
best supplier of a specific raw material [164]. Integrated data from sential components of the big data ecosystem should be utilized to build
various systems could identify potential production problems and im- up BDA platform for smart manufacturing, including data ingestion,
prove work efficiency by selecting suitable maintenance time [152]. storage, computing, analytics, visualization, workflow and dataflow,
Collection and sharing data of supply chains with big data could be data management, infrastructure and security.
helpful to interactions among customers, manufacturers, and suppliers
[97]. 4.2.1. Data ingestion
Addressing information silo is challenging because data cannot be Data ingestion or inception is of necessity to manufacturing in order
easily shared among traditional manufacturing systems by different to bring big volume data into its BDA platform. There are two types of
protocols. Some practical solutions are implemented, such as sharing big data: data at rest and data in motion. Apache Sqoop is used to
data in various systems, which leads to the issue of data redundancy. transfer bundle of data from a relational database (MySQL, SQL server)
Manufacturers also have to take massive maintenance effort on syn- to Hadoop in several applications such as ERP and MES [116,65],
chronizing the shared data of every system. Because the big data plat- SCADA [71], O&M [39,79] and Data Management [121,151].
form could be sat on the cloud, it has to establish only one connection to Streaming data is data continuously generated from manufacturing
synchronize data of the platform and the system. Through big data systems and devices. Apache Flume is mainly used to collect large
tools, data is much easier collected from various systems to the data amounts of logs from controllers, sensors, equipment and actuators
lake of the big data platform, synchronized, managed and shared [121]. [65,71,116,121]. Kafka is a general-purpose messaging system to col-
lect streaming data and publish it to data consumers who subscribe to
• Sharing BDA Infrastructure and software the topic of data. Kafka is applied in some cases of SCADA [140], O&M
[139]. Apache Storm is a real-time data processor, which is used to
Sharing big-data infrastructure and software brings economic ben- collect and ingest streaming data to data consumers straightway (O&M
efits to manufacturing. According to NIST definition, there are four [138,139], Quality control [85], PLM [58]). Although some data col-
deployment models of cloud computing: private cloud, community lection tools of manufacturing are widely used in production systems
cloud, public cloud and hybrid cloud [19]. From reviewed papers, 18 such as MTConnect and OPC-UA, big data ingestion tools can comple-
solutions use private cloud while only two solutions use public cloud. ment their limitations as discussed in Section 4.4.2. Therefore, data
The main reason for adopting a private cloud is to provide better ingestion is an essential component of the big-data ecosystem to collect
privacy and security than public cloud [62,63,144]. However, there is a batching data and streaming data.
significant investment on the hardware of private cloud clusters. Wang
et al. demonstrate that manufacturing can gain economic benefits from 4.2.2. Storage
the public cloud with three aspects: pay-as-you-go service model, re- Data storage is critical to big data applications in smart manu-
ducing maintenance fee on data centers and operating cost [189]. No facturing. As the manufacturing industry increasingly benefits from the
security function was identified in the 18 proposed solutions, which use of big data, it is of importance to store more data [4]. Various
means the security and privacy of the private cloud solutions do not applications require different storage technologies to provide different
outperform public cloud. Therefore, using the public cloud in com- features, which are file system and databases. Although both technol-
parison to private cloud may bring significant economic benefits to ogies can store structured, semi-structured and unstructured data, they
manufacturing. have some differences that file system is suitable to store data-at-rest or
11
unstructured data such as files, search and compile files manually. machine vision for robotics [105], speech recognition for alarm and
Database is suitable to store data-in-motion, semi-structured or struc- security, image processing for quality control [87]and O&M [82].
tured data, faster query data automatically. Manufacturers can save time to develop algorithms from scratch by
There are three types of databases: RDBMS, NoSQL and NewSQL using these tools in their big data platforms, such as image recognition
databases. RDBMS database has been used in manufacturing for general for product quality control [87]. Manufacturers also benefits from new
purpose applications for decades such as SQL Server, Oracle, MySQL. tools such as CaffeOnSpark [105], which is a deep learning framework
Whereas, RDBMS is unable to address the challenges of big data's 3Vs widely used for autonomous driving in the automotive industry [105].
(Volume, Velocity and Variety) for storage and query [32]. NoSQL and Because Caffe does not work on Spark clusters, it is challenging to meet
NewSQL databases can provide almost unlimited scalability and faster the two strengths of better data analytics algorithm and faster big data
query capability for industrial big data. Fig. 4 illustrates that the in- computation at the same time. CaffeOnSpark could address the issue to
creased use of a NoSQL database is already happening in manu- work on Spark and Hadoop clusters for Caffe applications.
facturing. NoSQL is suitable for one kind of OLTP application, which OLAP is an approach to analyse large multidimensional datasets for
does not require consistent data all the time, but simple query and complex business analytics, such as BI reporting, Decision Support
frequent updates to data [168]. Hence, manufacturing could use NoSQL System (DSS), and CRM in manufacturing. Analytics tools of OLAP are
databases for real-time big data analytics such as quality monitoring Apache Hive [172] and Apache Kylin [173]. Hive is one utility of Ha-
and prediction. Moreover, manufacturing could benefit from using the doop ecosystem which is used as a data warehouse. Kylin can query a
four types of NoSQL data models (Key-value, Wide column, Document, large volume of data faster than Hive.
Graph) to easily manage semi-structured data (XML and JSON) [43]. By analyzing the collected big data, data analytics tools can effi-
The widely used NoSQL databases are Redis, HBase, Cassandra, Mon- ciently extract timely information to manufacturers to make decisions.
goDB and Neo4j. Since NoSQL databases were not designed to meet It is challenging to the decision makers to apply their experience and
data consistency, they are not suitable for some OLTP applications, knowledge on the new circumstances. Their experience is acquired
where data consistency needs to be guaranteed anytime [169]. There- under the previous circumstance, which may be different from the
fore, NoSQL should not be used in an environment of many operations current one and their knowledge may be out of date. With the given
and controllers that are present since the control data may be incon- data and big data analytics tools, manufacturers can analyze historic
sistent. The NewSQL solution is used to provide relational query (SQL) data, discover new knowledge, build actionable intelligence to make
and data consistency all the time for an OLTP application. Manu- data-driven decisions. It would happen in some areas of different sys-
facturing could use NewSQL for the scenarios requiring consistent data tems feed with new data such as production planning with streaming
all the time (finance data in ERP, inventory data in SCM, control signal real-time IoT data.
in SCADA systems). Some examples of NewSQL are VoltDB, Clustrix
and NuoDB [43]. However, NewSQL focuses on relational data, which 4.2.5. Visualization
may not fully support unstructured data for big data analytics. Manufacturing systems require various visualization methods to
Unlike database solutions, in which the data is structured as a data present analytics results, such as interactive dashboard, reporting,
model for data consumers’ demands, file systems do not need a data graph, document. Most of the proposed solutions did not provide a tool
model to store unstructured data. Hadoop Distributed File System to construct visualization. As the Python language is widely used in big
(HDFS) could store petabytes of data with a redundantly low-cost data analytics, it should be convenient to use Python plotting library
method [30]. However, querying data in HDFS is much slower than the (Matplotlib) to present big data analytic results by data scientists in
speed in databases. HDFS could be utilized as a central data storage to manufacturing. However, Matplotlib works on a command line inter-
retain all the data in manufacturing. Therefore, selection of the correct face (CLI), which is not user-friendly to business users with less pro-
storage solutions is required for big data applications in manufacturing. gramming experience. The reviewed solutions could benefit from vi-
sualization tools such as Zeppelin, Tableau, and D3.Js [76]. Zeppelin
4.2.3. Computation provides interpreters with big data tools (HBase, Cassandra, HDFS,
Computation is the foundation of implementing big data applica- Spark, Flink etc.) and supports multi-agents. Manufacturers could use
tions. Three types of computation engines are available to manu- Zeppelin in their existing big data platform with less developing effort.
facturing: batching, micro-batching and streaming. MapReduce in D3.Js provides more complex visualization templates than Zeppelin.
Hadoop provides the batching method to process a petabyte level of big D3.Js is a JavaScript library for producing dynamic, interactive data
data by less memory usage and cannot provide real-time analytics visualizations in the web browser. It is suitable for applications, which
[170]. Spark is a micro-batching processing engine, which provides require dynamic monitoring and control such as MES, SCADA, O&M,
near real-time computation with more memory resources than Ma- Quality control, safety and security systems [126]. D3.Js has less sup-
pReduce. Flink and Storm are real-time streaming engines to process port to produce a report, which is not suitable for business analyses.
small volumes of data [171]. The differences and application scenarios Tableau and QlikView are commercial data visualization software fo-
of these computation engines have been discussed in Section 3.3. Based cusing on BI and support many common databases. . Big data and vi-
on the analysis of these computation engines and outcomes of Fig. 4, sualization can help bring data together and show the value of big data
there would be more Spark-based big data solutions on the factory floor in meaningful ways. AI helps to make automatic decisions, and visua-
from device to production planning in the near future. The proposed lization helps to make manual decisions. As a result, AI and human can
solutions could use Spark to replace Hadoop to get faster outcomes. collaboratively make data-driven decisions based on the actionable
intelligence generated by big data or mined by machine learning.
4.2.4. Analytics
Big data analytics is intended to extract information from collected 4.2.6. Workflow and dataflow
big data. Big data analytics includes two types of analysis: 1) data Workflow and dataflow components provide efficient approaches to
mining and machine learning algorithms (clustering, regression, manage workflows for business and processes to data management.
Bayesian networks, artificial neural networks (ANN), deep learning. 2) Two workflow tools are found in the review papers: Oozie [116] and
On-Line Analytic Processing (OLAP). Kelper [63]. Oozie is specialized in managing Hadoop jobs, which
Big data analytics tools of the first type are identified as MLlib and would be suitable for a manufacturer with Hadoop ready platform
Scikit-learn for machine learning [25,109], Spark R for high-level sta- [174] . Kepler provides a convenient method to share workflows with
tistical analysis [73], Tensorflow and CNTK for deep learning [105]. other business users in cloud manufacturing [63]. However, Kepler
Analytics tools have many potential use cases in manufacturing, such as does not support horizontal scalability to store large volume data.
12
Another two workflow tools Wings and IBM InfoSphere have not been method (schema-on-read) [176]. As data volume increases, dis-
found through this review [175]. Wings/Pegasus provides a more in- tributed DW solution (Hive) and associated ETL tool (Apache Pig) is
telligent method to construct and execute workflow for users by opti- available to manufacturing for business applications such as BI re-
mizing the workflow automatically. For parallel executing of work- porting, DSS. However, the predefined ETL method is not flexible
flows, Wings has to work with a resource management framework and expensive to build before this finally used by data consumers.
(Pegasus), which may be a constraint to manufacturers with Hadoop- Data Lake is a new solution of data integration, which is defined as
based platform since Hadoop uses Yarn as its cluster resource manager. central storage to store any data (sizes, types, rates) with the raw
IBM InfoSphere is a commercial enterprise-ready framework, which is formats in an enterprise [190]. DL is more suitable to data con-
suitable for inexperienced users. It can work with Hadoop and other sumers for an ad-hoc query of the data, which is undefined until
IBM software together. InfoSphere is not open source, which is unable issuing the query (schema-on-read) [179]. HDFS is a popular DL tool
to build add-on functions. For the dataflow component, researchers had to store extensive unstructured data in manufacturing (video, audio,
to construct their dataflow component from scratch. Usually, there is a image) [105]. However, DL probably becomes a “data swamp”
long learning curve for beginners to go through trial-error processes. without practical data management tools (Falcon, Sentry, Atlas etc.)
Apache NiFi is an efficient tool to construct dataflow and data in- [176].
tegration with an interactive GUI. One paper uses NiFi to collect • Semantic KM includes a series of operations for including, creating,
streaming data in process industry [123]. NiFi is initially developed and classifying, sharing, using information and knowledge in manu-
used by the National Security Agency of the United States (NSA), which facturing [180]. Many technologies of the semantic web are utilized
has been verified in the real application environment. Through this to manage knowledge in manufacturing such as application of on-
study, there is limited work reported in this area. Smart manufacturing tology web language (OWL) in SCADA [134], Resource Description
needs the workflow and dataflow components to get data automatically Framework (RDF), RDF query language (SPARQL) in SCM [91], RDF
processed among various manufacturing systems. If assisted by AI and database(Jena) in KM [23].
optimization, they could significantly improve the efficiency of work-
flow and dataflow in manufacturing such as production planning and Data management tools include broad areas to address the veracity
scheduling. and value of big data issues in manufacturing. The tools are still de-
veloping since most of the tools are Hadoop-based. As manufacturing
4.2.7. Data management begins to focus on velocity of big data computation, some new data
Data management would make big data platform more feasible for management tools based on faster computation engines would be de-
manufacturers. Data management focuses on data governance, meta- veloped in the near future such as Spark and Flink. The big data
data management, data modeling, data quality management, master management tools are still manually operated by data stewards, which
data management, data integration and knowledge management. would be challenging to manufacturers when various massive data is
Although data management is highly related to the traditional IT area, ingested into the big data platform, and different roles of users apply to
it provides a holistic management method to meet the requirements of use them. New methods of big data management could address this
enterprise compliance, data policy, data lifecycle. issue by using algorithms such as rule-based, machine learning–based
Some data management tools are available to the big data eco- or hybrid of both.
system of manufacturing:
4.2.8. Infrastructure and deployment model
• Data lifecycle management is based on enterprises’ policy to manage The infrastructure and deployment model are the foundation of big
data lifecycle from creation, storage, obsolescence to delete. Apache data applications in manufacturing. There are two types of cloud
Falcon includes the functions of data retention (persistency), data computing infrastructures:
replication for disaster recovery, aggregation and archive on
Hadoop [176]. Falcon is beneficial to manufacturing research. For 1 General purpose commodity computer cluster (Hadoop, Spark) to
example, each acoustic experiment of aerospace engine collects process completely parallel computing problems such as computing
hundreds of gigabytes of audio data through over a hundred sensors. massive sensor data separately or responding millions of users’ re-
The traditional method uses a single disk to store them, which is quests (Facebook);
costly to manage with high risk of disk failure or data loss. Falcon 2 High-Performance Computing (HPC) cluster provides faster com-
could manage the data lifecycle with a friendly user interface. puting speed with dedicated hardware.
• Data governance provides an enterprise policy approach to manage
data availability, usability, and integrity. Apache Atlas is the Hence, HPC outperforms others in processing highly dependent data
Hadoop ecosystem tool for audit, lineage and service level agree- computing such as complex modelling and simulation workload in
ment (SLA), which is used to apply agile enterprise compliance manufacturing. However, the disadvantages of HPC are that its in-
through consistent metadata management across the big data eco- vestment is enormous, and its utilization rate is low. Some solutions
system [177]. address its low utilization by moving HPC to cloud [182]. Some IAAS
• Data authorization manages users’ privilege to access sensitive data. providers offer public cloud HPC (AWS) and hybrid cloud HPC service
Apache Sentry is utilized to authorize data and metadata on Hadoop (Microsoft Azure).
clusters based on the roles of people in manufacturing [121]. Manufacturing enterprises have different perspectives on economic,
• Data quality management is crucial to the outcomes of data ana- security, the privacy of big data platforms. Four deployment models
lyses. Table 2 presents that data quality is a challenging data issue to (Public cloud, Private cloud, Hybrid cloud, Community cloud [19]) are
manufacturing. Apache Griffin maintains data quality by auto- available to satisfy the requirements [183]. For example: in terms of
mating data profiling and validation on Hadoop and Spark [177]. better privacy and reducing waste of computing resource, some plat-
Apache Griffin improves data quality by pre-processing data auto- forms could adopt a hybrid cloud model, which puts sensitive data on
matically from various data sources, which reduces the amount of private cloud and process insensitive data on public cloud. However,
data analyst's time to prepare data for analyzing. Hybrid and Community cloud are not found by this review. To meet
• Data integration has two approaches to address data silos issue diverse requirements of manufacturing enterprises, some deployment
[178]: Data Warehouse (DW) and Data Lake (DL). DW is the tradi- technologies, including OpenStack and Docker [191], are used to
tional approach that integrates data from various data sources into a quickly deploy an agile software environment with different micro-
central data store with a predefined extract-transform-load (ETL) service frameworks and programming languages.
13
All deployment models and infrastructure should be taken into technology with new and matured tools. Hence, the full potential of big
consideration in the big data ecosystem of manufacturing. data has not been discovered in manufacturing. Some potential direc-
tions are proposed for the future work of interested researchers:
4.2.9. Cybersecurity
Cybersecurity is an essential component of big data platform to 4.4.1. Modeling and simulation
protect data assets in manufacturing. As manufacturing becomes data- Modeling and simulation will naturally play an important role in
driven, the standards and tools are required to secure data in the IT extracting value from data once the volume of data is available [4].
architecture of manufacturers completely. Although some security Digital twin and digital thread are the two essential methods with two
standards have been provided in manufacturing systems such as SCADA important tasks: collecting real-time data from manufacturing input
[192], the standards cannot fully address the challenges of SCADA on and output devices such as CNC machine and sensors and building up
the Internet [193]. Traditional control systems are vulnerable to un- the real-time simulation model with the collected data. These two tasks
authorized attacks from the Internet since they are designed as close require considerable computing resources and data storage to process
systems with few capabilities of cybersecurity [194]. Because big data streaming data from manufacturing devices. Researchers commonly use
platform integrates the physical space and the cyber space closely, the Matlab or some scientific software to develop the selected algorithm to
risk of cybersecurity could quickly escalate to the physical system in simulate the models [196]. As one simulation model may be im-
manufacturing. If unauthorized people manage the critical equipment plemented on a single computer, many models are required to design,
or information, it will bring irreversible disaster to manufacturing such and manually implemented on computers. It is a challenging and on-
as economic loss, personal safety. erous task to manufacturers to convert the Matlab simulation models to
executive programs on the selected cloud computing engine. However,
4.3. Question 3: how can we harness the capabilities available in the big from the reviewed articles, simulation is rare in the big data based
data ecosystem to drive research innovations in manufacturing? solutions. The ongoing practice is that of a digital twin, which provide a
real-time, bi-directional management between a physical object and its
Big data ecosystem is the comprehension of massive functional digital object [36]. It is essential to simulate the digital twin model with
components with various enabling tools. Capabilities of the big data parameters across different domains (e.g. product, process and logis-
ecosystem are not only about computing and storing big data, but also tics) in order to predict its dynamic performance, such as Virtual/
the advantages of its systematic platform and potentials of big data Augmented Reality [4]. For example, simulation of the product model
analytics. Hence, according to proposed solutions of reviewed literature with predefined parameters includes FEA analysis (ANSYS). However,
and big data capabilities, the maturity of big data ecosystem application the parameters from other domains (processes and logistics) are not
is categorized into three stages: accounted for in the FEA simulation. One potential research direction is
customized products by integrating End-Of-Life data from PLM into
Stage 1: proposing a big data framework and platform; CAx software. It could improve the product's impact during its EOL
Stage 2: harvesting cloud computing capacity for big data com- phase, such as Prognostics and health management (PHM) [184]).
puting and storage; Another direction is that simulation of digital twin models with real-
Stage 3: analysing big data with various algorithms for the appli- time data could be used for predictive maintenance. Moreover, simu-
cations (prediction, fault detection, optimisation etc.). lation can provide integral data to data scientists or business users to
design machine learning algorithms. Because Matlab simulation man-
Table 3 presents the allocation of proposed solutions by these three ager is available to simulate multiple models on the public cloud
stages. MES, SCADA and O&M have been studied within the three (Azure, AWS), more case studies of methods above could be im-
stages. However, big-data based solution is missing in some manu- plemented to verify the performance of big data tools for digital twin
facturing systems. For example, no general big data framework has and digital thread. Since Matlab is commercial software, unlike most of
been identified in CAx (Stage 1); no practices of Stage 2 and Stage 3 are the software in the big data ecosystem that is open source and free to
found in PLM, SCM, BI and AM. Available capabilities of big data use, users have to purchase the specific licenses to run the models on
ecosystems could drive research innovation in various big data appli- the cloud. AnyLogic is also a very popular simulation software which
cations (Fig. 5) of the immature manufacturing areas. provides cloud-based simulation recently. Researchers could use Any-
Manufacturing could benefit from new development and deploy- Logic Cloud to deploy and verify their digital twin models.
ment methods of the big data ecosystem. Because many tools are fast Simulations with these large volumes of data require enormous
changing with highly frequent updates, it makes the design of the big computing, storage and communication resources, and cloud com-
data platform dramatically challenging. Most of the proposed solutions puting is well placed to satiate these. Another disruptive technology is
in manufacturing constructed the platform from scratch by installing Quantum computing, which can provide unlimited resources to process
and testing every tool step by step. Researchers spent much time pre- and store the data [197]. However, the technology is still under de-
paring the software but not focusing on programming and big data velopment and still has limitations, such as fault tolerance and error
analytics. Several popular vendors offer their Hadoop distributions to correlation [185].
mitigate the issue, such as Cloudera (CDH), Hortonworks (HDP, HDF), Future research directions could be:
MapR, IBM (Infosphere BigInsights), Microsoft (HD Insight), Pivotal HD
[195]. Because the tools and versions were tested in the distributions, • Develop big data tools to convert simulation models from scientific
they are ready to be used by researchers and enterprises in manu- software to implement on public cloud or private cloud;
facturing. Moreover, the packages could be easily deployed and shared • Using a simulation method to generate testing and training data for
with virtualization technology (VM and container) [191]. However, machine learning at planning and decision-making processes;
developers need to evaluate several conditions to select the suitable • Using general-purpose cloud computing cluster to simulate FEA at
distribution such as open source, pricing, customer support, sizes of product design.
community and so on [195].
4.4.2. Connectivity and interoperability
4.4. Question 4: what are the future directions of big data applications in Big data ecosystem complements of existing manufacturing ap-
manufacturing? proaches to have connectivity and interoperability. With the aim of
systems integration and collaboration for smart manufacturing, the
Manufacturing could benefit from fast developing big data basis is that data and information have to be timely collected, correctly
14
formatted, analyzed and exchanged among the systems. MTConnect and collaboration cannot be achieved for smart manufacturing.
and OPC-UA are both data collection approaches in manufacturing. Future research directions in this area could be:
MTConnect focuses on device level and control level by monitoring
CNC machine tools using a predefined consistent data model, data • Review the availability and feasibility of data collection tools in
format and definition, which matches different vendors' machine tools. various manufacturing scenarios;
OPC-UA focuses on SCADA, MES and ERP by using a generic data • Develop a generic data transformation solution with big data tech-
model which can flexibly match more industrial devices with additional nologies to exchange data of manufacturing systems;
configuration effort. There are three challenging issues of both ap-
proaches. One issue is the performance limitation of OPC-UA which the 4.4.3. Standardized big data platform design
CPU of OPC-UA server is identified as the main bottleneck in produc- Standardization of big data platform design improves the feasibility
tion [198]. Since both approaches are implemented on edge devices at of enterprise-ready solutions in manufacturing. Because some essential
the factory floor, the capabilities of data processing cannot be flexibly components are missing in the proposed solutions, it is likely to increase
scaled up. The servers of both approaches have to be replaced with the difficulty to apply them to manufacturing. Missing components
better performing hardware, while more data sources are connecting to would be insufficient to apply big data applications to smart manu-
the systems such as IoT devices. Another issue is interoperability of the facturing. However, there is no standard approach to design big data
systems using both approaches since there is no common ontology on platform in manufacturing. The reason may be that different profiles of
the top of both information models. The last issue is the capabilities of manufacturing enterprises have varieties of system requirements for big
data analysis and information exchange between both systems and data applications.
other systems such as SCM, PLM, and CRM. Although some solutions A systematic assessment method is required to analyze the limita-
are proposed to exchange both data formats and other formats (such as tions and strengths of the proposed big data solution in the various
MTConnect and IEEE 1451 [118], OPC-UA and IoT [117], OPC-UA and manufacturing systems. Data issues are not entirely assessed, such as
AutomationML [98]), more solutions are required for data exchange of latency of data transmission among clusters, data quality and data
massive data formats between these two systems and others systems as format exchange. Therefore, it is essential to provide a standard ap-
shown in Table 1. proach to design big data platform with related assessment method to
Four existing big data ingestion tools (Sqoop, Flume, Kafka and manufacturing.
Storm) and a new tool (NiFi) could complement the weaknesses of both
approaches. All five big data tools can scale out the capability of data 4.4.4. Real time big data analytics
processing by adding new hardware to the clusters without replacing Big data analytics go deeper from batch analysing to real-time
the old ones. There are some differences between the five tools. The streaming analyzing in manufacturing. On the one hand, streaming big
first difference is that Sqoop and Flume are two tools of Hadoop eco- data analysis is considered as a high research requirement in manu-
system; while Kafka and Storm are not dedicated to Hadoop. Secondly, facturing [187]. One the other hand, enabling technologies are chan-
data consumers require to pull data with Kafka, whereas Flume pushes ging from non-real time analytics to real time. Fig. 4 illustrates that the
data to consumers. Thirdly, Kafka provides better fault-tolerance and streaming computation engine Spark is becoming a popular tool than
scalable than others. Kafka provides event duplication, which means the traditional batching engine (MapReduce of Hadoop). However, it
other nodes continuously make data available when one node is failed. also shows that Hadoop still receives more focus than Spark in manu-
Compared with Kafka, Storm, Flume and NiFi do not provide event facturing from Fig. 4. Batch processing is not able to provide real-time
duplication, which is theoretically not suitable for application of critical analytics response such as real time monitoring, dynamic scheduling
missions such as safety, security, and finance in manufacturing. Storm, and planning on systems of workshop floor (SCADA, MES). Micro-
Flume and NiFi require less developing effort to work with Hadoop. batching and streaming engines (Spark, Storm, Flink) can provide real-
Interoperability has two main issues: data format and data quality. time big data analytics of streaming data [26]. Spark could take ad-
With the aim of interoperability among systems on BDA platform, data vantage of batching and streaming to replace MapReduce engine.
needs to be correctly formatted with good quality before exchange Moreover, compared to Storm and Flink, Spark has more powerful
happens. Table 1 demonstrated various data formats in manufacturing analytics tools such as SparkSQL, Spark R, GraphX, and MLlib.
systems. In terms of data formats, there are two types of data: data with However, Storm and Flink outperform Spark in real-time concerns.
schema (XML with given XSD/DTD) and data without schema such as Hence, with the objective of streaming and analytics, it is necessary to
XML without given XSD/DTD, JSON and unstructured data (document, use Storm or Flink with Spark together. The issue is that it requires
report, email). Since data with schema define the schemas in standard more development effort to work on several computation engines.
schema files (XSD/DTD: PLM XML), data is can be exchanged auto- Apache Beam provides a uniform abstraction layer to run these real-
matically by mapping elements of each other's schemas. It is challen- time engines at the execution layer [44]. Although it has not been used
ging to automatically transform data without schema to an intended in manufacturing, researchers in manufacturing could focus on analy-
schema since there is no mapping between both sides. One solution may tics logics without spending time on learning various usages of engines.
be using natural language processing (NLP) to process human readable Another fact of big data analytics forwarding to real-time is that data
documents. BDA platform also has to support new file formats such as warehousing tools (Pig, Hive) of BI used MapReduce to batch proces-
AMF and 3MF [199]. sing large datasets, now that their latest versions support Spark engine.
Issues of data quality in Table 2 includes missing value, noise data
or anomaly data, uncertainty, data outlier, data correlation, timing and 4.4.5. Cybersecurity in manufacturing
synchronization. Data transformation can address these issues to extract Cybersecurity will continuously challenge manufacturing since se-
data to correct timely information. No generic transformation tool is curity standards are still not available in some system such as SCM [49].
identified in the reviewed papers which can be used as “One size fit all” Recently, NIST published a framework to improve cybersecurity on
tool for all the applications in manufacturing. Because manufacturers critical infrastructures [188]. It is envisioned that there would be more
use various manufacturing systems with different data characterizes, developments in cybersecurity tools based on the new standard. No big
there are many combinations of data issues which require massive data data tool of cybersecurity was found in all reviewed papers, which is
transformation tools to address. It is challenging for manufacturers to likely a promising research direction. Manufacturing could benefit from
develop customized big data transformation tool for every specific the following new security tools. Firstly, Apache Metron is an en-
scenario. Without addressing these data issues, correct information terprise-ready real-time big data security tool, which is used by Telstra
cannot be extracted and exchanged among systems. Systems integration Company [200]. Secondly, Apache Ranger provides security
15
administration management on Hadoop clusters. Thirdly, Apache Knox system integration, data, prediction, sustainability, resource sharing
provides gateway service to access Hadoop clusters. Future direction in and hardware. Afterwards, the nine essential components of big data
this area is to explore the capability of big data cybersecurity tools on ecosystem are presented to design a feasible big data solution to man-
critical systems in manufacturing such as safety, security and SCADA. ufacturing enterprises. These are data ingestion, storage, computing,
Vulnerability in both the physical and cyberspace of the manufacturing analytics, visualization, management, workflow, infrastructure and se-
systems must be identified and protected. The potential risk and da- curity. The evaluation reveals that there is no enterprise-ready big data
mages warrant high priority of future research in this direction. solution in the reviewed literature.
It is important to note that some research areas have received less
5. Conclusions and future work attention from the manufacturing community such as PLM, CAx, ERP
and SCM. Many big data utilities are applicable to these areas, which
This paper systematically reviews the state of art of big data re- could drive research innovation.
search in manufacturing to evaluate the capabilities of big data eco- Regarding future work, there are five promising directions: mod-
system and requirements of smart manufacturing. Six key drivers of big eling and simulation, connectivity and interoperability, standardized
data ecosystem are identified for smart manufacturing, which are big data platform design, real-time big data analytics and cybersecurity.
Supplementary materials
Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.rcim.2019.101861.
Appendices
Tables A1 and A2.
Table A1
Search strings of four citation databases.
Search strings Number of papers
Scopus TITLE-ABS-KEY (("hadoop") OR ("spark") OR ("Storm") OR ("Flink") OR ("SQL") OR ("nosyl") OR ("time-series database") OR ("opc-usa") OR 228
("MTconnect")) AND (TITLE-ABS-KEY ("manufacturing") OR TITLE-ABS-KEY ("Industry 4.0”) OR TITLE-ABS-KEY ("Smart manufacturing") OR
TITLE-ABS-KEY ("Digital twin") OR TITLE-ABS-KEY ("Digital thread")) AND NOT (TITLE-ABS-KEY ("construction") OR TITLE-ABS-KEY
("healthcare") OR TITLE-ABS-KEY ("oil") OR TITLE-ABS-KEY ("energy") OR TITLE-ABS-KEY ("Agriculture"))
IEEE Xplore (("Author Keywords":hadoop OR "Author Keywords":spark OR "Author Keywords":storm OR "Author Keywords":flink OR "Author 63
Keywords":sql OR "Author Keywords":nosql OR "Author Keywords":"time-series database" OR "Author Keywords":"opc-ua" OR "Author
Keywords":"mtconnect") AND ("Abstract":manufacturing OR "Abstract":"Industrial 4.0” OR "Abstract":"industrial automation" OR
"Abstract":"smart manufacturing" OR "Abstract":"digital twin" OR "Abstract":"digital thread"))
ASME (hadoop, spark, storm, flink, sql, nosql, "time series database", "opc-ua", mtconnect) AND (manufacturing, "industry 4.0”, "industrial 25
automation", "smart manufacturing", "digital twin", "digital thread")
ACM ((keywords.author.keyword:(hadoop, spark, storm, flink) AND content.ftsec:(apache)) OR keywords.author.keyword:(sql, nosql, "time series 22
database", opc-ua, mtconnect)) AND recordAbstract:(manufacturing, "industrial automation", "smart manufacturing", "digital twin", "digital
thread")
Table A2
Category distribution of reviewed articles.
Categories Systems Reference
Product CAD/CAE/CAPP/CAM [55,53,56,125]

PLM [58,59]
Production ERP [126,62,62,97,94]
MOM/MES [25,65,127,128,69,66,116,6]
SCADA/DCS/HMI [137,73,71,39,134,201,133,131,130,129]
IIoT/CNC/Robot [166,202,78,77,140]
O&M [138,24,83,139,161,156,157,142,82]
QMS [85,88,109,87]
Safety [89,203]
Business SCM/CRM/BI/AM [93,90,181]
ICT architecture CPS/CM/ICT [64,146,68,160,153,99,132,101,143,75,148,95,149,204,96,76]
Data analytics/DM [105,121,205,102,152,206,151,207,208]
KM [164,23,106]
References Intelligence). Johannes, Recommendations for implementing the strategic in-

itiative INDUSTRIE 4.0, Final Rep. Ind. 4.0 WG. (2013) 82.
[3] Y. Liao, F. Deschamps, E. de, F.R. Loures, L.F.P. Ramos, Past, present and future of
[1] S. de Treville, M. Ketokivi, V. Singhal, Competitive manufacturing in a high-cost industry 4.0 - a systematic literature review and research agenda proposal, Int. J.
environment: introduction to the special issue, J. Oper. Manag. 49–51 (2017) 1–5, Prod. Res. 55 (2017) 3609–3629, https://doi.org/10.1080/00207543.2017.
https://doi.org/10.1016/j.jom.2017.02.001. 1308576.
[2] H. (Deutsche P.A. Henning, Kagermann (National Academy Of Science and [4] A. Kusiak, Smart manufacturing, Int. J. Prod. Res. 56 (2018) 508–517, https://doi.
Engineering). Wolfgang, Wahlster (German Research Center for Artificial org/10.1080/00207543.2017.1351644.
16
[5] B. (Serm) Kulvatunyou, N. Ivezic, V. Srinivasan, On architecting and composing Informatics 10 (2014) 1435–1442, https://doi.org/10.1109/TII.2014.2306383.
engineering information services to enable smart manufacturing, J. Comput. Inf. [35] D. Wu, D.W. Rosen, L. Wang, D. Schaefer, Cloud-based design and manufacturing:
Sci. Eng. 16 (2016) 031002, , https://doi.org/10.1115/1.4033725. A new paradigm in digital manufacturing and design innovation, CAD Comput.
[6] B.W. Jeon, J. Um, S.C. Yoon, S. Suk-Hwan, An architecture design for smart Aided Des. 59 (2015) 1–14, https://doi.org/10.1016/j.cad.2014.07.006.
manufacturing execution system, Comput. Aided. Des. Appl. 4360 (2016) 1–14, [36] F. Tao, Q. Qi, A. Liu, A. Kusiak, Data-driven smart manufacturing, J. Manuf. Syst.
https://doi.org/10.1080/16864360.2016.1257189. (2018), https://doi.org/10.1016/j.jmsy.2018.01.006.
[7] X. Xu, From cloud computing to cloud manufacturing, robot, Comput. Integr. [37] M. Soualhia, F. Khomh, S. Tahar, Task scheduling in Big Data platforms: A sys-
Manuf. 28 (2012) 75–86, https://doi.org/10.1016/j.rcim.2011.07.002. tematic literature review, J. Syst. Softw. 134 (2017) 170–189, https://doi.org/10.
[8] J. Li, F. Tao, Y. Cheng, L. Zhao, Big Data in product lifecycle management, Int. J. 1016/j.jss.2017.09.001.
Adv. Manuf. Technol. 81 (2015) 667–684, https://doi.org/10.1007/s00170-015- [38] Y. Demchenko, P. Grosso, C. De Laat, P. Membrey, Addressing Big Data issues in
7151-x. scientific data infrastructure, Proc. 2013 Int. Conf. Collab. Technol. Syst. CTS
[9] J. Wang, Q. Chang, G. Xiao, N. Wang, S. Li, Data driven production modeling and 2013, 2013, pp. 48–55, , https://doi.org/10.1109/CTS.2013.6567203.
simulation of complex automobile general assembly plant, Comput. Ind. 62 (2011) [39] J. Moyne, J. Samantaray, M. Armacost, Big Data capabilities applied to semi-
765–775, https://doi.org/10.1016/j.compind.2011.05.004. conductor manufacturing advanced process control, IEEE Trans. Semicond. Manuf.
[10] S. Yang, B. Bagheri, H.-A. Kao, J. Lee, A unified framework and platform for de- 29 (2016) 283–291, https://doi.org/10.1109/TSM.2016.2574130.
signing of cloud-based machine health monitoring and manufacturing systems, J. [40] Y. Demchenko, C. De Laat, P. Membrey, Defining architecture components of the
Manuf. Sci. Eng. 137 (2015) 040914, , https://doi.org/10.1115/1.4030669. Big Data ecosystem, 2014 Int. Conf. Collab. Technol. Syst. CTS 2014, 2014, pp.
[11] H. Sequeira, P. Carreira, T. Goldschmidt, P. Vorst, Energy cloud: real-time cloud- 104–112, , https://doi.org/10.1109/CTS.2014.6867550.
native energy management system to monitor and analyze energy consumption in [41] S. Landset, T.M. Khoshgoftaar, A.N. Richter, T. Hasanin, A survey of open source
multiple industrial sites, Proc. - 2014 IEEE/ACM 7th Int. Conf. Util. Cloud Comput. tools for machine learning with big data in the Hadoop ecosystem, J. Big Data 2
UCC 2014, 2014, pp. 529–534, , https://doi.org/10.1109/UCC.2014.79. (2015) 24, https://doi.org/10.1186/s40537-015-0032-1.
[12] R.Y. Zhong, S.T. Newman, G.Q. Huang, S. Lan, Big Data for supply chain man- [42] S. Binani, A. Gutti, S. Upadhyay, SQL vs. NoSQL vs. NewSQL-a comparative study,
agement in the service and manufacturing sectors: challenges, opportunities, and Commun. Appl. Electron. 6 (2016) 43–46 http://www.caeaccess.org/archives/
future perspectives, Comput. Ind. Eng. 101 (2016) 572–591, https://doi.org/10. volume6/number1/binani-2016-cae-652418.pdf.
1016/j.cie.2016.07.013. [43] K. Grolinger, W.A. Higashino, A. Tiwari, M.A.M. Capretz, Data management in
[13] H.M. Chen, R. Schutz, R. Kazman, F. Matthes, Amazon in the air: innovating with cloud environments: NoSQL and NewSQL data stores, J. Cloud Comput. (2013) 2,
big data at Lufthansa, Proc. Annu. Hawaii Int. Conf. Syst. Sci. 2016-March, 2016, https://doi.org/10.1186/2192-113X-2-22.
pp. 5096–5105, , https://doi.org/10.1109/HICSS.2016.631. [44] M. Gökalp, K. Kayabay, M. Zaki, A. Koçyiğit, Big-Data data analytics architecture
[14] M. Helu, T. Hedberg, A. Barnard Feeney, Reference architecture to integrate for businesses: A comprehensive review on new open-source big-data tools,
heterogeneous manufacturing systems for the digital thread, CIRP J. Manuf. Sci. Cambridge Service Alliance, 2017, pp. 1–35, , https://doi.org/10.13140/RG.2.2.
Technol. 19 (2017) 191–195, https://doi.org/10.1016/j.cirpj.2017.04.002. 30306.84165.
[15] F. Tao, F. Sui, A. Liu, Q. Qi, M. Zhang, B. Song, Z. Guo, S.C.Y. Lu, A.Y.C. Nee, [45] J. Wan, S. Tang, Z. Shu, D. Li, S. Wang, M. Imran, A.V. Vasilakos, Software-Defined
Digital twin-driven product design framework, Int. J. Prod. Res. 7543 (2018) industrial internet of things in the context of industry 4.0, IEEE Sens. J. 16 (2016)
1–19, https://doi.org/10.1080/00207543.2018.1443229. 7373–7380, https://doi.org/10.1109/JSEN.2016.2565621.
[16] L. Monostori, B. Kádár, T. Bauernhansl, S. Kondoh, S. Kumara, G. Reinhart, [46] M.J. Koop, W. Huang, K. Gopalakrishnan, D.K. Panda, Performance analysis and
O. Sauer, G. Schuh, W. Sihn, K. Ueda, Cyber-physical systems in manufacturing, evaluation of PCIe 2.0 and quad-data rate InfiniBand, Proc. - Symp. High Perform.
CIRP Ann. Manuf. Technol. 65 (2016) 621–641, https://doi.org/10.1016/j.cirp. Interconnects, Hot Interconnects, 2008, pp. 85–92, , https://doi.org/10.1109/
2016.06.005. HOTI.2008.26.
[17] F. Bonomi, R. Milito, P. Natarajan, J. Zhu, Big Data and internet of things: aA [47] J. Cheng, L. Da Xu, W. Chen, F. Tao, C.-L. Lin, Industrial IoT in 5G environment
roadmap for smart environments, 2014. doi:10.1007/978-3-319-05029-4. towards smart manufacturing, J. Ind. Inf. Integr. (2018), https://doi.org/10.1016/
[18] J. Gubbi, R. Buyya, S. Marusic, M. Palaniswami, Internet of things (IoT): a vision, j.jii.2018.04.001.
architectural elements, and future directions, Futur. Gener. Comput. Syst. 29 [48] P. Zikopoulos, C. Eaton, Understanding Big Data: aAnalytics for enterprise class
(2013) 1645–1660, https://doi.org/10.1016/j.future.2013.01.010. hadoop Hadoop and streaming data: aAnalytics for enterprise class hadoop
[19] P. Mell, T. Grance, The NIST definition of cloud computing, NIST Spec. Publ. 145 Hadoop and streaming data, 2011. https://www.immagic.com/eLibrary/
(2011) 7, https://doi.org/10.1136/emj.2010.096966. ARCHIVES/EBOOKS/I111025E.pdf.
[20] B.V. Dhar, Data science and prediction, Commun. ACM 56 (2013) 64–73. [49] Y. Lu, K. Morris, S. Frechette, Current standards landscape for smart manu-
[21] A.J.C. Trappey, C.V. Trappey, U. Hareesh Govindarajan, A.C. Chuang, J.J. Sun, A facturing systems, Natl. Inst. Stand. Technol. NISTIR 8107 (2016) 39, https://doi.
review of essential standards and patent landscapes for the internet of things: a key org/10.6028/NIST.IR.8107.
enabler for industry 4.0, Adv. Eng. Informatics. 33 (2017) 208–229, https://doi. [50] S. Evdokimov, RFID and the internet of things: tTechnology, applications, and
org/10.1016/j.aei.2016.11.007. security challenges, 2010. doi:10.1561/0200000020.
[22] R. Kosara, C. Healey, Visualization viewpoints: data, information and knowledge [51] D. Kiritsis, V.K. Nguyen, J. Stark, How closed-loop PLM improves knowledge
in visualization, Comput. Graph. …. (2003), http://ieeexplore.ieee.org/xpls/abs_ management over the complete product lifecycle and enables the factory of the
all.jsp?arnumber=1210860. future, Int. J. Prod. Lifecycle Manag. 3 (2008) 54, https://doi.org/10.1504/IJPLM.
[23] S. Wang, J. Wan, D. Li, C. Liu, Knowledge reasoning with semantic data for real- 2008.019970.
time data processing in smart factory, Sensors (Switzerland) 18 (2018) 1–10, [52] P. Buneman, Semistructured data, Proc. 16th Symp. Princ. Database Syst. 1997,
https://doi.org/10.3390/s18020471. pp. 117–121, , https://doi.org/10.1145/263661.263675.
[24] L. Spendla, M. Kebisek, P. Tanuska, L. Hrcka, Concept of predictive maintenance [53] Y. Xu, G. Chen, J. Zheng, An integrated solution—KAGFM for mass customization
of production systems in accordance with industry 4.0, SAMI 2017 - IEEE 15th Int. in customer-oriented product design under cloud manufacturing environment, Int.
Symp. Appl. Mach. Intell. Informatics, Proc. 2017, pp. 405–410, , https://doi.org/ J. Adv. Manuf. Technol. 84 (2016) 85–101, https://doi.org/10.1007/s00170-015-
10.1109/SAMI.2017.7880343. 8074-2.
[25] O. Morariu, C. Morariu, T. Borangiu, S. Răileanu, Manufacturing systems at scale [54] S. Chen, C. Yin, X. Li, Implementation of MTConnect in machine monitoring
with Big Data streaming and online machine learning, Stud. Comput. Intell. 762 system for CNCs, Proc. - 2017 5th Int. Conf. Enterp. Syst. Ind. Digit. by Enterp.
(2018) 253–264, https://doi.org/10.1007/978-3-319-73751-5_19. Syst. ES 2017, 2017, pp. 70–75, , https://doi.org/10.1109/ES.2017.19.
[26] K. Nagorny, P. Lima-Monteiro, J. Barata, A.W. Colombo, Big Data analysis in smart [55] S.L. Chan, Y. Lu, Y. Wang, Data-driven cost estimation for additive manufacturing
manufacturing: a review, Int. J. Commun. Netw. Syst. Sci. 10 (2017) 31–58, in cybermanufacturing, J. Manuf. Syst. 46 (2018) 115–126, https://doi.org/10.
https://doi.org/10.4236/ijcns.2017.103003. 1016/j.jmsy.2017.12.001.
[27] D. Reinsel, J. Gantz, J. Rydning, Data age 2025: tThe digitization of the world [56] S.K., K.C., H.Y., S.O. Yang, G-code conversion from 3D model data for 3D printers
from edge to core, 2018. https://www.seagate.com/files/www-content/our- on Hadoop systems, 2017 4th Int. Conf. Comput. Appl. Inf. Process. Technol. 2017,
story/trends/files/idc-seagate-dataage-whitepaper.pdf. pp. 1–4, , https://doi.org/10.1109/CAIPT.2017.8320709.
[28] A. Gandomi, M. Haider, Beyond the hype: Big data concepts, methods, and ana- [57] S. Sierla, V. Kyrki, P. Aarnio, V. Vyatkin, Automatic assembly planning based on
lytics, Int. J. Inf. Manage. 35 (2015) 137–144, https://doi.org/10.1016/j. digital product descriptions, Comput. Ind. 97 (2018) 34–46, https://doi.org/10.
ijinfomgt.2014.10.007. 1016/j.compind.2018.01.013.
[29] J. Dean, S. Ghemawat, MapReduce: simplified data processing on large clusters, [58] Y. Zhang, S. Ren, Y. Liu, T. Sakao, D. Huisingh, A framework for Big Data driven
Communications of the ACM 51 (1) (2008) 107–113. product lifecycle management, J. Clean. Prod. 159 (2017) 229–240, https://doi.
[30] K. Shvachko, H. Kuang, S. Radia, R. Chansler, The Hadoop distributed file system, org/10.1016/j.jclepro.2017.04.172.
2010 IEEE 26th Symp. Mass Storage Syst. Technol. 2010, pp. 1–10, , https://doi. [59] Y. Zhang, S. Ren, Y. Liu, S. Si, A big data analytics architecture for cleaner man-
org/10.1109/MSST.2010.5496972. ufacturing and maintenance processes of complex products, J. Clean. Prod. (2016),
[31] J. Duda, Business intelligence and NoSQL databases, Inf. Syst. Manag. 1 (2012) https://doi.org/10.1016/j.jclepro.2016.07.123.
25–37. [60] D. Ramanujan, W.Z. Bernstein, M.A. Totorikaguena, C.F. Ilvig, K.B. Ørskov,
[32] R. Cattell, Scalable SQL and NoSQL data stores, Acm Sigmod Rec. 39 (2010) Generating contextual design for environment principles in sustainable manu-
12–27. facturing using visual analytics, J. Manuf. Sci. Eng. 141 (2018) 021016, , https://
[33] Y. Bao, L. Ren, L. Zhang, X. Zhang, Y. Luo, Massive sensor data management doi.org/10.1115/1.4041835.
framework in cloud manufacturing based on Hadoop, IEEE Int. Conf. Ind. [61] T. Hedberg, A.B. Feeney, M. Helu, J.A. Camelio, Toward a lifecycle information
Informatics, 2012, pp. 397–401, , https://doi.org/10.1109/INDIN.2012.6301192. framework and technology in manufacturing, J. Comput. Inf. Sci. Eng. 17 (2017)
[34] F. Tao, Y. Cheng, L. Da Xu, L. Zhang, B.H. Li, CCIoT-CMfg: cloud computing and 021010, , https://doi.org/10.1115/1.4034132.
internet of things-based cloud manufacturing service system, IEEE Trans. Ind. [62] J.W. Wang, J.Y. Yang, J. Zhang, X.X. Wang, W. (Chris) Zhang, Big data driven
17
cycle time parallel prediction for production planning in wafer manufacturing, 1216–1226, https://doi.org/10.1007/s10033-017-0179-0.
Enterp. Inf. Syst. 12 (2018) 1–19, https://doi.org/10.1080/17517575.2018. [89] S.A. Jacobs, A. Dagnino, Large-scale industrial alarm reduction and critical events
1450998. mining using graph analytics on spark, Proc. - 2016 IEEE 2nd Int. Conf. Big Data
[63] X. Li, J. Song, B. Huang, A scientific workflow management system architecture Comput. Serv. Appl. BigDataService 2016, 2016, pp. 66–71, , https://doi.org/10.
and its scheduling based on cloud service platform for manufacturing big data 1109/BigDataService.2016.21.
analytics, Int. J. Adv. Manuf. Technol. 84 (2015) 119–131, https://doi.org/10. [90] H. Kemper, H. Baars, H. Lasi, An integrated business intelligence framework:
1007/s00170-015-7804-9. closing the gap between IT support for management and for production, Bus.
[64] M.Y. Santos, J. Oliveira e, C. Sá, F. Andrade, E. Vale Lima, C. Costa, B. Costa, Intell. Perform. Manag. Springer, London, 2013, pp. 13–27, , https://doi.org/10.
J. Martinho, A. Galvão, Big Data system supporting Bosch Braga industry 4.0 1007/978-1-4471-4866-1.
strategy, Int. J. Inf. Manage. 37 (2017) 750–760, https://doi.org/10.1016/j. [91] B.R. Ferrer, W.M. Mohammed, J.L.M. Lastra, A solution for processing supply
ijinfomgt.2017.07.012. chain events within ontology-based descriptions, (2016) 4877–4883.
[65] S. Wang, C. Zhang, C. Liu, D. Li, H. Tang, Cloud-assisted interaction and nego- [92] M.J.A.G. Izaguirre, A. Lobov, J.L.M. Lastra, OPC-UA and DPWS interoperability
tiation of industrial robots for the smart factory, Comput. Electr. Eng. 63 (2017) for factory floor monitoring using complex event processing, IEEE Int. Conf. Ind.
66–78, https://doi.org/10.1016/j.compeleceng.2017.05.025. Informatics (2011) 205–210, https://doi.org/10.1109/INDIN.2011.6034874.
[66] N.M. Khushairi, N.A. Emran, M.M. Mohd Yusof, Database performance tuning [93] J. Campos, P. Sharma, U.G. Gabiria, E. Jantunen, D. Baglee, A Big Data analytical
methods for manufacturing execution system, World Appl. Sci. J. 30 (2014) architecture for the asset management, Procedia CIRP 64 (2017) 369–374,
91–99, https://doi.org/10.5829/idosi.wasj.2014.30.icmrp.14. https://doi.org/10.1016/j.procir.2017.03.019.
[67] B.W. Jeon, J. Um, S.C. Yoon, S. Suk-hwan, An architecture design for smart [94] P. Gölzer, L. Simon, P. Cato, M. Amberg, Designing global manufacturing networks
manufacturing execution system, 4360 (2017). doi:10.1080/16864360.2016. using Big Data, Procedia CIRP 33 (2015) 191–196, https://doi.org/10.1016/j.
1257189. procir.2015.06.035.
[68] M. Jirkovský, V., Obitko, Enabling semantics within industry 4.0, Int. Conf. Ind. [95] M. Zimmermann, U. Breitenbucher, M. Falkenthal, F. Leymann, K. Saatkamp,
Appl. Holonic Multi-Agent Syst. 2017, pp. 39–52, , https://doi.org/10.1007/978- Standards-based function shipping - How to use TOSCA for shipping and executing
3-642-40090-2. data analytics software in remote manufacturing environments, Proc. - 2017 IEEE
[69] O. Sauer, Developments and trends in shopfloor-related ICT systems, IEEE Int. 21st Int. Enterp. Distrib. Object Comput. Conf. EDOC 2017, 2017, pp. 50–60, ,
Conf. Ind. Eng. Eng. Manag. 2015-Janua, 2014, pp. 1352–1356, , https://doi.org/ https://doi.org/10.1109/EDOC.2017.16 2017-January.
10.1109/IEEM.2014.7058859. [96] L.K. B, C. Gr, K. Jan, E. Hoos, C. Kiefer, C. Weber, S. Silcher, B. Mitschang, The
[70] B.E.I. Systron, D. Division, Managing configuration control in an automotive stuttgart IT architecture for manufacturing an architecture for the data-driven
sensor mass customization manufacturing product line, 2006 World Autom. factory, Int. Conf. Enterp. Inf. Syst. 2016, pp. 53–80, , https://doi.org/10.1007/
Congr. 2006, pp. 1–16. 978-3-319-62386-3.
[71] J.-J. Kim, D.-W. Lee, D.-B. Ko, S.-I. Jeong, J.-M. Park, An autonomic computing [97] W.M. Mohammed, B.R. Ferrer, L. Jose, M. Lastra, D. Aleixo, C. Agostinho,
based on big data platform for high-reliable smart factory, J. Eng. Appl. Sci. 12 Configuring and visualizing the data resources in a cloud-based data collection
(2017) 2662–2666 https://www.scopus.com/inward/record.uri?eid=2-s2.0- framework, 2017 Int. Conf. Eng. Technol. Innov. Eng. Technol. Innov. Manag.
85041023417&partnerID=40&md5=003f099909f142b4879dffa79001fe1c. Beyond 2020 New Challenges, New Approaches, ICE/ITMC 2017 - Proc. 2018, pp.
[72] P. Reboredo, M. Keinert, Integration of discrete manufacturing field devices data 1201–1208, , https://doi.org/10.1109/ICE.2017.8280017 2018-January.
and services based on OPC UA, IECON Proc. (Industrial Electron. Conf. 2013, pp. [98] X. Ye, T.Y. Park, S.H. Hong, Y. Ding, A. Xu, Implementation of a production-
4476–4481, , https://doi.org/10.1109/IECON.2013.6699856. control system using integrated automation ML and OPC UA, 2018 Work. Metrol.
[73] A.R. Khan, H. Schioler, M. Kulahci, T. Knudsen, Big data analytics for industrial Ind. 4.0 IoT, MetroInd 4.0 IoT 2018 - Proc. 2018, pp. 242–247, , https://doi.org/
process control, 2017 22nd IEEE Int. Conf. Emerg. Technol. Fact. Autom. 2017, pp. 10.1109/METROI4.2018.8428310.
1–8, , https://doi.org/10.1109/ETFA.2017.8247658. [99] Y. Cao, S. Wang, L. Kang, C. Li, L. Guo, Study on machining service modes and
[74] X. Ye, S.H. Hong, An AutomationML/OPC UA-based industry 4.0 solution for a resource selection strategies in cloud manufacturing, Int. J. Adv. Manuf. Technol.
manufacturing system, IEEE Int. Conf. Emerg. Technol. Fact. Autom. ETFA. 2018- 81 (2015) 597–613, https://doi.org/10.1007/s00170-015-7222-z.
September, 2018, pp. 543–550, , https://doi.org/10.1109/ETFA.2018.8502637. [100] S.K. Panda, T. Schroder, L. Wisniewski, C. Diedrich, PlugProduce integration of
[75] D. Mourtzis, E. Vlachou, A cloud-based cyber-physical system for adaptive shop- components into OPC UA, based data-space, IEEE Int. Conf. Emerg. Technol. Fact.
floor scheduling and condition-based maintenance, J. Manuf. Syst. 47 (2018) Autom. ETFA, 2018, pp. 1095–1100, , https://doi.org/10.1109/ETFA.2018.
179–198, https://doi.org/10.1016/j.jmsy.2018.05.008. 8502663 2018-September.
[76] C. Toro, I. Barandiaran, J. Posada, A perspective on knowledge based and in- [101] F. Tao, L. Zhang, Y. Liu, Y. Cheng, L. Wang, X. Xu, Manufacturing service man-
telligent systems implementation in industrie 4.0, Procedia Comput. Sci. 60 (2015) agement in cloud manufacturing: overview and future research directions, J.
362–370, https://doi.org/10.1016/j.procs.2015.08.143. Manuf. Sci. Eng. 137 (2015) 040912, , https://doi.org/10.1115/1.4030510.
[77] E. Poormohammady, J.H. Reelfs, M. Stoffers, K. Wehrle, A. Papageorgiou, [102] H. Yang, M. Park, M. Cho, M. Song, S. Kim, A system architecture for manu-
Dynamic algorithm selection for the logic of tasks in IoT stream processing sys- facturing process analysis based on big data and process mining techniques, Big
tems, 2017 13th Int. Conf. Netw. Serv. Manag. CNSM 2017. 2018-Janua, 2018, pp. Data (Big Data), 2014 IEEE Int. Conf. 2014, pp. 1024–1029, , https://doi.org/10.
1–5, , https://doi.org/10.23919/CNSM.2017.8256009. 1109/BigData.2014.7004336.
[78] D. Mourtzis, E. Vlachou, N. Milas, Industrial Big Data as a result of IoT adoption in [103] J.C.C. Tseng, J.Y. Gu, P.F. Wang, C.Y. Chen, C.F. Li, V.S. Tseng, A scalable complex
manufacturing, Procedia CIRP 55 (2016) 290–295, https://doi.org/10.1016/j. event analytical system with incremental episode mining over data streams, 2016
procir.2016.07.038. IEEE Congr. Evol. Comput. CEC 2016, 2016, pp. 648–655, , https://doi.org/10.
[79] J. Moyne, J. Samantaray, M. Armacost, Big data emergence in semiconductor 1109/CEC.2016.7743854.
manufacturing advanced process control, Adv. Semicond. Manuf. Conf. (ASMC), [104] B. Suryajaya, C.C. Chen, M.H. Hung, Y.Y. Liu, J.X. Liu, Y.C. Lin, A fast large-size
2015 26th Annu. SEMI. 2015, pp. 130–135, , https://doi.org/10.1109/ASMC. production data transformation scheme for supporting smart manufacturing in
2015.7164483. semiconductor industry, IEEE Int. Conf. Autom. Sci. Eng. 2018, pp. 275–281, ,
[80] D. Mourtzis, A. Vlachou, V. Zogopoulos, Cloud-based augmented reality remote https://doi.org/10.1109/COASE.2017.8256114 2017-Augus.
maintenance through shop-floor monitoring: A product-service system approach, [105] A. Luckow, M. Cook, N. Ashcraft, E. Weill, E. Djerekarov, B. Vorster, Deep learning
J. Manuf. Sci. Eng. 139 (2017) 061011, , https://doi.org/10.1115/1.4035721. in the automotive industry: Applications and tools, Big Data Int. Conf. Big Data.
[81] Y. Busnel, N. Riveei, A. Gal, Y. Busnel, N. Riveei, A. Gal, A. Gal, FlinkMan : 2016, pp. 3759–3768, , https://doi.org/10.1109/BigData.2016.7841045.
aAnomaly detection in manufacturing equipment with Apache Flinkapache flink: [106] A. Brodsky, G. Shao, M. Krishnamoorthy, A. Narayanan, D. Menasce, R. Ak,
gGrand challenge, (2017). doi:10.1145/3093742.3095099. Analysis and optimization in smart manufacturing based on a reusable knowledge
[82] I. Yen, S. Zhang, F. Bastani, A framework for IoT-based monitoring and diagnosis base for process performance models, Proc. - 2015 IEEE Int. Conf. Big Data, IEEE
of manufacturing systems, Proc. - 11th IEEE Int. Symp. Serv. Syst. Eng. SOSE 2017, Big Data 2015, 2015, pp. 1418–1427, , https://doi.org/10.1109/BigData.2015.
2017, pp. 1–8, , https://doi.org/10.1109/SOSE.2017.26. 7363902.
[83] S. Kang, W.T.K. Chien, J.G. Yang, A study for big-data (Hadoop) application in [107] S.C. Feng, W.Z. Bernstein, T. Hedberg, A. Barnard Feeney, Toward knowledge
semiconductor manufacturing, IEEE Int. Conf. Ind. Eng. Eng. Manag. 2016-Decem, management for smart manufacturing, J. Comput. Inf. Sci. Eng. 17 (2017) 031016,
2016, pp. 1893–1897, , https://doi.org/10.1109/IEEM.2016.7798207. , https://doi.org/10.1115/1.4037178.
[84] X. Li, Z. Tu, Q. Jia, X. Man, H. Wang, X. Zhang, Deep-level quality management [108] M. Stonebraker, I.F. Ilyas, Data integration: The current status and the way for-
based on big data analytics with case study, Proc. - 2017 Chinese Autom. Congr. ward, IEEE Data Eng. Bull. 41 (2018) 3–9.
CAC 2017. 2017-Janua, 2017, pp. 4921–4926, , https://doi.org/10.1109/CAC. [109] D. Zhang, B. Xu, J. Wood, Predict failures in production lines: A two-stage ap-
2017.8243651. proach with clustering and supervised learning, Proc. - 2016 IEEE Int. Conf. Big
[85] M. Syafrudin, N.L. Fitriyani, D. Li, G. Alfian, J. Rhee, Y.S. Kang, An open source- Data, Big Data 2016, 2016, pp. 2070–2074, , https://doi.org/10.1109/BigData.
based real-time data processing architecture framework for manufacturing sus- 2016.7840832.
tainability, Sustain (2017) 9, https://doi.org/10.3390/su9112139. [110] V. Jirkovský, M. Obitko, P. Novák, P. Kadera, Big data analysis for sensor time-
[86] S. Venkatesh, Web-enabled real-time quality feedback for factory systems using series in automation, 19th IEEE Int. Conf. Emerg. Technol. Fact. Autom. ETFA
MTConnect, ASME 2012 Int. Des. Eng. Tech. Conf. Comput. Inf. Eng. Conf. 2012, 2014, 2014, https://doi.org/10.1109/ETFA.2014.7005183.
pp. 403–409. [111] J. Bakakeu, J. Fuchs, T. Javied, M. Brossog, J. Franke, H. Klos, W. Eberlein,
[87] H.-M. Hou, J.-F. Kung, Y.-B. Hsu, Y. Yamazaki, K. Maruyama, Y. Toyoshima, S. Tolksdorf, J. Peschke, L. Jahn, Multi-Objective design space exploration for the
C. Chen, Prediction of ppm level electrical failure by using physical variation integration of advanced analytics in cyber-physical production systems, IEEE Int.
analysis, Proc.SPIE. (2016) 9778, https://doi.org/10.1117/12.2229410. Conf. Ind. Eng. Eng. Manag. 2019, pp. 1866–1873, , https://doi.org/10.1109/
[88] M.-K. Zheng, X.-G. Ming, X.-Y. Zhang, G.-M. Li, MapReduce based parallel baye- IEEM.2018.8607483 2019-December.
sian network for manufacturing quality control, Chinese J. Mech. Eng. 30 (2017) [112] R. Lynn, W. Louhichi, M. Parto, E. Wescoat, T. Kurfess, Rapidly deployable
18
MTConnect-based machine tool monitoring systems, Proc. ASME 12TH Int. Manuf. factory, Int. J. Control Autom. 11 (2018) 91–98.
Sci. Eng. Conf. - 2017, 3 2017. [139] H. Chen, X. Fei, S. Wang, X. Lu, G. Jin, W. Li, X. Wu, Energy consumption data
[113] D. Libes, S. Shin, J. Woo, Considerations and recommendations for data avail- based machine anomaly detection, 2014 Second Int. Conf. Adv. Cloud Big Data,
ability for data analytics for manufacturing, 3rd IEEE Int. Conf. Big Data, IEEE Big 2014, pp. 136–142, , https://doi.org/10.1109/CBD.2014.24.
Data, 2015, pp. 68–75, , https://doi.org/10.1109/BigData.2015.7363743. [140] N. Stojanovic, M. Dinic, L. Stojanovic, A data-driven approach for multivariate
[114] C. Zhao, L. Zhang, X.Z.L. Zhang, Cloud manufacturing resource management based contextualized anomaly detection: industry use case, Proc. - 2017 IEEE Int. Conf.
on metadata, ASME 2015 Int. Manuf. Sci. Eng. Conf. MSEC 2015, 2 2015, pp. 1–8, Big Data, Big Data 2017. 2018-Janua, 2018, pp. 1560–1569, , https://doi.org/10.
, https://doi.org/10.1115/MSEC20159388. 1109/BigData.2017.8258090.
[115] Y. Cheng, W. Shang, L. Zhu, D. Zhang, D. Feng, Items analysis of postal super- [141] M. Canizo, E. Onieva, A. Conde, S. Charramendieta, S. Trujillo, Real-time pre-
vision, 2016 IEEE/ACIS 15th Int. Conf. Comput. Inf. Sci. ICIS 2016 - Proc. 2016, dictive maintenance for wind turbines using Big Data frameworks, IEEE Int. Conf.
pp. 3–5, , https://doi.org/10.1109/ICIS.2016.7550949. Progn. Heal. Manag. 2017, pp. 1–8, , https://doi.org/10.1109/ICPHM.2017.
[116] P. Gaj, K. Andrzej, P. Stera, Ontology-Based integrated monitoring of Hadoop 7998308.
clusters in industrial environments with OPC UA and RESTful web services, [142] T. Zaarour, N. Pavlopoulou, S. Hasan, U. ul Hassan, E. Curry, Grand challenge:
Commun. Comput. Inf. Sci. 522 (2015) 162–171, https://doi.org/10.1007/978-3- automatic anomaly detection over sliding windows, Proc. 11th ACM Int. Conf.
319-19419-6. Distrib. Event-Based Syst. - DEBS ’17, 2017, pp. 310–314, , https://doi.org/10.
[117] H. Derhamy, J. Ronnholm, J. Delsing, J. Eliasson, J. Van Deventer, Protocol in- 1145/3093742.3095105.
teroperability of OPC UA in service oriented architectures, Proc. - 2017 IEEE 15th [143] J.H. Lee, S. Do Noh, H.-J. Kim, Y.-S. Kang, Implementation of cyber-physical
Int. Conf. Ind. Informatics, INDIN 2017, 2017, pp. 44–50, , https://doi.org/10. production systems for quality prediction and operation control in metal casting,
1109/INDIN.2017.8104744. Sensors (Switzerland) (2018) 18, https://doi.org/10.3390/s18051428.
[118] K.B. Lee, E.Y. Song, P.S. Gu, Integration of MTConnect and standard-based sensor [144] A. Stojadinović, Industry Paper : Dynamic monitoring for improving worker safety
networks for manufacturing equipment monitoring, ASME 2012 Int. Manuf. Sci. at the workplace: uUse case from a manufacturing shop floor, (2015) 205–216.
Eng. Conf. MSEC, 2012, pp. 4–8. [145] H. Haskamp, F. Orth, J. Wermann, A.W. Colombo, Implementing an OPC UA in-
[119] A.N. Richter, T.M. Khoshgoftaar, S. Landset, T. Hasanin, A multi-dimensional terface for legacy PLC-based automation systems using the Azure cloud: An ICPS-
comparison of toolkits for machine learning with Big Data, Proc. - 2015 IEEE 16th architecture with a retrofitted RFID system, Proc. - 2018 IEEE Ind. Cyber-Physical
Int. Conf. Inf. Reuse Integr. IRI 2015, 2015, pp. 1–8, , https://doi.org/10.1109/ Syst. ICPS 2018, 2018, pp. 115–121, , https://doi.org/10.1109/ICPHYS.2018.
IRI.2015.12. 8387646.
[120] A. Jos, Integration of sensors, controllers and instruments, Sensors (2017), https:// [146] D. Cemernek, H. Gursch, R. Kern, Big data as a promoter of industry 4.0: Lessons of
doi.org/10.3390/s17071512. the semiconductor industry, Proc. - 2017 IEEE 15th Int. Conf. Ind. Informatics,
[121] A. Luckow, K. Kennedy, F. Manhardt, E. Djerekarov, B. Vorster, A. Apon, INDIN 2017, 2017, pp. 239–244, , https://doi.org/10.1109/INDIN.2017.8104778.
Automotive big data: applications, workloads and infrastructures, Proc. - 2015 [147] C. Ellwein, O. Riedel, O. Meyer, D. Schel, Rent'n’Produce: A secure cloud manu-
IEEE Int. Conf. Big Data, IEEE Big Data 2015, 2015, pp. 1201–1210, , https://doi. facturing platform for small and medium enterprises, 2018 IEEE Int. Conf. Eng.
org/10.1109/BigData.2015.7363874. Technol. Innov. ICE/ITMC 2018 - Proc. 2018, pp. 1–6, , https://doi.org/10.1109/
[122] C. Mathis, Data Lakes, Datenbank-Spektrum. (2017). doi:10.1007/s13222-017- ICE.2018.8436332.
0272-7. [148] N. Ferry, G. Terrazas, P. Kalweit, A. Solberg, S. Ratchev, D. Weinelt, Towards a big
[123] M. Sarnovsky, P. Bednar, M. Smatana, Data integration in scalable data analytics data platform for managing machine generated data in the cloud, Proc. - 2017
platform for process industries, 2017 IEEE 21st Int. Conf. Intell. Eng. Syst. 2017, IEEE 15th Int. Conf. Ind. Informatics, INDIN 2017, 2017, pp. 263–270, , https://
pp. 187–192, , https://doi.org/10.1109/INES.2017.8118553. doi.org/10.1109/INDIN.2017.8104782.
[124] M. Sarnovsky, P. Bednar, M. Smatana, Big data processing and analytics platform [149] A. Angrish, B. Starly, Y.S. Lee, P.H. Cohen, A flexible data schema and system
architecture for process industry factories, Big Data Cogn. Comput. 2 (2018) 3, architecture for the virtualization of manufacturing machines (VMM), J. Manuf.
https://doi.org/10.3390/bdcc2010003. Syst. 45 (2017) 236–247, https://doi.org/10.1016/j.jmsy.2017.10.003.
[125] G.L. Ooi, Y.-H. Wang, P.S. Tan, Z. Zhang, Y. Gao, J.K. Chow, Y. Wu, Q. Yuan, [150] R.S. Peres, A.D. Rocha, A. Coelho, J. Barata Oliveira, A. Highly Flexible,
Customizable and scalable geotechnical laboratory testing and field monitoring Distributed data analysis framework for industry 4.0 manufacturing systems, in:
with new sensing and big data technologies, ICSMGE 2017 - 19th Int. Conf. Soil T Borangiu, D. Trentesaux, A. Thomas, P. Leitão, J.B. Oliveira (Eds.), Serv.
Mech. Geotech. Eng. 2017, pp. 471–474. Orientat. Holonic Multi-Agent Manuf. Springer International Publishing, Cham,
[126] H. Liang, L. Feng, Z. Chun, Application of the Big Data technology for massive data 2017, pp. 373–381.
of the whole life cycle of EMU, in: F. Xhafa, S. Patnaik, Z. Yu (Eds.), Recent Dev. [151] M.Y. Santos, J.O. e Sá, C. Costa, J. Galvão, C. Andrade, B. Martinho, F.V. Lima,
Intell. Syst. Interact. Appl. Springer International Publishing, Cham, 2017, pp. E. Costa, A big data analytics architecture for industry 4.0, WorldCIST 2017
219–224. Recent Adv. Inf. Syst. Technol, 2017, https://doi.org/10.1007/978-3-319-56538-
[127] L. Zheng, L. Tang, T. Li, B. Duan, M. Lei, P. Wang, C. Zeng, L. Li, Y. Jiang, W. Xue, 5 0.
J. Li, C. Shen, W. Zhou, H. Li, Applying data mining techniques to address critical [152] P. Tanuska, L. Spendla, M. Kebisek, Data integration for incidents analysis in
process optimization needs in advanced manufacturing, Proc. 20th ACM SIGKDD manufacturing infrastructure, Comput. Conf. 2017, 2017, pp. 340–345.
Int. Conf. Knowl. Discov. Data Min. - KDD ’14, 2014, pp. 1739–1748, , https://doi. [153] H. Lin, J.A. Harding, C. Chen, A hyperconnected manufacturing collaboration
org/10.1145/2623330.2623347. system using the semantic web and Hadoop ecosystem system, Procedia CIRP 52
[128] S. Windmann, A. Maier, O. Niggemann, C. Frey, A. Bernardi, Y. Gu, H. Pfrommer, (2016) 18–23, https://doi.org/10.1016/j.procir.2016.07.075.
T. Steckel, M. Krüger, R. Kraus, Big data analysis of manufacturing processes, J. [154] C. Gröger, Building an industry 4.0 analytics platform, Datenbank-Spektrum.
Phys. Conf. Ser. (2015) 659, https://doi.org/10.1088/1742-6596/659/1/012055. (2018). doi:10.1007/s13222-018-0273-1.
[129] R. Bohlin, L. Lindkvist, J. Hagmar, J.S. Carlson, K. Bengtsson, Data flow and [155] M.C. Domenech, L.P. Rauta, M.D. Lopes, P.H. Da Silva, R.C. Da Silva, B.W. Mezger,
communication framework supporting digital twin for geometry assurance Robert, M.S. Wangham, Providing a smart industrial environment with the web of things
Proc. ASME 2017 Int. Mech. Eng. Congr. Expo. 2017, pp. 1–7. and cloud computing, Proc. - 2016 IEEE Int. Conf. Serv. Comput. SCC 2016, 2016,
[130] K.E. Harper, J. Zheng, S.A. Jacobs, A. Dagnino, A. Jansen, T. Goldschmidt, pp. 641–648, , https://doi.org/10.1109/SCC.2016.89.
A. Marinakis, Industrial analytics pipelines, Proc. - 2015 IEEE 1st Int. Conf. Big [156] J. Wan, S.,. Tang, D. Li, S. Wang, C. Liu, H. Abbas, A.V. Vasilakos, A manufacturing
Data Comput. Serv. Appl. BigDataService 2015, 2015, pp. 242–248, , https://doi. Big Data solution for active preventive maintenance, IEEE Trans. Ind. Informatics.
org/10.1109/BigDataService.2015.38. 13 2017, pp. 2039–2047, , https://doi.org/10.1201/b15906-13.
[131] T. Goldschmidt, M.K. Murugaiah, C. Sonntag, B. Schlich, S. Biallas, P. Weber, [157] N. Rivetti, Y. Busnel, A. Gal, Grand challenge: Flinkman - Anomaly detection in
Cloud-based control : a multi-tenant, horizontally scalable Soft-PLC, 2015 IEEE 8th manufacturing equipment with Apache Flink, Proc. 11th ACM Int. Conf. Distrib.
Int. Conf. Cloud Comput. 2015, https://doi.org/10.1109/CLOUD.2015.124. Event-Based Syst. - DEBS ’17, 2017, pp. 274–279, , https://doi.org/10.1145/
[132] H.-S. Park, J.-H. Kim, C.-H. Choi, B.-R. Jung, K.-H. Lee, S.-Y. Chi, W.-S. Cho, In- 3093742.3095099.
Memory data grid system for real-time processing of machine sensor data in a [158] M. Canizo, E. Onieva, A. Conde, S. Charramendieta, S. Trujillo, Real-time pre-
smart factory environment, Proc. 2015 Int. Conf. Big Data Appl. Serv. - BigDAS dictive maintenance for wind turbines using Big Data frameworks, (2017) 1–8.
’15, 2015, pp. 92–97, , https://doi.org/10.1145/2837060.2837073. [159] M.R. Brule, Big data in EP: Real-time adaptive analytics and data-flow archi-
[133] N. Ramakrishnanus, R. Ghosh, Distributed dynamic elastic nets : a scalable ap- tecture, Soc. Pet. Eng. - SPE Digit. Energy Conf. Exhib. 2013, 2013, pp. 305–311, ,
proach for regularization in dynamic manufacturing environments, 2015 IEEE Int. https://doi.org/10.2118/163721-MS.
Conf. Big Data, 2015, pp. 2752–2761. [160] D. Wu, J. Terpenny, L. Zhang, R. Gao, T. Kurfess, Fog-enabled architecture for
[134] S. Wang, J. Ouyang, D. Li, C. Liu, An integrated industrial ethernet solution for the data-driven cyber-manufacturing systems, ASME 2016 11th Int. Manuf. Sci. Eng.
implementation of smart factory, IEEE Access (2017) 25455–25462, https://doi. Conf. MSEC 2016, 2 2016, https://doi.org/10.1115/MSEC2016-8559
org/10.1109/ACCESS.2017.2770180. V002T04A032.
[135] S. Division, C. Hsing, N. Village, N. City, N. County, M. Availability, Developing a [161] R. Gao, L. Wang, R. Teti, D. Dornfeld, S. Kumara, M. Mori, M. Helu, Cloud-enabled
cloud virtual maintenance system for machine tools management, 2015 11th Int. prognosis for manufacturing, CIRP Ann. - Manuf. Technol. 64 (2015) 749–772,
Conf. Heterog. Netw. Qual. Reliab. Secur. Robustness. 2015, pp. 358–364. https://doi.org/10.1016/j.cirp.2015.05.011.
[136] J. Wan, S. Tang, D. Li, S. Wang, C. Liu, H. Abbas, S. Member, A.V. Vasilakos, A [162] Y. Wu, S. Wang, Streaming analytics processing in manufacturing performance
manufacturing big data solution for active preventive maintenance, IEEE Trans. monitoring and prediction, 2017 IEEE Int. Conf. Big Data (Big Data) (2017)
Ind. INFORMATICS, 2017, pp. 2039–2047. 3285–3289.
[137] J.-H. Ku, A study on prediction model of equipment failure through analysis of Big [163] Y. Zhang, S. Ren, Y. Liu, T. Sakao, D. Huisingh, A framework for Big Data driven
Data based on RHadoop, Wirel. Pers. Commun. 98 (2017) 3163–3176, https://doi. product lifecycle management, J. Clean. Prod. 159 (2017) 229–240, https://doi.
org/10.1007/s11277-017-4151-1. org/10.1016/j.jclepro.2017.04.172.
[138] W. Lee, J. Cho, L. Lee, S. Korea, Time series abnormal data detection for smart [164] M. Naeem, N. Moalla, Y. Ouzrout, A. Bouaras, An ontology based digital
19
preservation system for enterprise collaboration, 2014 IEEE/ACS 11th Int. Conf. standardization roadmap industry 4.0, (2016) 523.
Comput. Syst. Appl. 2014, pp. 691–698, , https://doi.org/10.1109/AICCSA.2014. [187] W. and S. Kagermann, H., Riemensperger, F., Hoke, D., Helbig, J., Stocksmeier, D.,
7073267. Wahlster, Recommendations for the strategic initiative coordination and editing,
[165] Q.P. He, J. Wang, D. Shah, N. Vahdat, Statistical process monitoring for IoT-en- 2015.
abled cybermanufacturing: Opportunities and challenges, IFAC-PapersOnLine 50 [188] NIST, Framework for improving critical infrastructure cybersecurity, 2018. doi:10.
(2017) 14946–14951, https://doi.org/10.1016/j.ifacol.2017.08.2546. 1109/JPROC.2011.2165269.
[166] F. Tao, J. Cheng, Q. Qi, IIHub: An industrial internet-of-things hub toward smart [189] P. Wang, R.X. Gao, Z. Fan, Cloud computing for cloud manufacturing: Benefits and
manufacturing based on cyber-physical system, IEEE Trans. Ind. Informatics. 14 limitations, J. Manuf. Sci. Eng. 137 (2015) 044002, , https://doi.org/10.1115/1.
2018, pp. 2271–2280, , https://doi.org/10.1109/TII.2017.2759178. 4030209.
[167] S. Ramírez-Gallego, A. Fernández, S. García, M. Chen, F. Herrera, Big Data, [190] H. Fang, Managing data lakes in Big Data era, Cyber Technol. Autom. Control.
Tutorial and guidelines on information and process fusion for analytics algorithms Intell. Syst. (CYBER), 2015 IEEE Int. Conf. 2015, pp. 820–824.
with MapReduce, Inf. Fusion. 42 (2018) 51–61, https://doi.org/10.1016/j.inffus. [191] D. Bernstein, Containers and cloud: From LXC to docker to kubernetes, IEEE Cloud
2017.10.001. Comput 1 (2014) 81–84, https://doi.org/10.1109/MCC.2014.51.
[168] S. Michael, SQL databases v. NoSQL databases, Commun. ACM. 53 (2010) 10–11, [192] K. Stouffer, J. Falco, K. Kent, Guide to supervisory control and data acquisition
https://doi.org/10.1145/1721654.1721659. (SCADA) and industrial control systems security, NIST Spec. Publ. SP800-82,
[169] I. Kovačević, I. Mekterović, Alternative business intelligence engines, Inf. 2006, pp. 800–882, , https://doi.org/10.6028/NIST.SP.800.82.
Commun. Technol. Electron. Microelectron. 2017, pp. 1617–1622, , https://doi. [193] S. Nazir, S. Patel, D. Patel, Assessing and augmenting SCADA cyber security: A
org/10.23919/MIPRO.2017.7973638. survey of techniques, Comput. Secur. 70 (2017) 436–454, https://doi.org/10.
[170] P. Kannan, Beyond Hadoop Mapreduce Apache Tez and Apache Spark, (2015). 1016/j.cose.2017.06.010.
[171] S. Chintapalli, D. Dagit, B. Evans, R. Farivar, T. Graves, M. Holderbaugh, Z. Liu, [194] H. Shih, W, Ludwig, The biggest challenges of data-driven manufacturing, Harv.
K. Nusbaum, K. Patil, B.J. Peng, P. Poulosky, Benchmarking streaming computa- Bus. Rev. (2016), https://hbr.org/2016/05/the-biggest-challenges-of-data-driven-
tion engines: Storm, flink and spark streaming, Proc. - 2016 IEEE 30th Int. Parallel manufacturing accessed June 6, 2019.
Distrib. Process. Symp. IPDPS 2016, 2016, pp. 1789–1792, , https://doi.org/10. [195] Allae Erraissi, Abdessamad Belangour, Abderrahim Tragha, Digging into Hadoop-
1109/IPDPSW.2016.138. based Big Data architectures, Int. J. Comput. Sci. Issues IJCSI 14 (2017) 52–59,
[172] D. Abadi, R. Agrawal, A. Ailamaki, M. Balazinska, P.A. Bernstein, M.J. Carey, https://doi.org/10.20943/01201706.5259.
S. Chaudhuri, J. Dean, A. Doan, M.J. Franklin, J. Gehrke, The Beckman report on [196] H. Fleischmann, S. Spreng, J. Kohl, D. Kisskalt, J. Franke, Distributed condition
database research, Commun. ACM. 59 (2016) 92–99. monitoring systems in electric drives manufacturing, 2016 6th Int. Electr. Drives
[173] S.V. Ranawade, Online analytical processing on hadoop Hadoop using Apache Prod. Conf. EDPC 2016 - Proc. 2016, pp. 52–57, , https://doi.org/10.1109/EDPC.
Kylin, 12 (2017) 1–5. 2016.7851314.
[174] M. Islam, A.K. Huang, M. Battisha, M. Chiang, S. Srinivasan, C. Peters, [197] J. Manyika, M. Chui, J. Bughin, R. Dobbs, P. Bisson, Marrs, Disruptive technolo-
A. Neumann, A. Abdelnur, Oozie: towards a scalable workflow management gies: Advances that Will Transform Life, Business, and the Global Economy,
system for Hadoop, Proc. 1st ACM SIGMOD Work. Scalable Work. Exec. Engines McKinsey Glob. Insitute, 2013, p. 163 http://www.mckinsey.com/insights/
Technol. - SWEET ’12, 2012, pp. 1–10, , https://doi.org/10.1145/2443416. business_technology/disruptive_technologies%5Cnhttp://www.chrysalixevc.com/
2443420. pdfs/mckinsey_may2013.pdf.
[175] K. Sundaravarathan, P. Martin, D. Rope, M. McRoberts, C. Statchuk, MEWSE: [198] A. Burger, H. Koziolek, J. Rückert, M. Platenius-Mohr, G. Stomberg, Bottleneck
multi-engine workflow submission and execution on Apache YARN, Proc. 26th identification and performance modeling of OPC UA opc ua communication
Annu. Int. Conf. Comput. Sci. Softw. Eng. 2016, pp. 194–200 http://dl.acm.org/ models, (2019) 231–242. doi:10.1145/3297663.3309670.
citation.cfm?id=3049877.3049897. [199] Deutsche kommission elektrotechnik DKE; din DIN e.V., german German stan-
[176] I. Suriarachchi, B. Plale, Crossing analytics systems : a case for integrated prove- dardization roadmap industry 4.0, 2016.
nance in data lakes, 2016 IEEE 12th Int. Conf. e-Science Crossing, Baltimore, MD, [200] D. Loshin, Data integration, Bus. Intell. (2013) 189–210, https://doi.org/10.1016/
USA, IEEE, 2016, pp. 349–354, , https://doi.org/10.1109/eScience.2016. B978-0-12-385889-4.00013-2.
7870919. [201] J.M. Gutierrez-Guerrero, J.A. Holgado-Terriza, IMMAS an industrial meta-model
[177] M.O. Gökalp, K. Kayabay, M. Zaki, A. Koçyiğit, P.E. Eren, A. Neely, Big-Data data for automation system using OPC UA, Elektron, Ir Elektrotechnika 23 (2017) 3–11,
analytics architecture for businesses : a comprehensive review on new open-source https://doi.org/10.5755/j01.eie.23.3.18324.
big-data tools, (2017). [202] Y.H. Wu, S. De Wang, L.J. Chen, C.J. Yu, Streaming analytics processing in
[178] B. Stein, A. Morrison, The enterprise data lake: Better integration and deeper manufacturing performance monitoring and prediction, Proc. - 2017 IEEE Int.
analytics, PWC Technol. Forecast Rethink. Integr. 2014 http://www.pwc.com/us/ Conf. Big Data, Big Data 2017, 2018, pp. 3285–3288, , https://doi.org/10.1109/
en/technology-forecast/2014/cloud-computing/assets/pdf/pwc-technology- BigData.2017.8258312 2018-January.
forecast-data-lakes.pdf. [203] A. Stojadinović, N. Stojanović, L. Stojanović, Dynamic monitoring for improving
[179] N. Miloslavskaya, A. Tolstoy, Big Data, fast data and data lake concepts, Procedia worker safety at the workplace, Proc. 9th ACM Int. Conf. Distrib. Event-Based Syst.
Comput. Sci. 88 (2016) 300–305, https://doi.org/10.1016/j.procs.2016.07.439. - DEBS ’15, 2015, pp. 205–216, , https://doi.org/10.1145/2675743.2771881.
[180] R. Bose, V. Sugumaran, Application of knowledge management technology in [204] S. Scholze, K. Nagorny, R. Siafaka, K. Krone, An approach for cloud-based situa-
customer relationship management, 10 (2003) 3–17. doi:10.1002/kpm.163. tional analysis for factories providing real-time reconfiguration services, in: 2017:
[181] B.R. Ferrer, W.M. Mohammed, J.L.M. Lastra, A solution for processing supply pp. 118–127. doi:10.1007/978-3-319-65151-4.
chain events within ontology-based descriptions, IECON Proc. (Industrial Electron. [205] L. Angrisani, G. Ianniello, I. Elettrica, N. Federico, Cloud based system for mea-
Conf. 2016, pp. 4877–4883, , https://doi.org/10.1109/IECON.2016.7793020. surement data management in large scale electronic production, Euro Med Telco
[182] M. Mercier, D. Glesser, Y. Georgiou, O. Richard, Big Data and HPC collocation : Conference (EMTC), IEEE, 2014, pp. 1–4.
using HPC idle resources for Big Data analytics, Big Data (Big Data), 2017 IEEE Int. [206] G. Hesse, B. Reissaus, C. Matthies, M. Lorenz, M. Kraus, M. Uflacker, Senska –
Conf. 2017, pp. 347–352. Towards an enterprise streaming benchmark, 2018. doi:10.1007/978-3-319-
[183] G. Adamson, L. Wang, M. Holm, P. Moore, Cloud manufacturing – a critical review 72401-0_3.
of recent development and future trends, Int. J. Comput. Integr. Manuf. (2017) [207] L. Banica, A. Hagiu, 4 - Using big data analytics to improve decision-making in
1–34, https://doi.org/10.1080/0951192X.2015.1031704. apparel supply chains A2 - Choi, Tsan-Ming BT - information systems for the
[184] J. Lee, H.D. Ardakani, S. Yang, B. Bagheri, Industrial Big Data analytics and cyber- fashion and apparel industry, Woodhead Publ. Ser. Text. Woodhead Publishing,
physical systems for future maintenance & service innovation, Procedia CIRP 38 2016, pp. 63–95 https://doi.org/10.1016/B978-0-08-100571-2.00004-X.
(2015) 3–7, https://doi.org/10.1016/j.procir.2015.08.026. [208] S. Saeidlou, M. Saadat, E.A. Sharifi, D. Guiovanni, An ontology-based intelligent
[185] E. Knill, Quantum computing with realistically noisy devices, Nature 434 (2005) data query system in manufacturing networks, Prod. Manuf. Res. 3277 (2017)
39–44, https://doi.org/10.1038/nature03350. 1–18, https://doi.org/10.1080/21693277.2017.1374887.
[186] Deutsche kommission elektrotechnik DKE; din DIN e.V., german German
20

Manufacturing Big Data Ecosystem A Systematic Literature Review

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Manufacturing Big Data Ecosystem A Systematic Literature Review

Загружено:

Авторское право:

Доступные форматы

Robotics and Computer Integrated Manufacturing 62 (2020) 101861

Contents lists available at ScienceDirect

Robotics and Computer Integrated Manufacturing

Manufacturing big data ecosystem: A systematic literature review T

ARTICLE INFO ABSTRACT

1. Introduction Customer Relationship Management (CRM) [13] etc. However, smart

Nomenclature MES Manufacturing executions system

Fig. 1. Conceptual framework of systematic literature review.

Fig. 2. Process of literature review methodology.

in real-time. • Computing engines: batch processing (MapReduce), iterative/near

Fig. 3. Smart manufacturing systems and various data formats.

Fig. 4. Chronological distribution of big data tools.

Third, to further focus on the nature of this research paper, the

3.1. Manufacturing systems

In 2016, NIST reported three dimensions of concerns in smart

customers and manufacturing enterprises (SCM, CRM, BI, asset man-

Production dimension is the triangle block, which includes an entire

production system (ERP, MOM/MES, SCADA/DCS/HMI, O&M, Safety,

Stream data analytics [103],

Communication Technology). The first three categories focus on en-

gineering functions and business, ICT architecture underpins all three

Table 2 in Appendices illustrates the distribution of reviewed articles by

these four categories.

3.2. Data source

3.2.1. Data format

Based on the three dimensions of SMS in NIST report, the standards

every manufacturing system in Fig. 3. The bold black text represents

Table 1 demonstrates the complete data formats found from the

• Structured data: data that is presented in tables and can be stored in

not presented in tables, such as XML, JSON, HTML [52];

ture, such as document, image, audio, video, text and e-mail.

manufacturing systems. In order to make these systems collaborative

and integrated, the transformation of these data formats is an essential

function to the manufacturing big data solutions. It also illustrates that

Tables A1 and A2.

Product CAD/CAE/CAPP/CAM [55,53,56,125]

References Intelligence). Johannes, Recommendations for implementing the strategic in-

Вам также может понравиться