Академический Документы
Профессиональный Документы
Культура Документы
Abstract— In today's e-Business environment, ERP, CRM, the use of significant horizontal scaling (more nodes) for
collaboration tools, and networked sensors may be characterized efficient processing.
as data generators resources. Business Intelligence (BI) is a term
that incorporates a range of analytical and decision support
II. DATA TYPES, SOURCES AND PROPERTIES
applications in business including data mining, decision support
systems, knowledge management systems, and online analytical Data are classified as structured (relational data model),
processing; processing data within these systems produce new semi-structured data (data model) or unstructured data (no data
data that are characterized to grow rapidly causing limitation model).
problem of data management if handled by a Relational
Database Management System (RDBMS) or statistical tools. Big data are generated by executing business and personal
Collectively these structured and unstructured data are referred data transformation processes; these data may be classified as
to as Big Data. Successful and efficient handling of Big Data transaction data, web text resources, log data (aka machine
requires deployment of specific IT infrastructure components as data), events. e-mail, social media, sensors, external feeds and
well as adopting an emerging service model. In this research we live streams, RFID scans, Form Text, explicit geographic
introduce a conceptual model that abstracts the processing positioning information known as geospatial data, audio, still
scheme of big data processing lifecycle. The model addresses the images, and videos.
main phases of the lifecycle: data acquisition, data serialization,
data aggregation, data analysis, data mining, knowledge
representation, and information dissemination. The model is
driven by projecting Service Oriented Architecture attributes to
the building block of the lifecycle and adhering to the Lifecycle
Modeling Language specification.
Fig. 2. A Context-level Data Flow Diagram (DFD) for the Big Data
Processing System.
Fig. 3. The main building block of Big Data Processing Lifecycle. Fig. 5. SOA Big Data Pocessing Conceptual Model
Fig. 6. Level-0 Data Flow Diagram (DFD) for the Big Data Processing System.
Based on the context diagram a logical DFD Level-0 CONCLUSION AND FUTURE WORK
diagram is exploded (Fig.6). The model illustrates the main
processes interaction with the main entities of the system. Formalizing Big Data Processing Lifecycle is a complicated
process. In this research, we have identified the main building
blocks of the processing lifecycle and abstracted the main
V. THE LIFECYCLE MODELING LANGUAGE entities of the system. A context diagram and level-0 data flow
Lifecycle Modeling Language (LML) [5] is a modeling diagrams are presented. We have utilized Lifecycle Modeling
language is based on entity, relationship, and attribute meta- Language to draft a big data processing lifecycle mode. These
data model [5]. Entities in LML are represented as actors. models shall provide data scientists with the knowledge they
LML is considered as an extension to SysML [6]. Advantages need to understand big data processing requirements.
of the language are supporting information capturing and
tracing throughout the lifecycle. LML supports a Future work will cover the process of generating level-1
Documentation Model, a Functional Model (modeling actions diagram by exploding the level-0 diagram and decomposing
and input/output processes), a Physical Model (modeling each of the processes within the level-0 diagram to a set of
Assets and resources, and connections), and a Parametric sub-processes. A formal requirements engineering
Entities (address parametric entities such as measures, risk, methodology will evaluated and selected where processes will
and location). Functional models can be easily deduced from be defined based on the requirements analysis process. A
logical and physical DFDs. Lifecycles are defined based on physical DFD will be also generated based on an
drafting the Action Diagram, the Asset Diagram, and the implementation of a big data processing platform.
Spider Diagram; additionally, some other optional diagrams Additionally, a system lifecycle model will be designed based
are also supported. on the logical and physical DFDs in conformance with
Lifecycle Modeling Language.
[3] The National Institute of Standards and Technology’s Joint Cloud and
REFERENCES Big Data Workshop,
http://www.nist.gov/itl/cloud/cloudbdworkshop.cfm.
[4] “OASIS Reference Model for Service Oriented Ar-chitecture 1.0,”
[1] Marakas O’Brien, Introduction to Information Systems, 6th Edition, public review draft 2, 2006; www.oasis-
McGraw-Hill, 2013. open.org/committees/download.php/18486/pr- 2changes.pdf.
[2] Raul F. Chong, Clara Liu, DB2 Essentials - Understanding DB2 in a Big [5] Lifecycle Modeling Lanaguage Specification, V.1,
Data World, IBM Press, 2014. http://www.lifecyclemodeling.org/specification/.
[6] OMG SysML 1.3 Specification. http://www.omg.org/spec/SysML/1.3/.