Академический Документы
Профессиональный Документы
Культура Документы
Conference Dates
December 8-10, 2017
Conference Venue
Metropolitan College, Thessaloniki, Greece
ISBN:
978-1-941968-46-8 ©2017 SDIWC
Published by
The Society of Digital Information and Wireless
Communications (SDIWC)
Wilmington, New Castle, DE 19801, USA
www.sdiwc.net
Table of Contents
Enterprise System Maturity – Past, Present and Future: A Case Study of Botswana …………………… 6
Using Dense Subgraphs to Optimize Ego-centric Aggregate Queries in Graph Databases …………… 59
A Secure Method for the Global Medical Information in Cloud Storage based on the Encryption
and Data Embedding ………………………………………………………………………………………………………………….. 68
Ontology-Based Data Mining Approach for Judo Technical Tactical Analysis ……………………….………90
The Agent-Based Model of The Dynamic Spectrum Access Networks with Network Switching
Mechanism ………………………………………………………………………………………………………………………………… 106
Proceedings of the Third International Conference on Computing Technology and Information Management (ICCTIM2017), Thessaloniki, Greece, 2017
Pasquina Campanella
Department of Computer Science
University of Bari “Aldo Moro”
via Orabona,4 – 70126 Bari – Italy
pasqua13.cp@libero.it
ABSTRACT
Current trends in the digital age lead us to learn [16], [20], [25]. These advances in the
that in the near future the access tools will become performance of digital components have
more contextual. It deduces that it no longer has resulted in a huge increase in the scope of IT
any sense to identify an e-learning system in a environments, and consequently, the need to
single monolithic platform but as a set consisting
manage them uniformly in a single “cloud”
of several interoperable components and
subcomponents to rationally manage the various was born [6], [26], [30]. The need for such
heterogeneous activities a training process can environments is particularly felt in the
undergo. In this scenario, cloud learning is born, exponential growth of network connected
“cloud formation”, which combines the ability to equipment and real time data streaming
draw resources distributed with contextual processes, as well as in the spread of service
information. But the problems that arise in the era oriented architectures and applications,
of distributed computing or fragmentation of collaborative and research projects [3], [19].
workload into an arbitrary number of sub-tasks to Cloud architecture has been the best candidate
be distributed to an unknown number of for solving some of the problems generated
heterogeneous machines spread around the world by large scale data processing for many
are that there is no the absolute certainty that
computer giants [27], [29]. In this context, a
network machines are always available (latency,
unpredictable network crashes) then continuous new hybrid model of resource utilization
monitoring is essential. In this context, the e- offered by computer networks was named,
learning platform Docebo Cloud has been studied which was named Cloud Computing [20],
to analyze the different response times. [28], [29]. Cloud computing is therefore a
new approach to the provision of continuous
KEYWORDS ICT resources that enables easy access to the
on demand network to a configurable
platform, cloud computing, services, architecture, computational resource pool [1], [7], [17],
test. [19]. Although the cloud landscape is still
extremely young, in recent years it has
1 INTRODUCTION become increasingly important in Information
and Communication Technology (ICT) and is
Since the ‘90 with Grid Computing and today, the new technology that will enable the entire
with the evolution of technologies and ways education system to be changed in the near
of using the users, we are witnessing the future, high-tech e-learning services that bring
proliferation of an interaction between high economic savings [10], [18], [19]. The
computing systems for a computational prospect leads to the development of
cooperation that moves the classical view of CLearning (Cloud-learning) and CMobile-
ICT towards large datacenters located in the Learning (CloudMobile-Learning) where the
territory [3], [6], [9], [22]. Hence, the rise of user will have access to data that will be
web 2.0 and content sharing and publishing shared in the cloud based on his request. The
services has led to the ability of users to have expression “cloud learning” can be translated
advanced services without having to resort to as “cloud formation”, pointing to a virtual
classical management of local resources [4],
space where you can store, share and consult software as a Service.
training data (work documents, formats Cloud computing defines several delivery
training formats, meeting records) on a service models among which the main ones
remote server [6], [23]. There are currently are [6], [12], [17]:
several suppliers on the market that have Software as a Service (SaaS) – the
enhanced their data center hosting provider's applications are accessible to the
applications in the cloud: from giants such as consumer from various client devices through
amazon, ibm, google, microsoft, sun a thin client interface (eg. gmail, google docs,
microsystems to cloud realties offered by salesforce.com CRM 8 solutions, zoho docs);
smaller companies such as goGrid [24]. You Platform as a Service (PaaS) –
can access applications through a browser and consumers have control over deployed
you can use any device that accesses the applications and hosting environment
network (pc, notebook, tablet, cell phones) configurations (eg. google app engine,
(Fig.1). In a cloud computing environment Force.com);
(cloud or cloud computing) three distinct Infrastructure as a Service (IaaS) –
actors are configured [2], [5], [12], [15], [17]: consumers can, depending on their needs,
Infrastructure Provider: provides provision storage, processing and use of
platforms by providing services (storage, resource based networks, eg. Amazon S3
applications, computing capabilities) (Simple Storage Service), Amazon EC2
generally following the pay-per-use model; (Elastic Compute Cloud), GoGrid.
Service Provider / Cloud User: choose This article discusses the Docebo Cloud
and configure the services offered by the platform in the first section, then experimental
vendor. Implement a service that uses the results conducted at various learning times
resources provided by the infrastructure and finally conclusions and future
provider and offers it to the end user. developments.
Cliente Final Client: uses services
configured by the service provider. In certain 2 DOCEBO CLOUD PLATFORM
cases the administrator and the end customer
may coincide. The spread of virtualization and cloud
computing technologies, coupled with an
increasing need to cut down on application
and system management costs in the IT
world, led to the spread of IT services
delivery policies in on demand mode to allow
the diffusion of new models, or the extension
of existing ones, related to software
distribution and access to software
applications [11], [14], [21], [27]. So here are
the software, platforms, or infrastructures that
are made available as services. These services
can be considered as core components for the
development of cloud computing [6], [24],
Figure 1. Representation of Cloud connections [29]. Based on this, the docebo cloud
platform, the e-learning platform as a service
Cloud computing, therefore, is a new way of designed to allow teachers and educators to
conceiving the supply and use of IT services create and manage online courses with ample
utilizing the convergence of three key interaction possibilities, is being studied.
elements [8], [15], [17], [27]: Docebo was born as an evolution of spaghetti
utility computing; learning, an LMS, developed in 2003 by the
virtualization of computing resources; same team of developers [7], [13]. Today it
[5] P. Campanella, “Cloud Computing: un nuovo [17] A. Manzalini, C. Moiso, E. Morandin, “Cloud
paradigma”, Atti VIII convegno nazionale Sie-l, computing: stato dell’arte e opportunità”,
connessi! Scenari di innovazione nella Notiziario Tecnico Telecom Italia, n. 2, 2009.
formazione e nella comunicazione, 14-15-
16/09/2011, Ledizioni, Reggio Emilia, Italy.
[18] M. Miller, “Cloud Computing: Web-Based
Applications that Change the Way you Work
[6] P. Campanella, “Cloud E-learning: un nuovo and Collaborate Online”, 2008.
binomio”, Atti Didamatica 2016 – Innovazione:
sfida comune di scuola, università, ricerca e
impresa, 30° edizione Aica, 19-20-21/04/2016, [19] E. Martins Morgado, “An exploratory essay on
pp. 128-137, Udine, Italy. Cloud Computing and its Impact on the use of
Information and Communication Technologies
in Education”, in education in a technological
[7] P. Campanella, “Method of experimental world: communicating current and emerging
evaluation of ICT in teaching”, Atti del research and technological efforts, ed. Mendez-
convegno Elearn 2011, world conference on e- vilas, formatex, 2011.
learning in corporate governement, healthcare e
higher education organized by AACE,
Honolulu, Hawaii, USA, 17-18-19-20- [20] S. Ouf, M. Nasr, Y. Helmy, “An enhanced e-
21/10/2011. learning eco system based on an integration
between cloud computing and Web 2.0”, Proc.
of the 10th IEEE International Symposium on
[8] D. Chandran, S. Kempegowda, “Hybrid E- Signal Processing and Information Technology
learning Platform based on Cloud Architecture (ISSPIT), pp. 48-55, 15-18/12/2010.
Model: a proposal”, Proc. International
conference on signal and image processing
(ICSIP), 2010, pp. 534-537, IEEE. [21] P. Pocatilu, F. Alecu, M. Vetrici, “Using Cloudth
Computing for Elearning Systems”, Proc. 8
that was referred to as the data processing National What was achieved
by Ward and Peppard [1]. After this era, a Development
Plan (NDP)
number of critical ICT systems that have Phase
been implemented in Botswana implemented the Master
government departments. The details are Information Systems Plan
described using the government National (MISP); Teacher Management
Development Plan (NDP). Since System (TMS) and the Student
Selection System (SSS) [13]
independence in 1966, Botswana’s
development process has been guided by Automation of some post office
successive National Development Plans counters, and the installation of
(NDPs). These have provided a medium Performance Management
term planning and budgeting framework System by Botswana Post [13]
(typically 5-6 years) for capital and
The Ministry of Trade, Industry,
recurrent expenditure, and have been a key Wildlife and Tourism
feature of Botswana’s system of (MTIWAT), developed the
development management. The plans Ministry's website, automated the
outline the government’s development office of Wildlife and National
Parks [13]
priorities for the plan period as well as the
NDP 9 (2003/4 The Ministry of Trade, Industry,
policies, programmes and projects required - 2008/09) Wildlife and Tourism
to achieve those priorities [19]. (MTIWAT) : Automate the
process of company registration
The Information System development over and business names, trade and
the years is shown in Table 4 following the industrial licenses and tourism
information management,
NDP plan. develop document management
and workflow management
Table 4: Information Technology Development process; review and develop all
According to the National Development Plan Department of Wildlife and
National Parks systems [13]
National What was achieved
Development Ministry of Local Government
Plan (NDP) (MLG): Tribal Land
Phase Administration System for land
NDP 7 Voters' registers, a new payroll use planning and management,
(1991/92- system and vehicle registration installation of Human Resources
1996/97) [13] and Payroll package to all the
NDP 8 (1997- National Registration System, Local Authorities, Financial
2002) Payroll System, and Management Computer System,
Computerised Personnel and Project Management.
Management System (CPMS) Document Management System,
[13] a website; database for recording
tribal ceremonies, and Social
Livestock Identification and Benefits System[13]
Trace back System (LITS),
Automated System for Customs Ministry of Works, Transport and
Data (ASYCUDA), Vehicle Communications (MWTC):
Registration System [20,21] Development of communications
Department of Supplies, network infrastructure. A
Government Data Network I, Tax comprehensive review and
Payer Management System I, and computerisation of the Road
Trade Statistics [13] Transport Permits system
earmarked to be executed as an
Water Utilities Corporation also integral part of the Vehicle
implemented the Master Registration and Licensing [13]
Information Systems Plan (MISP
Water Utilities Corporation also
Board (PPADB), the Botswana becomes a natural way of working and the
government announced that Korea IT principles behind the discipline are widely
Consulting was selected as the winner of adopted. This is depicted in Figure 1.
the “Professional Services for the
Development of an e-Government Service
Oriented Enterprise Architecture” project, 6. CONCLUSION
initiated by the Ministry of Transport and
Communications of Botswana [23]. This paper reveals significant progress
Integration seem to be never-ending made by the Botswana Government in
processes, because the enterprise continues attaining maturity. The achievements of
to evolve in its ever-changing environment the Botswana government have improved
as a result of adaptation to external forces, service delivery and increased accessibility
advances in technology, emerging business to government services. One lesson to be
models, new regulations and/or learnt from the Botswana government
optimisation of internal solutions; making approach is the incorporation of major ICT
what is today a fully integrated system, the projects into its national development
partly integrated system of tomorrow. In plans, the phased implementation of the
the present state, the enterprise ICT projects, and the governments'
architecture practice is delivering value. commitment to achieving the strategies
developed. This approach has enabled a
The country is now in a phase of defining great deal of achievements. National
integration, standards etc. and how all this development plans, among others,
come together within the various sectors of particularly serve as major statements of
the nation. This is being done in the government's development policies and
development of a Government Enterprise strategies.
Architecture [24].
Enterprise plans in many countries are not
The NDP 11 (2017-2023) hopes to achieve yet very clear; however Botswana has
what is highlighted in the e-Government taken a step forward to plan its national
master plan Government [25]. There are architecture. It is therefore recommended
fifteen areas of focus are: Upgrade e- that other countries in Africa emulate the
Government Strategy; Advancement of e- example of the Botswana government by
Governance; Project Evaluation System; e- embarking on a strategised plan of an
Document System; Network Optimization; integrated enterprise. The planning and
e-Education System; National Health implementation approaches (i.e.
Information System; Business Activity incorporation into national development
Support System; e-Procurement System; plans and phased implementation) adopted
Local Government Informatization; Civil are particularly worthy of emulation. It
Affairs Single Portal; Administrative explicitly explains how the development
Information Sharing Centre; Job Portal; e- agenda of a country if followed through
Agriculture and Government Enterprise can impact positively on the country's
Architecture (GEA) [25]. development.
Looking at the well laid out government Defining maturity model is not really
plan, if all goes as planned, it can be where a nation will find its value. The
predicted that the end of NDP 11 should knowledge gained from current position
lead to a Ubiquitous state (level 5) where and insight from the model gives the
enterprise architectures success has a model its worth. This information can be
trickle-down effect across the nation. At used to influence development roadmaps
that point, Enterprise Architecture for enterprise architecture practices.
REFERENCES
[1] J Ward and J Peppard, Strategic Planning [5] D Chen, G Doumeingts, and F Vernadat,
for Information System, 3rd ed. UK: John "Architectures for enterpriseintegration
Wiley, 2002. and interoperability: Past, present and
future.," Computers in Industry, vol. 59,
[2] P Johnson, R Lagerström, and M Ekstedt, pp. 647-647, 2008.
IT Management with Enterprise
Architecture.: ePub, [6] W Engelsmana, D Quartelc, H Jonkersa
www.ics.kth.se/MAP.pdf, 2012. and M van Sinderen, "Extending
enterprise architecture modelling with
[3] The Open Group (2015) A Historical business goals and requirements,"
Look at Enterprise Architecture with John Enterprise Information Systems, vol. 5,
Zachman. [Online]. HYPERLINK no. 1, pp. 9-36, 2011.
"https://blog.opengroup.org/2015/01/23/a-
historical-look-at-enterprise-architecture- [7] M Lankhorst, Enterprise Architecture at
with-john-zachman/ Work: Modelling, Communication, and
Analysis. New York: Springer-Verlag,
[4] R Sessions (2007, May) A Comparison of 2005.
the Top Four Enterprise-Architecture
Methodologies. [Online]. HYPERLINK [8] O Noran, A Meta-Methodology for
"https://msdn.microsoft.com/en- Collaborative Networked Organisations.
us/library/bb466232.aspx" School. Brisbane, Griffith University:
School of Computing and Information
Technology, 2005.
Pasquina Campanella
Department of Computer Science
University of Bari “Aldo Moro”
via Orabona, 4 – 70126 Bari – Italy
pasqua13.cp@libero.it
ABSTRACT
In the face of development continuous of internet, [10], [11], [14]. From early 1990, models for
the training resources in the network have grown evaluating learning management systems
informally. For this reason, the need for tools to have been developed [4], [7]. The
extract their knowledge is constantly widening. Commonwealth of Learning model, which
Starting with the evolution created by social web, examines various features such as usability,
this article explored potential of current
accessibility, collaborative functionality,
technology tools, with particular reference to the
features of e-learning platforms, and related to manualization, installation, technical support,
proprietary solutions regarding content delivery standard compliance, interoperability and
modes, user-based monitoring tests, as well as content reusability, tracking [15]. Below are
evaluation techniques in order to better manage the different sections on proprietary platforms
interactive online courses, which make the web analyzed in their respective studies and
user active in the production process. In this simulations as well as evaluation sessions and
direction, what is called lifelong learning is what finally conclusions and future developments.
follows.
2 PROPRIETARY PLATFORMS
KEYWORDS
Some of the major proprietary platforms are
monitoring, platforms, performance, collaborative
learning, features. listed in different ways, considering the
sharing, participation and collaboration of
1 INTRODUCTION web 2.0 and in particular blog, feedback, chat,
forum, podcasting and wiki (Tab.1) [2], [4],
The panoramic of FAD platforms has seen a [9], [13]:
continuous evolution over the years. The term
“platform” means the technological
infrastructure that allows e-learning activities
or online course management, integrating
teaching modules, evaluations within learning
groups [1], [10], [11], [12]. In order to
promote use of more advanced and interactive
platforms, a proprietary solution analysis is
proposed that would serve as a useful
contribution to the development of different Table 1 - Ownership proprietary platforms
forms of collaborative learning and require
new capabilities for integrated management of In particular, the study was aimed at
formative components of social networks. communication between learning objects,
The analysis was determined by the fact that tracking activities and the results obtained.
literature allows only partially to obtain an On-line questionnaires, forum interventions to
objective evaluation of platforms and how highlight the different polarities of expression
they support processes learning, considering [2], [13]. In delivering the courses, the
their peculiarities, needs and problems [1],
Centra
Web-based collaborative platform with
features such as web conferencing, virtual
classroom, web seminar, net meetings [2],
[4],[12]. The performance evaluation of 100
students (ages 18 to 30) in the community
reported the duration of courses and course
management with 54% average, quality of
lessons, exercises and tests averaging 56% Figure 3. Monitoring Elluminate Live
(Fig.2). Communication issues have improved
in videoconferencing. e/pop
E-learning tool for content sharing,
multiplatform windows, mac os [3], [4], [12].
The performance evaluation of a 100
contingent (aged between 20 and 30 years) in
the community reported the duration of
courses and quality of distributed material
with an average of 35%, quality of lessons,
exercises and test with an average percentage
of 45% (Fig.4). The monitoring was balanced lessons, exercises, and test with an average of
in the various tests conducted. 48% (Fig.6). Well balanced test monitoring.
Groove LearnLinc
E-learning tool for collaborative learning, E-learning tool for collaborative learning,
synchronous and asynchronous solutions [4], synchronous and asynchronous solutions [4],
[6], [14]. The performance assessment of a [8], [12]. The performance evaluation of a
100 students (18 to 30 year old) in the 100 degree students (between 20 and 25 years
community reported a 35% lifetime of old) in the community reported a 60%
courses, 45% of distributed media, quality of lifetime of courses, 40% of distributed
lessons, exercises, and test with an average of material quality, quality of lessons, exercises,
60% (Fig.5). Monitoring balanced in the and test with an average of 50% (Fig.7).
various test. Balanced monitoring in the various test
conducted.
Standalone e-learning platform for The platforms that reported the best results for
synchronous and asynchronous learning [3], the duration of courses are Saba Learning
[5], [12]. The performance evaluation of a Enterprice and LearnLinc; for distributed
100 contingent (between 20 and 30 years old) material quality are WebCT, Saba Learning
students in the community averaged 40% of Enterprice, Netlearning and Hotconference;
course time, material quality distributed by an for quality theoretical lectures transmitted and
average of 50%, quality of lessons, exercises quality exercises performed are Groove and
and test with an average of 43% (Fig.12). Lotus Learning Space.
Quite balanced monitoring in the various test
conducted. 3 CONCLUSIONS and FUTURE
DEVELOPMENTS
REFERENCES
Figure 13. Monitoring WebConference
[1] M. Banzato, D. Corcione, Piattaforme per la [11] D. Colombo, Formazione a distanza, ambienti e
didattica in rete, TD-tecnologie didattiche n. 33, piattaforme telematiche a confronto, 2001.
2004, pp. 22-31, edizioni menabò, Ortona.
[12] D. F. Garcia, C. Uria, J. C. Granda, F. J. Suarez ,
[2] P. Campanella, Piattaforme per l’uso integrato F. Gonzalez, A functional evaluation of the
di risorse formative nei processi di e-learning, commercial platforms and tools forrd synchronous
Atti Didamatica 2015 – Studio ergo lavoro – distance e-learning, Proc. of the 3 wseas/iasme
dalla società della conoscenza alla società delle international conference on educational
competenze, Aica, 15-16-17/04/2015, Genova, technologies, Arcachon, France, 2007, pp. 330-
Italy. 335.
Pasquina Campanella
Department of Computer Science
University of Bari “Aldo Moro”
via Orabona,4 – 70126 Bari – Italy
pasqua13.cp@libero.it
ABSTRACT
Digital media technologies and broadband well as reported experimental results and
communications networks are undergoing conclusions and future developments.
profound transformations and new trends in
content development for training are emerging. In
2 CASE STUDY
this scenario with the rapid introduction of mobile
devices among which the smartphone prevails, it
was possible to meet the needs of users with a Within the framework of mobile learning, a
paradigm that involves the “learning in mobility” study was launched that consider analysis as
process. This article provides a case study for well as modular integration with Oracle i-
mobile oracle i-learning and claroline platforms Learning and Claroline platform plug-in on
considering its content delivery, user-based four different mobile operating systems such
monitoring test as well as evaluation techniques to as android, iPhone OS, symbian, windows
better manage interactive online courses that make mobile, in order to promote communication
the user active web participant in the production by means of services [2], [16], [17].
process. Accessing the mobile learning area through a
specially created application that bring
KEYWORDS
important features such as viewing content
platforms, questionnaires, learning, monitoring, and other content, proven on emulators and
interoperability. real life devices. The prototype examined is
the mobile oracle i-learning and claroline
1 INTRODUCTION platform of which the screenshots are shown
(Fig.1):
In the last few years, there has been a large
scale, large scale distribution of mobile
devices such as cell phones, handhelds,
pocketPC, ebook, tablet pc, smartphones, tv-
phonics, ipod, ipad and other portable
devices, personal communication devices are
becoming devices suitable for displaying
multimedia contents. A new communication
tool, a new frontier for e-learning: mobile
learning [1], [2], [3], [4], [5], [7], [8], [9],
[10], [12], [13], [14], [18], [19], [21], [23],
[26], [27], [28]. Users pass from simple users
to content creators, designed, modified, or
simply shared [10], [11], [20], [22]. Today
there is a strong demand for immediate use,
easily assimilable and immediate oriented
applications [16], [17]. In this context, a Figure 1. Oracle iLearning - Claroline mobile
follow-up study case has been launched, as platforms screening
type of device and produce flexible materials applications, case studies, vol. 4, 2007, pp. 1169-
1176, IOS press.
taken from different situations [4], [15], [17].
The critical issues are the small size of the [3] S. Al-khamayseh, A. Zmijewska, E. Lawrence,
screen, which does not allow for a large G. Culjak, “Mobile Learning Systems for Digital
Natives”, Proc. of 6th Iasted International
amount of content but only essential concepts, Conference on web based education, 2007, pp.
the difficulties of interoperability between the 252-257, Chamonix, France.
various devices and connectivity has been
[4] J. Attewell, “Mobile Technologies and Learning:
somewhat fragmented. a Technology Update and M-learning project
summary”, learning and skills development
4 CONCLUSIONS and FUTURE agency, United Kingdom, 2005.
DEVELOPMENTS
[5] P. Campanella, “Mobile Learning: New forms of
education”, Proc. of 10th International
Concluding the rapid diffusion of large scale Conference on Emerging e-learning technologies
and applications, ICETA 2012, IEEE, 08-
mobile devices such as cell phones, 09/11/2012, pp. 51-56, Stará Lesná, the high
handhelds, pocketPC, ebook, tablet pc, tatras, Slovakia.
smartphones, tv-phonics, ipod, ipad, and other
portable devices bring new trends in [6] P. Campanella, “Mobile learning application for
Android using web service”, in p. Resta (ed.),
developing content for training or so-called Proc. of Society for information technology &
“learning in mobility”. A new frontier for e- teacher education international conference, SITE
2012, AACE publish, 05/03/2012, pp. 1677-
learning: mobile learning. In this context, a 1682, Austin, Texas, USA.
study was conducted between students on
mobile oracle and claroline platforms tested [7] Y. Y. Chan, S. C. Chan, C. H. Leung, A. K. W.
Wu, “Mobilp: a mobile learning platform for
on four different mobile operating systems enhancing lifewide learning”, Proc. of the 3rd
such as android, iPhone OS, symbian, IEEE international conference on advanced
learning technologies, Athens, Greece, pp.457-
windows mobile considering its content 457, 09-11/07/2003.
delivery, user-based monitoring tests and
evaluation in order to better manage [8] S. J. Geddes, “Mobile learning in the 21st
century: benefit for learners”, knowledge tree E-
interactive online courses, which make the journal, vol. 30, n.3, 2004, pp. 214-228.
web user active in the production process.
The results obtained were positive in terms of [9] G. Guazzaroni, “Fare esperienze di
apprendimento con tecnologie di mobile
satisfaction, acquisition of knowledge and learning”, tratto da giornata di studio sul mobile
performance variations by those who learning organizzato dal collaborative knowledge
building group (CKBG), Genova, 2010.
participated in it. Ultimately, it is crucial that
the learner has access to a flexible learning [10] J. Herrington, A. Herrington, J. Mantei, I. Olney,
strategy and that all teaching resources are B. Ferry, “New technologies, new pedagogies:
mobile learning in higher education”, faculty of
available at any time and in different types of education, University of Wollongong, Australia,
support. The minor criticisms are related to 2009.
interoperability between the different devices
and further studies are being carried out. [11] J. Herrington, J. Mantei, A. Herrington, I. W.
Olney, B. Ferry, “New technologies, new
Ultimately, we can say that m-learning is pedagogies: mobile technologies and new ways
aimed at bridging the emerging needs of of teaching and learning”, ascilite of teaching
and learning, in atkinson, r & mcbeath, c (eds),
digital natives and training outcomes. proc. Asclite, 2008, pp. 419-427, Melbourne,
Australia.
REFERENCES [12] D. Keegan, “Mobile learning - theth next
generation of learning”, Proc. of the 18 asian
[1] B. Alexander, “Going Nomadic: Mobile association of open universities annual
Learning in Higher Education”, Educause conference, Shanghai, China, 28-30/11/2004.
review, vol. 39, n. 5, 2004, pp. 28-35.
[13] J. Kossen, “Mobile e-learning: when e-learning
[2] M. Alier, J. Casany, P. Casado, “Mobile becomes m-learning”, Palmpower magazine,
extension of a Web based Moodle Virtual 2005.
Classroom”, in P. Cunningham, M. Cunningham
(ed.), expanding the knowledge economy: issue,
[14] A. Kukulska-hulme, J. Traxler, “Mobile [27] K. Yordanova, “Mobile Learning and integration
learning: a handbook for educators and trainers”, of advanced technologies in education”, Proc. of
vol. 8, n. 2, Rroutledge, London, 2005. the international conference on computer
systems and technologies, compsystech’07, pp.
1-5, Acm, 14/06/2007.
[15] S. Impedovo, IAPR Fellow, IEEE S. M., P.
Campanella, “Mobile recommended system on
Android platform”, Proc. of the 18th [28] B. Zuga, I. Slaidins, A. Kapenieks, A. K.
International conference on distributed Strazds, “M-learning and mobile knowledge
multimedia systems, DMS 2012, dblp, pp. 33-38, management: similarities and differences”,
09-10-11/08/2012, eden roc renaissance Miami International journal of computing &
beach, Florida,USA. information sciences, vol. 4, n. 2, 2006, pp. 58-
62.
[16] S. Impedovo, IAPR Fellow, S. M. IEEE, P.
Campanella, “Mobile Computing: sviluppo
applicazione Voip su Symbian OS”, Atti
Didamatica 2012 - Informatica per la didattica,
Aica, Taranto 14-15-16/05/2012.
The static multi-level images use several Discrete Wavelet Transform based on Haar
metrics of the objective quality evaluation. In wavelet (HT) is the simplest useful energy
the field of steganography, those usually compression process which can effectively
measure the distortion between stego and serve very useful and fast object
cover objects. One of the most common is the decomposition. The Haar transform, like all
PSNR, based on the mean squared error wavelet transforms, decomposes a discrete
(MSE). Given a noise-free M×N monochrome object into sublevels {a1|d1} of half its length,
image object I and its noisy approximation K, where a1 = [a1, a2 …aN/2] represents the
then MSE is defined as [8]: approximation (average) coefficients and d1 =
[d1, d2 …dN/2] the detailed (difference)
2
I (i, j ) K (i, j )
M N
MSE
1 coefficients. The first value of approximation
(1)
M .N i 1 j 1 coefficient a1 is computed by taking the
PSNR 20 log10 MAX I 10 log10 MSEdB
average of the first pair of values of and then
(2)
multiplying it by the square root of 2.
255 2 Application of 2D-HT to an object will
PSNRCSF 10 log10 dB
(3) retrieve transformation coefficients that are
I I g
N1 N2
1 W 2
N1 .N 2 i 1 j 1
CSF defined as a decomposition of an input object,
also known as approximation component LL
and detailed components LH, HL and HH [9]:
tried to design the algorithm in the way of position. Contrariwise, the "decrease" label in
minimal modification of transformation the table 1. represents a modification of the
coefficients as the imperceptibility of secret coefficient value by increasing it according to
message would then be better. We build our the value that has been modified by the
work on the fascinating feature of 2D-HT embedding of secret message in LL area in
coefficients; therefore, there is a high the same position.
correlation between the transformation
coefficients on the same position in different 3.1 Process of embedding
decomposition areas. It means, if coefficient
in LL on a position (i, j) is an even-numbered We designed the proposed method to be
integer then the coefficients in LH, HL, and usable for all kind of secret digital messages,
HH on the same location (i, j) are also even- i.e., it can be any numeric binary data file.
numbered integers. Our next finding during Firstly, the elected cover image is
the investigations was that if we change one decomposed by 2D-HT. Based on the
transform coefficient value in one previous chapter; this transform provides an
decomposition area, then the inverse 2D-HT approximation HH and three detail coefficient
(2D-IHT) changes this value back to the areas (horizontal HL, vertical LH, and
original one. Otherwise, when we do the same diagonal LL) on each level of decomposition.
change also in other three areas of It was necessary to accomplish the property of
decomposition at the same position, then this private communication as well as the secret
change will be preserved. Another and message imperceptibility in the cover image.
curious problem is a due behavior of 2D-IHT. Therefore, as it was explained, the detail
Sometimes, the reconstructed image pixel coefficients are the most convenient area for
values may not be integer numbers. After this, secret message embedding. Another, not a
the image would not be possible to negligible requirement is the system capacity.
reconstruct and illustrate. From these The capacity or payload can be defined as
findings, we concluded, that there exist only some embedded bits per pixel (bit/pixel). The
four suitable changes of transform secret message embedding process can be
coefficients. Each modification affects a achieved in three embedding approaches,
different set of image pixel values after 2D- where every method provides a specific level
IHT is performed. Table 1. shows all possible of robustness. Differences between
coefficient changes. approaches are mainly in the fact that the
pixel 2D-HT decomposition areas same data bits are replicated into other
position HH HL LH coefficients of detailed areas. In individual
f(2m-1,2n-1) increase (+) increase (+) increase (+) strategies, we try to embed secret data bit
f(2m-1,2n) increase (+) decrease (–) decrease (–)
from one area of detail coefficients until all
three areas HL, LH, LL respectively.
f(2m,2n-1) decrease (–) increase (+) decrease (–)
Introduction of higher method approaches
f(2m,2n) decrease (–) decrease (–) increase (+) introduces more details coefficients
Table 1. Transformation coefficients suitable changes. modifications without contribution to the
system payload. However, higher approach
guarantees the higher secret message
The first column determines a set of specific
robustness.
pixel values positions. Variables m and n are
We defined a set of applicable transform
pixel indexes, where m = 1, 2, 3… M and n =
coefficients C' suitable for embedding
1, 2, 3... N and image size are intended as
process, which was elected from detail areas
M×N. The "increase" label on the table 1.
HL, LH, LL (5). It was also needed to prepare
represents a modification of the value by
and define a set of hidden message bits S
increasing it according to the value that has
before embedding itself (6).
been modified by the embedding of secret
message in LL area on the same punctual C cij | 1 i M c ,1 j N c (11)
S si | 1 i K , si 0,1 (12) The most unusual behavior in here is the case
when no changes are needed to achieve the
The preparation consists of two parts. First secret message bit be embedded. In other
part is dealing with the determination of the words, the secret message bit is embedded
length of Code-word, which determines the without any modification of transform
size of secret data stream segmentation. coefficient.
Analogically, the length of Code word {bi | i = Nevertheless, independently of minimal
1, 2... N} directly determines the number of modifications, we encountered an unpleasant
pseudorandom sequences (PNSK). This problem on the receiver side. The issue we
PNSK are used in the process of spreading, found was related to stego object distortion,
ergo the SSIS approach, shown in Fig. 1. caused by the implementation of multiple
transformations. Inverse operations mislead
decimal numbers into spatial domain and
rounding to integers introduces this issue.
This distortion is responsible for incorrect
extraction of secret message. It was the reason
for PNS implementation. Autocorrelation
attributes of PNS significantly improve
resultful extraction. In general, a more
extended PNS are accounted for uplifting
autocorrelation characteristics, hence better-
Figure 1. Secrete message spreading before
embedding itself. gained results. Moreover, an additional
approach termed Extraction with Error
Correction (EEC) was applied during the
Secondly, the secret key can be used in the
extraction to improve error correction. The all
process of PNSK alignment, with which the
possible states calculations (2n) on the output
secret message is spread. This key can denote
of defined code-word could be held as
positions of bits during the extraction process.
reference data cell. However, incrementing of
However, one significant problem with this
code-word length requires higher computer
approach is that this secret key needs to be
power, i.e., more time-consuming.
transported and present on the receiver side. It
represents the weakness of this approach, and
4 Results and Ascertainments
it can lead to uncovering of the covert
communication.
The total system capacity of the proposed
Subsequently, when the secret message is
method is a quarter of used cover image size
correctly spread, they are embedded in the in the 1-level decomposition of 2D-HT. The
transformation domain of 2D-HT coefficient maximum useful payload is 0.25 bit/pixel in
regarding attributes of steganography. In this case of using the total capacity, and it also
embedding process, we firstly perform varies depending on some used detail
modulo calculations, and comparisons of the coefficient during the process of embedding
secret message summarized spread bits Si and from the applied code-word length. The
with the detail coefficients in the form of (13– total number of coefficient elected for
16). embedding determines the capacity in binary
if Si= 0:
form, and it does not vary with growing
C , if mod(| C |,1) 0 (13)
C
ij ij
number of PNS. Unfortunately, the system
E
ij
sgn C
ij
| C |
ij
1 2 if mod(| C |,1) 0 (14)
ij
capacity is decreasing with increasing length
if Si= 1: of PNS. Considerable differences can be
Cij , if mod(| Cij |,1) 0 (15) observed in the amount of added or removed
CijE
sgn Cij | Cij | 1 2 if mod(| Cij |,1) 0 (16) energy to or from the cover object.
Subsequently, if more PNS are used in the
Table 2. Extraction rate results and PSNR values for In this paper, we prepare steganography
different cover objects. method that uses properties of the
combination of the transformation domain of
In case of LL and n=1 (if one decomposition 2D-HT, 2D-DCT and Direct spreading
area is utilized and no code word applied), the technique of CDMA. The objectives and
satisfactory reconstruction rate is around embedding algorithm solved the problem of
49%. This rate is unacceptable and not secret message reconstruction issue formed in
applicable for our purposes. However, the research [4], where the transformation domain
stego image visual quality values (PSNR- of DCT creates a wrong place for secret
SSIM) are incredibly satisfying. In case of message embedding. As it has been shown,
used LL, HL and HH coefficients for the capacity depends on length, but no longer
embedding, the reconstruction rate has on number of PNS. Moreover, the error
increased to around 78% what is much better, correction algorithm increases the fruitfulness
but still not enough for our purposes. It was of secret message extraction. Another
the reason why we implemented code words. critically considered benchmark according to
In Tab. 2 are all combinations of reached steganography is imperceptibility of a human
results. It is evident if the length of the code observer to degradation of the cover object.
word is increasing then the extraction rate is These measured values are decreasing with
getting better, but at the expense of the employment of more decomposition areas of
amount of payload to be embedded (amount 2D-HT. This handicap is solved by increasing
of secrete message bits). The Fig. 2 depicts the length of PNS because of cross-
the difference values of PSNR between correlation improvement. The autocorrelation
applied decomposition areas and achieved allows proper identification of PNS, thus the
capacity utilization. secret message bit.
Acknowledgment
REFERENCES
[1] Gowda, S. N. (2016, July). Dual layered secure
algorithm for image steganography. In Applied
and Theoretical Computing and Communication
Technology (iCATccT), 2016 2nd International
Conference on (pp. 22-24). IEEE.
[9] Bugar, G., Banoci, V., Broda, M., Levicky, D., &
Miko, E. (2013, September). Blind steganography
based on 2D Haar transform. In ELMAR, 2013
55th International Symposium (pp. 31-35). IEEE.
1
In recent years, various initiatives have started to entrepreneurship programs (e.g. REAP -
elevate the potential of innovation within regional http://reap.mit.edu/).
Environment
Network
Entity
Science Entrepreneurship
Capacity
Capacity
Processes
Processes
Performance
Performance
Capacity Capacity
Processes Processes
Performance Performance
Table 1. Analytical grid for the categorisation of individual indicators and aggregated indicators. For illustrative
purpose, some example indicators are provided.
Dimensions of innovation
Levels of Science Enabling Innovation Innovation Entrepreneurship
innovation
Within entities Capacity - Revealed Scientific - Budget for applied - Revealed - Potential for spin-
Advantage (RSA) research Technological off
- Scientific personal - Science- Advantage - Entrepreneurship
innovation (RTA) programs
connectors - Engineers
Processes - Science focussed - Market research / - Innovation - Transfer of IPR
incentives identification focussed - Resource allocation
- Internal research exploitation incentives / management
projects opportunities - Identification
licensees
Output - Publications (basic - Publications - Patents - Spin-offs
research) (applied research) - Technologies
- Market assessment
Between Capacity - Basic research - Applied research - Innovation - Shared
entities collaborations collaborations networks infrastructures
- Organisational
embedment of
institutes
Processes - Interorganisational - Applied research - Contract - Identification
coordination partner projects research potential clients
- Basic research partner
projects
Output - Co-publications (basic - Co-publications - Product - Market niche
research) (applied research) development
Environment - Systemic / national - Systemic / national - Funding and - Institutional support
need for science need for transfer for market niches
innovation instruments - Price of innovation
In order for models to explain more complex – both in terms of intelligence and interface
patterns of innovation but still be coherent and possibilities – innovation phenomena could be
flexible, models could be designed in a translated into either central or peripheral
modular fashion [41]. Additional to modular modules.
models that are being developed in technology
development to overcome measurement
challenges and understand complex CONCLUSION
interactions – for example biotechnology
[42][43] – some attempts have also been made To understand the conditions that enable
in the context of innovation dynamic [44].2 innovation within public research
organisations, a preliminary methodological
The construction of modules should depend on concept was proposed that can be applied
several aspects. Based on the availability of within the construction of innovation models.
data and analysis of indicators and scores, This concept – rooted in theories of
modules can be composed based on principle organisational structure, innovation systems
components that relate to the applied analytical and evolutionary economics – can be
framework and align with the chosen scope of operationalised to include both antecedents
analysis. Second, the creation of modules is and impacts of innovation, as well as to
also likely to depend on end-user integrate processes that occur within and
requirements. Based on articulated needs between involved agents. For the further
regarding what output a model should produce development and operationalisation of this
2
The operational implications of multiple perspectives https://www.imagwiki.nibib.nih.gov/sites/default/files/
have also previously been discussed in research; for Ropella,%20Glen05Aug14cah.pdf
example, see
concept, some potential methodological theory of a firm,” J. Manag. Stud., vol. 44, no.
features were suggested to improve the 7, pp. 1213–1241, 2007.
[14] R. G. M. Kemp, M. Folkeringa, J. P. J. de Jong,
simulation of agent-based models. and E. F. M. Wubben, Innovation and firm
performance. 2003.
[15] J. Birkinshaw, G. Hamel, and M. J. Mol,
“Management Innovation,” Acad. Manag. Rev.,
REFERENCES vol. 33, no. 4, pp. 825–845, Oct. 2008.
[16] A. H. Gold, A. Malhotra, and A. H. Segars,
[1] W. M. Cohen, R. R. Nelson, and J. P. Walsh, “Knowledge Management: An Organizational
“Links and Impacts : The Influence of Public Capabilities Perspective,” J. Manag. Inf. Syst.,
Research on Industrial R & D,” Manage. Sci., vol. 18, no. 1, pp. 185–214, 2001.
vol. 48, no. 1, pp. 1–23, 2002. [17] W. M. Cohen and D. A. Levinthal, “Absorptive
[2] E. von Hippel, The sources of innovation, vol. Capacity: A New Perspective on Learning and
53, no. 9. 2013. Innovation.,” Adm. Sci. Q., vol. 35, no. 1, pp.
[3] M. Gibbons and R. Johnston, “The roles of 128–152, 1990.
science in technological innovation,” Res. [18] B. Levitt and J. G. March, “Organizational
Policy, vol. 3, no. 3, pp. 220–242, 1974. Learning,” Annu. Rev. Sociol., vol. 14, no. 1, pp.
[4] J. A. Johannessen, B. Olsen, and J. Olaisen, 319–338, 1988.
“Aspects of innovation theory based on [19] T. J. Allen, “Managing the Flow of
knowledge-management,” Int. J. Inf. Manage., Technology,” MIT Press Cambridge MA, p.
vol. 19, no. 2, pp. 121–139, 1999. 320, 1977.
[5] C. E. Helfat et al., Dynamic capabilities: [20] J. E. Ettlie and E. M. Reza, “Organizational
Understanding strategic change in Integration and Process Innovation,” Acad.
organizations. Oxford, Blackwell Publishing, Manag., vol. 35, no. 4, pp. 795–827, 1992.
2007. [21] C. Truss, A. Shantz, E. Soane, K. Alfes, and R.
[6] J. J. P. Jansen, F. a J. van den Bosch, and H. W. Delbridge, “Employee engagement,
Volberda, “Managing Potential and Realised organisational performance and individual well-
Absorptive Capacity: How do Organisational being: exploring the evidence, developing the
Antecedents Matter?,” Acad. Manag., vol. 48, theory.,” Int. J. Hum. Resour. Manag., vol. 24,
no. 6, p. 16, 2005. no. 14, pp. 2657–2669, 2013.
[7] J. Woodhill, “Capacities for institutional [22] V. H. Hailey, E. Farndale, and C. Truss, “The
innovation: A complexity perspective,” IDS HR department’s role in organisational
Bull., vol. 41, no. 3, pp. 47–59, 2010. performance,” Hum. Resour. Manag. J., vol. 15,
[8] F. E. García-Muiña and E. Pelechano-Barahona, no. 3, pp. 49–66, 2005.
“The complexity of technological capital and [23] J.-M. Hiltrop, “The impact of human resource
legal protection mechanisms,” J. Intellect. Cap., management on organisational performance:
vol. 9, no. 1, pp. 86–104, 2008. Theory and research,” Eur. Manag. J., vol. 14,
[9] H. M. Grimm, “The diffusion of Bayh-Dole to no. 6, pp. 628–637, 1996.
Germany: Did New public policy facilitate [24] R. E. Miles, C. C. Snow, A. D. Meyer, and H. J.
university patenting and commercialisation?,” Coleman, “Organizational strategy, structure,
Int. J. Entrep. Small Bus., vol. 12, no. 4, pp. and process,” Acad. Manag. Rev., vol. 3, no. 3,
459–478, 2011. pp. 546–562, 1978.
[10] H. Chesbrough and R. S. Rosenbloom, “The [25] B.-Å. Lundvall, National Systems of
role of the business model in capturing value Innovation: Towards a Theory of Innovation
from innovation: evidence from Xerox and Interactive learning. London, 1992.
Corporation’s technology spin-off companies,” [26] M. E. Porter, “The Competitive Advantage of
Ind. Corp. Chang., vol. 11, no. 3, pp. 529–555, Nations. (cover story),” Harv. Bus. Rev., vol. 68,
2002. no. 2, pp. 73–93, 1990.
[11] K. Pavitt, M. Robson, and J. Townsend, [27] W. Vandekerckhove and N. a. Dentchev, “A
“Technological Accumulation, Diversification Network Perspective on Stakeholder
and Organisation in UK Companies, 1945- Management: Facilitating Entrepreneurs in the
1983,” Manage. Sci., vol. 35, no. 1, pp. 81–99, Discovery of Opportunities,” J. Bus. Ethics, vol.
1989. 60, no. 3, pp. 221–232, Sep. 2005.
[12] A. Walter, M. Auer, and T. Ritter, “The impact [28] H. Choi, S.-H. Kim, and J. Lee, “Role of
of network capabilities and entrepreneurial network structure and network effects in
orientation on university spin-off performance,” diffusion of innovations,” Ind. Mark. Manag.,
J. Bus. Ventur., vol. 21, no. 4, pp. 541–567, vol. 39, no. 1, pp. 170–177, 2010.
2006. [29] F. W. Geels, “Ontologies, socio-technical
[13] M. G. Jacobides and S. G. Winter, transitions (to sustainability), and the multi-
“Entrepreneurship and firm boundaries: The
level perspective,” Res. Policy, vol. 39, no. 4, application to maturation of the immune
pp. 495–510, 2010. response,” J. Theor. Biol., vol. 141, no. 2, pp.
[30] M. Steiner, Clusters and regional 211–245, 1989.
specialisation : on geography, technology and [39] A. L. Barabási, H. Jeong, Z. Néda, E. Ravasz,
networks, no. 8. 1998. A. Schubert, and T. Vicsek, “Evolution of the
[31] N. Gilbert, A. Pyka, and P. Ahrweiler, social network of scientific collaborations,”
“Innovation Networks - A Simulation Phys. A Stat. Mech. its Appl., vol. 311, no. 3–4,
Approach,” J. Artif. Soc. Soc. Simul., vol. 4, no. pp. 590–614, 2002.
3, pp. 1–14, 2001. [40] Helmholtz Gemeinschaft, “BMBF-
[32] P. Ahrweiler, A. Pyka, and N. Gilbert, “A New Förderprojekt Enabling Innovation –Erprobung
Model for University-Industry Links in des Management-Tools,” 2014.
Knowledge-Based Economies,” Soc. Sci., pp. [41] D. Scerri, S. Hickmott, A. Drogoul, and L.
218–235, 2011. Padgham, “An Architecture for Modular
[33] P. Ahrweiler, M. Schilperoord, A. Pyka, and N. Distributed Simulation with Agent-Based
Gilbert, “Modelling research policy: Ex-ante Models,” Proc. 9th Int. Conf. Auton. Agents
evaluation of complex policy instruments,” Multiagent Syst. (AAMAS 2010), pp. 541–548,
Jasss, vol. 18, no. 4, 2015. 2010.
[34] R. R. Nelson and S. G. Winter, An evolutionary [42] B. K. Petersen, G. E. P. Ropella, and C. A. Hunt,
theory of economic change. 1982. “Toward modular biological models: defining
[35] F. Malerba, R. Nelson, L. Orsenigo, and S. analog modules based on referent physiological
Winter, “History-friendly models: An overview mechanisms.,” BMC Syst. Biol., vol. 8, p. 95,
of the case of the computer industry,” JASSS, 2014.
vol. 4, no. 3, 2001. [43] G. Sunwoo Park, E. P. Ropella, and C. A. Hu,
[36] V. N. Kolokoltsov, Nonlinear Markov “PISL: A Large-Scale In Silico Experimental
processes and kinetic equations. Cambridge Framework for Agent-Directed Physiological
Tracts in Mathematics, 2010. Models,” 2005.
[37] R. Y. Rubinstein and D. P. Kroese, “Simulation [44] S. H. Chen and B. T. Chie, “A functional
and the Monte Carlo Method,” Wiley, p. 377, modularity approach to agent-based modeling
2008. of the evolution of technology,” in Lecture
[38] S. A. Kauffman and E. D. Weinberger, “The NK Notes in Economics and Mathematical Systems,
model of rugged fitness landscapes and its 2006, vol. 567, pp. 165–178.
posts) of text data and treating stock prices Facebook posts from company pages as the
and related text documents as data streams text data, because it has been a very rarely
divided into time windows as we suppose that used data source for this area of research, we
the reasons of stock price changes evolve in have lots of available data, and it might bring
time. new interesting insights.
To model the behavior of a stock price with a In our research, the values of the S&P 500
relation to the content of text data we can use Index were used to represent stock prices. The
classification in a way that we examine the index values reflect stock prices of the
direction of the change of the stock price to selected blue chip (large and famous)
create classes. This approach was used for companies on the US stock market. The
example by [6]. The problem can be seen as historical values of the index were
text classification – given a text, decide its downloaded from the website investing.com.
class (direction of the price movement). For each trading day, we have a closing (end-
However, we must overcome two problems. of-day) numeric value of the S&P 500 Index
The first problem is the definition of classes. available.
[8] used a threshold value of 1% price change
for the class determination. The second 3.2 Text data
problem lies in choosing correct features.
Many studies used just single words and this As the text data, posts from Facebook pages
simple unigram bag-of-words model provided of the companies from the S&P 500 Index
good results in [8]. were used. In total, we examined 431
company pages. The company’s Facebook
There exist a wide range of supervised page contains a sequence of documents
learning algorithm that can be uses for the arranged according to their publication time.
text classification. An interesting approach is These short postings are created by the
described in [9] – it focuses on sentence-level company representatives. Figure 1 shows an
sentiment analysis of movie reviews. They example of a post on the Intel’s page. A post
used the cosine normalization, Term may be commented by Facebook users.
Presence, and Smoothed delta IDF as However, the comments were not used in the
weighting schemes and the Recursive Neutral analysis.
Tensor Network algorithm to achieve an
accuracy of 87.60%. [10] used Naïve Bayes In total, 138,713 Facebook posts published
and SVM as algorithms and unigrams, between 1. 1. 2015 and 15. 10. 2016 were
bigrams, unigrams with bigrams, and used.
unigrams with POS (Parts-of-speech) as
features. The bigrams showed a lower
accuracy then unigrams – the reason is that
the resulting vectors were very sparse. All in
all, the type of features used in the bag-of-
words model has a little (maximal 2–3%)
impact on the accuracy.
Classification
The converted data was split into the training
(60%) and testing (40%) set. Each bag-of-
Figure 2. Classification classes identified in the time words representation was processed by 10
series of the stock index values classifiers (with default settings – no
parameter optimization was made) in scikit-
We decided to perform two types of learn. The classifier’s performance was
experiments with different data sets used for evaluated by the achieved accuracy
the classification. In the first experiment, (proportion of the correctly classified
documents from all 24 windows were put into instances on all examined instances [10, p.
one data set. The results for this experiment 268]) on the test set.
are presented in section “Batch approach”.
4 RESULTS AND DISCUSSION
In the second experiment, we divided the
documents into 12 data sets. Each data set One set of the text data (Facebook posts)
consisted of the documents from two together with the S&P 500 Index values was
neighboring windows: one with an upward used to prepare the data for classification. The
movement and one with a downward class-labelled data set was processed using
movement. The windows represented two the three weighting schemes (TP, TF, TF-
classes for the classification. The results for
Table 6. Facebook posts – neighboring windows: Table 8. The comparison of average accuracies
classification results. achieved by different classifiers applied to the
Data set neighboring windows of the Facebook posts.
Accuracy Precision Recall F1 score
no.
Data set no. Classifier Avg. accuracy
1 0.584 0.602 0.584 0.593
2 0.615 0.598 0.615 0.606 1 NearestCentroid 0.587
3 0.808 0.653 0.808 0.722 2 LogisticRegressionCV 0.578
4 0.639 0.733 0.639 0.683 3 LogisticRegression 0.748
5 0.803 0.807 0.803 0.805 4 LogisticRegressionCV 0.633
6 0.646 0.653 0.646 0.650 5 LogisticRegression 0.784
7 0.554 0.553 0.554 0.553 6 MultinomialNB 0.630
8 0.674 0.669 0.674 0.672 7 ExtraTreesClassifier 0.553
9 0.721 0.727 0.721 0.724 8 SGDClassifier 0.656
10 0.618 0.614 0.618 0.616 9 MultinomialNB 0.701
11 0.800 0.782 0.800 0.791 10 LogisticRegressionCV 0.595
12 0.698 0.702 0.698 0.700 11 ExtraTreesClassifier 0.788
Average 0.680 0.674 0.680 0.676 12 SGDClassifier 0.673
According to Table 6, the average accuracy Table 8 shows the classifier that achieved the
(as well as the F1 score) was 68%. The best highest accuracy for each data set. We can see
accuracy (as well as F1 score) was achieved that most of the times the Logistic Regression
for data sets 3 (72%), 5 (80%), and 11 (79%). (5 times) achieved the best result. Among the
The reason for this might be that they have a other classifiers, the Multinomial Naïve
balance ratio around 4 (with more documents Bayes classifier, Extra Trees Classifier, and
marked with index value going up). Stochastic Gradient Descent (SGD) Classifier
were the most successful twice and the
Table 7. The comparison of average accuracies Nearest Centroid was the best only once.
achieved with different weighting schemes applied
to the neighboring windows of the Facebook posts. 5 CONCLUSION
Data set no. TP TF TF-IDF
The goal of the work was to examine whether
1 0.539 0.534 0.534
the content of text documents published on
2 0.555 0.556 0.570
the Internet (specifically Facebook posts) has
3 0.744 0.753 0.769
any connection with stock price movements.
4 0.626 0.625 0.639
We used the values of the S&P 500 Index and
5 0.735 0.736 0.765
divided them into 24 time windows with
6 0.610 0.609 0.620
either growing or decreasing index value
7 0.534 0.534 0.536
trend. Subsequently, we examined (using the
8 0.633 0.629 0.646 classification accuracy) the connection
9 0.663 0.661 0.675 between the documents’ content and the trend
10 0.568 0.562 0.578 of the index value in the time window in
11 0.754 0.747 0.766 which was the document published.
12 0.637 0.644 0.658
Average 0.633 0.633 0.646 Two types of experiments were performed. In
the first one, the documents from all 24
From Table 7 can be seen that the highest windows were put into one data set and we
average accuracy provided the TF-IDF achieved an accuracy of 62%. The second
weighting scheme (+1% in comparison to TP experiment, in which we divided the
and TF). documents into 12 data sets formed from two
neighboring windows, provided better results
– the average accuracy was 68%. Moreover,
provide data to the warehouse since the transforming and documenting them under a
objective of data warehouse projects is form which will be analyzable and
fundamentally relied on the support of the communicable. This process is generally
process of decision making of the enterprise known as Requirements Engineering (RE)
in order to facilitate the analysis processes [11], which is a necessary and vital phase in
[2], [5], [8], and [11]. the software development life cycle (SDLC)
[16].
It is well documented that a prominent reason
why many DW projects have failed in the According to [16] and [17], RE is the process
past is not only because they attempted to which is intended to collect, document,
supply strategic information from operational analyze and manage requirements for
systems while those operational systems systems and software product throughout the
were not intended to provide strategic SDLC. According to [5], RE in the data
information [5], but also because the warehouse arena has acquired increased
requirements analysis phase were often importance and it has the goal of identifying
overlooked [1], [2], and [3] during the design the information demands of the decision
process and because of these reasons, [2] and makers. However, researchers are actually
[14] have declared that over 80% of DW attempting to utilize numerous techniques of
projects miss to meet with the users and requirements engineering to analyze the
stakeholders’ requirements. The analysis specification of data warehouse systems in
phase of requirements can be executed order to avoid the risk of failure. Several
informally based on simple requirements techniques and methods are used in the
glossaries instead of formal diagrams but requirements engineering activities and in
such an informal (or maybe semi-formal) this paper we are more interested on formal
approach may be inappropriate for a methods [16], [18], and [19] for analyzing
requirements-driven framework that requires system behavior, factors of risk and problems
more organized and comprehensible related to its implementation [20] during the
techniques [3]. design of the system.
Data warehouse projects are similar in The use of Formal Methods (FMs) in the
several phases to any software development construction of reliable software has been
project and claims a definition of different controversial for a number of decades.
activities which ought to be executed related Advocates of such techniques point to the
to demands collection, design and advantages to be gained in constructing
implementation within an operational provably correct systems, especially in the
platform, amongst other activities [1], [14]. arena of mission/safety-critical systems, e.g.
Despite the similarity to general software nuclear power plants and aviation systems.
development, the effective development of a Critics of FMs object to the steep learning
DW relies upon the quality of its models curve involved in mastering the underlying
(design and specification) [15]. However, the discrete mathematics and formal logic
system success under the development may needed for the effective use of the
be strongly affected by the discovering methodology. Yet, the literature suggest that
process of involved stakeholders demands using FMs in the design of data warehouse
and sustaining those demands while systems ought to be useful in improving the
reliability of such systems and other directions for future work are presented in
functionality [19]. Section 6.
Formal methods are mathematical 1.1 Research questions
approaches sustained by tools and techniques
In this paper we aim to find answers to the
for the verification of the desired and
following questions:
necessary properties of software or hardware
systems. FMs are necessary for the control of RQ1: What are the requirements elicitation
the quality parameters such as completeness; approaches for data warehouse development?
correctness; and consistency and verification
of requirements of a system [13] and they are RQ2: To what extent may formal
based on (often discrete) mathematical specification facilitate data warehousing?
notations and logic to clearly and accurately
RQ3: How may the two (2) prominent
express requirements specification [21].
elicitation techniques be combined?
In the research work published by [16],
Formal methods are more likely to be used at 2 DATA WAREHOUSE
the levels of design and verification of the SYSTEMS DESIGN
software development. As observed by [13],
Building a DW system is unlike transactional
formal methods are attached with the three
systems with respect to the development; the
techniques which are formal specification,
structures are not only ones to be thought of
formal checking (discharging proof
as in those kind of source systems, but
obligations), and refinements. Formal
cognizance should also be given about the
specifications aim to provide for an
purposes and strategies of the organization
unambiguous and coherent complement to
[8]. Data warehouse systems have the
natural language descriptions [21], [22] and
purpose of supporting the process of decision
are rigorously validated and verified
making of an enterprise. The development of
conducting to the early detection of
a DW requires that the analytical
specification errors [22]. One of the broadly
requirements supporting the decision making
used formal specification languages amongst
process be captured and such requirements
many different formal specification
are usually not easy to extract and specify
languages is the Z, selected in this paper
[11]. A DW is generally defined as the
owing to its simplicity and widely-usedness
linking of a number of operational databases
in the formal methods arena [22].
with the aforementioned intelligence (e.g.
This paper is structured as follows. Following decision-making) added to the resultant
our research questions below, we introduce in structure. Since a DM is viewed as a subset
Section 2 the fundamental concepts of data of a DW, we view a data mart as being one of
warehouse systems design by discussing the operational databases in the DW.
various design approaches. Section 3
Subsequently, we formalize a DW as follows:
presents a Z specification of a data warehouse
star schema and Section 4 addresses related 𝑛
work in this area. In Section 5 we address our 𝐿𝑖𝑛𝑘 𝐷𝐵𝑖 , where
methodology and finally, conclusions and i=1
(∀i)(∀j) (1 ≤ i, j ≤ n ⦁ i ≠ j ⇒ 𝐷𝐵𝑖 ∩ 𝐷𝐵𝑗 = phase, logical phase and physical phase [6],
∅) [8]. According to [9], the design stage is the
The above definition assumes that different most significant operation in the successful
databases, when correctly normalized do not construction of a DW.
contain common elements, except of course, The following sections elucidate the context
for foreign key matches. of requirements analysis and conceptual
The design of data warehouse systems is design which are considered as the two main
unlike from the design of transactional phases within the data warehouse systems
systems that provides data to the warehouse design process [8].
[8]. There are two well-known authors in the 2.1.1. Requirements Analysis
world of data warehousing; Bill Inmon and
Ralph Kimball, advocating complementary, Requirements analysis has as its aim
yet different techniques to the design of data detecting which knowledge is useful for
warehouses. The technique applied by Bill decision making by investigating the user’s
Inmon is the familiar top-down design which demands and expectations in user-driven and
begins with the Extraction-Transformation- goal-driven approaches, or by verifying the
Loading (ETL) process working from validity of operational data sources in a data-
external data sources in order to build a data driven approach [8]. Requirements analysis
warehouse, whilst Ralph Kimball applies the of users plays a crucial role in data warehouse
equally well-established bottom-up systems design. It has a major influence upon
technique which begins with an ETL process the taking of decisions throughout the data
for one or more data marts separately. Most warehouse systems implementation [2], [23].
of proponents of data warehouse design The requirements analysis phase leads the
subscribe to either of the two techniques [12]. designer to unveil the multidimensional
schema necessary elements (facts, measures
2.1 Data warehouse systems design and dimensions) which are claimed to assist
Approaches future data manipulations and calculations.
The DW systems design is based on two The multidimensional schema has a
approaches that are alternative and inverses significant impact on the success of DW
of each other viz: the Data-driven approach projects [2], [3], and [14].
also known as the Supply-driven approach Several research works have published on the
and the Requirement-driven approach also various approaches used during the
known as the Demand-driven approach [1], requirements analysis phase of DW systems
[3], [7], and [10]. The process of design, leaning on the two techniques
development of a DW starts with the mentioned above – the Top-down technique
identification and collection of requirements. and Bottom-up technique. Implementations
The design of the multidimensional model is of these are: Data-driven approach, Goal-
next, followed then by testing and driven approach, User-driven approach, and
maintenance [9]. To develop a data Mixed-driven approach [1], [2], [3], [4], [5],
warehouse requires a set of steps to [6], [7], [8], [9], [10], and [11]:
accomplish throughout the process, namely,
the requirements analysis phase, conceptual
The Data-driven approach also known as considering the needs expressed by end users
the supply-driven approach. It utilizes and stakeholders. The correct elicitation of
the bottom-up technique and yields user requirements remains a fine challenge
subject-oriented business data schemas and many techniques, e.g. the use of JAD
by only leaning on the operational data (Joint Application Design) sessions [25] have
sources and ignoring business goals and been put forward.
stakeholder needs.
These three primary aforementioned
approaches have their merits and demerits.
The Goal-driven approach applies the
However, in an attempt at overcoming this
top-down technique rather than the
problem, numerous authors suggested the
bottom-up technique. It allows for the
mixed-driven approach which consists of
generation of information like Key
combining two or even all three the primary
Performance Indicators (KPIs) of
approaches (either user-driven and data-
principal business areas by relying on
driven approaches or goal-driven and data-
business objectives and granted business
driven approaches or a combination of user-
processes only, and essentially overlooks
driven, goal-driven and data-driven
data sources and user demands.
approaches) as detailed in [2], all aimed at
getting a “best result” that will meet users and
The User-driven approach is similar to stakeholders’ demands and expectations.
the goal-driven approach, and it applies According to [14] and [26], the requirements-
the top-down technique. It permits to driven approach is also called the analysis-
produce analytical requirements driven approach; the supply-driven approach
interpreted by the dimensions and is also called the source-driven approach and
measures of each subject by ignoring the requirement/supply-driven approach is
business goals and data sources. known as an analysis/source-driven
A user-driven approach starts with a detailed approach, but is also known by the name of a
agreement of the needs and expectations of hybrid-driven approach.
the users and this brings about numerous The various approached discussed above are
advantages like increasing of the production, elicited in Figure 1 below.
enhancing of the work quality, support and
training costs reductions and improvement of
Top-down
general user satisfaction [24]. User
requirements analysis does not prescribe to a Requirements- DW
standard approach which designers may rely driven Source-driven
on for designing their data warehousing User-driven
Supply-driven
projects [16]. As declared by [6] the data- Demand-
Data-driven
driven approach yields a conceptual schema driven
through a re-engineering process of the data Goal-driven
Analysis-driven Bottom-up
sources by ignoring the end users and
stakeholders’ contribution, whilst the DW
requirements-driven approach aims at
generating the conceptual schema by only Figure 1: Complementary top-down & bottom-up
Table 1 below presents the advantages and Figure 2 below depicts the analysis-driven
disadvantages of approaches grouped by approach framework with all the steps
technique. considered:
However, the main purpose of this stage is to consists of measures and additional
develop a conceptual diagram that will meet attributes, therefore we have as basic types:
the functional requirements from the
requirements analysis stage (requirement- [Measure, Attribute]
driven approach) and the data model
A Fact table consists of attributes, but given
designed from the legacy operational systems
the structure of the star schema [15] we
[8] in order to fulfill the users and distinguish between measures and ordinary
stakeholders’ demands and expectations. attributes as follows.
This phase would be useful for the
representation of the necessary elements into Fact_table
the multidimensional schema after the measures : ℙ Measure
specification of requirements. attributes : ℙ Attribute
According to [29] a schema is defined by the measures ∩ attributes = ∅
relation between facts and dimensions. Fact
is the subject of analysis or the focus of While measures of fact tables in a star schema
interest in the process of decision making are also attributes, we give them special
[27] and dimensions are different status to reflect the structure defined in [15].
perspectives used for the analysis of facts.
A Dimension as per Figure 5 consists of a
Fact contains numerical attributes commonly
number of fact tables.
called measures [29].
Dimension
Figure 5 below depicts a multidimensional
schema for data warehouse systems, dimension : ℙ Fact_table
following the suggestions in [15].
As per the Established Strategy for
constructing a Z specification, a specifier has
Fact – table Dimensions – to define an initial state, and discharge a
1.. n table 1.. n proof obligation (PO) arises, namely, show
Measures Attributes
that such an initial state may be realized (i.e.
it exists).
Figure 5: A multidimensional schema
Subsequently, the initial state of the star state
is given by:
Next we present a formal specification of the
star-based structure in Figure 5 in Z.
Init_Dimension
3 A FORMAL SPECIFICATION Dimension′
⊢ ∃ Dimension′ ⦁ Init_Dimension
Fact table
(Measures) 4 RELATED WORK
data, hence the set is non-homogeneous. Z, [7] M. Golfarelli. (2010). From User
however, is a strongly typed language, Requirements to Conceptual Design in Data
Warehouse Design. Data Warehousing
containing only “homogeneous” sets. Design and Advanced Engineering …, 15.
Therefore specifying the above processes in https://doi.org/10.4018/978-1-60566-756-
0.ch001
Z will require us to devise a mechanism to
centric aggregate queries. According to [11], [20] and [21] use aggregate data to share work
an ego-centric query allows to a graph node, across different queries with different sliding
called consumer, to aggregate events from windows.
others nodes, called producers. Aggregate queries over graphs are
In this paper, we present an approach to fundamentally different and have not been
optimize ego-centric aggregate queries in widely studied in previous works. [22] has
GDB by materializing some of their results. proposed a language for querying and
The main contribution of our approach is to analyzing GDB by using aggregates and
reduce the cost of affecting materialization ranking. [10] and [11] have addressed the
decisions to the results of ego-centric management of ego-centric aggregate queries
aggregate query. Firstly, we discover the in large graphs.. Both [11] and [10] have
densest subgraph from the underline graph. addressed the issue of when the events should
Dense subgraphs represent the tightly coupled be transmitted from the producers to the
nodes. Then, we materialize all the results of consumer. The two possible ways are: either at
the ego-centric aggregate queries that are query time or precomputed on the consumer.
implemented on the nodes of the densest The former way consists of traversing the
subgraph. producers at each read of the consumer. In
[10], this way is called pull task and it
The rest of this paper is structured as follows.
corresponds to an on-demand update of
The next section discusses related works.
aggregate data. However, the later one consists
Section 3 presents our approach. It contains the
of pre-computing aggregate query answer at
details of the node classification and the
each new write in the producers. In [10], this
decision affectation, which constitute the two
way is called push task and it corresponds to
main steps of our approach. Section 4 presents
online update. The work in [11] proposes to
the evaluation of the proposed approach. The
retrieve events from high-rate producers at
section 5 is the conclusion.
query time and materialize, in aggregation
2 RELATED WORKS nodes, events that come from low-rate
producers. The work in [10] proposes detailed
Ego-centric aggregate queries are special case solution that begins by constructing an
of aggregate queries which are widely treated aggregation overlay graph and then makes a
in relational databases and data warehouse [12] decision for each node of this graph whether to
[13] [14] [15], in data streams [16], and in aggregate events on it (push decision) or not
sensor networks and distributed databases [17] (pull decision). The aggregation overlay graph
[18]. The materialization of query results is the is constructed to encode the computations to be
main technique, which is used to optimize performed when an update or a query is
aggregate queries, in such domains. It allows received. The main advantage of the
pre-computing and storing their results, called aggregation overlay graph is that it allows
views, to avoid computing/recomputing them sharing partial aggregates across different ego-
whenever they are asked. The topics of centric aggregate queries. The decisions of
materialization are mainly: What data do we materializing events on nodes are made based
materialize? Where do we materializing it? on the cost of push and pull tasks. As we have
And when updating materialized data? seen, what distinguishes [10] from [11] is the
Aggregated results are selected based on the answer to the issue of where do we store
following criteria: they serve frequent queries materialized data? For this reason, the solution
or they are shared by some queries. The works of [10] has integrated intermediate aggregation
in [12], [13], [14], [15] have proposed nodes. These two approaches are most closely
approaches to best select materialized views. related to our work, which consists of deciding
The approach proposed in [19], optimizes the whether the result of an ego-centric aggregate
update load of aggregate data (materialized query should be materialized (push decision)
views). In the context of data streams, [16], or not (pull decision). We have not evoked the
issue of where materialized data should be These two steps follow an iteratively process
stored. Simply, we have supposed that in order to make it possible to adapt the node
materialized results are stored on the ego- node classes to the recent changes in the graph
i.e. the consumer. The main contribution of our databases.
approach is to reduce the cost of affecting In the rest of this section, we will develop these
decisions (pull/push) to ego-centric aggregate two steps.
queries. Instead of continuously measuring the
access and update frequencies of nodes to 3.1 Classification of graph nodes
affect decisions, we propose to intervene We distinguish two types of graph nodes: (i)
periodically to identify the most active nodes the most active nodes; and (ii) the less active
(producer or consumer), which are the nodes ones. The first set contains the nodes that
of the densest subgraph. Then, we materialize participate, as sources/targets, in the most parts
the result of every ego-centric aggregate query of the graph edges (represent events in real
that is contained on an identified active node, world). The second set contains the nodes with
i.e. the producers of this query will receive the low participation in the graph growth. In our
decision push. To the best of our knowledge, approach, the ego-centric aggregate queries, to
there is no approach that uses dense subgraphs which we affect the push (online) policy, are
as a way to classify and optimize ego-centric implemented on a part of the first set, since
aggregate queries. their access/update frequencies will be
important. However. The ego-centric
3 OPTIMIZING EGO-CENTRIC
aggregate queries, to which we affect the pull
AGGREGATE QUERIES (on demand) policy, belong to the second class
of nodes, where the access/update are less
In our approach, each ego-centric aggregate
frequent.
query is executed in one of the following two
ways. The first way consists of querying the To specify the first set of nodes, we have
inputs from the neighborhood only when the chosen to look for the tightly coupled nodes of
user requests the ego node. However, in the the graph in the recent time. We suppose that
second way, inputs are pre-computed and kept these nodes represent the main interests of
up-to-date; then, when the user requests it, the users, i.e. topics that capture popular attention
ego-centric aggregate query is executed on the and in which a group of nodes has participate
precomputed data which reduce latency. as sources or targets. In literature, tightly
Consequently, our hybrid approach affects the coupled nodes are called dense subgraph.
pull decision to the ego-centric aggregate Dense subgraphs may correspond to emerging
queries whose results should be pre-computed, stories in social media, hot topic of discussion
and the push decision to the rest of the queries. in a forum …. We use dense subgraphs, to
Therefore, there are two steps in our approach: discover the more active nodes, because:
1. Classify ego-centric aggregate queries Dense subgraphs group the nodes that are
by identifying the nodes whose query frequently updated/accessed and that
results should be pre-computed. This capture the most part of interactions.
task is performed periodically, i.e. at the Dense subgraphs indicate the
expiration of any time interval with a trends/interests of users in the current
predetermined duration; and probably in the future time.
2. Affect and apply update decision. This For example, in figure 1, the densest subgraph
task exploits the result of the previous is composed of vertices {a, b} and the edges
classification to optimize the server load. between them. These two nodes attract the
It applies pull or push decisions for major part of the graph edges. However, {c} is
executing ego-centric aggregate queries, less used in the interactions between nodes. So,
as we have explained here above. if the vertex a or b contains an ego-centric
aggregate query, then its result will be becomes not interesting. So, we will use the
precomputed (push decision) else we apply the greedy approximation algorithm [23] to
pull decision for it. discover dense subgraphs.
The greedy approximation algorithm identify
the densest subgraph through several
iterations. In each iteration, it removes the
node that has the minimum degree vertex
according to a certain rule.
3.2 Affectation of push/pull decisions
Figure 1. Example of graph having a densest subgraph.
Dense subgraphs may be identified from In our approach, we intervene periodically to
directed or undirected underline graphs. In this decide whether the result of an ego-centric
approach we consider the directed graph since aggregate query should be precomputed (push
the recent applications of GDB need directed decision) or not (pull decision). In other words,
edges like social networks, communication every time when a pre-specified interval of
networks, email networks, financial time (called intervention period 𝑝𝑖 ) is expired,
transaction ….. According to [23], where the we search the densest subgraph to classify the
underline graph is directed, the problem of nodes used in the last period 𝑝𝑖 . The vertices of
identification of dense subgraphs is formulated the densest subgraph, issue from the
as follow: classification task, are added to a set 𝑁 that
regroups the vertices of the densest subgraphs
Let G(V,E) be a directed graph, where of the previous periods. In other words, if 𝑀𝑘
V is the set of vertices and E is the set is the set of vertices of the densest subgraph of
of directed edges between vertices of V. the period 𝑝𝑘 , then 𝑁 = ⋃𝑘=0,…,𝑖 𝑀𝑘 .The
To identify the densest subgraph decision for an ego-centric query 𝑞𝑧 which is
𝑀(𝑉𝑚 , 𝑇𝑚 ) of G, we search the subsets implemented on a vertex 𝑣𝑗 , is made according
𝑆 ⊆ 𝑉 and 𝑇 ⊆ 𝑉 so that: to the following rules:
- All the edges from S to T are
included in E, i.e. 𝐸(𝑆, 𝑇) = {𝑒𝑖,𝑗 ∈ If 𝑣𝑗 ∈ 𝑁 then the decision for 𝑞𝑧 is push
𝐸, 𝑣𝑖 ∈ 𝑆, 𝑣𝑗 ∈ 𝑇}; Else the decision for 𝑞𝑧 is pull
- The subgraph composed of 𝐸(𝑆, 𝑇)
has the maximum density from all 𝑁 is incrementally constructed because:
the subgraphs, i.e The identification of the densest
𝑚𝑎𝑥𝑆,𝑇 ⊆ 𝑉 {𝑑(𝑆, 𝑇)} where 𝑑(𝑆, 𝑇) subgraph from the wall underline graph
represents the density. This density is highly complex. In our approach, only
of directed graphs was introduced in the last increment to the graph is used to
|𝐸(𝑆,𝑇)| search the dense subgraph;
[24] as follow: 𝑑(𝑆, 𝑇) =
√|𝑆|.|𝑇|
New active nodes may arise over time
- 𝑇𝑚 = 𝐸(𝑆, 𝑇); 𝑉𝑚 = 𝑆 ⋃ 𝑇; and we should adapt the update policy of
In literature, the algorithms, of identification the ego-centric aggregate queries they
of dense subgraphs, are either with or without contains.
overlap. The approach without overlap
discover a set of dense subgraphs so that the
intersection between each couple of dense
subgraphs is null [23]. However, the approach
with overlap authorizes the intersection
between dense subgraphs [25]. In this paper, (a) state at 𝑡0 (b) state at 𝑡1 (c) state at 𝑡2
since our aim is limited to identify the most
active vertices, the overlap of subgraphs Figure 2. Example of underline graph with increments.
For example in figure 2, we have three states u sent a private message to user v at time t. The
of the underline graph in three different times. graph of this dataset contains 1899 nodes and
The dotted edges and vertices represent the 59835 temporal edges. The time span of this
changes (the increment) on the underline graph dataset is 193 days. The second dataset is a
from 𝑡𝑖 to 𝑡𝑖+1 , where 𝑡𝑖+1 = 𝑡𝑖 + 𝑙 and 𝑙 is temporal network of interactions on the stack
a specified time interval, i.e. the duration of 𝑝𝑖 . exchange web site https://superuser.com.
The table 1 presents the densest subgraphs of There are three different types of interactions
the periods 𝑝0 , 𝑝1, 𝑝2 corresponding to the represented by a directed edge (u, v, t):
three states of figure 2; and the evolution of the User u answered user v's question at time
content of the set 𝑁 across time. The 3 last t;
rows of the table present the decisions for three
ego-centric aggregate queries 𝑞1 , 𝑞2 , 𝑞3 User u commented on user v's question at
implemented respectively on the vertices a, c time t;
and f. User u commented on user v's answer at
time t.
Table 1. Example of densest subgraphs and
decisions. This second dataset is comprised of 194085
Period 𝒑𝟎 𝒑𝟏 𝒑𝟐 nodes and more than 1 million edges (1443339
Vertices of the densest {a, b} {a, d} {b, f} temporal edges). Its time span is 2773 days.
subgraph
N {a, b} {a, b, d} {a, b, d, f}
Decision for 𝒒𝟏 push push Push
In this experimentation, each vertex 𝑣𝑖 , which
Decision for 𝒒𝟐 Pull pull Pull corresponds to a user, is considered having an
Decision for 𝒒𝟏 Pull pull push ego-centric aggregate query that summarizes
3.3 Experimentation the reactions of neighbors to the messages
(emails, questions, answers, comments) of 𝑣𝑖 .
We evaluate our proposed approach using the
following questions: In order to answer the first question, we ran our
system on the dataset CollegeMsg, here above
1. Do the nodes, which are considered the described. We measured the update load,
most active ones and whose query which is the required time to update the results
results have been precomputed, have of all the ego-centric aggregate queries, and the
really optimized the server load and the average of query response time. We have split
response time of ego-centric aggregate the time span to periods. The duration of each
queries? period is 7 days i.e. one week. The table 2
2. How relevant is the choice of the presents the result of this test, in milliseconds,
duration of the period to reclassify nodes for the three policies of running queries:
and begin a new iteration? Push decision for all the vertices, called
3. Do the size of the dataset has an impact precomputation policy;
on the result of our approach? Pull decision from all vertices, called on
In order to answer these questions, we have demand policy;
evaluated our approach against the two Hybrid policy according to the principle
datasets: CollegeMsg temporal network of our approach.
[http://snap.stanford.edu/data/CollegeMsg.ht
ml] and Super User temporal network From table 2, we can see that our approach
[http://snap.stanford.edu/data/sx- allows the optimization of the update load of
superuser.html]. The first dataset is comprised the precomputation policy by more than 46%
of private messages sent on an online social and the query response time of the on-demand
network at the University of California, Irvine. policy by more than 15%. It is obvious that the
Users could search the network for others and on-demand policy produces the minimum
then initiate conversation based on profile update load, since query results are computed
information. An edge (u, v, t) means that user only when the query is asked. The
precomputation policy produces the optimal Table 3. Selectivity and rate of participation.
response time, since the results of all queries Of all vertices Of active
vertices
are precomputed. We conclude, from this Average number by 334 5
experiment, that our approach gives the best period
compromise between an acceptable quality of Rate of the active vertices 5/334 = 1.5%
with respect to the total
service (QoS) and a low update load i.e. we number of vertices
optimize the query response time using a low Average number of edges 2209 806
update load. (events) by period
Rate of participation of 806/2206 = 36.49%
active vertices in the
Table 2. Results of the first experiment. edges
Policy Precomputation On Hybrid
demand In order to answer the second question we have
Update load in ms 2369445 1065968 1271694 executed our system on the dataset
Average of query 6.08 23.95 20.21
CollegeMsg with varying the period duration
response time in from 3 to 150 days. We have measured how
ms much our approach is capable to optimize the
Rate of - - 46.32%
optimization of total load i.e. the access and update loads. We
precomputation calculated the optimization rate. The
policy load by our
approach optimization rate is the percentage of
Rate of - - 15.60% decreased/increased time that resulted by
optimization of
query response applying our approach with rapport to the two
time of On demand other policies. For example, the optimization
policy by our
approach rate of the precomputation policy is calculated
as follow:
We explain, in table 3, why this first
experiment has produced good results. The 𝑡𝑜𝑡𝑎𝑙 𝑙𝑜𝑎𝑑 𝑜𝑓 𝑡ℎ𝑒 𝑝𝑟𝑒𝑐𝑜𝑚𝑢𝑝𝑡𝑎𝑡𝑖𝑜𝑛 𝑝𝑜𝑙𝑖𝑦 − 𝑡𝑜𝑡𝑎𝑙 𝑙𝑜𝑎𝑑 𝑜𝑓 𝑜𝑢𝑟 𝑎𝑝𝑝𝑟𝑜𝑎𝑐ℎ
selectivity, of the algorithm of classification of 𝑡𝑜𝑡𝑎𝑙 𝑙𝑜𝑎𝑑 𝑜𝑓 𝑡ℎ𝑒 𝑝𝑟𝑒𝑐𝑜𝑚𝑢𝑝𝑡𝑎𝑡𝑖𝑜𝑛 𝑝𝑜𝑙𝑖𝑦
periods (less than 30 days), where the best rate may be correlated because they discuss an
is for duration of 3 days (≃ 15%). For long issue), it does not last in time. This makes the
periods, the selectivity, of our classification results, of the ego-centric aggregate queries,
algorithm, is high. For example, for the period that are precomputed (push decision), are less
duration 90 and 150 days, the selectivity was or never requested; that's why the
27.45% and 30%, respectively. In addition, the precomputation policy was costly i.e. it
refreshment of the list of selected vertices will precomputes all the results for nothing.
be slow, i.e. it may not follow the change in the However, in the first dataset (CollegeMsg),
user interests. Consequently, some vertices where users know each other and are in
that are expected to be active will consume a permanent contact, the results of our hybrid
lot of update time but they will never be approach were good.
accessed. For example, the participation of the Optimization rates
expected active vertices, for period duration 90 30.00%
and 150 days, was 59% and 47%, respectively.
In addition to that, the possibility, to select new 20.00%
active vertices to reduce the total load, is
10.00%
delayed. This makes the precomputation
policy, which precomputes the results of all 0.00%
queries, more profitable than our hybrid 90 days 60 days 30 days 7 days 5 days
-10.00%
approach for long periods. On the other hand, optimization rate of precomputation policy
this failure, of classifying vertices, did not optimization rate of on demand policy
prohibit our approach to produce a profit in
rapport with the on demand policy. The main
Figure 4. Optimization rates in super user dataset.
conclusion of this experiment is that, to
optimize the two policies, the period duration Our main conclusion from these experiments
shall not exceed 30 days. is that our approach may be used to optimize
ego-centric aggregate queries in graph
In order to answer the third question we have
databases with limited number of users and
executed our system on the dataset Super User,
highly connected vertices across time.
here above described. We have measured the
However, our approach is not recommended
optimization rate of our approach with rapport
for large graph databases with less correlated
to the two other policies. The figure 4 presents
nodes across time.
the results of this experiment.
What we conclude, from the figure 4, is that 4 CONCLUSION
our hybrid approach has never produced a
profit with rapport to the on demand policy for In this paper, we have proposed an approach to
all the period durations of the test. The rate of optimize ego-centric aggregate queries in
optimization with rapport to the on demand graph databases. Ego-centric aggregate queries
policy was around 0% . In most cases, it was allow to a graph node, called consumer, to
negative. The second conclusion is that the aggregate events from others nodes, called
precomputation policy was the worst and we producers. The most used technique to
have optimized it by nearly 20%, in all cases. optimize such queries is the materialization of
The main reason, that has led to these results is their results either in the consumer node or in
the high variety of users and of their interests the producer ones. We have developed a policy
in the case of large dataset. In other words, the that materializes only the results of ego-centric
correlation between users, which is the basis to aggregate queries which are implemented on
select active vertices and affect the push active nodes. A node is considered active if it
decision for them, is low in such large datasets. is an element of the densest subgraph. For this
Moreover, even if there is a correlation reason, we have begun by discovering the
between a group of users (in our dataset, users densest subgraph. We have supposed that the
densest subgraph regroups the nodes whose
access/update frequency is high and the [9] Ben Ammar, Ali, "Query Optimization
correlation between them is strong. The results Techniques In Graph Databases," in CoRR
of our experimentation have demonstrated abs/1609.01893 , 2016.
that, in case of small graphs, our approach [10] J. Mondal and . A. Deshpande, "EAGr:
Supporting Continuous Ego-centric Aggregate
produces a low management load with rapport Queries over Large Dynamic Graphs," CoRR/
to the scenarios of precomputing all query abs/1404.6570, 2014.
results or computing query result at query time. [11] A. Silberstein, J. Terrace, B. . F. Cooper and R.
However, for large graphs, computing query Ramakrishnan, "Feeding Frenzy: Selectively
result at query time is the best scenario. The Materializing Users’ Event Feeds," in SIGMOD,
failure of our approach, in case of large graphs, 2010.
comes from the level of correlation between [12] H. Gupta and . I. S. Mumick, "Selection of
Views to Materialize in a Data Warehouse,"
the selected active nodes, which has not held
IEEE Trans. on Knowl. and Data Eng. 17(1), pp.
strong. Consequently, access frequencies are pages 24-43. , Jan. 2005..
decreased and the expected optimization has [13] I. Mami, Z. Bellahsene and R. Coletta, "A
not been realized. Therefore, our future works Declarative Approach to View Selection
will focus on how improving the way of Modeling.," T. Large-Scale Data- and
selecting active nodes in order to be profitable Knowledge-Centered Systems, pp. 115-145,
across time. 2013.
[14] P. P. Karde and V. M. Thakare, "Selection &
REFERENCES Maintenance of Materialized View and It’s
Application for Fast Query Processing A
survey," International Journal of Computer
[1] R. Angles and C. Gutierrez, "Survey of graph Science & Engineering Survey (IJCSES), Vol.1,
database models," ACM Comput. Surv., Vol. 40, No.2, November 2010.
No. 1, pp. pp. 1-39, 2008.
[15] Y. . D. Choudhari and S. K. Shrivastava, "Cluster
[2] K. N. Satone, "Modern Graph Databases Based Approach for Selection of Materialized
Models," International Journal of Engineering Views," International Journal of Advanced
Research and Applications , 2014. Research in Computer Science and Software
[3] P. . T. Wood, "Query languages for graph Engineering, Volume 2, Issue 7, July 2012.
databases," SIGMOD Rec Vol. 41, No. 1, pp. pp. [16] S. Krishnamurthy, C. Wu and M. J. Franklin,
50-60, 2012. "On-the-fly sharing for streamed aggregation," in
[4] P. Macko, D. W. Margo and M. I. Seltzer, SIGMOD, 2016.
"Performance introspection of graph databases," [17] S. Madden, M. J. Franklin, J. M. Hellerstein and
in 6th Annual International Systems and Storage W. Hong, "TAG: a Tiny Aggregation service for
Conference, Haifa, Israel , June 30 - July 02, Ad-Hoc sensor networks," in OSDI, 2002.
2013.
[18] A. Silberstein and . J. Yang, "Many-to-Many
[5] P. Jadhav and R. Oberoi, "Comparative Analysis Aggregation for Sensor Networks," in ICDE,
of Graph Database Models using Classification 2007.
and Clustering by using Weka Tool,"
International Journal of Advanced Research in [19] X. Zhang, L. Yang and D. Wang, "Incremental
Computer Science and Software Engineering, pp. view maintenance based on data source
438-445, Volume 5, Issue 2, February 2015. compensation in data warehouses," International
Conference on Computer Application and
[6] H. R. Vyawahare and P. P. Karde, "An Overview System Modeling (ICCASM), pp. 287-291, 22-
on Graph Database Model," International Journal 24 Oct. 2010.
of Innovative Research in Computer and
Communication Engineering, Vol. 3, Issue 8, [20] S. Wang, E. A. Rundensteiner, S. Ganguly and S.
August 2015. Bhatnagar, "State-Slice: New Paradigm of Multi-
query Optimization of Window-based Stream
[7] A. Campos, J. Mozzino and A. A. Vaisman, Queries," in VLDB, 2006.
"Towards Temporal Graph Databases," CoRR
abs/1604.08568 , 2016. [21] L. Al Moakar, "Class-Based Continuous Query
Scheduling in Data Stream Management
[8] J. Reutter, Graph Patterns: Structure, Query Systems," in PhD thesis, Univ. of Pittsburgh,
Answering and Applications in Schema 2013.
Mappings and Formal Language Theory, PhD.
Dissertation, University of Edinburgh, 2013. [22] A. Dries and S. Nijssen, "Analyzing graph
databases by aggregate queries," MLG@KDD,
pp. 37-45, 2010.
A Secure Method for the Global Medical Information in Cloud Storage based on the
Encryption and Data Embedding
1 Soheil Nezaket, Islamic Azad University, IAU, UAE branch, Dubai, UAE
2 Dr. Mohammad V. Malakooti, Islamic Azad University, IAU, UAE branch, Dubai, UAE
3 Dr. Navid Hashemitaba, Islamic Azad University, IAU, Tehran Central Branch, Tehran, Iran
1 soheilnezakat@yahoo.com, 2malakooti@iau.ae 3 nhtaba@yahoo.com
are interconnected to form an information pool that saved millions of dollars using the cloud storage,
can be accessed over the internet with minimal but the security issues are still under consideration.
management efforts. The individual users, The security threats are major challenges for the
corporations, and even enterprises can perform data cloud users and many businesses and they are not
storage and computing on their own private cloud or comfortable to use cloud storage facility and prefer
using the third–party cloud facilities rapidly, to use their own infrastructure and storage systems.
efficiently, with high speed and acceptable security. We have proposed a new method that can be used to
The most important feature of the cloud computing scramble and encrypt the information prior to the
is the resource sharing with easy access, high transmission process. While the information is
reliability, minimum cost, high speed, and encrypted then the hacker no longer can retrieve the
acceptable security. Cloud storage is a hot topic original data during the transmission or while the
nowadays as the data storage capacity rates are data is stored on the cloud network.
increasing manifold’s every year and has become a Cloud storage brings scalability, economies, and
reality that all data centers and organizations should flexibility all together. But still security in the cloud
consider it. storage has its own proportional worries. The cloud
The cloud computing and storage is a fast growing storage will solve security problems occurred in the
technologies in which it has received the high cloud computing. problems related to compliance,
attention of scientists, researchers, enterprises, and privacy and legal matters[3]. Cloud storage will
industrial communities due to its low cost, easy covers most of the security eras considered for the
access, high speed, and reliability which bring data shared data. The study has shown that most of the
security and portability. Cloud Storage empowers cloud storage providers are aware of the extreme
the real time data access over the network. Each important of the data integrity and security and they
user with their pre-specified privilege can have a have launched the new firewalls and software for
real time and on-demand access to the online data their network security as well as the data
pool which is shared among all cloud uses but with integrity[4].
different data type, accessibility and performance.
Cloud Storage occurs as central data storage idea Study of the Cryptography:
which its main goal is avoid the data replication, The cryptography system brings latest and modern
duplication, provide quick-secure-suitable data security protocol. Cryptography can be used in a
exchange and also prevent massive paper correct way and wrong way and also it has the
displacement. The cloud storage speeds up all the possibility to protect the wrong things. Transferring
basic cloud features like: collaboration, agility, message in a secure way is the goal of
scalability, availability. In some cases, Cloud cryptography. Cryptography has been used to
storage brings the centralized data management in prevent the insecurity in a way that information is
parallel with data security. The most important issue encrypted by the predefined algorithm and those
that need to be considered regarding to the cloud who have knowledge of the algorithm as well as the
computing is the uncertainty that exist in its security secret keys can decrypt the information and retrieve
while the information is transmitted through the the original data from the encrypted message. Once
cloud facilities or during the computation or storage the message is encrypted, the meaning will change
process[1]. The security risk for storing the vital and without key no one can find out what is that.
information over the public, cloud storage, or any This meaning will be revealed only after the correct
online facilities are high because the user have no recipient tries to access it. The encrypted message
control over the cloud storage and many people will be obtained through some reversible algorithm
simultaneous can access the cloud and breach the in which the original data can be obtained by
security. inverse option through the decryption algorithm. If
Data movement for the companies is a big problem. the algorithm is not reversible then it cannot be used
In order to decrease this issue, the cloud storage for the encryption and decryption processes.
providers ensure the data owners that they can The accrued inconsistency between the security and
continue with the same security and privacy cryptography is now over 20 years. The people who
monitor their data[2]. Although the cloud works and study about the security don’t find the
computing has solved the storage problem of cryptography tools much useful. A more complex
corporations and enterprises in which they have method of information security is the lossless data
ISBN: 978-1-941968-45-1 ©2017 SDIWC 69
Proceedings of the Third International Conference on Computing Technology and Information Management (ICCTIM2017), Thessaloniki, Greece, 2017
hiding or information embedding in which vital them against the stored featured in the database. In
data, such as social security or credit card numbers, addition, we can obtain extra features by measuring
are hidden inside the coded information rather than the distance between the eyes and mouth, as well as
the original message. We have optional the distance between the nose and mouth. The total
transformation such a s DCT to convert the original biometric features collected from the human face
message into coded message. Once the code can be in the wide of potential applications
message is obtain then the vital information is
inserted into mid-frequency area of the coded Fingerprint:
message. This process is based on the mathematical The fingerprint is made up of pattern of ridges,
algorithm that can be used to retrieve the data from valley and furrows on the surface of the fingertip.
the decode information in reversible technique[5]. Fingerprints have been used for the criminal
The mathematical function which is using the investigation by the law enforcement officers to
encryption named as cryptographic algorithm. The identify the people for more than 100 years. The
used method in that algorithm is working with key finger pattern is created in the first step of growing
and a record that can be number or phrase. Keys are fetus in the uterus. This feature is a unique feature
generally giant numerical figures. You can store of each human. The fingerprint information can be
keys in encoded forms. The keys can be stored in obtained by using a fingerprint reader or fingerprint
two files on your hard disc using PGP. scanner. The scanner is a biometric device that can
be used to identify a person based on the acquisition
Review on biometrics: and recognition of those unique patters in a
Nowadays, the importance of exigency to identify fingerprint. The fingerprint scanners are the most
users of convenience and services has improved and popular types of the biometric security and used
became more on the significant issues not only for with a variety of the systems on the market for the
controlling accesses a system & service, but also to general and mask market usage.
determine who will have the rights. Biometrics is Eyes
currently being applied all over the world in various In spite of the reduced size of this organ, it provides
ways. All these systems are generally computer- two reliable modalities: The Retina and Iris.
based solutions, where the procedure of validation
is running in at the server’s side or workstations. 3.1 Retina Scans
Biometrics relies on two major categories: Physical The retina is one of the modalities which provide
and Behavioral as mentioned in below. better performance results; however, this technique
is not well accepted due to the eye invasion during
Physical Modalities: the acquisition process.
Biometric systems that can be used for the human
identification are based on the following physical 3.2 Iris Scans:
elements, such as face shape, fingerprint, retinal The Irises are the most important biometric features
scans, and Iris scans. The DNA information also that can be used for the authentication. The patterns
can be used in the high security area such as Data in our irises are unique and hardly can be replicated,
Centers, military sites, or the aerospace launch meaning those irises authentications are safe and
controller. We have not used the DNA test in our secure. The Irises authentications mostly have been
research due to lack of available information and used in the immigration, international police, airport
only used face shape, fingerprint, eyes(retinal scans, security, and criminal justice systems. The Iris
and Iris scans) as the most important elements of scanners have provider the high resolution images
the physical modalities. that can be used to identify the person with the high
Face Shape: reliability and accuracy inside the airports, highly
The face recognition technology is one of the secured data centers, and even in the criminal
advanced technics that can be used to measures and justice systems. The development of highly
match the unique characteristics of the human resolution camera and scanner as well as the fast
biometric features for the identification and and robust software make it possible to retrieve the
authentication. The digital camera connected to the high resolution iris features from the captured
face recognition software can obtain the detected image and quickly compare it with the existence Iris
image and extract their features which then match features inside the database. The high resolution
ISBN: 978-1-941968-45-1 ©2017 SDIWC 70
Proceedings of the Third International Conference on Computing Technology and Information Management (ICCTIM2017), Thessaloniki, Greece, 2017
iris scanner along with the underline software for recognized by overall velocity, forces, kinetic and
feature extraction has made this modality cost potential energy cycles, and changes in the contact
effective for commercial applications. with the surface. This trait although not very
Behavioral modalities: distinctive among users, provides an extra
Behavioral modalities are based on the data derived advantage: it can be measured at a distance,
from an action performed by the user. avoiding contact with the user. This modality,
therefore, is interesting for surveillance
1- Voice recognition: applications.
Voice recognition is a combination of a physical
and a behavioral characteristic. Voice is based on Security:
two factors: first, on language and the way of High security is needed when we are talking about
speaking which helps in the mouth disposition for the biometrics. Because the biometrics features are
modulation and secondly, in physical traits such as grafting with the individuals features and data. This
vocal chords or the mouth itself. Voice recognition security must be occurring in two phases. The first
is the identification of individuals from the phase of protection is in the storage, where they
characteristics of their voices and often referred to store. And the second phase is where the
as voice biometrics. The voice characteristics can be information exchange happens.
used for both authentication and identification. The
characteristics of the voice or its features can be Global Medical Cloud Storage
obtained by applying the Discrete Fourier By using our proposed Image embedding algorithm,
Transform and other feature extraction techniques. one can easily store the patient’s personal
These features can be saved for the speaker information at the specific location of the image.
identification as well as of the voice recognition. There are several data embedding techniques that
The acoustic features obtained from voice analysis have been proposed by different authors. The one
reflect both anatomy (size and shape of throats and which is close to our proposed algorithm is the act
mouth) as well as learning the behavioral patterns. of hiding the secret information into the image data
during the encoding process. The encoded image
2-Signature along with embedded secret information will be
Signature systems can observe only the result of the delivered to the decoder[6].
action, i.e. the signature. These systems do not We proposed a new approach that can hide data in
require that the signature is made at the time of user general images, including personal information,
identification. Thus, it can be used in forensic medical information and any type of data that can
Biometrics. Signature recognition is a type of be converted to numbers (e.g. ASCII Code). In
behavioral biometric in which users can write their contrast to traditional techniques of the data hiding,
signatures on the paper or in digitizing tablet. When the new embedding process of lossless data hiding
the signature is typed on paper it can be digitized by methods must be invertible and user be able to
using a scanner or camera for offline recognition. In completely restore the original image after the
contrast when users write their signature in a extracting the embedded secret Information[7].
digitized tablet, the online recognition can be Our proposed model is based on the a novel
applied to analyses the signature based on their Lossless, Secured data embedding algorithm in
some features such as pressure, spatial coordinates x which the vital information can be embedded into
and y, azimuth, inclination, and pen position The the personal or radiology image while preserving
most popular signature recognition techniques are the quality of cover image and maintaining the
dynamic time warping, hidden Markov Model, and security of the data to be embedded[8]. In many
vector Quantization. embedding algorithm, a huge amount of data will be
embedded into the cover image with a high security
3-Gait: and high resolution while the extraction of the
The gait is the person’s manner of walking. It is embedded data are required to be lossless and
referred to the style of stepping or the locomotion robust[9]. We have applied the Hilbert curve as
achieved through the movement of human limbs. well as encryption with the Iris Code on the original
The variety of gait patterns are characterized by data and hide the data inside the mid frequency
differences in limb movement patterns and it can areas of the DCT coefficients. The general equation
ISBN: 978-1-941968-45-1 ©2017 SDIWC 71
Proceedings of the Third International Conference on Computing Technology and Information Management (ICCTIM2017), Thessaloniki, Greece, 2017
Where f(i,j) is the sample image point and f(u,v) is Transforming image by Discrete Cosine Transform
the sampled DCT coefficients, M, and N are the formula. Merging the image pixels with the data.
image rows and columns of image, respectively. Archiving is the last stage.
GMCS characteristics:
GMCS (Global Medical Cloud Storage) Algorithm Once the personal information or patient’s vital data
consists of two major security levels that can be has been obtained from the client file then they will
used to hide information of many Patients inside be embedded inside the image file using our highly
their images (Personal and radiology). The first secured algorithm without losing even one bit of the
security level is applied on Data itself, which information.
contain multiple security levels (conversion, In this paper we focused on using part of
encryption, scrambling). (embedding information's into the LSB and
All the data at the begging stage will be converted modifies DCT values). There are 3 main concerns
to the ASCII codes. that are clearly showed in Figure 1.
Data getting scrambled (Peano Hilbert).
The last stage is encryption.
The second security level is at the image
processing.
Gathering Convert the Scrambling Merging all the information and data
patient data data into 2D array of together;
(medical or ASCII code data by Peano
personal) and place it in Hilbert curve
to 2D array 1- Adding converted scrambled
and encrypted data to the end
of the transformed value
Generating Using MD-5 Encrypt the Dara 2- Encrypt all data once again for
512-digit or cellular with Generated double secure purpose
IRIS code automata to Code
from eye scan reduce 512-
digit code
Figure 1-Block Diagram of DCT along with scrambling on the Patient’s Information, and IRIS code
How can we select DCT coefficients to embed the A Hilbert curve (also known as a Hilbert space-
data? filling curve) is a continuous fractal space-filling
How can we embed the data in each block of image curve first described by the German mathematician
by using the Hilbert-space-filling curve scrambling David Hilbert in 1891. The Hilbert curve (Figure 2)
and encryption? In particular appears to have useful characteristics
How can we embed the data into image in secure of the cells after 4 subdivision steps[10].
manner and what technique do we use to enhance
the efficiency.
ISBN: 978-1-941968-45-1 ©2017 SDIWC 72
Proceedings of the Third International Conference on Computing Technology and Information Management (ICCTIM2017), Thessaloniki, Greece, 2017
Hilbert rules:
Rules:
L +RF-LFL-FR+
R -LF+RFR+FL-
(-) Turn right
(+) Turn left
Ln+1 = +RnF − LnF Ln − F Rn+
Rn+1 = −LnF + RnF Rn + F Ln−
S O H E I L S H
A H R Y A R N E
Z A K A T 2 2 0
2 1 9 9 0 0 0 9
8 9 1 2 1 2 3 4
5 6 7 T E H R A
N T E H R A N S
H E M I R A N I
Table 3: Patients’ Information
83 79 72 69 73 76 83 72
65 72 82 89 65 82 78 69
90 65 75 65 84 50 50 48
50 49 57 57 48 48 48 57
56 57 49 50 49 50 51 52
53 53 52 53 52 53 53 53
78 84 69 72 82 65 78 83
55 56 54 55 56 54 55 56
Table 4: Patient’s Information Converted to ASCII
83 79 72 65 90 520 49 65
75 57 57 65 89 82 72 69
73 65 82 76 83 72 69 78
50 48 57 48 48 50 84 48
49 52 53 50 51 52 53 53
78 83 56 55 54 65 82 56
55 54 69 72 53 50 49 52
53 58 56 53 78 84 56 55
Table 5: Scrambled ASCII code of Patient’s Information
Before merging them we have to encrypt our whereas you can keep more digits to increase the
scrambled data. As I mentioned earlier, for the reversed picture quality.
encryption key we are using the 512-Digit iris code. We add the second matrix values at end of the first
We are not storing the key, because the key is matrix diagonal values.
always carried by the patient (IRIS). In order to
reduce size of iris code, we can use reversible
cellular automata or the MD-5 hash algorithm.
With the MD-5 or reverse cellular automata, the
512-digit iris code reduce to 32-digit. With this 32-
digit code we encrypt the data matrix by XOR each
ASCII code with each digit of the code.
Merging
Merging step is the most critical step in the Global
Medical Cloud Storage. Hence it is important that if
you do not merge them exactly the way that it must
be done, you cannot unmerge them later when you
want to retrieve the file.
Here we have two matrices. One is the converted,
transformed blue pixel layer with the Boolean
value. And the other one is data which are
converted, scrambled and encrypted with the iris
code.
We are merging the encrypted asci codes in to the
decimal part. The picture accuracy that we spoke
about is the decision that we make here about the
values of the transformed pixels. In this paper we
have decided to keep 5 digits in decimal part,
ISBN: 978-1-941968-45-1 ©2017 SDIWC 75
Proceedings of the Third International Conference on Computing Technology and Information Management (ICCTIM2017), Thessaloniki, Greece, 2017
Archived file Extract the Un-merge the Compress the Decrypt with Extract the
values data iris using the iris code vital values
MD5 or
cellular
automata
coding. In this paper we proposed a new Data Compression, Encryption, and biometric feature
Encryption, Compression, along with new data extraction along with data hiding. We can achieve
embedding algorithms that no one has ever used in the highest level of security even though the patient
PACS and HIS .We have examined the feasibility is swoon by using biometric information along with
of Encryption and Data Embedding in the medical a 512 digit of iris code.
cloud storage which can be used to provide the data This study has opened a wide range of research
portability, compression, as well as the security. We about the electronic health record with security,
also discovered the existence of several factors encryption, and compression while maintaining the
about the security and compression in the cloud portability and accessibility. If we think deeply we
storage around the Hospital Information System. could easily find out that the Global Medical Cloud
In this paper we have shown that our model is Storage is the best world wide solution, but the
robust and secure and predetermined goals and model need to be fitted into the different regional
objectives have been achieved. Our main goals are requirements, because there are certain rules and
listed as following: regulations that restrict some countries to exchange
Being able to store all the medical records patient’s information and health records. The health
independently but integrated. integration and record exchange with worldwide
Making the Health Records to be accessible from all EHR solution, is the next research step which will
over the world, without any data loss. empower the GMCS worldwide.
Making the Electronics Health Record (EHR) to be
highly confidentiality by applying the Data
References:
1. S. Ajoudanian, M.R.A., A Novel Data Security Embedding for JPEG2000 Compressed Bit-Stream.
Model for Cloud Computing. IACSIT International 2008: p. 151-154.
Journal of Engineering and Technology, 2012, 4. 6. Huang, H.-C., Lai, W.-H., and Chang, F.-C.,
2.A.Ross, A.K.J., Human Recognition Using Content-Adaptive Multi-level Data Embedding for
Biometrics. Appeared in Annals of Lossless Data Hiding. 2011: p. 29-32.
Telecommunications, Feb 2007. 62: p. 11-35. 7.Wu, J.-H.L.a.a.M.-Y., An Iterative Method for
3. A.Bessani, M.C., B.Quaresma F.Andr´e Paulo Lossless Data Embedding in BMP Images.
Sousa, DEPSKY: Dependable and Secure Storage 8. Malakooti,M.V., . Khederzadeh, M., A Lossless
in a Cloud-of-Clouds. Secure data embedding in image using DCT and
4.Moritz Borgmann, T.H., Michael Herfert, Thomas Randomize key generator. DICTAP 2012.
Kunz,, and Marcel Richter, U.V., Sven Vowe, On 9.Naheed, T., I. Usman, and A. Dar, Lossless Data
the Security of Cloud Storage Services. Fraunhofer Hiding Using Optimized Interpolation Error
Institute for Secure Information Technology,SIT, Expansion. 2011: p. 281-286.
2012. 10.Mishra, S., AN INTUITIVE METHOD FOR
5. Ohyama, S., Niimi, M., Yamawaki, K., and HILBERT CURVE CODING. International Journal
Noda, H., Lossless Data Hiding Using Bit-Depth of Computing and Corporate Research, 2011,1.
ABSTRACT 1 INTRODUCTION
A virtual local area networks (VLANs) have The world today relies heavily on technology
recently expanded into an integral property of to do daily work. Important information is al-
switched LAN solutions from every main LAN ma- ways being transferred quickly from within a
teriel seller. One of the motives for the attention company. The advancement of technology has
placed on VLAN functionality now is the fast im- helped to enhance the way we transfer informa-
plementation of LAN switching that commenced tion. The use of Virtual Local Area Network
two decades ago. Many more anxious organiza- (VLAN) is more popular now than ever. But
tions and companies are moving rapidly into net- what is a VLAN? VLANs are simple, yet they
works featuring private port LAN switching de- offer a wide variety of capabilities and options
signs. VLANs demonstrate an alternate solution to to improve the network. VLAN is a technology
routers for broadcasting containment, since VLANs that provide dividing a physical network into
permit switches to also possess broadcast traffic. logical at Layer two. Functionally, VLANs al-
With the implementation of switches in continu- low a network administrator to partition a local
ity with VLANs, each network segment can have network into separate, independent networks.
as few as one user, while broadcast domains can VLANs are often implemented in large net-
be as big as 1,000 users or probably even more. works as well as small VLANs. In larg net-
This paper presents what exactly a VLAN is and works, VLANs are sometimes implemented to
how VLAN memberships are implemented in a combine physically separate LAN segments or
switched network. Membership in a VLAN can LANs into one logical LAN.
be based on MAC addresses, port members, IP ad-
In this paper, we will discuss about the seg-
dresses, IP multicast addresses and/or a combina-
mentation of a VLAN, including why VLANs
tion of these aspects. VLANs are cost effective as
should be considered in a smaller network.
well as time effective, can decrease the traffic of the
We will go through the segmentation process
network, and give an extra security. VLANs give
of the LAN to the VLAN, and the configura-
upgraded system security. In a VLAN system en-
tion of a switch to form separate LAN seg-
vironment, with various communicating areas, sys-
ments. It is imperial for the company to se-
tem administrator can have control over every port
cure the network from any attackers that tries
and client. A malevolent client can no more simply
to steal company information. Network secu-
connect their station to any port of switch and snif-
rity technologies provide protection of the net-
fer the system movement utilizing a bundle sniffer.
work against the theft and misuse of confiden-
The system overseer controls every port and what-
tial data and secure against malicious attacks
ever assets it is permitted to utilize. VLANs confine
from viruses/worms. Without a security so-
delicate movement beginning from inside an under-
lution, the company risks unauthorized intru-
taking department itself.
sions, network downtime, service disruption,
regulatory noncompliance and even legal ac-
KEYWORDS tion. Companies uses VLAN as a way to con-
nect the networks in their company [1][2].
Local Area Network, Virtual LAN, Security, Seg- An expansive scope of themes identified with
mentation, VXLAN. system security have been discussed in the re-
cent researches, and a decent rundown of the scope. It carries an extensive and wide range
system security issue have been given. Great of information resources and services, such as
systems ought to work easily with different the World Wide Web (WWW), electronic mail,
systems, be straightforwardly to clients, give telephony, voice and video over IP [6], and file
remote get to, and keep up crest execution. sharing using peer-to-peer networks [7].
Then again, secure systems ensure private data, As the internet is the network of networks.
keep system execution solid, and underscore We then look into the different type of net-
information respectability. The two measure- works. The LAN interconnects computers and
ments are regularly at chances [3]. devices within a limited area such as a resi-
In this paper, we will discuss more on the secu- dence, campus, school, laboratory, or office.
rity of the VLAN. We will indulge further into On the other hand, there is a Wide Area Net-
the discussion of the different types of attack work (WAN), which covers a larger geographic
such as ARP poisoning and VLAN hopping on distance compared to LAN. A network larger
the VLAN and how we can use different ways than LAN and smaller than WAN could be con-
of preventing such attacks such as the use of sidered as Metropolitan Area Network (MAN),
static ARP entries. In section 2, we will discuss covering an area of a few blocks of a city to the
the related work that focuses on the segmenta- area of an entire city [8].
tion and the security of the VLAN and the ben- If LAN uses physical administration to create
efits of the VLAN. In the next subsection 2.1, a network, VLAN was created by using logical
we will examine the benefits of using VLANs. networks to divide a physical switch and sep-
We will be viewing the benefit in terms of the arate hosts that are not supposed to access to
scalability, cost, ease of use, integrity, virtual each other. A VLAN allows the creation of dif-
work group and security. In subsection 2.2, we ferent networks on a single physical switch at
will be going more in depth about the segmen- the data link layer [9]. To subdivide a network
tation of the VLAN and see how it differs from into VLANs, one configures a network switch
the traditional LAN. In subsection 2.3, we will or router. This allows different departments
examine about VXLAN, how it differs from in the company to have different networks on
VLAN and its benefits. Then in subsection 2.4, a single physical switch. This saves cost as
we will cover more about the security of the only a single physical switch is needed, and it
VLAN where we will examine the network at- can absolutely simplify network implementa-
tacks on a VLAN and the existing methods that tion and design, as it can be configured through
exist for protection against them. In section 3, software rather than hardware. VLANs allow
we will examine about the future works that network administrator to group hosts together
are in stored for the improvement of VLAN us- even if the hosts are not on the same network
age such as VXLAN. Lastly, we will conclude switch. Without VLANs, grouping hosts will
about the VLAN segmentation and security. need to relocate the nodes or rewire the data
links [10]. VLAN allows the flexibility for
2 RELATED WORK changes should there be a need to reconfigure
In this section, we will be focusing on the seg- the network [11]. However, VLANs are still
mentation and the security of the Virtual Local vulnerable to network attacks.
Area Network (VLAN)[4][5]. From protecting user data against the grow-
The Internet, the largest network, is the global ing number of threats to ensuring the conti-
system of interconnected computer networks nuity of the business, IT Security is an es-
that use the Internet protocol suite (TCP/IP) to sential element in any organization IT infras-
connect billions of computers and electronics tructure. As IT professionals being able to
devices worldwide. It consists of millions of benchmark against our peers, assess a threat, or
private, public, academic, business, and gov- just having some understanding of why a secu-
ernment sector networks of local to global rity project is important to the business is key
[12]. Many techniques to address this issue has Virtual Work Group Another benefits of
been discovered. However, these techniques VLAN is to create virtual workgroups. For in-
will require improvement over the years as net- stance, co-workers from different departments
work attackers are getting better in attacking a that is working on a part of big project or pos-
VLAN [1][2]. sibly same project can send message from one
another without having to be in the same de-
2.1 Benefits of VLAN partment. This can help in reducing traffic in
The benefits of VLAN will be dsicussed in this the network.
section [13][14].
ment one. Therefore, the native VLAN should 2.2.4 Identifying VLANs
be distinct from any othe VLANs if it is used.
A port on a switch could be associated to only
one VLAN or to all VLANs. A port could be
Voice VLANs To support Voice over IP configured manually as an access or trunk port.
(VoIP), a different VLAN is required which Let the Dynamic Trunking Protocol (DTP) op-
can be called as a voice VLAN. VoIP traffic erates on a per-port basis to set the switch port
needs the followings: mode. It can be done by negotiating with the
• Assured bandwidth to provide acceptable port on the other end of the link [17]. There
voice quality are two different types of links in the switched
• Priority of transmission over other types network:
of network traffics
i) Access Ports: An access port normally car-
• Routing capability around congested ar-
ries the traffic of only one VLAN. In this case,
eas on the network
traffic is both sent and received in native for-
• Lesser delay (less than 150ms) across the
mats without VLAN tagging. Anything arriv-
network
ing on an access port is simply considered to
These requirements need to be achieved to sup-
belong to the VLAN assigned to the port. Any
port VoIP. However, the configuration of these
device connected to an access link is not aware
requirements is beyond the scope of this paper,
of a VLAN membership; the device just as-
but it is useful to briefly discuss on how a voice
sumes its part of the same broadcast domain
VLAN works between a switch, a computer,
and doesn’t recognize the physical network
and a Cisco IP phone.
topology. Access-link devices cannot send and
receive data to and from devices outside their
2.2.3 VLAN Operation
VLAN unless the routing is configured. It can
only make a switch port to be either an access
port or a trunk port but not both. It must be
noted that the access port can only be attached
to one VLAN only [17].
ii) Trunk Ports: Trunk ports on the other hand
is able to carry multiple VLANs at a time. A
trunk link is 100 or 1000 Mbps point-to-point
link between two switches, switch and router,
or even between a switch and server, and it car-
ries the traffic of multiple VLANs from 1 to
4094 at a time. This is a great functionality
because ports can be set up to have a server
in two separate broadcast domains at the same
time, so the users will not have to cross a net-
work layer (layer 3) device to log in and access
Figure 1. VLAN Operation. it. The other benefit is that trunk links are able
to carry various amounts of VLAN data across
While each switch port could be associated to the link [17].
a separate VLAN, the ports associated to the
same VLAN share broadcasts. Once a device 2.2.5 VLAN Identification Method
enters the network, it automatically considers
the VLAN membership of the port it is at- VLAN identification is where switches can
tached to. For a host to be a part of any VLAN, keep track of all frames as they are traveling
it must be given an IP address that belongs to through a switched network. It defines how
the appropriate subnet. switches can identify which frames belong to
which VLANs where there is more than one tual technology and protocols to segment a net-
trunking method. work can be useful to control broadcast traf-
i) Inter-Switch Link (ISL): Inter-Switch Link fic and implement security boundaries. How-
(ISL) is a method that tag VLAN data onto ever, allowing absolutely NO access between
an Ethernet frame. This data tagging per- VLANs is never very beneficial. To solve this
mits VLANs to be multiplexed over a trunk problem, implementation of inter-VLAN rout-
through an external encapsulation method ing is suggested. At; the CCNA level, there are
(ISL). In fact, it allows the switch to recog- two ways we can make this happen namely:
nize the VLAN membership of a frame over a) Connect a unique router port to each VLAN
the trunked link. By implementing ISL, inter- b) Create a router on a stick.
connection of multiple switches can be done
and still maintain VLAN information as traffic 2.3 Virtual Extended Local Area Network
travels ; between switches on trunk links. ISL (VXLAN)
operates at layer two by encapsulating a data
frame with a new header and Cyclic Redun- Now that we have learned that traditional
dancy Check (CRC). It is used for Fast Ether- network segmentation has been provided by
net and Gigabit Ethernet links only. ISL rout- VLANs that are standardized under the IEEE
ing is versatile and can be used on a switch 802.1Q group. VLANs provide logical seg-
port, router interface and server interface cards mentation of Layer 2 boundaries or broadcast
to trunk a server [17]. domains. However, due to the inefficient use of
ii) IEEE 802.1Q: it is a standard method cre- available network links with VLAN use, rigid
ated by IEEE for frame tagging, IEEE 802.1Q requirements on device placements in the data
inserts a field into the frame to identify the centre network, and the limited scalability to a
VLAN. If trunking between a Cisco switched maximum of 4094 VLANs, using VLANs has
link and a different brand of switch are need become a limiting factor to IT departments and
to be done, 802.1Q must be used for the cloud providers as they build large multitenant
trunk to work. The basic purpose of ISL and data centres.
802.1Q frame-tagging methods is to provide In this section, we will discuss about the
inter-switch VLAN communication. Also, it VXLAN standard which Cisco, in partnership
should be noted that any ISL or 802.1Q frame with other leading vendors proposed IETF as
tagging is removed if a frame is forwarded out a solution to the data centre network chal-
an access link; tagging is used across trunk lenges posed by traditional VLAN technology.
links only [17]. The VXLAN standard provides for the elastic
workload placement and higher scalability of
2.2.6 Routing between VLANs Layer 2 segmentation that is required by to-
day’s larger application demands.
Routing between VLANs (Inter-VLAN rout-
ing) is based on forwarding network traffic 2.3.1 VXLAN Benefits
from one VLAN to another VLAN using a
router. It allows devices connected to the dif- VXLAN is proposed to provide the same Eth-
ferent VLANs to communicate with each other ernet Layer two network services as VLAN
by using a router. Nodes in a VLAN stay in does today, but with greater flexibility and ex-
their own broadcast domain and will be able tensibility. The following benefits are offered
to communicate freely. VLANs can make net- by VXLAN as compared to VLAN [18]:
work partitioning and traffic separation at layer 1. Flexible placement of multitenant segments
two, data link layer. Therefore, if hosts or throughout the data centre gives a way to ex-
any other IP addressable device want to com- tend the Layer two segments over the underly-
municate between VLANs, a layer 3 device is ing shared network infrastructure. Therefore,
needed to provide routing services. Using vir- tenant workload can be put among physical
pods in the data centre. VLAN. Network security issues on the VLAN
2. Higher scalability to address more Layer [21][22][23] is very important and should be
two segments compared to VLANs, which considered, discussed and analyzed. Here we
results in limiting scalability of only 4094 consider some of the more common network
VLANs. VXLAN uses a 24-bit segment attacks on a VLAN.
ID known as the VXLAN Network Identi-
fier (VNID). This enables up to 16 million Address Resolution Protocol (ARP) Spoofing
VXLAN segments to coexist in the same ad- Attacks:
min domain. ARP spoofing, ARP cache poisoning, or ARP
3. Better utilization of network paths available poison routing, is a method that an attacker
in the underlying infrastructure where VLAN sends (spoofed) Address Resolution Protocol
uses the Spanning Tree Protocol for loop pre- (ARP)[24][25] messages onto a local area net-
vention. It should noted that it uses half of work. The main aim of an attacker is to asso-
the network links in a network by blocking re- ciate the its MAC address with the IP address
dundant paths. In contrast, VXLAN packets of another node, such as the default gateway.
are transferred through the underlying network Therefore, the attacker is able to cause any traf-
based on its Layer 3 header and can take com- fic meant for that particular IP address to be
plete advantage of Layer 3 routing, Equal Cost sent to the attacker instead [20]. As shown in
Multi Path (ECMP) routing, and link aggrega- figure 3, the attacker sends fake packets which
tion protocols to be able to use all available has similar IP address to the original IP address
paths. to the server claiming to be the genuine host.
When the server receives the packet and con-
siders the MAC address of the attacker to be
2.4 Security
the intended destination for the packets since
A broad range of topics related to network se- the attacker is using a similar IP address as the
curity has been discussed, and a good sum- original host, it starts sending the data to the
mary of the network security problem has attacker instead. The attacker can receive data
been provided. Good networks should operate intended for the genuine recipient. ARP spoof-
smoothly with other networks, be transparently ing may allow an attacker to intercept data
to users, provide remote access, and maintain frames on a network, modify the traffic, or stop
peak performance. On the other hand, secure all traffic [26].
networks protect confidential information, pro-
vide reliability, and give data integrity [19].
The two dimensions are often at odds [3].
Figure 3. ARP attack example with the malicious user Figure 5. MAC address flooding.
as the man-in-the-middle.
adding the static ARP entry, the ARP cache on If the entry is no longer needed, or if it is
the computer may look like the following: needed to be changed to something else, the
C:\>arp -a
no arp command removes the original entry:
Interface: 192.168.1.137 --- 0x50006
Internet Address Physical Address Type Router(config)\#no arp 192.168.1.17
192.168.1.17 6c-fc-03-a3-7f-81 static
192.168.1.254 77-d8-e5-f2-43-6d dynamic
If two nodes communicate with each other
The connection is up with the node at constantly throughout the day, the static ARP
192.168.1.17 until the MAC address of the tar- would be added. By adding static ARP entries
get computer changes, which could be because for both systems in each other’s ARP cache,
of a change of network card, or probably an some network overhead will be reduced, in the
operation changes the MAC address. When form of ARP requests and ARP replies. It is
this happens, it is needed to delete the invalid also good for prevention against flooding at-
ARP entry with an arp -dcommand, such as tacks where the ARP cache is being flooded
arp -d 192.168.1.17. with random entries. Static ARP could help to
Cisco router keeps ARP entries in the cache find out which entries are allowed and which
for four hours (240 minutes), while Windows should be dropped.
workstations can only keep for a maximum of
ten minutes. This is common on routers be- Ingress Filtering:
cause they tend to spend most of their time The use of ingress filtering is a method of en-
dealing with the same nodes. A router is nor- suring that incoming packets are actually from
mally configured as a default gateway for de- the networks which they claim to originate.
vices of the network, which is why they see the The switch is configured with ingress filter-
same nodes communicating with that for most ing to accept only the allowed packets. Any
of the time in a day, and as long as those nodes router that deploys ingress filtering, checks the
keep sending data through the router, they will source IP field of IP packets it receives, and if
remain in the ARP cache. For a router con- the packets do not have an IP address in the
nected to large network segments, this would IP address block, the packets will be dropped.
result in a rather large ARP listing or ARP ta- However, addresses can be faked and ingress
ble. More of the router’s memory will be con- filtering will still accept the packets should the
sumed with a large ARP table, so the caching same allowed address by the ingress filtering
time that Cisco has chosen was the result of be used by the attacker [20]. Ingress filtering is
memory consumed by the ARP cache against to confirm whether inbound packets arriving at
the ARP’s need for fresh MAC information. To a network are from the source that they claim
create a static ARP entry for a router, it can be to be from before entry (or ingress) is given or
done by entering Global Configuration mode, not.
which the arp command looks like this: It takes the advantage of the Layer two IP-
#arp 192.168.1.17 6cfc.03a3.7f81 arpa
address filtering capability of a router at the
network’s edge and if it has a high probability
After entering this command, the ARP cache of being malicious, the traffic will be blocked.
contains the IP-MAC address pair, which At its simplest, ingress filtering involves estab-
would not age-out of the cache. This can be lishing an access control list. This list contains
seen by the dash in the Age column. Static the IP addresses of permitted source addresses.
ARP entries are not usually recognized to an Conversely, the access control list may also be
interface like the dynamic entries are. used to block prohibited source addresses. The
Router\#show arp
following source IP addresses will be blocked
by ingress filtering:
Prot. Address
Int. 192.168.1.1
Age(min) Hardware Addr
- 0050.43bf.7c82
Type
ARPA
• Already in use IP address, IP address
Int. 192.168.1.17 - 6cfc.03a3.7f81 ARPA within the internal network. By blocking
the source IP, attacker from spoofing an frastructure. The transport protocol over the
internal IP address to take advantage of a physical data centre network is IP plus UDP.
poorly written firewall rule could be pre- VXLAN defines a MAC-in-UDP encapsula-
vented. tion scheme where the original Layer 2 frame
• Private IP addresses. By blocking this has a VXLAN header added and is then placed
addresses, malicious traffic coming in in a UDP-IP packet. With this MAC-in-UDP
from an improperly configured Internet- encapsulation, VXLAN tunnels Layer 2 net-
based host or an attacker’s spoofed ad- work over Layer 3 network. VXLAN intro-
dress could be prevented. duces an 8-byte VXLAN header that consists
• loopbacks IP addresses. If the loopback is of a 24-bit VNID and a few reserved bits. The
spoofed, this helps to prevent this type of VXLAN header together with the original Eth-
traffic. ernet frame goes in the UDP payload. The 24-
• multicast addresses[27]. Blocking multi- bit VNID is used to identify Layer 2 segments
cast addresses could help to prevent un- and to maintain Layer 2 isolation between the
desired multicast traffic that seems to be segments. With all 24 bits in VNID, VXLAN
spam. can support 16 million LAN segments.
• Service or management network IP ad-
dresses. The attacker is not bale yo use VXLAN uses VXLAN Tunnel EndPoint
the public Internet to gain unauthorized (VTEP) devices to map tenants’ end devices to
access to network services running at the VXLAN segments and to perform VXLAN en-
network application layer and above. capsulation and de-encapsulation. Each VTEP
The traffic from specific regions of the world function has two interfaces: One is a switch in-
could be whitelisted by network admin, and terface on the local LAN segment to support
can be blacklisted to not allow a specific region local endpoint communication through bridg-
to have access to its environment. Some free ing, and the other is an IP interface to the
subscription-based services could be found to transport IP network. The IP interface has a
create access control lists for network border unique IP address that identifies the VTEP de-
routers. vice on the transport IP network known as the
infrastructure VLAN. The VTEP device uses
3 VXLAN ENHANCEMENTS this IP address to encapsulate Ethernet frames
VXLAN has a higher scalability to address and transmits the encapsulated packets to the
more Layer 2 segments. VLANs uses a 12- transport network through the IP interface. A
bit VLAN ID to address Layer 2 segments, VTEP device also discovers the remote VTEPs
which results in limiting scalability of only for its VXLAN segments and learns remote
4094 VLANs. VXLAN uses a 24-bit seg- MAC Address-to-VTEP mappings through its
ment ID known as the VXLAN Network Iden- IP interface. The functional components of
tifier (VNID), which enables up to 16 million VTEPs and the logical topology that is created
VXLAN segments to coexist in the same ad- for Layer 2 connectivity across the transport IP
ministrative domain. network is shown in Figure 6.
We now discuss the VXLAN encapsulation
and packet format. VXLAN is a Layer 2 The VXLAN segments are independent of the
overlay scheme over a Layer 3 network. It underlying network topology; conversely, the
uses MAC Address-in-User Datagram Proto- underlying IP network between VTEPs is in-
col (MAC-in-UDP) encapsulation to provide a dependent of the VXLAN overlay. It routes
means to extend Layer 2 segments across the the encapsulated packets based on the outer IP
data centre network. VXLAN is a solution to address header, which has the initiating VTEP
support a flexible, large-scale multitenant en- as the source IP address and the terminating
vironment over a shared common physical in- VTEP as the destination IP address [28][29].
REFERENCES
[1] Cisco, “What Is Network Security? - Cisco Sys-
tems”, 2016.
Figure 6. The functional components of VTEPs. [4] Catalyst 4500 Series Switch Cisco IOS Software
Configuration Guide, 12.2(25)EW, “Understand-
ing and Configuring VLANs [Cisco Catalyst 4500
Series Switches]”, 2016.
4 CONCLUSION
[5] Surabhi Surendra Tambe, “Understanding Vir-
tual Local Area Networks”, International Journal
As technology is advancing and improving at of Engineering Trends and Technology (IJETT),
a high rate on a daily basis, more methods of Vol.25 No.4, 174-176, 2015.
managing the network of these technologies
[6] Sun, Lingfen, Is-Haka Mkwawa, Emmanuel Jam-
are being developed. Since there are millions meh, and Emmanuel Ifeachor, “Guide to voice and
of network all around the globe, one of the video over IP: for fixed and mobile networks”,
special methods to manage these networks is Springer Science & Business Media, 2013.
the creation of logical addressing. One way to
manage the networks is the creation of phys- [7] Tsimonis, G., & Dimitriadis, S., “Brand strategies
ical way of addressing which is called Local in social media. Marketing Intelligence & Plan-
ning”, 32(3), 328-344, 2014.
Area Networking (LAN). To address the issue
with handling many networks, logical address- [8] Van Heddeghem, W., Lambert, S., Lannoo, B.,
ing was created where component only needed Colle, D., Pickavet, M., & Demeester, P., “Trends
to be in the same sub network to interact with in worldwide ICT electricity consumption from
each other. 2007 to 2012”, Computer Communications, 50,
64-76, 2014.
With the wide usage of VLAN, there is a con-
cern of security of the network, as well as scal- [9] Altunbasak, Hayriye, and Henry Owen, “An archi-
ability and network management, which have tectural framework for data link layer security with
security inter-layering”, Proceedings IEEE South-
been discussed in this paper. As sensitive data eastCon, 2007.
is broadcasted on a network, there are sev-
eral risks and threats to the network. VLANs [10] Wilkins, S., “Virtual vs. Physical LANs: Device
can minimize this threat by placing only those Functionalities”, Pearson IT Certification, CCNA
users on the network data on a VLAN with ac- Routing and Switching 200-120 Network Simula-
tor, 1st Edition, 2015.
cess. This will reduce chances of an intruder
gaining access. With the implementation of [11] Nishino, H., Nagatomo, Y., Kagawa, T., & Hara-
VLAN we can also have control of broadcast maki, T., “A Mobile AR Assistant for Cam-
domains, setting up firewalls, prohibition of ac- pus Area Network Management”, In IEEE 2014
cess and alerting a network manager in case of Eighth International Conference on Complex, In-
an attack by an outsider. In this paper, we can telligent and Software Intensive Systems (CISIS),
pp.643-648, 2014.
conclude that utilization of virtual local area
networks can surely simplify network manage- [12] Shepard, D., “84 Fascinating & Scary IT Secu-
ment and also provide networks with improved rity Statistics”, 2015 CYBERTHREAT DEFENSE
security. REPORT, 2015.
[13] Haq, Ul, Syed Ehtesham, and Suraiyaa Parveen, col”, 19th IEEE Annual Computer Security Appli-
“Implementation of network architecture, its secu- cations Conference, 2003.
rity and performance analysis of VLAN”, Interna-
tional Journal of Advanced Research in Computer [25] Cisco, “Configuring the Address Resolution Proto-
Science, 8, no. 7, 2017. col (ARP), Cisco Content Services Switch Routing
and Bridging Configuration”, 2004.
[14] Nguyen, Van-Giang, and Young-Han Kim, “SDN-
Based Enterprise and Campus Networks: A Case [26] Zargar, S. T., Joshi, J., & Tipper, D., “A survey of
of VLAN Management”, Journal of Information defense mechanisms against distributed denial of
Processing Systems, 12, no. 3, 2016. service (DDoS) flooding attacks”, IEEE commu-
nications surveys & tutorials, 15(4), 2046-2069,
[15] Derfler FJ, Freed L, Douglas P, Robbins L, Adams 2013.
S., “Illustrator-Troller M. How networks work”,
Que Corp, 2000. [27] Mehdizadeh, A., Abdullah, R. S. A. R., Hashim,
F., Ali, B. M., Othman, M., & Khatun, S., “Reli-
[16] Henry, Paul David, “Strategic networking: From able key management and data delivery method in
LAN and WAN to information superhighways”, multicast over wireless IPv6 networks”, Wireless
Coriolis Group, 1996. personal communications, 73(3), 967-991, 2013.
[17] Pal, G. Prakash, and Gyan Prakash Pal, “Virtual [28] Cisco, “VXLAN Overview: Cisco Nexus 9000 Se-
Local Area Network (VLAN)”, International Jour- ries Switches”, 2016.
nal of Scientific Research Engineering & Technol-
ogy (IJSRET), 1: 006-010, 2013. [29] Arista, “VXLAN: Scaling Data Center Capacity”,
2016.
[18] Kapadia, Shyam, Puto H. Subagio, Yibin Yang,
Nilesh Shah, Vipin Jain, and Ashutosh Agrawal,
“Implementation of virtual extensible local area
network (VXLAN) in top-of-rack switches in a
network environment”, U.S. Patent 9,565,105,
2017.
Finally, the last study is an ontology-based data source for the data mining tool in the
data mining approach using clustering and technical tactical analysis of judo.
mining association rules to identify patterns in Step 1.1. Defining the ontology of judo fight.
schools in India [8]. The approach was The ontology of judo fight is
composed of ontology development and recommended, but it is possible to use
attributes categorization, data mining, and another ontology.
evaluation. Data from Indian schools Step 1.2. Modelling the database. The
consisted of 242 attributes, which make it database must consist of entities and
difficult for the data mining tool to extract attributes found in the ontology.
useful knowledge through random grouping. Step 1.3. Creating the physical database.
The ontology developed in this study was Step 1.4. Loading the judo fight notations in
based on the school data attributes, which was the database.
used in the application to select the input data
of mining algorithms, and served as reference
in the evaluation and interpretation of results.
This approach served as an inspiration for the
data mining approach to technical tactical
analysis of judo in the present study.
Step 2.2. Defining the data mining method. database tables according to Step 1.2 was
The data mining method must obey the done based on the entities and attributes of
mapping of performance analysis this ontology. In Step 1.3, PostgreSQL
requirements and data mining methods version 9.6.1 for x86-64 bits Windows
(Table 1). platform was used to create the database. In
Step 2.3. Defining the data mining technique. Step 1.4, fight notation data were imported
The technique must correspond to the directly into the database. These notes came
values defined by the characteristics of from Professor Emerson Franchini's archive
interpretability, precision, and flexibility [12] and the 120 judo matches in official
(Table 1). contests organized by the International Judo
Step 2.4. Developing the prototype of the data Federation between Beijing 2008 and London
mining tool. This prototype must use the 2012 Olympic Games, which comprise the
selected data mining technique to process fighting actions of 23 athletes from among the
the judo fight notations and return the best of men's -81 kg category. Table 2 shows
patterns discovered and desired to satisfy the number of fights and actions noted by
the performance analysis requirement. judoka – the names are preserved. These judo
matches were under the rules applicable at
4.3 Simulation of Technical Tactical that time, and did not consider the
Analysis of Judo modifications introduced in 2017.
The simulation of the technical tactical Table 2. Number of fights and actions noted by judoka.
analysis of judo comprises the third and last
Judo fighter Total fights Total actions
phase of the proposed approach. Brazilian judoka 1 10 753
Step 3.1. Discovering patterns from judo fight German judoka 1 10 527
notations. This step should be performed in Canadian judoka 1 9 486
simulations using the prototype. Dutch judoka 1 8 555
Step 3.2. Performing the technical tactical French judoka 1 7 428
Montenegrin judoka 1 7 369
analysis of judo. The analysis must be South Korean judoka 1 6 417
based on the patterns discovered by the American judoka 1 6 390
prototype. Currently, the ontology can be Azerbaijani judoka 1 6 368
used in interpreting the results. Emirati judoka 1 6 299
Step 3.3. Evaluating the characteristics of the Moroccan judoka 1 6 263
Belgian judoka 1 5 345
data mining technique. The characteristics Italian judoka 1 5 268
of interpretability, precision, and flexibility Argentinian judoka 1 5 199
of the technique used should be compared Ukrainian judoka 1 4 170
with the values defined for these Kazakhstani judoka 1 4 167
characteristics (Table 1). Japanese judoka 1 3 210
Dutch judoka 2 3 153
Slovene judoka 1 3 94
5 PROOF OF CONCEPT Russian judoka 1 3 88
Croatian judoka 1 2 134
To validate the approach proposed in this British judoka 1 1 60
study, a proof of concept was performed that South Korean judoka 2 1 54
involved the fulfillment of the steps defined in
each phase. In this section, the most critical Following Step 2.1, the performance analysis
aspects defined in Phases 1 and 2 steps are requirements were aimed at real-time
highlighted. decision-making because it was intended to
extract information from judo matches that
The ontology of judo fight, Step 1.1 in support decision-making on actions/reactions
Section 3, was defined for this proof of or strategies needed by a judoka to win in a
concept. The creation of a model of the competition. However, the term "real-time"
must be relativized given that during a fight, a Sequence. All set of actions (or elements)
judoka would not be able to redefine his performed by the same judoka in the same
strategies from an information system. In Step fight.
2.2, mining rules was the data mining method
defined because the discovery of rules or The constraint imposed by the algorithm's
patterns in the fighting actions of a judoka can premise of not allowing an element to contain
fulfill the performance analysis requirement. repeated items can have implications on the
In Step 2.3, the sequencing pattern mining results obtained with data mining. Another
technique [13] was defined because important consideration concerns the attack
identifying the most frequent sequences of actions. In the judo fight notation database
actions performed by a judoka can model, an attack action is characterized by the
accomplish the goal defined in Step 2.1. applied judo technique, the attack direction
and, eventually, the score assessed by the
Step 2.4 comprised the development of the referee. To allow techniques and directions of
prototype of the data mining tool based on the attacks to be considered in the process of
sequential pattern mining technique. The CM- finding sequential patterns, the set "attack +
SPAM algorithm [14], provided by SPMF technique + direction" will be considered as a
library [15] written in Java, was used in the single action. Similarly, an eventual score will
prototype development. The CM-SPAM also be considered as a single action in
algorithm was selected because it has a very addition to the element following the item
satisfactory performance compared to other related to the attack action.
algorithms [16]. Furthermore, its
implementation in SPMF allows the use of An example of an element could be
optional parameters (such as maximum and represented by the sequence of judo fight
minimum pattern size and mandatory items), actions “Left Anteroposterior Approach,
which are not available in the implementation Trying to Grasp, Left Sleeve and Right
of other algorithms (such as GSP, PrefixSpan, Sleeve, Attack + Deashi-harai + Left, Waza-
SPADE, SPAM, and CM-SPADE). Ari.”
Moreover, to assemble the sequence base to The prototype built in this study was called
enable execution of the sequential pattern JudoDataMining and was developed using
mining algorithm, the following adaptations Java version JavaSE-1.8 (jdk1.8.0_121).
were considered:
Item. All fight actions performed by the same 6 RESULTS ANALYSIS
judoka.
Element. A set of consecutive fighting This section highlights the most critical
actions carried out by the same judoka aspects defined in the implementation of
between a referee's start fight command Steps 3.1, 3.2, and 3.3 of Phase 3.
(hajime or yoshi) and a pause fight
command (mate, sonomama or soremade). 6.1 Discovering Patterns from Judo Fight
Eventually, if a judo player repeats some Notations
action since hajime, that action delimits the
set of previous actions, which marks the In Step 3.1, the equipment used to perform
beginning of a new set of actions until the the simulations was a Dell Vostro 3500 with
next referee's command or until another Intel® Core ™ i7 M processor 680 @
action is repeated. The break in a set of 2.80GHz, 6.00GB RAM using Microsoft
actions is necessary because it is an Windows 10 Pro 64-bit operating system. The
algorithm premise that an element does not simulations were carried out through a script
contain repeated items. program to run the data mining tool for each
selected judo fighter with the following steps:
1. Performing a loop by running the tool The degradation is probably due to the
and initially passing the value 100% in number of fights of each judoka, the average
minimum support parameter. For each size of items and elements contained in each
repetition, 10% is decreased in the value sequence, and the number of possible items in
of the minimum support parameter until the system.
it reaches 30% or the execution time
exceeds 2 seconds. The last significant number is 2 seconds as
2. Performing the previous loop by running limit for the execution time of the tool. This is
the tool and passing the value 4 in the because in the running tests lasting close to 2
minimum pattern size parameter. With seconds, the number of sequential patterns
this parameter, the tool will only return found was around 300,000. Thus, the limit
sequential patterns with at least 4 items. was imposed so that huge files with much
3. Performing the previous loop again by information would not be generated, which
passing the value corresponding to the would cost too much to analyze.
IPPON item in the required item
parameter, instead of passing value in 6.2 Technical Tactical Analysis of Judo
the minimum pattern size parameter.
With this parameter, the tool will only In Step 3.2, the task of a tool user (or a
return sequential patterns containing technical tactical analyst) is to extract useful
IPPON. information to make winning strategic
decisions from the sequential patterns found.
There are some significant numbers in this For this task, the tool offers two features. The
script program. first resource is a CSV file containing the
columns: found sequential pattern, support
The first number is the value 4 defined for the value, number of elements, and number of
minimum pattern size parameter used in the fighting actions (example in Table 3). The
second loop. This is because a judoka must second feature is also a CSV file that present
perform at least 3 actions to obtain a score: an some simple statistics extracted from the
approach action, a grip action, and the attack sequential patterns found file with the
action. Given that a score is also considered columns: item/element, type, size, and
an item in the sequential database, the number of occurrences (example in Table 4).
hypothesis is that any sequential pattern
involving the achievement of the score must Table 3. Sample records from the sequential patterns
have at least 4 items. file of “Brazilian judoka 1”.
found and the execution times from the the technique used by the prototype with the
simulations performed for Brazilian judoka 1. values defined in the model proposed by
Ofoghi et al. [2]. As defined in Step 2.3, the
The analysis of the results supported the sequential pattern mining technique should
conclusion that the sequential pattern mining present the following characteristics values:
tool was able to extract useful information for Interpretability: Very high – Owing to the
tactical and strategic decision-making in judo. large amount of response information
produced.
Table 4. Sample records from the simple statistics file Precision: High – Owing to the high degree
of the sequential patterns of “Brazilian judoka 1”. of dependence between the results
Item/element Type Size Occurrences
generated and the important decisions
Right anteroposterior Item 1 21 taken.
approach Flexibility: Very low – Owing to the very
Left collar and right Item 1 9 short time limit to use the results.
sleeve
(Right anteroposterior Element 2 8 6.3.1 Interpretability
approach, left collar)
Table 5. Statistics of the sequential patterns found in In this proof of concept, the results are
the proof of concept simulations. sequential patterns consisting of a series of
consecutive elements - or sets of items -
Number of Minimum Maximum Average representing judo fight actions. Given that
simulations number per number per these results are not numerical values and do
simulation simulation
not require interpretation in reading, their
173 0 5757309 168815.8
interpretation can be considered as an easy
Table 6. Quantities of patterns found and execution task. Moreover, a judo specialist is widely
times from the simulations performed for “Brazilian familiar with the fighting action names, which
judoka 1”. further enhances the interpretability. If there
is any difficulty or doubt in some fighting
Support Min. Required Number of Execution
pattern item patterns time (s)
action, the ontology of judo fight can be used
size discovered as support. Thus, it is possible to justify that
100% 21 0 the interpretability of the sequential pattern
90% 325 0.016 mining is very high because having
80% 10626 0.125 understood what a sequential pattern
70% 1280831 9.656
100% 4 1 0 represents allows for an interpretation without
90% 4 249 0 difficulties.
80% 4 10475 0.109
70% 4 1280453 9.786 6.3.2 Precision
100% Ippon 0 0
90% Ippon 0 0
80% Ippon 0 0 In this study, the results are sequential
70% Ippon 0 0 patterns, which represent a series of fight
60% Ippon 0 0.015 action sets that occur with certain frequency.
50% Ippon 0 0 Typically, a sequential pattern reveals similar
40% Ippon 0 0
information as a judo coach would find by
30% Ippon 41 0.016
watching a judo match once or more.
Therefore, any combat strategy decision will
6.3 Evaluation of the Characteristics of the
be made based on the information found by
Data Mining Technique
the judo coach.
In Step 3.3, the evaluation should be made by
By regarding the judo fight notations as
comparing the values of each characteristic
accurate and reliable, it is possible to consider
(interpretability, precision, and flexibility) of
[2] B. Ofoghi, J. Zeleznikow, C. MacMahom, and M. [14] P. Fournier-Viger, A. Gomariz, M. Campos, and
Raab. Data Mining in Elite Sports: A Review and R. Thomas. Fast Vertical Mining of Sequential
a Framework. Measurement in Physical Education Patterns Using Co-occurrence Information. In
and Exercise Science, vol. 17, issue 3, pp. 171- Pacific-Asia Conference on Knowledge Discovery
186, 2013. and Data Mining, 18, pp. 40-52, 2014.
[3] G. Marcon, E. Franchini, J.R. Jardim, and T.L. [15] P. Fournier-Viger, J.C.W. Lin, A. Gomariz, T.
Barros Neto, “Structural analysis of action and Gueniche, A. Soltani, Z. Deng, and H.T. Lam. The
time in sports: Judo.” J Quant Anal Sports, vol. 6, SPMF open-source data mining library version 2.
issue 4, article 10, 2010. In Joint European Conference on Machine
Learning and Knowledge Discovery in Databases,
23, pp. 36-40, 2016.
[4] B. Miarka. Construção, validação e aplicação de
um programa computadorizado para análise de [16] P. Fournier-Viger, J.C.W. Lin, R.U. Kiran, Y.S.
ações técnicas e táticas em atletas de Judô: Koh, and R. Thomas. A Survey of Sequential
diferenças entre classes, categorias e níveis Pattern Mining. Ubiquitous International - Data
competitivos. (MSc Thesis, in Portuguese). Science and Pattern Recognition, vol. 1, issue 1,
Universidade de São Paulo, Brazil, 2010. pp. 54-77, 2017.
[5] M. Haghighat, H. Rastegari, and N. Nourafza, “A
review of data mining techniques for result
prediction in sports.” ACSIJ Advances in
Computer Science: an International Journal, vol. 2,
issue 5, pp. 7-12, 2013.
Jingchang Pan, Gaoyu Jiang, Yude Bu, Zhenping Yi and Xin Tan
School of Mechanical, Electrical & Information Engineering, Shandong University at Weihai,
Weihai 264209, China
pjc@sdu.edu.cn, jgyxyyxy@gmail.com, buyude001@163.com
yizhenping@sdu.edu.cn tanxin_0911@163.com
classes correctly, but also maximize the margin. Where C is a specified constant which
The former guarantees minimal empirical risk controlls the degree of punishment for samples
that were misclassified, so that the trade-off
(e.g. the training error is 0). It can be seen later
between the ratio of the misclassified samples
that the maximum margin is to minimize the
and the complexity of the algorithm is
fiducial range in the bound of extending, so that achieved.
the real risk is minimal. When extended to high
dimensional space, the optimal classification 2.3 Support Vector Machine
line becomes the optimal hyperplane. The final classification discriminant
function ( w x b 0 ) of optimal and
generalized linear classification functions that
were discussed above contains only the inner
product ( x xi ) of support vectors for
spectra must contain astronomical knowledge[3]. The training data and test data used in this
For astronomers, spectral lines are the most paper are derived from SDSS DR7, and the
important feature of spectra. The transition of categories involve stars (STAR), late stars
atoms or molecules between different energy (STAR_LATE), galaxies (GALAXY), quasars
levels in a celestial body will absorb or emit (QSO), and high redshift quasars (HIZ_QSO).
spectral lines. Different atoms and molecules Among them, galaxies (GALAXY) are divided
have their own specific spectral lines. The into normal galaxies and emission-line
distribution of the intensity at different galaxies[6-8].
wavelengths can be used to describe the 4.2 Experimental Environment
radiation characteristics of celestial bodies. Experiments run in the MATLAB
There are all kinds of heavenly bodies in R2010a mainly, and its computer environment
the universe. In the field of astronomy, the is configured as follows.
celestial bodies are first divided into normal Processor: Intel Pentium dual-core
celestial spectral and emission-line object T2390@1.86GHz notebook processor
spectra [4-5]. The spectrum of normal celestial Motherboard: Lenovo 1GT30
bodies includes main stars and normal galaxies, Memory: 1GB (Hynix DDR2 667MHz)
while the emission-line object spectra includes Main hard drive: Hitachi
starburst galaxies, narrow line AGNs, wide line HTS542525K9SA00 (250GB)
AGNs and quasars. Operating system: Windows7 Ultimate 32
Spectra classification can be expressed in LIBSVM software package developed by
Fig. 2. Doctor Chih-Jen, Lin of National Taiwan
University was used for experiments. It’s a
4 OVERALL DESIGN OF SPECTRAL simple SVM software package in common use.
CLASSIFICATION MODEL 4.3 Classifier Design
4.1 General Design Training data is from DR7, downloaded
The work of this paper is to use SVM to from the SDSS website. There are files of GIF
implement classifier 1, classifier 2 and and FITS of five classes of celestial spectra
classifier 3, through which the celestial defined by Sloan including STAR,
spectrum is roughly classified. STAR_LATE, GALAXY, QSO and HIZ_QSO.
Manual removal of both low confidence and 3) Reduce the dimension of training data.
erroneous spectral files, dividing GALAXY 4) Use LIBSVM software package to
into normal galaxies and emission-line galaxies, train, select parameters, generate
a total of 300 FITS files are selected as training training templates and record training
data, 50 files from each class. 150 spectral data time.
from STAR, STAR_LATE, and normal galaxies
in GALAXY are used as normal celestial 5 IMPLEMENTATION OF SPECTRAL
spectra, and data from QSO, HIZ_QSO, and CLASSIFICATION MODEL
emission-line spectra in GALAXY are used as 5.1 Description of Test Data
emission-lines celestial spectra. A total of 100 Similar to training data, test data is from
spectral data from STAR and STAR_LATE are DR7, downloaded from the SDSS website.
used as stellar spectral data, and normal galaxy There are files of GIF and FITS of five classes
spectra in GALAXY are used as normal galaxy of celestial spectra defined by Sloan including
spectral data. A total of 100 spectral data of STAR, STAR_LATE, GALAXY, QSO and
QSO and HIZ_QSO are used as quasar spectral HIZ_QSO. Manual removal of both low
data, and the emission-line galaxies in confidence and erroneous spectral files,
GALAXY are used as emission-line galaxy dividing GALAXY into normal galaxies and
spectral data. emission-line galaxies, a total of 300 FITS files
In terms of FITS files downloaded from are selected as test data, 50 files from each
DR7, the abscissa of spectral is wavelength class. 150 spectral data from STAR,
range from 3800Å to 9200Å, and the ordinate STAR_LATE, and normal galaxies in
is the corresponding flow. Because there is GALAXY are used as normal celestial spectra,
something missing in front wavelength of some and data from QSO, HIZ_QSO, and
spectral, we use the wavelength range from emission-line spectra in GALAXY are used as
3810Å to 9200Å for training. emission-lines celestial spectra. A total of 100
The two parameters, COEFF0 and spectral data from STAR and STAR_LATE are
COEFF1, can be found in the corresponding used as stellar spectral data, and normal galaxy
FITS file, representing the starting wavelength spectra in GALAXY are used as normal galaxy
and step size, respectively. The dimension n is spectral data. A total of 100 spectral data of
obtained from the FITS header file, that is, the QSO and HIZ_QSO are used as quasar spectral
number of steps, and the wavelength of the data, and the emission-line galaxies in
laboratory is obtained by the formula as GALAXY are used as emission-line galaxy
follows. spectral data.
Test data and training data are similar in
wave(n) 10COEFF 0( n1)COEFF1
format, but differ in content.
(4-1) 5.2 Testing Process
The corresponding flow is obtained from 1) Read the training data used in the
the table of the main block of the FITS file. experiment to make it can be
The specific experimental process is as processed by computers.
follows. 2) Generate class labels.
1) Read the training data used in the 3) Use LIBSVM package to test and
experiment to make it can be record the final accuracy.
processed by computers. 5.3 Test Results
2) Generate class labels. 1) The results of classifier 1 are shown
2) The results of classifier 2 are shown in reduced to 43 dimensions through the PCA
Table 2 dimension reduction method. According to the
In the row of dimensionality reduction, same principle, E_star_late_43d is obtained
E_normal_total_43d is a matrix E that obtained from 50 late stellar astronomical spectra, and
from all training spectra at a total of 150 which E_normal_galaxy_43d is from all training
are reduced to 43 dimensions through the PCA spectra at a total of 50.
dimension reduction method. Using 3) The results of classifier 3 are shown in
E_normal_total_43d to reduce dimension, the Table 3
accuracy rate is 97.3333%, and the training In the row of dimensionality reduction,
time is 0.1349s. E_star_43d is a matrix E that E_star_43d is a matrix E that obtained from 50
obtained from 50 star spectra which are star spectra which are reduced to 43 dimensions
through the PCA dimension reduction method. dimension of other data, the classification result
Using E_star_43d to reduce dimension, the is the best.
accuracy rate is 96%, and the training time is Moreover, it can be seen from the
0.1056s. E_celestial_total_43d is a matrix E experiment results that the training time is
that obtained from all training spectra at a total longer without reducing dimension, and the
of 150 which are reduced to 43 dimensions effect is usually better after reducing
through the PCA dimension reduction method. dimension.
According to the same principle, 6.2 Next Work
E_hiz_qso_43d is obtained from 50 late high 1) Analyze the misclassification sample
redshift quasar spectra, and and improve the training templates.
E_celestial_galaxy_43d is from 50 Misclassification samples are extracted
emission-line galaxy spectra. and analyzed separately to find out the causes
of the errors, and improve training data to
6. SPECTRAL CLASSIFICATION MODEL improve training templates, so as to get better
ANALYSIS classification results.
6.1 Analysis of Experimental Result 2) Further analysis will be made on
1) Using support vector machines to dimension reduction.
classify celestial spectra gets a good From the above experimental results, we
result. can see that the process of dimension reduction
It can be seen from the above has a great impact on the experimental results,
experimental results that using support vector so we can do some works from the dimension
machines to classify celestial spectra does an reduction to see if we can further improve the
excellent job. The accuracy is basically above training template. The experiment in this paper
95%, and the training time is short. only involves the process of reducing to 43
2) The method of dimension reduction dimensions, and only uses the method of PCA
has an effect on the experimental dimension reduction. Spectra also can be
results. reduced to 3 dimension, 2 dimension and so on
As can be seen from the above to see whether the experimental results improve
experimental results, the experimental results or not. Other dimension reduction methods,
obtained by using different data for such as kernel entropy and component analysis
dimensionality reduction are also quite (KECA), can be used to reduce the
different, and the corresponding training time is dimensionality, and compare the results to see
different. if the training templates can be further
In classifier 1, the result obtained from all improved.
normal celestial spectra through PCA
dimension reduction method is the best. In SUMMARY
classifier 2, the result obtained from star or late The content of this study is to use support
star spectra through PCA dimension reduction vector machines to classify the celestial
method is the best. In classifier 3, the result spectrum, and the feasibility is verified by
obtained from quasar spectra is the best. experiments.
From the above results, we can get a After briefly introducing the LAMOST
conclusion that before dividing into two classes, project and the FITS file, the principle of SVM
Using E matrix obtained by the PCA reduction and the classification of celestial spectrum are
dimension of one of the classes to reduce the introduced. The process and result of the
ACKNOWLEDGEMENT
This work was financially supported by
the National Natural Science Foundation of
China (U1431102)
REFERENCES
[1] Cui, X., et al. The Large Sky Area
Multi-Object Fiber Spectroscopic Telescope.
Research in Astronomy and Astrophysics, 2012,
12(9): 1197-1242.
[2] Yude Bu, Fuqiang Chen, Jingchang Pan.
Stellar spectral subclasses classification based
on Isomap and SVM. New Astronomy, 2014,
28: 35-43.
[3] Daniel Thomas, Claudia Maraston, Jonas
Johansson. Flux-calibrated Stellar Population
Models of Lick Absorption-line Indices with
Variable Element Abundance Ratios.
MONTHLY NOTICES OF THE ROYAL
(5)
the 𝜀 represents a fixed price-change
parameter, the administrator uses to lower and
the operator to increase its offer concerning the
Figure 1 Model scheme
preset 𝑙𝑖𝑚𝑖𝑡_𝑝𝑟𝑖𝑐𝑒𝑖 of the actual round.
The following equations are used for the Figure 2 Acceptance probability according to retail
channel retail price calculation: price
from a single simulation during the time executed to verify and illustrate system's
interval of 5 days, each consisting of the 500 behavior in different conditions. We can
virtual time units so-called ticks in the conclude that proposed model with the
NetLogo. In Figure 3 a) the negotiated bilateral negotiation can capture the vital
wholesale prices between the administrator characteristics of the cognitive network and
and one of the investors can be observed. As performs well regarding spectrum trading as
can be seen, price reacts to the varying end- simulation results suggest. However, it
users' activity that follows Gauss like curve features the noticeable price lag as could be
with a slight delay. This phenomenon was seen on the plots which affect the incomes of
caused by equations used in MASCEM as was the operators. This phenomenon, yet being
found. natural in the MASCEM, would require an
additional effort to overcome when deployed
Figure 3 b) shows frequency channels usage. in the real environment.
Investors tend to rent more during peak hours
as expected and are very successful on the REFERENCES
retail market too. Investor's profit is also
affected by end-users' activity, but from the 1. Ofcom, “Simplifying Spectrum Trading: Spectrum
leasing and other market enhancements”, 2011, p. 8
Figure 3 c) it is obvious that highest profits 2. P. Grønsund, R. MacKenzie, P.H. Lehne, K. Briggs,
were gained thanks to the delay on the O. Grøndalen, P.E. Engelstad, and T. Tjelta,
wholesale market. Retail market price, Figure “Towards spectrum micro-trading” [Future
3 d), does not change its mean value during Network & Mobile Summit (FutureNetw), 2012].
simulated days, but it's variance which is 3. H. Arslan, ed.: “Cognitive radio, software defined
radio, and adaptive wireless systems,” Vol. 10.
higher during the non-peak hours. Berlin: Springer, 2007, p 16.
4. N. Zhang, H. Liang, N. Cheng, Y. Tang, J.W. Mark,
Numerous simulations were executed to and X.S.Shen, “Dynamic spectrum access in multi-
determine the impact of the overall network channel cognitive radio networks,” IEEE Journal on
load generated by end-users. Figure 4 Selected Areas in Communications 32.11, 2014, pp.
2053-2064.
illustrates the stability of the model with the 5. P. Cramton and L. Doyle, “Open access wireless
different number of the end-users. The results markets,” Telecommunications Policy, 2017, pp.
conclude that the increasing number of the end 379-390
users in the network, results in the higher 6. I. Praça, C. Ramos, Z. Vale, and M. Cordeiro,
wholesale prices that have the higher variance “MASCEM: A multiagent system that simulates
competitive electricity markets,” IEEE Intelligent
too due to the MASCEMs characteristics of the Systems 18.6, 2003, pp. 54-60.
negotiation process and the fact that activity of 7. Z. Vale, T. Pinto, I. Praca, and H. Morais,
end-users changes during the day. The mean “MASCEM: electricity markets simulation with
retail prices are also raised, but unlike the strategic agents,” IEEE Intelligent Systems, 26.2,
wholesale price, more users result in the less 2011, pp. 9-17.
8. J. Pastirčák, L. Friga, V. Kováč, J. Gazda, and V.
volatile thus more stable prices. Gazda, “An Agent-Based Economy Model of Real-
Time Secondary Market for the Cognitive Radio
Networks,” Journal of Network and Systems
4 CONCLUSION Management. 12.10 2015, pp. 1-17