Вы находитесь на странице: 1из 30

Real-Time Sensor Data Warehouse Architecture Using MySQL Database

Jacob Nikom MIT Lincoln Laboratory The MySQL Users Conference 2005 19 April 2005
MIT Lincoln Laboratory

MySQL Users Conf. 04-19-2005

This work was sponsored by the U.S. Army Space and Missile Defense Command under Air Force Contract# F19628-00-C-0002. Opinions, interpretations, recommendations and conclusions are that of the author and are not necessarily endorsed by the United States Government.

Outline
Introduction Corporate Information Factory (CIF) and its
Data Management Architecture (DMA)

Designing ROCC DMA using CIF architecture Summary

MySQL Users Conf. 04-19-2005

MIT Lincoln Laboratory


7/19/2011 2:32:52 AM 2

Outline
Introduction
Reagan Test Site (RTS) and its instrumentation What is RTS Operations Coordination Center (ROCC)? ROCC primary operations ROCC logical component block diagram ROCC modernization New ROCC Data Management Architecture

Corporate Information Factory (CIF) and its Data


Management Architecture (DMA)

Designing ROCC DMA based on CIF architecture Summary

MySQL Users Conf. 04-19-2005

MIT Lincoln Laboratory


7/19/2011 2:32:52 AM 3

Reagan Test Site (RTS) and its Instrumentation


The Reagan Test Site (RTS) range instrumentation
Multiple RF sensors collecting data in several regions of electromagnetic spectrum

Multiple optical sensors collecting objects metrics and spectral characteristics

Telemetry systems capable of tracking multiple targets

MySQL Users Conf. 04-19-2005

Mobile and fixed ground safety instrumentation MIT Lincoln Laboratory


7/19/2011 2:32:52 AM 4

What is RTS Operations Coordination Center (ROCC)?


RTS instrumentation is controlled by the ROCC
Current DMA Data Analysis Algorithms Decision Algorithms Network

Displays

Flat Files

Sensors

ROCC primary operations


Executes the prepared scenario for the acquisition session Manages the data flow from multiple sensors Processes the acquired data Provides operator displays to track and predict the path of space objects Stores the acquired data for later analysis and reporting Facilitates training and simulation of performed activities
MIT Lincoln Laboratory
7/19/2011 2:32:52 AM 5

MySQL Users Conf. 04-19-2005

What kind of system is ROCC? Feedback control system block diagram


COMPARATOR
reference Input r(t)

FORWARD PATH
actuating signal m(t) controlled variable c(t)

+ -

error signal e(t)

CONTROLLER

PLANT

feedback signal

b(t)
feedback processor FEEDBACK PATH

c(t)

Control is the process of making a system variable adhere to a particular value, called reference value A system designed to follow a changing reference is called tracking control system

ROCC is a tracking control system following the predefined reference input


MIT Lincoln Laboratory
7/19/2011 2:32:52 AM 6

MySQL Users Conf. 04-19-2005

Current ROCC DMA Block Diagram


ROCC controls the data acquisition, analysis and distribution processes Maximizes the quality of delivered data over specified time
Tactical decision control loop

Reference Data
Planning

Data Plant
Sensors Simulation

Output Data

Report: Data analysis

Manual Processing & Analysis


Displays Voice Operators

Automatic Real-Time Processing & Analysis


Tracking Fusion Classification Identification Trajectory Estimation

MySQL Users Conf. 04-19-2005

MIT Lincoln Laboratory


7/19/2011 2:32:52 AM 7

ROCC Modernization
Obsolete system hardware
Old central processors and boards are no longer supported Not enough computational power to perform new tasks Old components and interfaces are incompatible with modern technology

Aging system software



MySQL Users Conf. 04-19-2005

Centralized monolithic architecture Flat files for storing data Use of old procedural languages Alphanumeric displays Industry standard 32/64-bit Xeon or Opteron servers Software vendor independence: Linux and Java Database-based storage Distributed architecture using publish/subscribe paradigm Graphical user interface for visualization tools Targeted dataflow rates: 5 MB/s (sustained), 10 MB/s (peak) Data accumulation rate: 1 TB/year
MIT Lincoln Laboratory
7/19/2011 2:32:52 AM 8

Modernized system

New Data Management Architecture


ROCC data management challenges
Support powerful high-precision instrumentation with almost real-time response Support intensive and costly data collection process involving many human operators with high level of reliability Support data analysis leading to changes in data acquisition environment Be adequate for the wide range of transaction types from simple real-time record reads and inserts to complex multidimensional analytical queries Manage combination of streaming data with traditional structures Provide request management, configuration management and data quality management capabilities

Search for new data management architecture


New system represents conceptual change from the old architecture Instrumentation and Control software traditionally concentrates on algorithm development and lacks good data architecture Need for framework supporting analysis decision execution paradigm Enterprise software is a leading implementer of distributed architecture and publish/subscribe paradigm
MIT Lincoln Laboratory
7/19/2011 2:32:52 AM 9

MySQL Users Conf. 04-19-2005

Outline
Introduction Corporate Information Factory (CIF) for Data Management
Architecture
What is Corporate Information Factory (CIF)? CIF data flow diagram CIF data CIF layers CIF logical component block diagram

Designing ROCC data management architecture using CIF


architecture

Summary

MySQL Users Conf. 04-19-2005

MIT Lincoln Laboratory


7/19/2011 2:32:52 AM 10

What is Corporate Information Factory (CIF) ? *


Information ecosystem is a model of corporate information processing
CIF is the physical embodiment of the notion of an information ecosystem

CIF consists of the following components


External world Applications An integration and transformation layer (I & T layer) An operational data store (ODS) A data warehouse (DW) with current and historical detailed data A data mart(s) An internet and intranet A metadata repository An exploration and data mining warehouse Alternative (secondary) storage Decision support system (DSS)

CIF approach could be used for modeling information processing in any


organization (forest vs. trees view)
* Corporate Information Factory, by W.H. Inmon, Claudia Imhoff, Ryan Sousa. Wiley; 2 edition (December 18, 2000)
MySQL Users Conf. 04-19-2005

MIT Lincoln Laboratory


7/19/2011 2:32:52 AM 11

CIF Data Flow Diagram


External data External world Data acquisition Primary storage management Historical reference data Data delivery
Exploration warehouse

Reference data

Internet Enterprise Resource Planning (ERP)

Statistical analysis

Data mining warehouse

Application layer
eComm (tx)

Integration Operational &Transform layer layer

Warehouse layer
Alternative storage

Report & Analysis layer


eComm (rpt) CRM (rpt)

ERP (tx)

ERP (rpt) BI (rpt)

DSS applications

Enterprise transactions

CRM (tx)

BI (tx)

DW
ODS

Finance Sales Marketing Accounting

CRM = Customer Relation Management BI = Business Intelligence

Data marts

Operational reports

Row detailed data

Metadata management

MySQL Users Conf. 04-19-2005

MIT Lincoln Laboratory


7/19/2011 2:32:52 AM 12

CIF Data

External data
Data is defined outside of corporation. Could have erroneous, redundant or unnecessary items Data format is defined outside of corporation. Reformatting could be required

Reference data
Allows to standardize on commonly used names for important and frequently used information Allows consistent interpretation of corporate data across different departments Could be aliases for common and often referred names

Historical data
Volume of data longer history more data Usefulness of data recent data is more useful than the older one Granularity of data older data likely be used on summary level

Corporate timeline
Ancient history Data Recent history Most current activity Immediate future

DW
MySQL Users Conf. 04-19-2005

ODS

Applications MIT Lincoln Laboratory

7/19/2011 2:32:52 AM

13

CIF Layers
eComm (tx)

Application layer
Interacting directly with end user Gathering detailed transaction data

ERP (tx)

CRM (tx)

Auditing and adjusting data Editing data

BI (tx)

Integration and transformation layer


Combined non-integrated data from multiple application Transform external data into corporate data Creating appropriate metadata Mathematical transformation Reformatting and resequencing
MySQL Users Conf. 04-19-2005

MIT Lincoln Laboratory


7/19/2011 2:32:52 AM 14

CIF Layers (Continued)


Operational layer
ODS

Subject-oriented Integrated Volatile Current-valued Detailed Normalized

Warehouse layer
Data Warehouse
Subject-oriented Integrated Nonvolatile Time-variant Comprised of both summary and detailed data Summary data optimized for Report & Analyses queries Normalized and de-normalized data

Statistics
eComm (rpt) CRM (rpt) ERP (rpt) BI (rpt)
MySQL Users Conf. 04-19-2005

Report & Analysis layer

Statistical analysis
Exploration reporting Data mining reporting DSS analysis and reporting Finance Sales Marketing Accounting MIT Lincoln Laboratory
7/19/2011 2:32:52 AM 15

CIF Logical Component Block Diagram


System controls the corporation resources using real-time and long-term DSS Maximized the expected profit of corporation over specified time
Strategic decision control loop Tactical decision control loop

Reference Data
Corporate Goals

Data Plant

Output Data

Applications
Real-time DSS
Operational Data Store

Corporate Report

Long-term DSS

Data Warehouse
MySQL Users Conf. 04-19-2005

MIT Lincoln Laboratory


7/19/2011 2:32:52 AM 16

Outline
Introduction Corporate Information Factory (CIF) for Data Management
Architecture (DMA)

Designing ROCC DMA using CIF architecture


ROCC data flow diagram ROCC data ROCC layers ROCC logical component block diagram Database selection Three dangers of database design

Summary

MySQL Users Conf. 04-19-2005

MIT Lincoln Laboratory


7/19/2011 2:32:52 AM 17

ROCC Data Flow Diagram


Data acquisition
Reference data

Operational data

Archived data

External world

Integration &Transform layer

Operational layer
Planning

Warehouse layer
Secondary storage

Report & Analysis layer

Bias modeling Data mining warehouse

Multicast middleware
RIB

DSS applications Classifier

Long-term reporting & analysis

RIB

Best Choice

ODS
RIB
Smoother

BET

Post overview

DW

Short-term reporting & analysis

Impact

Sensor control data

RIB

Data Fusion

Space
Quick Look reports

Data marts

RIB = ROCC Interface Box


MySQL Users Conf. 04-19-2005

MIT Lincoln Laboratory


7/19/2011 2:32:52 AM 18

ROCC Data
External data
Data is defined outside of ROCC. Could have erroneous, redundant, or unnecessary items Data format is defined outside of ROCC. Reformatting or object conversion could be required Comprise geophysics models and constants necessary for external data interpretation Comprise common locations, sensor names, name of computers, programs Comprise the user names, passwords, access rights and privileges Operational data being migrated to the warehouse become historical data Detailed historical data are used to produce summarized historical data Historical data only inserted, never updated Comprise configuration data for the sensors acquisition procedures Comprise ROCC software components configuration data (XML format) Comprise data to plan specific activities to acquire space objects coordinates
MIT Lincoln Laboratory
7/19/2011 2:32:52 AM 19

Reference data

MySQL Users Conf. 04-19-2005

Historical data

Planning data

ROCC Layers
External world
Simultaneous output from multiple sensors up to 10 MB/s Capable to produce data autonomously Capable to work under the guidance of DSS applications Produces data as streams with considerable output rates
Feedback from DSS applications

Integration and transformation layer


RIB

Plays vitally important role in reconciling the incoming external data content and format with the internal data requirements Converts incoming data into appropriate Java objects Creates necessary metadata Mathematical transformation Reformatting and resequencing

RIB

RIB

RIB

MySQL Users Conf. 04-19-2005

MIT Lincoln Laboratory


7/19/2011 2:32:52 AM 20

ROCC Layers (continued)


Operational Layer
Subject-oriented
Focusing on basic transaction processing. Inserts and reads the streams of integrated and transformed sensor data Tracks, Ids, Control blocks, etc. ODS

Integrated
Physical unification and cohesiveness Uniform key structures Table naming conventions Common physical units and coordinate systems Data layouts and Metadata

DSS applications Classifier

Volatile
ODS data could be updated (replaced) as a normal part of processing. After acquisition session is done the data are moved to the DW

Current-valued
ODS data values are related to the current event (current acquisition session). For the next mission the ODS will be updated and its content will be moved to the DW (data migration)

Best Choice

Detailed
ODS contains inserted values of the published sensor objects and does not expect to have summary data

Smoother

Normalized
ODS contains normalized data

Data Fusion

Decision Support System Applications


Makes real-time operational decisions like ID assignment, sensor allocation, etc

MySQL Users Conf. 04-19-2005

MIT Lincoln Laboratory


7/19/2011 2:32:52 AM 21

ROCC ODS Specifics


Data streams of objects
Streams of measurements usually dont have very complex structures Object-relational mapping is straightforward and not computationally intensive High-speed insertion does not allow to use indices Relatively small size of the ODS allows to work without indices Indices do exist in the DW Could control the sensors, which in turn influences the input data Typical analytical application assume that data producer is not changed during the query

Indices Real-time DSS feedback

Fault-tolerance (primary and secondary ODS)


Network Network Network Additional benefits

ODS
Primary System
MySQL Users Conf. 04-19-2005

ODS
Secondary System

DW
Archive System

Necessary operations could be performed during the copying Two operational databases could be used in parallel right after the acquisition

MIT Lincoln Laboratory


7/19/2011 2:32:52 AM 22

ROCC Layers (continued)


Historical (data warehouse) layer

Data Warehouse

Subject-oriented
Organized like ODS around major ROCC entities, but focused on the modeling and analysis of data

Integrated
Data migrated into DW from ODS are integrated with the rest of DW data

Time-variant
Every datum in the data warehouse is identified with a particular time period. All summarized data are correct only for the particular period to whom the corresponding detailed data are identified with

Non-volatile
There are no updates in the warehouse, only inserts. The past cannot be changed, only expanded

Comprised of both summary and detailed data


Once detailed data from ODS migrated into DW, they became a part of history. In addition to the detailed historical data DW contains summary data. They are pre-calculated to reduce analytical query times

ROCC DW specifics
ROCC DW does not use multidimensional data model yet, only summarized tables

MySQL Users Conf. 04-19-2005

MIT Lincoln Laboratory


7/19/2011 2:32:52 AM 23

ROCC Layers (continued)


Analysis and Reporting layer
Continuous automatic monitoring of sensor metric performance Example: Angle Bias Modeling using ROCC Data Warehouse
What is Angle Bias Modeling?
RealReal-time queries Storing sensor data streams ODS

Creation of a mathematical model to describe differences between reported and actual antenna pointing positions
Sensor data collection

RIB

Bias

Data migration Analytical queries Bias model coefficients Corrected pointing information

Data Warehouse

Bias Modeling Application

Sensor Control System


MySQL Users Conf. 04-19-2005

MIT Lincoln Laboratory


7/19/2011 2:32:52 AM 24

Angle Bias Modeling using ROCC Data Warehouse


Organization of Sensor-Specific Summary Track Data in the Warehouse
Observed Data Source Time Range Az El Iono Corr Tropo Corr SNR Truth Data (Time-aligned and in Sensor Coord) Range Az El Delta Rng Residual Data Delta Az SNR

Bias Modeling Application Data Flow


Bias Model Analytic Equation Strategic decision control loop Sensor Control System

Truth Data

Data Warehouse

Observed Data

Generate Residuals

Residual Data

Multivariate Regression

Atmospheric Data

Report

Bias Model Coefficients

Data Warehouse

MySQL Users Conf. 04-19-2005

MIT Lincoln Laboratory


7/19/2011 2:32:52 AM 25

ROCC Logical Component Block Diagram


ROCC controls the RTS resources using tactical and strategic DSS Maximizes the quality of collected data over specified time
Strategic decision control loop

Reference Data
Planning

Tactical decision control loop Data Plant


Sensors Simulation

Output Data

Report Data Analysis

Tactical real-time DSS


Displays
Tracking Fusion

Voice
Classification Identification

Operators
Trajectory Estimation

Operational Data Store

Strategic long-term DSS


Bias Modeling Sensor Comparison Operators

Data Warehouse

MySQL Users Conf. 04-19-2005

MIT Lincoln Laboratory


7/19/2011 2:32:52 AM 26

Database Selection
The same server should work adequately for both ODS and DW Deficiency in sophistication could be mitigated by custom programming
Comparison criteria
(qualitative values)

MySQL

Oracle

DB2 (IBM)

SQL Server (Microsoft)

PostgreSQL

Speed Sophistication Reliability Administration simplicity Standardization Savings

High Moderate High High High High

High High High Low Moderate Low

High High High Low Moderate Low

High High Moderate Moderate Moderate Low

Low High Low High Moderate High

MySQL Users Conf. 04-19-2005

MIT Lincoln Laboratory


7/19/2011 2:32:52 AM 27

Three dangers of ROCC DMA design


Balkanization of data
Different groups of data have different design Attempt to fit data definitions into requirements of the existing tool In the long run increase the maintenance cost

Dialectism
Usage of specific database dialects Deviation from existing SQL standards Locks the user with specific vendor

Dirty repository design


Part of the data stored in the database, another (closely related on) stored in the file system Duplication of data between database and file system Increases the maintenance const

MySQL Users Conf. 04-19-2005

MIT Lincoln Laboratory


7/19/2011 2:32:52 AM 28

Outline

Introduction Corporate Information Factory (CIF) for Data


Management Architecture

Designing ROCC data management architecture


using CIF Architecture

Summary

MySQL Users Conf. 04-19-2005

MIT Lincoln Laboratory


7/19/2011 2:32:52 AM 29

Summary

Modernization of the ROCC calls for a new type of data management architecture
New high-performance hardware Significant increase of generated and managed volumes of data Introduction of new services Designed to support large scale information system Effectively manages different types of information queries Provides flexibility in distributing data between multiple producers and consumers ODS supports near real-time storage requirements and targeted, low granular queries DW is used for complex queries against summary-level data ODS provides information for tactical decisions about near real-time data acquisition DW delivers feedback for strategic decisions leading to system improvements Good performance for fast queries in ODS Capable of storing large amount of data in DW Simple installation and licensing allow many independent servers to run inside one system being used as ODS, DW, data marts, etc. Excellent Java support allows seamless integration with the rest of the software MIT Lincoln Laboratory
7/19/2011 2:32:52 AM 30

CIF satisfies the requirements


ODS and DW represent two types of repositories for information request


ODS and DW are parts of different control loops


MySQL is a good fit for ODS and DW databases

MySQL Users Conf. 04-19-2005

Вам также может понравиться