Академический Документы
Профессиональный Документы
Культура Документы
Information Workflow
Information Lifecycle Example
Overview
RF
Corsello Research Foundation
Stages of Information
All data must pass through several stages in its lifecycle
Creation or Collection Processing or Review (QA/QC) Use and Re-use Disposal
The main stage is use and re-use, which may result in data creation
Analysis results are new data creations Intermediate data may be directly disposed
Corsello Research Foundation
RF
Information Stores
Data is always stored in some place and format A data store indicates the place in which data is stored
A relational database (e.g. Oracle, SQL Server) is a type of store A network share is another type of store
A data format indicates the internal structure or encoding of data within the store
A pdf file is a type of format A table in a database defines its own format
Corsello Research Foundation
RF
Information Formats
Once data is within a specific format, that format will govern how it may be used
Images (e.g. jpg) can be displayed, but data within them is lost (e.g. text) Documents (e.g. pdf) can be read and indexed for searching, but numeric data within them is lost (e.g. tables) for use Databases allow for data to be transformed into other formats as needed
Unless the database contains pre-formatted content (e.g. pdf file in Oracle)
Data format is critical for data exchange and understanding within computer programs
RF
Information Flows
Each of the stages of information will tie to a different human workflow
RF
Work Flows
Each work area or topic (e.g. water quality, fish counts) will require its own work flow governing data management
Several work areas may end up using the same work flow, but that should be circumstantial rather than planned
RF
Work Flows
RF
Workflows
Introduction
Corsello Research Foundation
RF
Planning
For any new project, planning must occur to determine what is to be collected
For each dataset to be collected, there must be a data standard produced for handling that type of data Data standards should be common across all projects for a given data type Data stores may need to be created to support each data type
RF
Planning Phase
RF
Standardizing
If data standardization is needed, the process involves several aspects:
Identify existing standards
US Federal / US DoD Industry / International
Resulting standards and model becomes the norm for the organization
Should be considered a mostly one-time cost
Corsello Research Foundation
RF
Standardization Phase
RF
Creation / Collection
Data gets created in several ways:
Field collection Real-time telemetry (e.g. SCADA) Analysis results Report generation
Each form of data creation may need a workflow Field collection is of primary concern due to two primary factors:
Human involvement and potential for mistake / blunder Time component (data re-collected is time shifted)
RF
Creation Phase
RF
Processing / QA/QC
Once created, most data must be evaluated for quality, correctness
If data is not acceptable, there must be a rejection capability
Accepted data is processed, transformed and loaded into the final information store(s)
This may be a manual or automated process COTS tools may be ideal for this (e.g. Aquarius for water quality)
RF
Processing Phase
RF
Results are then treated as newly created data back in the creation phase
RF
Use Phase
RF
Relations exist
Source - Output Source - Source
RF
Implementation
Implementing a data strategy is an ongoing process These cycles will be developed in concert with the data producers and users Tools will be bought / built as needed to facilitate effective information management There will be several implementation efforts that will span projects
Corsello Research Foundation
RF
Sites
Concepts
Corsello Research Foundation
RF
Overview
All field data is collected at a geographic location
If a given location is well-known and used repeatedly, the management of that location provides value A site is a name that represents a location where sampling may take place
All data collected at a specific site can be related back to the site at which it was collected Querying the site will yield the data collected
RF
Location
While sites are intuitively a spatial location, locations do not necessarily need to be stored for the site to be useful
If however, the site location is stored (e.g. GIS point)
Querying by location will yield all sites in that location Query by basin (basin stored spatially), will result in all sites within that basin to be returned
In addition to the spatial nature of the site itself, a site boundary can be stored indicating the uncertainty of collections
RF
Site
A site will be defined as a named place where some form of collection or sampling may be performed A site may have a spatial location (GIS shape) associated with it
Support for points, lines (transect) and areas (netting area) A second spatial location is allowed (area only) for sampling approximation
RF
Sampling Events
Any activity of collecting data is a sampling event
A sampling event that occurs at a defined site may be entered and associated with that site
The organization that performs the sampling is associated with the event (e.g. contractor company)
The project that the sampling is being conducted for (paying) is associated with the event
RF
Projects
Any organized work effort may be a project
All formal work projects are projects
Projects can be nested (sub-projects)
RF
Organizations
An organization is a group of people working toward a common goal
Any named group is an organization
Organizations will be managed to track project teams (external agencies) and personnel alignments
RF
Contactable Party
Organizations and people can be contacted, and therefore have contact information (email, phone, address) A contactable party will be defined as any of the below:
A person An organization A point of contact
A job role within an organization which may be filled by a person
RF
Point of Contact
A point of contact is a simple abstraction of a job or position Allows for a front-desk type of entity that is intermittently filled by various people Each project has a default point of contact
This allows the actual person filling the role to change more easily
RF
Data Catalog
There is a current effort to build a card catalog for data within the district The previous slides provide data elements that will be used in the data catalog and as a mechanism for mining all data across the organization The data catalog will become the inventory of data with links to the actual data cataloged
RF
Current Model
RF
Development
The data catalog concept is still notional at this time
A data model for each of the items in the previous slide are being developed
Once modeled, these data elements may be collected, even without a tool in place for the data Implementation of the tools will be based upon a prioritization
Need for capability Cost to develop
Time to develop
Dependency on other capability
RF
Water Quality
Workflows
Corsello Research Foundation
RF
Overview
Water quality data is commonly collected across many projects Collections are commonly performed by contractors Collections use several types of collection methods
Fixed telemetry
RF
Collections
All collections are performed at some form of site
Instantaneous grab samples may not have well-known sites, but are still sampling events
Sampling events may be continuous such as telemetry and fixed time series
Multi-level samplings occur at a single site (sites have no Z axis)
RF
Workflow
The water quality workflow will incorporate several aspects
Many forms of field collection activities
Many forms of data submission (telemetry) QA/QC processes for evaluation Aquarius tool integrated into data process Multiple database insertions
Aquarius database CWMS database
Others?
A partial flow for field collection activities (non-telemetry) has been developed
RF
Flowchart
Currently Notional
RF
Questions
RF