Вы находитесь на странице: 1из 41

The DHS Ontology Case

Presentation by OntologyStream Inc Paul Stephen Prueitt, PhD 19 March 2005

Ontology Tutorial 1, copyright, Paul S Prueitt 2005

Ontology technology exists in the market


However, practical problems block progress in designing and implementing DHS wide ontology technology. Business process mythology, like Earned Value program management, is not focusing on the right questions and maturity models for software development have precluded most innovations that come from outside the relational database paradigm.

These practical problems are also partially a consequent of 1. 2. Some specific institutional behaviors related to traditional program management. Confusion caused by long term, and heavily funded Artificial Intelligence marketing activities

As a general proposition, through out the federal government, quality metrics are not guiding management decisions supporting: 1) Quick transitions from database centered information technology to XML based Semantic Web technology. 2) Transitions from XML repositories to ontology mediated Total Information Awareness, with Informational Transparency, in Secure Channels.

?
DHS Ontology
1) 2) 3) 4) World-wide Trade Data Investigation Targeting Risks, Threats and Vulnerabilities Policy Enforcement Emerging Semantic Web Standards

Diagram from Prueitt, 2003

First two steps are missing

Seven step AIPM RDBMS


Diagram from Prueitt, 2003

is not complete

The measurement/instrumentation task

First two steps in the AIPM

Measurement is part of the semantic extraction task, and is accomplished with a known set of techniques Latent semantic technologies Some sort of n-gram measurement with encoding into hash tables or internal ontology representation (CCM and NdCore, perhaps AeroText and Converas process ontology (?), Orbs, Hilbert encoding, CoreTalk/Cubicon.

Stochastic and neural/ genetic architectures

One model for semantic extraction explicitly focuses on the first two aspects of the AIPM; e.g. instrumentation/measurement and data-encoding/interpretation

Actionable Intelligence Process Model has an action-perception event cycle. Stratified ontology supports the use of this cycle to produce knowledge of attack and anticipatory mechanisms based on the measurement of substructural categorical invariance.
Work flow and process ontology is available as a basis for encoding knowledge of anticipatory response mechanisms.

Categorical Invariance is measured, using Orbs (Ontology referential bases) for example or CCM (Contiguous Connection Model) encoding, and organized as a resource for RDF triples using some lite OWL OIL.

Distributed ontology management is already available in some military activities

distributed Ontology Management Architecture d-OMA Ontology Architecture


March 10, 2005
Version 3.0

Part of a series on the nature of machine encoding of sets of concepts

Benchmark

Higher level Ontologies as part of the d-OMA

Transition

Community

Community

Community

Localized Ontologies within an DAML type agency structure

Higher level Ontologies

Links to Web services and Internal R&D


Community

Community Community

Localized Ontologies

Ontology users have the following roles:


Ontology librarian Knowledge Engineers

Knowledge engineers Information mediators ( e.g. ontology librarians )

A reconciliation process is required between ontology services.

Community

Community workers (e.g. analysts)

Intelligence Targeting Analyst

Higher level Ontologies

Real time acquisition of new concepts, and modifications to existing concepts are made via an piece of software called Ontology Use Agent Ontology Use Agents

have various type of functions expressed in accordance with roles


Mature scholarship from evolutionary psychology research communities.

Community

Localized Ontologies

Human and system interaction with a common Ontology Structure


Ontology Presentation
Part of a series on the nature of machine encoding of sets of concepts

First principles
First, and before all else, an computer based ontology is a

{ concepts }
In the natural, physical, sciences an ontology is the causes of those things that one can interact with in Nature. Physical science informs us that a formative process is involved in the expression of natural ontology in natural settings.
The set of our personal and private concepts is often thought to be the causes of how we reason as humans. This metaphor is operational in many peoples understanding about the nature and usefulness of machine encoded ontology. But this metaphor can also mislead us!!!!

Extensive literatures indicate that the Artificial Intelligence (AI) mythology has lead many to believe that the reasoning of an ontology might be the same as the reasoning of a human in all cases.
This inference is not proper because the truthful-ness of this inference has not been demonstrated by natural science, and perhaps cannot be demonstrated no matter what the levels of government funding for AI. AI is discounted in Tim Berners Lees notion of the Semantic Web.

Tim Berners Lee Notion of Semantic Web

The point being made here is that the notion of inference is very different depending if one is talking about the human side or the machine side of the Semantic Web.
One consequence of acknowledging this difference is to elevate the work of the authors of the OASIS standards, in particular Topic Maps. In Topic Maps we have an open world assumption and very little emphasis on computational inference. Human knowledge is represented in a shallow form, and visualization is used to manage this representation. Computation with topic maps AND OWL ontologies work together with XML repositories.

First principles
{ concepts }
Let is use only set theory to consider Tim Berners Lees notion of Semantic Web.

Let C = { concepts } and B be a subset of C


Some software

Subsetting function

The subsetting function might be an

Human and/or machine computation creates a well formed query

ask-tell interaction between two ontologies

B is a subset of C

First principles
{ concepts } At this point we have various possible consequences.

1) The small(er) B ontology might simply be viewed by a human and actions taken outside of the information system Subsetting function

2) The smaller ontology might be used in several different ways


a) Imported into a reasoner to be considered in conjunction with various different data sources.

b)

Send messages to other ontologies via a distributed Ontology Management System

B is a subset of C

First principles

Situational Ontology

Software

The knowledge repository acts as a perceptual ground in a figure-ground relationship.

The ontology sub-setting function has pulled part, but not all, of the background into a situational focus. This first principle is consistent with perceptual physics and thus is informed by natural science.

(The following slides are from OntologyStreams Ontology Presentation VII General Background)

Extending Ontology Structure over legacy information systems

Part of a series on the nature of machine encoding of sets of concepts

Presentation Contents

Functional specs Ontology Use Start-up Use Model Model: Steady State Ontology System Components: Framework for Query Entity Data Access: Steady State Ontology System

Framework for Query Entity


Building ontology from data flow Using Ontology Ontology Generating Framework The inverse problem: generating synthetic data Finding data regularity in its natural contextualization

Functional specs

Functional specs:

1. Human-centric: must be human (individual) centric in design and function 2. Support data retrieval: must act as a data retrieval mechanism 3. Event structure measurement: must assist in the definition of data acquisition requirements on an on-going basis 4. Interactive: must support multiple interacting ontologies

5. Real Time: must aid in real time problem solving and in the long term management of specific sets of concepts

Note: Ontology mediated knowledge systems have operational properties that are quite different from traditional relational database information systems. These five functional specs have been reviewed by a small community of professional ontologists, as has been deemed correct for knowledge systems.

Ontology Use Start-up Use Model I

Transaction process
Entity updates Inferences about

Knowledge Base

Query entities

One of the Ontology Reasoners

Start-up Use Case Step 1 startup


At start-up, resources are loaded into the reasoner. 1) Some part of the knowledge base is expressed as reasoner complaint RDF 2) Some inference logic is loaded into the reasoner 3) Query entities are loaded to reflect interests of analyst(s) 4) Transaction processes are occurring in real time

Start-up Use Model II

Transaction process
Entity updates inferences

Knowledge Base

Query entities

Reasoner

Start-up Use Case Step 2 startup


Since instance data is much larger (2 or more magnitudes larger) than the knowledge base, the instance data is managed in a separate start-up process.

Instance data

Model: Steady State Ontology System

Transaction process
Entity updates inferences

Inference Mgr Knowledge Base Query entities


Query Mgr Data Access Mgr Reasoner Ontology Mgr

Instance data

Instance data may be remote or local. Local data is on the same network as the knowledge base.

Components: Framework, User Visualization point of view

Transaction processes
updates
Query Manager

inferences

Inference Manager

Data

Reasoner

Query entities
Data Access Mgr

Knowledge Base

Ontology Mgr

Use Case: Steady State Ontology System

Data Access: Steady State Ontology System


using the OWL standard **

Instance data
Pipes and Threads

The RDF knowledge base ** is a set of concepts expressed as a set:

Data Object

{ < subject, verb, predicate > } Data and the data is either XML or a data structure such as one would have as a C construct. The Data Access Manager must manage the mapping between local data stores (sometimes having millions of elements) and the set of concepts. The remote data may have many persistence forms, and will be accessed via a data object.
Data Access Mgr

Knowledge Base

** We use RDF and OWL as a standard to create minimal and well knowledge inference capabilities.

Framework from Query Entity point of view

Transaction processes
updates
Query Manager

inferences

Inference Manager

Data

Reasoner

Query entities
Data Access Mgr

Knowledge Base

Ontology Mgr

A Query Entity is itself a type of light ontology. It develops knowledge about the user(s) and about the query process.

Framework from Knowledge Management point of view

Transaction processes

Query Manager

Query entities

Real time analysis is supported through the development and use of query entities. These entities have regular structure and are managed within a Framework.

Building ontology from data flow Transaction processes


A model of the "causes" of transaction data. The model is based on, grounded in, the concept of "occurrence in the real world", or "event". Associated with each event, we may have a "measurement". So we have a set of events { e(i) }

Objective: We convert a stream of event measurements into an transaction ontology, and create auxiliary processes that will use a general risk ontology, an ontology about process optimization, and other utility ontologies.

where i is a counter. Some of the fields MAY not be used.

Later the number of fields in any "findings data flow" may increase or decrease without us caring at all.

Using Ontology Transaction processes

Knowledge Base

Query Manager

Data Query entities

Consider a set of events { e(i) } where i is a counter.

Each event will have a weakly w structured (free form text) and structured s component. So we use the notation
e = w/s or e(i) = w(i)/s(i) .

Instance data

Ontology Generating Framework


Semantic extraction

Transaction processes { w(i) } { e(i) }

{ s(i) }
Notation e(i) = w(i)/s(i) An event is measured by filling in slots in a data entry form, and by typing in natural language into comments fields in these entry forms.
Semantic extraction is performed using one of several tools, or tools in combination with each other Discrete analysis

Observation: Given real data, one can categorize the set of events due to the nature of the information filled in.

Discrete analysis is mostly the manual development of ontology through the study of natural categories in how the data is understood by humans.

Finding data regularity in its natural contextualization


For each event we may have zero or more free text fields. Suppose we concatenate these, into one text unit, and perhaps also develop some metadata (in some way) that will help contextualize the semantic extraction process. We label this unit as w(i) .

Free form text is weakly structured. The set

{ w(i) }
Semantic extraction

is a text corpus that we would like to associate with several ontologies.

{ w(i) }

Each association is made as exploratory activities with specific goals. Each s(i) is a record from a single table. Suppose there are 120 columns. Each column has values, sometimes empty. Fix the counter at *. Let s(*, j) , j = 1, . . . , 120 be the columns. We can call these columns also using the term slot.

Regularity in data flow is caused by the events occurring in the external world. Thus the instances of specific data in data records provide to the knowledge system a measurement of the events in the external world.

{ s(i) }
Discrete analysis

Now for each s(*, j) list the values that are observed to be in that column. These values are the possible fillers for the associated slot.

Data regularity in context . Patterns and invariance


XML and related standards community open and protected standards (CoreTalk, Rosetta Net, ebXML) .NET component management J2EE frameworks Spring

General framework constructions


autonomous extraction (Hilbert encoding of data into keyless hash tables, SLIP Shallow Link analysis Iterated Scatter-Gather an Parcelation) Core, AeroText (etc) semantic extraction using process knowledge and text input

The role of community 1) A community of practice provide a reification process that is Human centric (geographical-community / functional-community) 2) Each community may have locally instantiated OWL ontology with topic map visualization.
a) Consistently and completeness is managed locally as a property of the human individuals, acting within a community, and a locally persisted history of informational transactions with his/her agent b) Individual agents can query for and acquire XML based Web Services, negotiate with other agents and create reconciliation processes involving designated agencies.

3) Knowledge engineers act on the behalf of policy makers to


reify new concepts, delete concepts and to instantiate reconciliation containers

Establishing coherence in natural knowledge representation 1) Coherence is a physical phenomenon seen in lasers
a) b) c) Brain function depends critically on electromagnetic coherence Incoherence, e.g. non-rationality, and in-completeness are two separate contrasting issues to the issue of coherence Mathematics, and therefore computer science and logic, have completeness and consistency issues that are well established and well studied

2) Logical coherence is sometimes treated as consistency in logic


a) b) One may think that one has logical consistency and yet this property of the full system was lost at the last transaction within the Ontology The act of attempting to find a complete representation of information organization is sometimes called reification, and reification efforts works against efforts to retain consistency

3) Human usability often is a function of a proper balance between logic and agility

Understanding multiple viewpoints


1) Logical consistency and single minded-ness are operational linked together in most current generation decision support systems. Database schema legacy issues. Schema servers, FEA (Federal
Enterprise Architecture) standards, schema independent data systems

2) Observation: Human cognitive capabilities have far more agility than current generation decision support systems 3) The topic map standard (2001, Newcomb and Briezski ) was specifically developed to address the non-agility of Semantic Web standards based on OWL and RDF. (Ontopia, Steve Pepper) 4) Combining XML repositories, OWL, distributed agent architectures and Topic Maps is expressed as Stratified Ontology Management

Detection of novelty
Scenario: an targeting and search analyst at the Port of Seattle is only partially aware of why she feels uncomfortable about some characteristic of a shipment from Sweden. The feeling is expressed in a hand written finding and fed into a document management repository for findings. A targeting and search analyst at the Port of LA expresses a fact about a similar shipment without knowing of her colleagues sense of discomfort.
1) Conceptual roll-up techniques are used on a minute by minute basis to create a viewable topic map over occurrences of concepts expressed in findings. 2) Link analysis connects an alert about uncertainty in the Seattle finding to the fact from LA to produce new information related to a known vulnerability and attack pattern. 3) New knowledge forms are propagated into OWL instantiated ontology and rules and viewed using Topic Maps.

Agent architecture
Scenario: Human analysts provide situational awareness via tacit knowledge, personally agent mediated interactions with agent structures, and human to human communications. A model of threats and vulnerabilities has evolved but does not account for a specific new strategy being developed by a smart smuggler. The smuggler games the current practices in order to bring illegal elements into the United States
1) 2) The model of threats and vulnerabilities expresses as a reification process from various techniques and encoded OWL/Protg ontology with rules Global Ontology: The model is maintained via near real time agency processes under the observation, active review, of knowledge engineers and program managers working with knowledge of policy and event structure Local Ontology: Information is propagated to individual analysts via alerts and ontology management services controlled by the localized agent (of the person)

3)

New (1/30/2005) tutorial on automated extraction of ontology from free form text:
http://www.bcngroup.org/beadgames/anticipatoryWeb/23.htm

Вам также может понравиться