Академический Документы
Профессиональный Документы
Культура Документы
each tool viewed logically as application of client Can reside on separate machine or in separate process and access data warehouse
RDBMS or proprietary OLAP embed data mining capabilities deeply within engines to improve efficiency and add extensions Requires a good foundation in terms of a data warehouse
CS753
CS753
Economics
Unprecedented affordability of MIPS and MB
Parallel computing
Enormous amounts of data can be processed
CS753
Traditional Analysis
Did sales of product X increase in Nov.? Do sales of product X decrease when there is a
promotion on product Y?
product X?
CS753
CS753
data cleaning missing values data derivation merging data SupervisedSupervised-articulating goal, choosing dependent variable or output and specifying data fields UnsupervisedUnsupervised-group similar types of data or identify exceptions
CS753 Dr. Mary Ann Robbert
Defining a study
Prediction
Choose the best outcome based on historical data
CS753 Dr. Mary Ann Robbert
Models
Genetic Algorithms Neural Nets Agents Statistics Visualization
CS753
Genetic Algorithms
Artificial intelligence system that mimics the evolutionary, survivalsurvival-ofof-thethe-fittest processes to generate increasingly better solutions to a problem. Genetic algorithms produce several generations of solutions, choosing the best of the current set for each new generation. Examples
Generating human faces based on a few known features. Generating solutions to routing problems. Generating stock portfolios.
CS753 Dr. Mary Ann Robbert
SELECTION - or survival of the fittest. The key is to give preference to better outcomes. CROSSOVER - combining portions of good outcomes in the hope of creating an even better outcome. MUTATION - randomly trying combinations and evaluating the success (or failure) of the outcome.
CS753 Dr. Mary Ann Robbert
Neural Nets
Mathematical Model of the Way a Brain Functions Machine learning approach by which historical data can be examined for pattern recognition A neural network simulates the human ability to classify things based on the experience of seeing many examples.
CS753
: / / w w w . a
Example
Distinguishing
different chemical
compounds
Detecting Reading
anomalies in human tissue that may signify disease handwriting fraud in credit card use
Detecting
CS753
Intelligent Agents
Software entities that carry out some set of operations on behalf of user or program with some degree of autonomy and employ some knowledge or representation of users goals and desires. Some common characteristics
ability to communicate, cooperate and coordinate with other agents ability to act autonomously to achieve collective goal of system
CS753 Dr. Mary Ann Robbert
Tasks
automate repetitive tasks finding and filtering information summarizing complex data
Capability to learn and make recommendations Black box approach hides complexity and allows for design of scalable system
Comparison
AI System Expert Systems Neural Networks Problem Type Diagnostic or prescriptive Identification, classification, prediction Based On Strategies of experts The human brain Starting Information Experts know-how Acceptable patterns
Genetic Algorithms
Intelligent Agents
Statistics
SAS, SPSS
Pros - Established technology Cons - Needs assumptions, nominal
CS753
Visualization
Data visualization refers to technologies that support visualization of information Includes digital images, GIS, multidimensions, 3-D presentations, animations http://www.almaden.ibm.com/cs/quest/dem o/assoc/general.html
CS753
It does not:
Find answers to questions you dont ask Eliminate the need for domain experience Remove the need for data analysis skills
CS753
CS753
http://www.rulequest.com/gritbot-info.html
CS753
Conclusion
identifies and formulated appropriate data must be selected, cleaned and prepared for queries and business analysis
http://www.rulequest.com/cubistexamples.html#BOSTON http://www.almaden.ibm.com/cs/quest/