Guide to Intelligent Data Analysis

|
 
 

Table of Contents

1 Introduction

1 Introduction

1.1 Motivation
1.1.1 Data and Knowledge
1.1.2 Tycho Brahe and Johannes Kepler
1.1.3 Intelligent Data Analysis
1.2 The Data Analysis Process
1.3 Methods, Tasks, and Tools
1.4 How to Read This Book

2 Practical Data Analysis: An Example

2 Practical Data Analysis: An Example

2.1 The Setup
2.2 Data Understanding and Pattern Finding
2.3 Explanation Finding
2.4 Predicting the Future
2.5 Concluding Remarks


3 Project Understanding

3 Project Understanding

3.1 Determine the Project Objective
3.2 Assess the Situation
3.3 Determine Analysis Goals

4 Data Understanding

4 Data Understanding

4.1 Attribute Understanding
4.2 Data Quality
4.3 Data Visualization
4.3.1 Methods for One and Two Attributes
4.3.2 Methods for Higher-Dimensional Data
4.4 Correlation Analysis
4.5 Outlier Detection
4.5.1 Outlier Detection for Single Attributes
4.5.2 Outlier Detection for Multidimensional Data
4.6 MissingValues
4.7 A Checklist for Data Understanding
4.8 Data Understanding in Practice
4.8.1 Data Understanding in KNIME
4.8.2 Data Understanding in R

5 Principles of Modeling

5 Principles of Modeling

5.1 Model Classes
5.2 Fitting Criteria and Score Functions
5.2.1 Error Functions for Classification Problems
5.2.2 Measures of Interestingness
5.3 Algorithms for Model Fitting
5.3.1 ClosedFormSolutions
5.3.2 GradientMethod
5.3.3 CombinatorialOptimization
5.3.4 Random Search, Greedy Strategies, and Other Heuristics
5.4 Types of Errors
5.4.1 Experimental Error
5.4.2 SampleError
5.4.3 Model Error
5.4.4 AlgorithmicError
5.4.5 Machine Learning Bias and Variance
5.4.6 Learning Without Bias?
5.5 Model Validation
5.5.1 TrainingandTestData
5.5.2 Cross-Validation
5.5.3 Bootstrapping
5.5.4 Measures for Model Complexity
5.6 Model Errors and Validation in Practice
5.6.1 Errors and Validation in KNIME
5.6.2 Validation in R

6 Data Preparation

6 Data Preparation

6.1 SelectData
6.1.1 Feature Selection
6.1.2 Dimensionality Reduction
6.1.3 Record Selection
6.2 CleanData
6.2.1 Improve Data Quality
6.2.2 MissingValues
6.3 ConstructData
6.3.1 Provide Operability
6.3.2 Assure Impartiality
6.3.3 MaximizeEfficiency
6.4 Complex Data Types
6.5 Data Integration
6.5.1 VerticalData Integration
6.5.2 Horizontal Data Integration
6.6 Data Preparation in Practice
6.6.1 Data Preparation in KNIME
6.6.2 Data Preparation in R

7 Finding Patterns

7 Finding Patterns

7.1 HierarchicalClustering
7.1.1 Overview
7.1.2 Construction
7.1.3 Variations andIssues
7.2 Notion of (Dis-) Similarity
7.3 Prototype- and Model-Based Clustering
7.3.1 Overview
7.3.2 Construction
7.3.3 Variations andIssues
7.4 Density-Based Clustering
7.4.1 Overview
7.4.2 Construction
7.4.3 Variations andIssues
7.5 Self-Organizing Maps
7.5.1 Overview
7.5.2 Construction
7.6 Frequent Pattern Mining and Association Rules
7.6.1 Overview
7.6.2 Construction
7.6.3 Variations andIssues
7.7 Deviation Analysis
7.7.1 Overview
7.7.2 Construction
7.7.3 Variations andIssues
7.8 FindingPatterns inPractice
7.8.1 FindingPatternswithKNIME
7.8.2 FindingPatterns in R

8 Finding Explanations

8 Finding Explanations

8.1 DecisionTrees
8.1.1 Overview
8.1.2 Construction
8.1.3 Variations and Issues
8.2 Bayes Classifiers
8.2.1 Overview
8.2.2 Construction
8.2.3 Variations andIssues
8.3 Regression
8.3.1 Overview
8.3.2 Construction
8.3.3 Variations andIssues
8.3.4 TwoClassProblems
8.4 Rule learning
8.4.1 Propositional Rules-
8.4.2 Inductive Logic Programming or First-Order Rules
8.5 Finding Explanations in Practice
8.5.1 Finding Explanations with KNIME
8.5.2 Using Explanations with R

9 Finding Predictors

9 Finding Predictors

9.1 Nearest-Neighbor Predictors
9.1.1 Overview
9.1.2 Construction
9.1.3 Variations and Issues
9.2 Artifical Neural Networks
9.2.1 Overview
9.2.2 Construction
9.2.3 Variations and Issues
9.3 Support Vector Machines
9.3.1 Overview
9.3.2 Construction
9.3.3 Variations and Issues
9.4 Ensemble Methods
9.4.1 Overview
9.4.2 Construction
9.4.3 Further Reading
9.5 Finding Predictors inPractice
9.5.1 Finding Predictors with KNIME
9.5.2 Using Predictors in R

10 Evaluation and Deployment

10 Evaluation and Deployment

10.1 Evaluation
10.2 Deployment and Monitoring

A Statistics

A Statistics

A.1 Terms and Notation
A.2 Descriptive Statistics
A.2.1 TabularRepresentations
A.2.2 Graphical Representations
A.2.3 Characteristic Measures for One-Dimensional Data
A.2.4 Characteristic Measures for Multidimensional Data
A.2.5 Principal Component Analysis
A.3 Probability Theory
A.3.1 Probability
A.3.2 Basic Methods and Theorems
A.3.3 Random Variables
A.3.4 Characteristic Measures of Random Variables
A.3.5 Some Special Distributions
A.4 Inferential Statistics
A.4.1 Random Samples
A.4.2 Parameter Estimation
A.4.3 Hypothesis Testing

B The R Project

B The R Project

B.1 Installation and Overview
B.2 Reading Files and R Objects
B.3 R Functions and Commands
B.4 Libraries/Packages
B.5 R Workspace
B.6 Finding Help

C KNIME

C KNIME

C.1 Installation and Overview
C.2 Building Workflows
C.3 Example Flow
C.4 R Integration