Data Mining
Finding hidden information in a database; fit data to a model.
| Database | Data Mining | |
|---|---|---|
| Query | Well defined; SQL | Poorly defined; no precise query language | 
| Data | Operational data | Not operational data | 
| Output | Precise; subset of database | Fuzzy; not a subset of database | 

- Classification: map data into predefined classes- Supervised learning
- Pattern recognition
- Prediction
 
- Regression: map data to real valued prediction variable
- Clustering: groups similar data together- Unsupervised learning
- Segmentation
- Partitioning
 
- Summarization: map data into subsets with associated simple descriptions- Characterization
- Generalization
 
- Link analysis: uncover relationships among data- Affinity analysis
- Association rules
- Sequential analysis
 
- Time series analysis- Future value prediction
- Pattern identification
- Behavior classification
 

- Metrics- Usefulness of results
- Return on investment (ROI)
- Accuracy of results
- Space/time required
 
- Social implications- Privacy
- Profiling
- Unauthorized use
 
- Database perspective- Scalability (large data)
- Real world data (noisy data)
- Updates (dynamic data)
- Ease of use
 
- Visualization techniques- Graphical
- Geometric
- Icon-based
- Pixel-based
- Hierarchical
- Hybrid
 
Knowledge Discovery in Databases (KDD)
Process of finding useful information & patterns in data. Data mining uses algorithms to extract information & patterns derived by the KDD process.

- Issues- Human interaction
- Overfitting
- Outliers
- Interpretation
- Visualization
- Large datasets
- High dimensionality
- Complex data
- Missing data
- Irrelevant data
- Noisy data
- Changing data
- Integration
- Application