Data Mining
Finding hidden information in a database; fit data to a model.
Database | Data Mining | |
---|---|---|
Query | Well defined; SQL | Poorly defined; no precise query language |
Data | Operational data | Not operational data |
Output | Precise; subset of database | Fuzzy; not a subset of database |
- Classification: map data into predefined classes
- Supervised learning
- Pattern recognition
- Prediction
- Regression: map data to real valued prediction variable
- Clustering: groups similar data together
- Unsupervised learning
- Segmentation
- Partitioning
- Summarization: map data into subsets with associated simple descriptions
- Characterization
- Generalization
- Link analysis: uncover relationships among data
- Affinity analysis
- Association rules
- Sequential analysis
- Time series analysis
- Future value prediction
- Pattern identification
- Behavior classification
- Metrics
- Usefulness of results
- Return on investment (ROI)
- Accuracy of results
- Space/time required
- Social implications
- Privacy
- Profiling
- Unauthorized use
- Database perspective
- Scalability (large data)
- Real world data (noisy data)
- Updates (dynamic data)
- Ease of use
- Visualization techniques
- Graphical
- Geometric
- Icon-based
- Pixel-based
- Hierarchical
- Hybrid
Knowledge Discovery in Databases (KDD)
Process of finding useful information & patterns in data. Data mining uses algorithms to extract information & patterns derived by the KDD process.
- Issues
- Human interaction
- Overfitting
- Outliers
- Interpretation
- Visualization
- Large datasets
- High dimensionality
- Complex data
- Missing data
- Irrelevant data
- Noisy data
- Changing data
- Integration
- Application