Data Mining

Finding hidden information in a database; fit data to a model.

Database Data Mining
Query Well defined; SQL Poorly defined; no precise query language
Data Operational data Not operational data
Output Precise; subset of database Fuzzy; not a subset of database

  • Classification: map data into predefined classes
    • Supervised learning
    • Pattern recognition
    • Prediction
  • Regression: map data to real valued prediction variable
  • Clustering: groups similar data together
    • Unsupervised learning
    • Segmentation
    • Partitioning
  • Summarization: map data into subsets with associated simple descriptions
    • Characterization
    • Generalization
  • Link analysis: uncover relationships among data
    • Affinity analysis
    • Association rules
    • Sequential analysis
  • Time series analysis
    • Future value prediction
    • Pattern identification
    • Behavior classification

  • Metrics
    • Usefulness of results
    • Return on investment (ROI)
    • Accuracy of results
    • Space/time required
  • Social implications
    • Privacy
    • Profiling
    • Unauthorized use
  • Database perspective
    • Scalability (large data)
    • Real world data (noisy data)
    • Updates (dynamic data)
    • Ease of use
  • Visualization techniques
    • Graphical
    • Geometric
    • Icon-based
    • Pixel-based
    • Hierarchical
    • Hybrid

Knowledge Discovery in Databases (KDD)

Process of finding useful information & patterns in data. Data mining uses algorithms to extract information & patterns derived by the KDD process.

  • Issues
    • Human interaction
    • Overfitting
    • Outliers
    • Interpretation
    • Visualization
    • Large datasets
    • High dimensionality
    • Complex data
    • Missing data
    • Irrelevant data
    • Noisy data
    • Changing data
    • Integration
    • Application

results matching ""

    No results matching ""