Machine Learning
Gives computers the ability to learn without being explicitly programmed.
Machine Learning v.s. Data Mining v.s. Artificial Intelligence
- Overlap significantly
- ML: learning properties and adapt to new data
- DM: discovering unknown properties in data
- AI: machines performing tasks that are characteristic of human intelligence
- ML is a way of achieving AI
- In the 1980s, AI all about expert systems
- Expert system = knowledge base + inference engine
- Problems
- Knowledge created by hand
- Things in real world not always true/false
- ML improvement
- Knowledge learned from data
- Probability to represent the real world
Types
Human supervision or not
- Supervised learning
- k-NN
- Linear regression
- Logistic regression
- SVM
- Decision trees & random forests
- Neural networks
- Unsupervised learning
- Clustering
- k-means
- Hierarchical cluster analysis (HCA)
- Expectation maximization
- Visualization & dimensionality reduction
- Principal component analysis (PCA)
- Kernel PCA
- Locally-linear embedding (LLE)
- t-distributed stochastic neighbor embedding (t-SNE)
- Anomaly detection
- Association rule learning
- Apriori
- Eclat
- Clustering
- Semi-supervised learning
- Reinforcement learning
- Observe environment, select & perform action, get reward or penalty, update policy
Learning incrementally on the fly or not
- Online
- "Data in motion"
- Model updated as data arrive
- Advantage
- Suitable for streaming data
- Suitable for resource-constrained systems
- Suitable for out-of-core learning (dataset cannot fit memory)
- Disadvantage
- Bad data needs to be monitored
- Offline (batch)
- "Data at rest"
- Model estimated all data at once
- Advantage
- Faster convergence
- Disadvantage
- Not efficient if new data frequently come in
- Not applicable to reinforcement learning
Try to find patterns to build a model or not
- Instance-based
- Memorize every training instances
- Generalize to new instances using similarity measure
- Model-based
- Build a model of the given training data, and use it to make predictions
- Less vulnerable to bad data
Data
- More data
- Generalize better
- May have sampling bias
- Less data
- Sample noise (non-representative data)
Data v.s. Algorithm
Very different ML algorithms performed almost identically well on a complex problem given enough data.
But small- and medium-sized datasets are still very common.
How to better detect patterns?
- Data cleaning
- Remove outliers
- Ignore/fill in missing values
- Feature engineering
- Select the most useful features
- Combine features
- Create new features
- Prevent overfitting (regularization)
- Simplify model
- Fewer parameters
- Reduce features
- Gather more data
- Reduce noise in data
- Add in hyperparameters to tune the pattern
- Simplify model
- Prevent underfitting
- Build more powerful model
- More parameters
- Feed better features
- Build more powerful model
Testing & Validating
To see how well the derived model performs, try it on new data instances.
- Split dataset
- Training set
- Test set
- Generalization error (out-of-sample error)
No Free Lunch Theorem
No one model works best for every problem. -> It is common in ML to try multiple models.