Machine Learning
Gives computers the ability to learn without being explicitly programmed.
Machine Learning v.s. Data Mining v.s. Artificial Intelligence
- Overlap significantly
- ML: learning properties and adapt to new data
- DM: discovering unknown properties in data
- AI: machines performing tasks that are characteristic of human intelligence- ML is a way of achieving AI
- In the 1980s, AI all about expert systems- Expert system = knowledge base + inference engine
- Problems- Knowledge created by hand
- Things in real world not always true/false
 
- ML improvement- Knowledge learned from data
- Probability to represent the real world
 
 
 
Types
Human supervision or not
- Supervised learning- k-NN
- Linear regression
- Logistic regression
- SVM
- Decision trees & random forests
- Neural networks
 
- Unsupervised learning- Clustering- k-means
- Hierarchical cluster analysis (HCA)
- Expectation maximization
 
- Visualization & dimensionality reduction- Principal component analysis (PCA)
- Kernel PCA
- Locally-linear embedding (LLE)
- t-distributed stochastic neighbor embedding (t-SNE)
 
- Anomaly detection
- Association rule learning- Apriori
- Eclat
 
 
- Clustering
- Semi-supervised learning
- Reinforcement learning- Observe environment, select & perform action, get reward or penalty, update policy
 
Learning incrementally on the fly or not
- Online- "Data in motion"
- Model updated as data arrive
- Advantage- Suitable for streaming data
- Suitable for resource-constrained systems
- Suitable for out-of-core learning (dataset cannot fit memory)
 
- Disadvantage- Bad data needs to be monitored
 
 
- Offline (batch)- "Data at rest"
- Model estimated all data at once
- Advantage- Faster convergence
 
- Disadvantage- Not efficient if new data frequently come in
- Not applicable to reinforcement learning
 
 
Try to find patterns to build a model or not
- Instance-based- Memorize every training instances
- Generalize to new instances using similarity measure
 
- Model-based- Build a model of the given training data, and use it to make predictions
- Less vulnerable to bad data
 
Data
- More data- Generalize better
- May have sampling bias
 
- Less data- Sample noise (non-representative data)
 
Data v.s. Algorithm
Very different ML algorithms performed almost identically well on a complex problem given enough data.
But small- and medium-sized datasets are still very common.
How to better detect patterns?
- Data cleaning- Remove outliers
- Ignore/fill in missing values
 
- Feature engineering- Select the most useful features
- Combine features
- Create new features
 
- Prevent overfitting (regularization)- Simplify model- Fewer parameters
- Reduce features
 
- Gather more data
- Reduce noise in data
- Add in hyperparameters to tune the pattern
 
- Simplify model
- Prevent underfitting- Build more powerful model- More parameters
 
- Feed better features
 
- Build more powerful model
Testing & Validating
To see how well the derived model performs, try it on new data instances.
- Split dataset- Training set
- Test set- Generalization error (out-of-sample error)
 
 
No Free Lunch Theorem
No one model works best for every problem. -> It is common in ML to try multiple models.