Artificial Data Synthesis
From Scratch
Introducing Distortions
Distortions should be representative of the type of noise in the test set; purely random noise may not help.
- Images
- Audios
Getting More Data
- Make sure you have a low bias classifier before expending the effort
- How much work?
- Artificial data synthesis
- Collect/label it yourself
- Crowd source
Ceiling Analysis
Photo OCR as Example
Machine learning pipeline:
Image -> Text Detection -> Character Segmentation -> Character Recognition
How much performance can improving a certain component be gained (upper limit)?