Artificial Data Synthesis

From Scratch

Introducing Distortions

Distortions should be representative of the type of noise in the test set; purely random noise may not help.

  1. Images
  2. Audios

Getting More Data

  1. Make sure you have a low bias classifier before expending the effort
  2. How much work?
    1. Artificial data synthesis
    2. Collect/label it yourself
    3. Crowd source

Ceiling Analysis

Photo OCR as Example

Machine learning pipeline:

Image -> Text Detection -> Character Segmentation -> Character Recognition

How much performance can improving a certain component be gained (upper limit)?

results matching ""

    No results matching ""