Correction of Noisy Labels via Mutual Consistency Check

Sahely Bhadraa,b and Matthias Heinb
aMax-Planck Institute for Informatics, Saarbrucken, Germany
bSaarland University, Saarbrucken, Germany



LND : Label noise Detection Method

Label noise can have severe negative effects on the performance of a classifier. Such noise can either arise by adversarial manipulation of the training data or from unskilled annotators frequently encountered in crowd sourcing (e.g. Amazon mechanical turk). Based on the assumption that an expert has provided some fraction of the training data, where labels can be assumed to be true, we propose a new preprocessing method to identify and correct noisy labels via a mutual consistency check using a Parzen window classifier. While the resulting optimization problem turns out to be a combinatorial problem, we design an efficient algorithm for which we provide approximation guarantees. Extensive experimental evaluation shows that our method performs similar and often much better than existing methods for the detection of noisy labels, thus leading to a boost in performance of the resulting classifiers.

Paper:

The corresponding paper (PDF) appears in a special issue of Neurocomputing. If you use this code below, you have to cite the paper.

Software:

To use this script one need to install ' libsvm-3.14' and add it to the Matlab path.

Dataset:

Dataset (dataLND.tar.gz) contains 8 datasets induced with different kinds and various amount of label noise. We have used them in our experiments. They are in Matlab data file format.

Contact:

sahely.bhadra@aalto.fi