Label noise can have severe negative effects on the performance of a classifier. Such noise can either arise by adversarial manipulation of the training data or from unskilled annotators frequently encountered in crowd sourcing (e.g. Amazon mechanical turk). Based on the assumption that an expert has provided some fraction of the training data, where labels can be assumed to be true, we propose a new preprocessing method to identify and correct noisy labels via a mutual consistency check using a Parzen window classifier. While the resulting optimization problem turns out to be a combinatorial problem, we design an efficient algorithm for which we provide approximation guarantees. Extensive experimental evaluation shows that our method performs similar and often much better than existing methods for the detection of noisy labels, thus leading to a boost in performance of the resulting classifiers.
LND is a Matlab implementation of LND. Please see 'readme.txt' inside this 'LND.tar.gz' for usage of code. This contains 3 solvers :
LND with Spannogram framework : LND_spanno.m
LND solved by sequential linearization : LND_SLP.m
LND solved by sequential linearization with initial point is set by level 1 approximation of LND_spanno : LND_SLP1.m
CVscript is a matlab script for LND with cross validation. Please see 'readme.txt' inside this 'cvLND.tar.gz' for usage of code. This contains 3 cross-validation scripts for 3 LND solvers :
LND with Spannogram framework : CV_LND_spanno.m
LND solved by sequential linearization : CV_LND_SLP.m
LND solved by sequential linearization with initial point is set by level 1 approximation of LND_spanno : CV_LND_SLP1.m
To use this script one need to install ' libsvm-3.14' and add it to the Matlab path.
Dataset (dataLND.tar.gz) contains 8 datasets induced with different kinds and various amount of label noise. We have used them in our experiments. They are in Matlab data file format.
sahely.bhadra@aalto.fi