Wednesday, February 4, 2015

My ICML 2014 reading list


         1.Discriminative Features via Generalized Eigenvectors
Right representation of data dictates the success rate of the algorithm used. Deep learning and dictionary learning methods to address these issues in domains such as image classification, audio systems and others. This paper talks about computationally simple methods to extract discriminative features from large data sets. The authors claims that
In this work, we explore conceptually and computationally simple ways to create discriminative features that can scale to a large number of examples, even when data is distributed across many machines. Our techniques are not a panacea. They are exploiting simple second order structure in the data and it is very easy to come up with sufficient conditions under which they will not give any advantage over learning using the raw signal. Nevertheless, they empirically work remarkably well. 
         Abstract
 Representing examples in a way that is compatible with the underlying classifier can greatly enhance the performance of a learning system. In this paper we investigate scalable techniques for inducing discriminative features by taking advantage of simple second order structure in the data. We focus on multiclass classification and show that features extracted from the generalized eigenvectors of the class conditional second moments lead to classifiers with excellent empirical performance. Moreover, these features have attractive theoretical properties, such as inducing representations that are invariant to linear transformations of the input. We evaluate classifiers built from these features on three different tasks, obtaining state of the art results.

2. Coding for Random Projections 
Abstract
 The method of random projections has become popular for large-scale applications in statistical learning, information retrieval, bio-informatics and other applications. Using a well-designed coding scheme for the projected data, which determines the number of bits needed for each projected value and how to allocate these bits, can significantly improve the effectiveness of the algorithm, in storage cost as well as computational speed. In this paper, we study a number of simple coding schemes, focusing on the task of similarity estimation and on an application to training linear classifiers. We demonstrate that uniform quantization outperforms the standard and influential method (Datar et al., 2004), which used a window-and-random offset scheme. Indeed, we argue that in many cases coding with just a small number of bits suffices. Furthermore, we also develop a non-uniform 2-bit coding scheme that generally performs well in practice, as confirmed by our experiments on training linear support vector machines (SVM). Proofs and additional experiments are available at arXiv:1308.2218. In the context of using coded random projections for approximate near neighbor search by building hash tables (arXiv:1403.8144) (Li et al., 2014), we show that the step of random offset in (Datar et al., 2004) is again not needed and may hurt the performance. Furthermore, we show that, unless the target similarity level is high, it usually suffices to use only 1 or 2 bits to code each hashed value for this task. Section 7 presents some experimental results for LSH