Research

My research deals with developing and analyzing novel efficient algorithms for learning and inference, and applying these algorithms in challenging real-world domains. My research interests are mainly related to statistical machine learning and more specifically to the fields of graphical models and deep learning. Deep learning methods attempt to model high-level abstractions in data by using a deep graph with multiple processing layers, composed of multiple linear and non-linear transformations. As often pointed out, the same machine learning models and algorithms can be applied in many different research areas. In my research I concentrate on developing and analyzing those algorithms in the context of classical machine learning tasks (classification, regression, clustering, dimensionality reduction etc.) and applying them to a large variety of real world applications (computer vision, language processing, audio processing, medical imaging etc.).


Selected Projects


Deep learning classification with noisy labels. The availability of large data sets has enabled neural networks to achieve impressive recognition results. However, the presence of inaccurate or inconsistent class labels is known to deteriorate the performance of even the best classifiers in a broad range of classification problems. Noisy labels also tend to be more harmful than noisy features. In a line of projects, we defined neural network architectured and training procdures that can explicitly take care of the presence of training data with unalienable labels.

Confidence calibration. Performance alone is not sufficient. Accurately assessing the confidence of a prediction, is important in a wide range of applications, including medical diagnosis, weather forecasting and autonomous driving. Hence, there is a need to calibrate the confidence infromation reported by the network. We addressed this problem in the contexts of classification, regression and prediction sets.

Domain adaptation. The application of deep learning systems to real-world problems is hindered by the drop in performance when a network trained on data from one domain is applied to data from a different domain. The issue of domain shift is particularly critical in medical imaging where the accuracy of a model trained on data from one medical facility decreases when applied to data from a different site.

Deep clustering. Clustering is one of the most fundamental techniques in unsupervised machine learning. applying k-means methods to highdimensional data is problematic. Deep learning methods are expected to automatically discover the most suitable non-linear representations for a specified task.

 

Multi-document summarization. Common information needs are most often satisfied by multiple texts rather than by a single one. Accordingly, there is a rising interest in MultiDocument Summarization (MDS) — generating a summary for a set of topically-related documents. Inherently, MDS needs to address, either explicitly or implicitly, several subtasks embedded in this summarization setting. These include salience detection, redundancy removal, and text generation.


Message-Passing Algorithms for MIMO Wireless Communication. The detection problem for MIMO communication systems is known to be NP-hard. The factor graph that corresponds to this problem is very loopy; in fact, it is a complete graph. Hence, a straightforward application of the Belief Propagation (BP) algorithm yields very poor results. We have developed several methods that either modify the cost function we want to optimize or modify the BP messages in such a way that the BP algorithm yields improved performance. One approach is based on an optimal tree approximation of the Gaussian density of the unconstrained linear system. The finite-set constraint is then applied to obtain a cycle-free discrete distribution. Another approach is based on two-dimensional Gaussian projections and a third method is based on imposing priors on the BP messages.

Non-parametric Differential Entropy Estimation. Estimating the differential entropy given only sampled points without any prior knowledge of the distribution is a difficult task. We proposed the Meann-NN estimator for the main information theoretic measures such as differential entropy, mutual information and KL-divergence. The Mean-NN estimator is related to classical k-NN entropy estimations. However, Unlike the k-NN based estimator, the Mean-NN entropy estimator is a smooth function of the given data points and is not sensitive to small perturbations in the values of the data. Hence, it can be used within optimization procedures that are based on computing the derivatives of the cost function we optimize. We demonstrated the usefulness of the Mean-NN entropy estimation technique on the ICA problem, on clustering and on supervised and unsupervised dimensionality reduction.

LDPC Serial Scheduling. LDPC is decoded by running an iterative belief-propagation algorithm over the factor graph of the code. In the traditional message passing schedule, in each iteration all the variable nodes, and subsequently all the factor nodes, pass new messages to their neighbors. We showed that serial scheduling, in which messages are generated using the latest available information, significantly improves the convergence speed in terms of the number of iterations. It was observed experimentally in several studies that the serial schedule converges in exactly half the number of iterations compared to the standard parallel schedule. We provided a theoretical motivation for this observation by proving it for single-path graphs. Joint work with Haggai Kfir, Eran Sharon and Simon Litsyn.

Neighbourhood Components Analysis (NCA) is a method for learning a Mahalnobis distance measure for k-nearest neighbors (kNN). In particular, it finds a distance metric that maximizes the leave-one-out error on the training set for a stochastic variant of kNN. NCA can also learn a low-dimensional linear embedding of labeled data for data visualization and for improved kNN classification speed. Unlike other methods, this classification model is non-parametric without any assumption on the shape of the class distributions or the boundaries between them. Joint work with Sam Roweis, Geoffrey Hinton  and Ruslan Salakhutdinov.

Neighbourhood Component Analysis. NIPS 2004. matlab code C code

Mixture of Gaussians, Distance and Simplification. The Mixture of Gaussians (MoG) is one of the simplest graphical models. It is a flexible and powerful parametric framework for unsupervised data grouping. We suggested several approaches for computing meaningful distances between two MoGs and for MoG model simplification. The Unscented Transform is traditionally used for generalized Kalman filters in non-linear dynamical systems. We proposed the usage of the Unscented Transform for computing the KL divergence between two MoGs and for model simplification.

Simplifying mixture models using the unscented transform. IEEE PAMI, 2008.
A distance measure between GMMs based on the unscented transform and its application to speaker recognition. Eurospeech, 2005.
Hierarchical clustering of a mixture model. NIPS, 2004.
An efficient similarity measure based on approximations of KL-divergence between two Gaussian mixtures. ICCV, 2003.