My research deals with developing and analyzing novel efficient algorithms for learning and inference, and applying these algorithms in challenging real world domains. My research interests are mainly related to statistical machine learning and more specifically to the fields of graphical models and deep learning. What is perhaps the most distinctive about the graphical model approach is its naturalness in formulating probabilistic models of complex phenomena, while maintaining control over the computational cost associated with these models. Deep learning methods attempt to model high level abstractions in data by using a deep graph with multiple processing layers, composed of multiple linear and non-linear transformations. As often pointed out, the same machine learning models and algorithms can be applied in many different research areas. In my research I concentrate on developing and analyzing those algorithms in the context of classical machine learning tasks (classification, clustering, dimensionality reduction etc.) and applying them to a large variety of real world applications.

Selected Projects


Message-Passing Algorithms for MIMO Wireless Communication. The detection problem for MIMO communication systems is known to be NP-hard. The factor graph that corresponds to this problem is very loopy; in fact, it is a complete graph. Hence, a straightforward application of the Belief Propagation (BP) algorithm yields very poor results. We have developed several methods that either modify the cost function we want to optimize or modify the BP messages in such a way that the BP algorithm yields improved performance. One approach is based on an optimal tree approximation of the Gaussian density of the unconstrained linear system. The finite-set constraint is then applied to obtain a cycle-free discrete distribution. Another approach is based on two-dimensional Gaussian projections and a third method is based on imposing priors on the BP messages. Joint work with Amir Leshem.

Improved MIMO detection based on successive tree approximations . IEEE Int. Symposium on Information Theory (ISIT), 2013. C code
Iterative tomographic solution of integer least squares problems with applications to MIMO detection. IEEE Journal of Selected Topics in Signal Processing, 2011.
MIMO detection for high-order QAM based on a Gaussian tree approximation. IEEE Trans. Information Theory, 2011.
Pseudo prior belief propagation for densely connected discrete graphs. IEEE Information Theory Workshop (ITW), 2010.
A Gaussian tree approximation for integer least-squares. NIPS, 2009.


Non-parametric Differential Entropy Estimation. Estimating the differential entropy given only sampled points without any prior knowledge on the distribution is a difficult task. We proposed the Meann-NN estimator for the main information theoretic measures such as differential entropy, mutual information and KL-divergence. The Mean-NN estimator is related to classical k-NN entropy estimations. However, Unlike the k-NN based estimator, the Mean-NN entropy estimator is a smooth function of the given data points and is not sensitive to small perturbations in the values of the data. Hence, it can be used within optimization procedures that are based on computing the derivatives of the cost function we optimize. We demonstrated the usefulness of the Mean-NN entropy estimation technique on the ICA problem, on clustering and on supervised and unsupervised dimensionality reduction. Joint work with Lev Faivishevsky.

Dimensionality reduction based on non-parametric mutual information. Neurocomputing 2012.
Unsupervised Feature Selection based on Non-Parametric Mutual Information. IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2012.
A nonparametric information theoretic clustering algorithm.  ICML, 2010.
ICA based on a smooth estimation of the differential entropy. NIPS 2008.


Textual Entailment Graphs. The objective of Textual Entailment is to automatically recognize whether a target textual hypothesis can be inferred from an input text. We can view this problem as the problems of learning a global semantic graph, where each node is a predicate and the graph describes all entailment rules between them. This allows us to search for semantic graphs that satisfy certain structural properties. For example, if 'cause an increase' entails 'raise' and 'raise' entails 'affect', then 'cause an increase' must entail 'affect'. This is known as the property of transitivity. We have developed algorithms that exploit this property to obtain improved semantic rules. We have also utilized other structural properties of semantic graphs such as sparseness (i.e., most predicates do not entail one another) and hierarchy (i.e., entailment relations usually form a hierarchy of predicates) to improve rule learning. The general learning problem can be formed as an Integer Linear Programming which is NP hard. A crucial challenge in utilizing structural information is to develop efficient algorithms that can be computed in reasonable time even on very large graphs. Joint work with Jonathan Berant and Ido Dagan.

Efficient tree-based approximation for entailment graph learning. ACL 2012.
Learning entailment relations by global graph structure optimization. Computational Linguistics 2012.
Global learning of typed entailment rules.  ACL 2011.
Global learning of focused entailment graphs. ACL 2010.


Medical Image Retrieval in Large Datasets. Retrieving similar cases from a large archive is a very challenging task and is one of the key issues in the rapidly expanding domain of content-based medical image retrieval. We developed an efficient image categorization and retrieval system applied to medical image databases, in particular large radiograph archives. The methodology is based on local patch representation of the image content, using the visual words approach. We explore the effects of various parameters on system performance, and show best results using dense sampling of simple features with spatial content, and a non-linear kernel-based Support Vector Machine (SVM) classifier. Our system was ranked first in the ImageCLEF 2009 medical annotation task. In a later work we developed a localized image lassification system. The system discriminates between healthy and pathological cases and indicates the subregion in the image that is automatically found to be most relevant for the decision. Joint work with Uri Avni and Hayit Greenspan.

X-ray categorization and spatial localization of chest pathologies. MICCAI, 2011.
X-ray categorization and retrieval on the organ and pathology level. IEEE Trans. Medical Imaging, 2011.
Addressing the ImageClef 2009 Challenge using a patch-based visual words.  CLEF, 2009.


Medical Image Segmentation and Lesion detection. Automatic segmentation of brain MRI images to the three main tissue types is a topic of great importance and much research. A realted task is detection and segmentation of Multiple Sclerosis (MS) lesions in brain images. To capture the complex tissue spatial layout, we used probabilistic model that is based on a mixture of multiple spatially oriented Gaussians per tissue. Another medical imaging challenge is lesion detection and segmentation in uterine cervix images. The image is modeled using an MRF in which watershed regions correspond to binary random variables indicating whether the region is part of the lesion tissue or not. The local pairwise factors on the arcs of the watershed map indicate whether the arc is part of the object boundary. The factors are based on supervised learning of a visual word distribution. The final lesion region segmentation is obtained using a loopy belief propagation applied to the watershed arc-level MRF. Joint work with Amit Ruf, Oren Freifled, Amir Alush and Hayit Greenspan.

Automated and interactive lesion detection and segmentation in uterine cervix Images.
IEEE Trans. on Medical Imaging, 2010.
Multiple sclerosis lesion detection using constrained GMM and curve evolution. Int. Journal of Biomedical Imaging, 2009.
Constrained Gaussian mixture model framework for automatic segmentation of MR brain images. IEEE Trans. on Medical Imaging, 2006.


LDPC Serial Scheduling. LDPC is decoded by running an iterative belief-propagation algorithm over the factor graph of the code. In the traditional message passing schedule, in each iteration all the variable nodes, and subsequently all the factor nodes, pass new messages to their neighbors. We showed that serial scheduling, in which messages are generated using the latest available information, significantly improves the convergence speed in terms of number of iterations. It was observed experimentally in several studies that the serial schedule converges in exactly half the number of iterations compared to the standard parallel schedule. We provided a theoretical motivation for this observation by proving it for single-path graphs. Joint work with Haggai Kfir, Eran Sharon and Simon Litsyn.

Serial schedules for belief-propagation: analysis of convergence time. IEEE Trans. on Information Theory, 2008.
Efficient serial message-passing schedules for LDPC decoding. IEEE Trans. on Information Theory, 2007.
An efficient message-passing schedule for LDPC decoding. Proc. Electrical and Electronic Engineers in Israel, 2004.


Neighbourhood Components Analysis (NCA) is a method for learning a Mahalnobis distance measure for k-nearest neighbours (kNN). In particular, it finds a distance metric that maximises the leave one out error on the training set for a stochastic variant of kNN. NCA can also learn a low-dimensional linear embedding of labelled data for data visualisation and for improved kNN classification speed. Unlike other methods, this classification model is non-parametric without any assumption on the shape of the class distributions or the boundaries between them. Joint work with Sam Roweis, Geoffrey Hinton  and Ruslan Salakhutdinov.

 Neighbourhood Component Analysis. NIPS 2004. matlab code C code


Mixture of Gaussians, Distance and Simplification. The Mixture of Gaussians (MoG) is one of the simplest graphical models. It is a flexible and powerful parametric framework for unsupervised data grouping. We suggested several approaches for computing a meaningful distances between two MoGs and for MoG model simplification. The Unscented Transform is traditionally used for generalized Kalman filter in non-linear dynamical systems. We proposed the usage of the Unscented Transform for computing the KL divergence between two MoGs and for model simplification.

Simplifying mixture models using the unscented transform. IEEE PAMI, 2008.
A distance measure between GMMs based on the unscented transform and its
application to speaker recognition.
 Eurospeech, 2005.
Hierarchical clustering of a mixture model. NIPS, 2004.
An efficient similarity measure based on approximations of KL-divergence between two Gaussian mixtures. ICCV, 2003.