Publications

Bayesian Nonparametric Hawkes Processes

Published in Bayesian Nonparametrics Workshop at NeurIPS18, 2018

Bayesian nonparametric priors (BNPs) have risen in popularity to capture the latent structure behind streams of data whose temporal nature has been modeled with Hawkes processes (HPs). However, a closer look into the literature reveals that many of these works do not actually rely on a valid BNP model. This is due to the fact that their BNP construction leads to patterns that “vanish” over time, i.e., that are assigned zero probability. In this work, we formalize this problem and develop a general and modular methodology to avoid the vanishing prior issue while, at the same time, allowing us to place a valid BNP over event data of an HP. The proposed methodology is general enough to model users’ activity and interactions, as well as to incorporate any valid BNP prior (e.g., the Chinese Restaurant Process and its hierarchical and nested variants).

Download here

Miscellaneous Reports

A Stochastic Extension of Hamiltonian Descent Methods

Written:

We analyse the recently released work in Maddison et. al. (2018), which employs Hamiltonian dynamics to allow faster convergence rate on a wider class of objectices. The proposed framework allows for linear rates of convergence on certain classes of non-strongly convex functions and generalizes the momentum method to non-classical kinetic energies. We propose a stochastic variant of one such method, and sketch its convergence proof. We also implement the deterministic methods as well as the stochastic counterpart and compare various facets of the methods with baselines such as Gradient Descent and Momentum.

Download here

Incremental Training of a 2 Layer Network

Written:

Gradient boosting for convex objectives has had a rich history and literature with provable guarantees for many years now. The same cannot be said for the workings of a neural network, while the class of neural networks is a set of incredibly powerful models, which can approximate complex function mappings. In this project, we make an attempt to combine the two approaches with a boosted model as a warm start for a single layer neural network, with provable convergence guarantees. We also see how gradient boosting on single node single hidden layer network essentially corresponds to sequential training of hidden layer nodes, and therefore can be used as a starting point for application of the backpropagation scheme for better results. Among these, we also look at the convergence analysis of functional gradient descent, which is used to train the weak learners, or nodes in our case, and empirical results received thereafter.

Download here

Convergence of Grandient Descent and its Variants

Written:

In this text, we survey prominent Gradient Descent techniques for optimization. Both, deterministic and stochastic methods are reviewed, such as SGD, Momentum, AdaGrad, ADAM and NAG. Convergence analyses of these algorithms are given, for objectives with various constraints on convexity, strong smoothness and strong convexity. Particularly for Adam, we review a recent work showing that the algorithm does not always converge, and restate the rigorous proof of the counterexample. Finally, the text aims to act as a reference for the reader to refer to convergence analyses of the above-mentioned methods, along with certain comments on the performance of these methods.

Download here

Variational Inference Using Transformations on Distributions

Written:

Variational inference methods often focus on the problem of efficient model optimization, with little emphasis on the choice of the approximating posterior. In this paper, we review and implement the various methods that enable us to develop a rich family of approximating posteriors. We show that one particular method employing transformations on distributions results in developing very rich and complex posterior approximation. We analyze its performance on the MNIST dataset by implementing with a Variational Autoencoder and demonstrate its effectiveness in learning better posterior distributions.

Download here