Calcium imaging data promises to transform the field of neuroscience by making it possible to record from large populations of neurons simultaneously. However, determining the exact moment in time at which a neuron spikes, from a calcium imaging data set, amounts to a non-trivial deconvolution problem which is of critical importance for downstream analyses. While a number of formulations have been proposed for this task in the recent literature, in this article, we focus on a formulation recently proposed in Jewell and Witten (2018. Exact spike train inference via ℓ0 optimization. The Annals of Applied Statistics 12(4), 2457–2482) that can accurately estimate not just the spike rate, but also the specific times at which the neuron spikes. We develop a much faster algorithm that can be used to deconvolve a fluorescence trace of 100 000 timesteps in less than a second. Furthermore, we present a modification to this algorithm that precludes the possibility of a “negative spike”. We demonstrate the performance of this algorithm for spike deconvolution on calcium imaging datasets that were recently released as part of the spikefinder challenge. The algorithm presented in this article was used in the Allen Institute for Brain Science’s "platform paper" to decode neural activity from the Allen Brain Observatory; this is the main scientific paper in which their data resource is presented.
We wish to congratulate the authors on their extension of model-X knockoffs beyond the Gaussian setting considered in Candès et al. (2018) to the well-studied class of Markov chain and hidden Markov models. This contribution builds upon a beautiful body of work that allows us to rethink false discovery rate control in the context of large and complex datasets (Barber & Candès, 2015, 2018; Weinstein et al., 2017; Barber et al., 2018; Candès et al., 2018; Katsevich & Sabatti, 2018). We believe that this innovation will help model-X knockoffs to be used in practical applications, such as genome-wide association studies explored in the paper, as well as other biological problems where Markov chain or hidden Markov models are common, see Sesia et al. (2019, §1.3) for some examples. In this discussion, we comment on two directions that merit further investigation.
Somatic mutations are a primary contributor to malignancy in human cells. Accurate detection of mutations is needed to define the clonal composition of tumours whereby clones may have distinct phenotypic properties. Although analysis of mutations over multiple tumour samples from the same patient has the potential to enhance identification of clones, few analytic methods exploit the correlation structure across samples. We posited that incorporating clonal information into joint analysis over multiple samples would improve mutation detection, particularly those with low prevalence. In this paper, we develop a new procedure called MuClone, for detection of mutations across multiple tumour samples of a patient from whole genome or exome sequencing data. In addition to mutation detection, MuClone classifies mutations into biologically meaningful groups and allows us to study clonal dynamics. We show that, on lung and ovarian cancer datasets, MuClone improves somatic mutation detection sensitivity over competing approaches without compromising specificity.
In recent years, new technologies in neuroscience have made it possible to measure the activities of large numbers of neurons in behaving animals. For each neuron, a fluorescence trace is measured; this can be seen as a first-order approximation of the neuron's activity over time. Determining the exact time at which a neuron spikes on the basis of its fluorescence trace is an important open problem in the field of computational neuroscience. Recently, a convex optimization problem involving an L1 penalty was proposed for this task. In this paper, we slightly modify that recent proposal by replacing the L1 penalty with an L0 penalty. In stark contrast to the conventional wisdom that L0 optimization problems are computationally intractable, we show that the resulting optimization problem can be efficiently solved for the global optimum using an extremely simple and efficient dynamic programming algorithm.
The emergence of compact GPS systems and the establishment of open data initiatives has resulted in widespread availability of spatial data for many urban centres. These data can be leveraged to develop data-driven intelligent resource allocation systems for urban issues such as policing, sanitation, and transportation. We employ techniques from Bayesian non-parametric statistics to develop a process which captures a common characteristic of urban spatial datasets. Specifically, our new spatial process framework models events which occur repeatedly at discrete spatial points, the number and locations of which are unknown a priori. We develop a representation of our spatial process which facilitates posterior simulation, resulting in an interpretable and computationally tractable model. The framework's superiority over both empirical grid-based models and Dirichlet process mixture models is demonstrated by fitting, interpreting, and comparing models of graffiti prevalence for both downtown Vancouver and Manhattan.
To examine the variance reduction from portfolios with both primary and derivative assets we develop a mean–variance Markovitz portfolio management problem. By invoking the delta–gamma approximation we reduce the problem to a well-posed quadratic programming problem. From a practitioner’s perspective, the primary goal is to understand the benefits of adding derivative securities to portfolios of primary assets. Our numerical experiments quantify this variance reduction from sample equity portfolios to mixed portfolios (containing both equities and equity derivatives).
Recently reconstructing evolutionary histories has become a computational issue due to the increased availability of genetic sequencing data and relaxations of classical modelling assumptions. This thesis specializes a Divide & conquer sequential Monte Carlo (DCSMC) inference algorithm to phylogenetics to address these challenges. In phylogenetics, the tree structure used to represent evolutionary histories provides a model decomposition used for DCSMC. In particular, speciation events are used to recursively decompose the model into subproblems. Each subproblem is approximated by an independent population of weighted particles, which are merged and propagated to create an ancestral population. This approach provides the flexibility to relax classical assumptions on large trees by parallelizing these recursions.
Phylogenetic Inference by Divide and Conquer Sequential Monte Carlo (DCSMC), Statistical Society of Canada Annual Meeting, Dalhousie University (June 2015)
Bayesian Non-Parametric Model for a class of Spatial Point Processes, Statistical Society of Canada Annual Meeting, University of Toronto (May 2014)
Stochastic pairs trading through cointegration, The First 3-C Risk Forum & 2011 International Conference on Engineering and Risk Management (ERM), Fields Institute (October 2011)