Statistics Technical Reports:Search | Browse by year

Sorted by:

Title:Structural Equation Models: A Critical Review
Author(s):Freedman, David A.; 
Date issued:Nov 2003 (PDF)
Abstract:We review the basis for inferring causation by structural equation modeling. Parameters should be stable under interventions, and so should error distributions. There are also statistical conditions that must be satisfied. Stability is difficult to establish a priori, and the statistical conditions are equally problematic. Therefore, causal relationships are seldom to be inferred from a data set by running regressions, unless there is substantial prior knowledge about the mechanisms that generated the data. Regression models are often used to infer causation from association. For instance, Yule (1899) showed-- or tried to show-- that welfare was a cause of poverty. Path models and structural equation models are later refinements of the technique. Besides Yule, examples to be discussed here include Blau and Duncan (1967) on stratification, as well as Gibson (1988) on the causes of McCarthyism. Strong assumptions are required to infer causation from association by modeling. The assumptions are of two kinds: (i) causal, and (ii) statistical. Regression models are often used to infer causation from association. For instance, Yule (1899) showed-- or tried to show-- that welfare was a cause of poverty. Path models and structural equation models are later refinements of the technique. Besides Yule, examples to be discussed here include Blau and Duncan (1967) on stratification, as well as Gibson (1988) on the causes of McCarthyism. Strong assumptions are required to infer causation from association by modeling. The assumptions are of two kinds: (i) causal, and (ii) statistical. These assumptions will be formulated explicitly, with the help of response schedules in hypothetical experiments. In particular, parameters and error distributions must be stable under intervention. That will be hard to demonstrate in observational settings. Statistical conditions (like independence) are also problematic, and latent variables create further complexities. The article ends with a review of the literature and a summary. Causal modeling with path diagrams will be the primary topic. The issues are not simple, so examining them from several perspectives may be helpful.
Keyword note:Freedman__David
Report ID:651

Title:Learning Graphical Models for Stationary Time Series
Author(s):Bach, Francis R.; Jordan, Michael I.; 
Date issued:Sep 2003 (PDF) (PostScript)
Abstract:Probabilistic graphical models can be extended to time series by considering probabilistic dependencies between entire time series. For stationary Gaussian time series, the graphical model semantics can be expressed naturally in the frequency domain, leading to interesting families of structured time series models that are complementary to families defined in the time domain. In this paper, we present an algorithm to learn the structure from data for directed graphical models for stationary Gaussian time series. We describe an algorithm for efficient forecasting for stationary Gaussian time series whose spectral densities factorize in a graphical model. We also explore the relationships between graphical model structure and sparsity, comparing and contrasting the notions of sparsity in the time domain and the frequency domain. Finally, we show how to make use of Mercer kernels in this setting, allowing our ideas to be extended to nonlinear models.
Keyword note:Bach__Francis_R Jordan__Michael_I
Report ID:650

Title:Graphical models, exponential families, and variational inference
Author(s):Wainwright, Martin J.; Jordan, Michael I.; 
Date issued:Sep 2003 (PDF) (PostScript)
Abstract:The formalism of probabilistic graphical models provides a unifying framework for the development of large-scale multivariate statistical models. Graphical models have become a focus of research in many applied statistical and computational fields, including bioinformatics, information theory, signal and image processing, information retrieval and machine learning. Many problems that arise in specific instances---including the key problems of computing marginals and modes of probability distributions---are best studied in the general setting. Working with exponential family representations, and exploiting the conjugate duality between the cumulant generating function and the entropy for exponential families, we develop general variational representations of the problems of computing marginal probabilities and modes. We describe how a wide variety of known computational algorithms---including mean field methods and cluster variational techniques---can be understood in terms of approximations of these variational representations. We also present novel convex relaxations based on the variational framework. The variational approach provides a complementary alternative to Markov chain Monte Carlo as a general source of approximation methods for inference in large-scale statistical models.
Keyword note:Wainwright__Martin Jordan__Michael_I
Report ID:649

Title:[Title unavailable]
Author(s):Pitman, Jim; 
Date issued:September 2003
Keyword note:Pitman__Jim
Report ID:648

Title:Multiple-sequence functional annotation and the generalized hidden Markov phylogeny
Author(s):McAuliffe, Jon D.; Pachter, Lior; Jordan, Michael I.; 
Date issued:Aug 2003 (PDF) (PostScript)
Abstract:Phylogenetic shadowing is a new comparative genomics principle which allows for the discovery of conserved regions in sequences from multiple closely-related organisms. We develop a formal probabilistic framework for combining phylogenetic shadowing with feature-based functional annotation methods. The resulting model, a generalized hidden Markov phylogeny (GHMP), applies to a variety of situations where functional regions are to be inferred from evolutionary constraints. In particular, we show how GHMPs can be used to predict complete shared gene structures in multiple primate sequences. We also describe SHADOWER, our implementation of such a prediction system. We find that SHADOWER outperforms previously reported ab initio gene finders, including comparative human-mouse approaches, on a small sample of diverse exonic regions. Finally, we report on an empirical analysis of SHADOWER's performance which reveals that as few as five well-chosen species may suffice to attain maximal sensitivity and specificity in exon demarcation.
Keyword note:McAuliffe__Jon Pachter__Lior Jordan__Michael_I
Report ID:647

Title:Kernel-based Data Fusion and its Application to Protein Function Prediction in Yeast
Author(s):Lanckriet, G. R. G.; Deng, M.; Cristianini, N.; Jordan, M. I.; Noble, W. S.; 
Date issued:Aug 2003 (PDF) (PostScript)
Abstract:Kernel methods provide a principled framework in which to represent many types of data, including vectors, strings, trees and graphs. As such, these methods are useful for drawing inferences about biological phenomena. We describe a method for combining multiple kernel representations in an optimal fashion, by formulating the problem as a convex optimization problem that can be solved using semidefinite programming techniques. The method is applied to the problem of predicting yeast protein functional classifications using a support vector machine (SVM) trained on five types of data. For this problem, the new method performs better than a previously-described Markov random field method, and better than the SVM trained on any single type of data.
Keyword note:Lanckriet__G_R_G Deng__M Cristianini__N Jordan__Michael_I Noble__W_S
Report ID:646

Title:Efficient Independent Component Analysis (II)
Author(s):Chen, Aiyou; Bickel, Peter J.; 
Date issued:Jul 2003 (PDF)
Abstract:Independent component analysis (ICA) has been widely used in separating hidden sources from observed linear mixtures in many fields such as brain imaging analysis, signal processing, telecommunication. Many statistical techniques based on M-estimates have been proposed in estimating the mixing matrix. Recently a few methods based on nonparametric tools are also available. However, in-depth analysis on the convergence rate and asymptotic efficiency has not been available. In this paper, we analyze ICA under the framework of semiparametric theory [see Bickel, Klaassen, Ritov and Wellner (1993)] and propose a straightforward estimate based on the efficient score function by using B-spline approximations. This estimate exhibits better performance than standard ICA methods in a variety of simulations. It is proved that this estimator is Fisher efficient under moderate conditions.
Pub info:PDf
Keyword note:Chen__Aiyou Bickel__Peter_John
Report ID:645

Title:Regenerative composition structures.
Author(s):Gnedin, Alexander; Pitman, Jim; 
Date issued:Jul 2003
Date modified:revised December 8, 2003 (PDF) (PostScript)
Abstract:A new class of random composition structures (the ordered analog of Kingman's partition structures) is defined by a regenerative description of component sizes.Each regenerative composition structure is represented by a process of random sampling of points from an exponential distribution on the positive halfline, and separating the points into clusters by an independent regenerative random set. Examples are composition structures derived from residual allocation models, including one associated with the Ewens sampling formula, and composition structures derived from the zero set of a Brownian motion or Bessel process. We provide characterisation results and formulas relating the distribution of the regenerative composition to the L\'evy parameters of a subordinator whose range is the corresponding regenerative set. In particular, the only reversible regenerative composition structures are those associated with the interval partition of [0,1] generated by excursions of a standard Bessel bridge of dimension 2 - 2 a for some a in [0,1].
Keyword note:Gnedin__Alexander Pitman__Jim
Report ID:644

Title:Path transformations of first passage bridges
Author(s):Bertoin, Jean; Chaumont, Loic; Pitman, Jim; 
Date issued:Jul 2003 (PDF) (PostScript)
Abstract:We define the first passage bridge from 0 to x as the Brownian motion on the time interval [0,1] conditioned to first hit x at time 1. We show that this process may be related to the Brownian bridge, the Bessel bridge or the Brownian excursion via some path transformations, the main one being an extension of Vervaat's transformation. We also provide an extension of these results to certain bridges with cyclically exchangeable increments.
Keyword note:Bertoin__Jean Chaumont__Loic Pitman__Jim
Report ID:643

Title:Dutch Book against some "Objective" Priors
Author(s):Eaton, Morris L.; Freedman, David A.; 
Date issued:Jun 2003 (PDF) (PostScript)
Abstract:"Dutch book," "incoherence," and "strong inconsistency" are generally equivalent: there is a system of bets which is expected to make money for the gambler, whatever the state of nature may be. As de Finetti showed, a bookie who is not a Bayesian is subject to a dutch book, under certain highly stylized rules of play-- a fact often used as an argument against frequentists. So- called "objective" or "uninformative" priors may also be subject to a dutch book. This note explains, in a relatively simple and self-contained way, how to make dutch book against a frequently- recommended uninformative prior for covariance matrices.
Keyword note:Eaton__Morris_L Freedman__David
Report ID:642

Title:Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces
Author(s):Fukumizu, Kenji; Bach, Francis R.; Jordan, Michael I.; 
Date issued:May 2003 (PDF) (PostScript)
Abstract:We propose a novel method of dimensionality reduction for supervised learning problems. Given a regression or classification problem in which we wish to predict a response variable $Y$ from an explanatory variable $X$, we treat the problem of dimensionality reduction as that of finding a low-dimensional ``effective subspace'' of $X$ which retains the statistical relationship between $X$ and $Y$. We show that this problem can be formulated in terms of conditional independence. To turn this formulation into an optimization problem we establish a general nonparametric characterization of conditional independence using covariance operators on a reproducing kernel Hilbert space. This characterization allows us to derive a contrast function for estimation of the effective subspace. Unlike many conventional methods for dimensionality reduction in supervised learning, the proposed method requires neither assumptions on the marginal distribution of $X$, nor a parametric model of the conditional distribution of $Y$. We present experiments that compare the performance of the method with conventional methods.
Keyword note:Fukumizu__Kenji Bach__Francis_R Jordan__Michael_I
Report ID:641

Title:[Title unavailable]
Author(s):Aldous, D. J.; 
Date issued:May 2003
Keyword note:Aldous__David_J
Report ID:640

Title:Convexity, classification, and risk bounds
Author(s):Bartlett, Peter L.; Jordan, Michael I.; McAuliffe, Jon D.; 
Date issued:Apr 2003 (PDF) (PostScript)
Abstract:Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex surrogate of the 0-1 loss function. The convexity makes these algorithms computationally efficient. The use of a surrogate, however, has statistical consequences that must be balanced against the computational virtues of convexity. To study these issues, we provide a general quantitative relationship between the risk as assessed using the 0-1 loss and the risk as assessed using any nonnegative surrogate loss function. We show that this relationship gives nontrivial upper bounds on excess risk under the weakest possible condition on the loss function: that it satisfy a pointwise form of Fisher consistency for classification. The relationship is based on a simple variational transformation of the loss function that is easy to compute in many applications. We also present a refined version of this result in the case of low noise. Finally, we present applications of our results to the estimation of convergence rates in the general setting of function classes that are scaled convex hulls of a finite-dimensional base class, with a variety of commonly used loss functions.
Keyword note:Bartlett__Peter Jordan__Michael_I McAuliffe__Jon
Report ID:638

Title:Quasistationary distributions for one-dimensional diffusions with killing
Author(s):Steinsaltz, David; Evans, Steven N.; 
Date issued:Mar 2003
Date modified:revised May 2004 (PDF) (PostScript)
Abstract:We extend some results on the convergence of one-dimensional diffusions killed at the boundary, conditioned on extended survival, to the case of general killing on the interior. We show, under fairly general conditions, that a diffusion conditioned on long survival either runs off to infinity almost surely, or almost surely converges to a quasistationary distribution given by the lowest eigenfunction of the generator. In the absence of internal killing, only a sufficiently strong inward drift can keep the process close to the origin, to allow convergence in distribution. An alternative, that arises when general killing is allowed, is that the conditioned process is held near the origin by a high rate of killing near $\infty$. We also extend, to the case of general killing, the standard result on convergence to a quasistationary distribution of a diffusion on a compact interval.
Keyword note:Steinsaltz__David Evans__Steven_N
Report ID:637

Title:Markov mortality models: Implications of quasistationarity and varying initial distributions
Author(s):Steinsaltz, David; Evans, Steven N.; 
Date issued:Mar 2003
Date modified:revised June 2003 (PDF) (PostScript)
Abstract:This paper explains some implications of markov-process theory for models of mortality.We show, on the one hand, that an important qualitative feature which has been found in certain models --- the convergence to a ``mortality plateau'' --- is a generic consequence of the convergence to a ``quasistationary distribution'', which has been explored extensively in the mathematical literature.This serves not merely to free these results from some irrelevant specifics of the models, but also to offer a new explanation of the convergence to constant mortality. At the same time that we show that the late behavior --- convergence to a finite asymptote --- of these models is almost logically immutable, we also show that the early behavior of the mortality rates can be more flexible than has been generally acknowledged.We show, in particular, that an appropriate choice of initial conditions enables one popular model to approximate any reasonable hazard-rate data.This suggests how precarious it might be to judge the appropriateness of mortality models by a perceived consilience with a favored hazard-rate function, such as the Gompertz exponential.
Keyword note:Steinsaltz__David Evans__Steven_N
Report ID:636

Title:Boosting with early stopping: convergence and consistency
Author(s):Yu, Bin; Zhang, Tong; 
Date issued:February 2003
Keyword note:Yu__Bin Zhang__Tong
Report ID:635

Title:Efficient independent component analysis
Author(s):Chen, Aiyou; Bickel, Peter; 
Date issued:Jan 2003 (PDF)
Abstract:We propose a Fisher Efficient estimator in ICA model (independent component analysis). First we provide a $\sqrt(n)$-consistent estimator using the empirical characteristic function, and then show that by directly estimating the efficient influence function we can construct a one-step MLE estimate which reaches asymptotic Fisher efficiency (EFFICA). We compare a variant of EFFICA to standard and state of the art algorithms such as Kernel ICA method (Bach & Jordan, 2002) using benchmark simulations and exhibit its excellent performance.
Keyword note:Chen__Aiyou Bickel__Peter_John
Report ID:634

Title:Resampling-based multiple testing for microarray data analysis
Author(s):Ge, Yongchao; Dudoit, Sandrine; Speed, Terence P.; 
Date issued:Jan 2003 (PDF)
Abstract:The burgeoning field of genomics has revived interest in multiple testing procedures by raising new methodological and computational challenges. For example, microarray experiments generate large multiplicity problems in which thousands of hypotheses are tested simultaneously. In their 1993 book, Westfall & Young propose resampling-based $p$-value adjustment procedures which are highly relevant to microarray experiments. This article discusses different criteria for error control in resampling-based multiple testing, including (a) the family wise error rate of Westfall & Young's 1993 book and (b) the false discovery rate developed by Benjamini & Hochberg's 1995 paper, both from a frequentist viewpoint; and (c) the positive false discovery rate of Storey's 2002 paper published in J.R.S.S.B, which has a Bayesian motivation. We also introduce our recently developed fast algorithm for implementing the minP adjustment to control family-wise error rate. Adjusted $p$-values for different approaches are applied to gene expression data from two recently published microarray studies. The properties of these procedures for multiple testing are compared.
Keyword note:Ge__Yongchao Dudoit__Sandrine Speed__Terry_P
Report ID:633

Title:[Title unavailable]
Author(s):Bickel, Peter; 
Date issued:January 2003
Keyword note:Bickel__Peter_John
Report ID:632

Title:The Markov Moment Problem and de Finettti's Theorem: Parts I and II
Author(s):Diaconis, Persi; Freedman, David A.; 
Date issued:Jan 2003 (PDF)
Abstract:Part I. The Markov moment problem is to characterize the moment sequences of densities on the unit interval that are bounded by a given positive constant c. There are well-known characterizations through complex systems of non-linear inequalities. Hausdorff found simpler necessary and sufficient linear conditions. This paper gives a new proof, with some ancillary results, for example, characterizing moment sequences of bounded densities with respect to an arbitrary measure. A connection with de Finetti's theorem is then described, and moments of monotone densities are characterized. Part II. There is an abstract version of de Finetti's theorem that characterizes mixing measures with bounded or L_p densities. The general setting is reviewed; after the theorem is proved, it is specialized to coin tossing and to exponential random variables. Laplace transforms of bounded densities are characterized, simplifying a well-known theorem.
Keyword note:Diaconis__Persi Freedman__David
Report ID:631