**Statistics Technical Reports:**Search | Browse by year

**Term(s):**2003**Results:**20**Sorted by:**

**Title:**Structural Equation Models: A Critical Review**Author(s):**Freedman, David A.; **Date issued:**Nov 2003

http://nma.berkeley.edu/ark:/28722/bk0000n2q78 (PDF) **Abstract:**We review the basis for inferring causation by structural equation modeling. Parameters should be stable under interventions,
and so should error distributions. There are also statistical conditions that must be satisfied. Stability is difficult to
establish a priori, and the statistical conditions are equally problematic. Therefore, causal relationships are seldom to
be inferred from a data set by running regressions, unless there is substantial prior knowledge about the mechanisms that
generated the data. Regression models are often used to infer causation from association. For instance, Yule (1899) showed--
or tried to show-- that welfare was a cause of poverty. Path models and structural equation models are later refinements of
the technique. Besides Yule, examples to be discussed here include Blau and Duncan (1967) on stratification, as well as Gibson
(1988) on the causes of McCarthyism. Strong assumptions are required to infer causation from association by modeling. The
assumptions are of two kinds: (i) causal, and (ii) statistical. Regression models are often used to infer causation from
association. For instance, Yule (1899) showed-- or tried to show-- that welfare was a cause of poverty. Path models and structural
equation models are later refinements of the technique. Besides Yule, examples to be discussed here include Blau and Duncan
(1967) on stratification, as well as Gibson (1988) on the causes of McCarthyism. Strong assumptions are required to infer
causation from association by modeling. The assumptions are of two kinds: (i) causal, and (ii) statistical. These assumptions
will be formulated explicitly, with the help of response schedules in hypothetical experiments. In particular, parameters
and error distributions must be stable under intervention. That will be hard to demonstrate in observational settings. Statistical
conditions (like independence) are also problematic, and latent variables create further complexities. The article ends with
a review of the literature and a summary. Causal modeling with path diagrams will be the primary topic. The issues are not
simple, so examining them from several perspectives may be helpful.**Keyword note:**Freedman__David**Report ID:**651**Relevance:**100

**Title:**Learning Graphical Models for Stationary Time Series**Author(s):**Bach, Francis R.; Jordan, Michael I.; **Date issued:**Sep 2003

http://nma.berkeley.edu/ark:/28722/bk0000n2q4m (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n2q55 (PostScript) **Abstract:**Probabilistic graphical models can be extended to time series by considering probabilistic dependencies between entire time
series. For stationary Gaussian time series, the graphical model semantics can be expressed naturally in the frequency domain,
leading to interesting families of structured time series models that are complementary to families defined in the time domain.
In this paper, we present an algorithm to learn the structure from data for directed graphical models for stationary Gaussian
time series. We describe an algorithm for efficient forecasting for stationary Gaussian time series whose spectral densities
factorize in a graphical model. We also explore the relationships between graphical model structure and sparsity, comparing
and contrasting the notions of sparsity in the time domain and the frequency domain. Finally, we show how to make use of
Mercer kernels in this setting, allowing our ideas to be extended to nonlinear models.**Keyword note:**Bach__Francis_R Jordan__Michael_I**Report ID:**650**Relevance:**100

**Title:**Graphical models, exponential families, and variational inference**Author(s):**Wainwright, Martin J.; Jordan, Michael I.; **Date issued:**Sep 2003

http://nma.berkeley.edu/ark:/28722/bk0000n2q1z (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n2q2h (PostScript) **Abstract:**The formalism of probabilistic graphical models provides a unifying framework for the development of large-scale multivariate
statistical models. Graphical models have become a focus of research in many applied statistical and computational fields,
including bioinformatics, information theory, signal and image processing, information retrieval and machine learning. Many
problems that arise in specific instances---including the key problems of computing marginals and modes of probability distributions---are
best studied in the general setting. Working with exponential family representations, and exploiting the conjugate duality
between the cumulant generating function and the entropy for exponential families, we develop general variational representations
of the problems of computing marginal probabilities and modes. We describe how a wide variety of known computational algorithms---including
mean field methods and cluster variational techniques---can be understood in terms of approximations of these variational
representations. We also present novel convex relaxations based on the variational framework. The variational approach provides
a complementary alternative to Markov chain Monte Carlo as a general source of approximation methods for inference in large-scale
statistical models.**Keyword note:**Wainwright__Martin Jordan__Michael_I**Report ID:**649**Relevance:**100

**Title:**[Title unavailable]**Author(s):**Pitman, Jim; **Date issued:**September 2003**Keyword note:**Pitman__Jim**Report ID:**648**Relevance:**100

**Title:**Multiple-sequence functional annotation and the generalized hidden Markov phylogeny**Author(s):**McAuliffe, Jon D.; Pachter, Lior; Jordan, Michael I.; **Date issued:**Aug 2003

http://nma.berkeley.edu/ark:/28722/bk0000n2p89 (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n2p9v (PostScript) **Abstract:**Phylogenetic shadowing is a new comparative genomics principle which allows for the discovery of conserved regions in sequences
from multiple closely-related organisms. We develop a formal probabilistic framework for combining phylogenetic shadowing
with feature-based functional annotation methods. The resulting model, a generalized hidden Markov phylogeny (GHMP), applies
to a variety of situations where functional regions are to be inferred from evolutionary constraints. In particular, we show
how GHMPs can be used to predict complete shared gene structures in multiple primate sequences. We also describe SHADOWER,
our implementation of such a prediction system. We find that SHADOWER outperforms previously reported ab initio gene finders,
including comparative human-mouse approaches, on a small sample of diverse exonic regions. Finally, we report on an empirical
analysis of SHADOWER's performance which reveals that as few as five well-chosen species may suffice to attain maximal sensitivity
and specificity in exon demarcation.**Keyword note:**McAuliffe__Jon Pachter__Lior Jordan__Michael_I**Report ID:**647**Relevance:**100

**Title:**Kernel-based Data Fusion and its Application to Protein Function Prediction in Yeast**Author(s):**Lanckriet, G. R. G.; Deng, M.; Cristianini, N.; Jordan, M. I.; Noble, W. S.; **Date issued:**Aug 2003

http://nma.berkeley.edu/ark:/28722/bk0000n2p5n (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n2p66 (PostScript) **Abstract:**Kernel methods provide a principled framework in which to represent many types of data, including vectors, strings, trees
and graphs. As such, these methods are useful for drawing inferences about biological phenomena. We describe a method for
combining multiple kernel representations in an optimal fashion, by formulating the problem as a convex optimization problem
that can be solved using semidefinite programming techniques. The method is applied to the problem of predicting yeast protein
functional classifications using a support vector machine (SVM) trained on five types of data. For this problem, the new
method performs better than a previously-described Markov random field method, and better than the SVM trained on any single
type of data.**Keyword note:**Lanckriet__G_R_G Deng__M Cristianini__N Jordan__Michael_I Noble__W_S**Report ID:**646**Relevance:**100

**Title:**Efficient Independent Component Analysis (II)**Author(s):**Chen, Aiyou; Bickel, Peter J.; **Date issued:**Jul 2003

http://nma.berkeley.edu/ark:/28722/bk0000n2p3j (PDF) **Abstract:**Independent component analysis (ICA) has been widely used in separating hidden sources from observed linear mixtures in many
fields such as brain imaging analysis, signal processing, telecommunication. Many statistical techniques based on M-estimates
have been proposed in estimating the mixing matrix. Recently a few methods based on nonparametric tools are also available.
However, in-depth analysis on the convergence rate and asymptotic efficiency has not been available. In this paper, we analyze
ICA under the framework of semiparametric theory [see Bickel, Klaassen, Ritov and Wellner (1993)] and propose a straightforward
estimate based on the efficient score function by using B-spline approximations. This estimate exhibits better performance
than standard ICA methods in a variety of simulations. It is proved that this estimator is Fisher efficient under moderate
conditions.**Pub info:**PDf**Keyword note:**Chen__Aiyou Bickel__Peter_John**Report ID:**645**Relevance:**100

**Title:**Regenerative composition structures.**Author(s):**Gnedin, Alexander; Pitman, Jim; **Date issued:**Jul 2003**Date modified:**revised December 8, 2003

http://nma.berkeley.edu/ark:/28722/bk0000n2g86 (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n2g9r (PostScript) **Abstract:**A new class of random composition structures (the ordered analog of Kingman's partition structures) is defined by a regenerative
description of component sizes.Each regenerative composition structure is represented by a process of random sampling of points
from an exponential distribution on the positive halfline, and separating the points into clusters by an independent regenerative
random set. Examples are composition structures derived from residual allocation models, including one associated with the
Ewens sampling formula, and composition structures derived from the zero set of a Brownian motion or Bessel process. We provide
characterisation results and formulas relating the distribution of the regenerative composition to the L\'evy parameters of
a subordinator whose range is the corresponding regenerative set. In particular, the only reversible regenerative composition
structures are those associated with the interval partition of [0,1] generated by excursions of a standard Bessel bridge of
dimension 2 - 2 a for some a in [0,1].**Keyword note:**Gnedin__Alexander Pitman__Jim**Report ID:**644**Relevance:**100

**Title:**Path transformations of first passage bridges**Author(s):**Bertoin, Jean; Chaumont, Loic; Pitman, Jim; **Date issued:**Jul 2003

http://nma.berkeley.edu/ark:/28722/bk0000n2n77 (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n2n8s (PostScript) **Abstract:**We define the first passage bridge from 0 to x as the Brownian motion on the time interval [0,1] conditioned to first hit
x at time 1. We show that this process may be related to the Brownian bridge, the Bessel bridge or the Brownian excursion
via some path transformations, the main one being an extension of Vervaat's transformation. We also provide an extension of
these results to certain bridges with cyclically exchangeable increments.**Keyword note:**Bertoin__Jean Chaumont__Loic Pitman__Jim**Report ID:**643**Relevance:**100

**Title:**Dutch Book against some "Objective" Priors**Author(s):**Eaton, Morris L.; Freedman, David A.; **Date issued:**Jun 2003

http://nma.berkeley.edu/ark:/28722/bk0000n2n4k (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n2n54 (PostScript) **Abstract:**"Dutch book," "incoherence," and "strong inconsistency" are generally equivalent: there is a system of bets which is expected
to make money for the gambler, whatever the state of nature may be. As de Finetti showed, a bookie who is not a Bayesian is
subject to a dutch book, under certain highly stylized rules of play-- a fact often used as an argument against frequentists.
So- called "objective" or "uninformative" priors may also be subject to a dutch book. This note explains, in a relatively
simple and self-contained way, how to make dutch book against a frequently- recommended uninformative prior for covariance
matrices.**Keyword note:**Eaton__Morris_L Freedman__David**Report ID:**642**Relevance:**100

**Title:**Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces**Author(s):**Fukumizu, Kenji; Bach, Francis R.; Jordan, Michael I.; **Date issued:**May 2003

http://nma.berkeley.edu/ark:/28722/bk0000n2n1x (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n2n2g (PostScript) **Abstract:**We propose a novel method of dimensionality reduction for supervised learning problems. Given a regression or classification
problem in which we wish to predict a response variable $Y$ from an explanatory variable $X$, we treat the problem of dimensionality
reduction as that of finding a low-dimensional ``effective subspace'' of $X$ which retains the statistical relationship between
$X$ and $Y$. We show that this problem can be formulated in terms of conditional independence. To turn this formulation into
an optimization problem we establish a general nonparametric characterization of conditional independence using covariance
operators on a reproducing kernel Hilbert space. This characterization allows us to derive a contrast function for estimation
of the effective subspace. Unlike many conventional methods for dimensionality reduction in supervised learning, the proposed
method requires neither assumptions on the marginal distribution of $X$, nor a parametric model of the conditional distribution
of $Y$. We present experiments that compare the performance of the method with conventional methods.**Keyword note:**Fukumizu__Kenji Bach__Francis_R Jordan__Michael_I**Report ID:**641**Relevance:**100

**Title:**[Title unavailable]**Author(s):**Aldous, D. J.; **Date issued:**May 2003**Keyword note:**Aldous__David_J**Report ID:**640**Relevance:**100

**Title:**Convexity, classification, and risk bounds**Author(s):**Bartlett, Peter L.; Jordan, Michael I.; McAuliffe, Jon D.; **Date issued:**Apr 2003

http://nma.berkeley.edu/ark:/28722/bk0000n2m65 (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n2m7q (PostScript) **Abstract:**Many of the classification algorithms developed in the machine learning literature, including the support vector machine and
boosting, can be viewed as minimum contrast methods that minimize a convex surrogate of the 0-1 loss function. The convexity
makes these algorithms computationally efficient. The use of a surrogate, however, has statistical consequences that must
be balanced against the computational virtues of convexity. To study these issues, we provide a general quantitative relationship
between the risk as assessed using the 0-1 loss and the risk as assessed using any nonnegative surrogate loss function. We
show that this relationship gives nontrivial upper bounds on excess risk under the weakest possible condition on the loss
function: that it satisfy a pointwise form of Fisher consistency for classification. The relationship is based on a simple
variational transformation of the loss function that is easy to compute in many applications. We also present a refined version
of this result in the case of low noise. Finally, we present applications of our results to the estimation of convergence
rates in the general setting of function classes that are scaled convex hulls of a finite-dimensional base class, with a variety
of commonly used loss functions.**Keyword note:**Bartlett__Peter Jordan__Michael_I McAuliffe__Jon**Report ID:**638**Relevance:**100

**Title:**Quasistationary distributions for one-dimensional diffusions with killing**Author(s):**Steinsaltz, David; Evans, Steven N.; **Date issued:**Mar 2003**Date modified:**revised May 2004

http://nma.berkeley.edu/ark:/28722/bk0000n2m3h (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n2m42 (PostScript) **Abstract:**We extend some results on the convergence of one-dimensional diffusions killed at the boundary, conditioned on extended survival,
to the case of general killing on the interior. We show, under fairly general conditions, that a diffusion conditioned on
long survival either runs off to infinity almost surely, or almost surely converges to a quasistationary distribution given
by the lowest eigenfunction of the generator. In the absence of internal killing, only a sufficiently strong inward drift
can keep the process close to the origin, to allow convergence in distribution. An alternative, that arises when general
killing is allowed, is that the conditioned process is held near the origin by a high rate of killing near $\infty$. We also
extend, to the case of general killing, the standard result on convergence to a quasistationary distribution of a diffusion
on a compact interval.**Keyword note:**Steinsaltz__David Evans__Steven_N**Report ID:**637**Relevance:**100

**Title:**Markov mortality models: Implications of quasistationarity and varying initial distributions**Author(s):**Steinsaltz, David; Evans, Steven N.; **Date issued:**Mar 2003**Date modified:**revised June 2003

http://nma.berkeley.edu/ark:/28722/bk0000n2m0v (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n2m1d (PostScript) **Abstract:**This paper explains some implications of markov-process theory for models of mortality.We show, on the one hand, that an important
qualitative feature which has been found in certain models --- the convergence to a ``mortality plateau'' --- is a generic
consequence of the convergence to a ``quasistationary distribution'', which has been explored extensively in the mathematical
literature.This serves not merely to free these results from some irrelevant specifics of the models, but also to offer a
new explanation of the convergence to constant mortality. At the same time that we show that the late behavior --- convergence
to a finite asymptote --- of these models is almost logically immutable, we also show that the early behavior of the mortality
rates can be more flexible than has been generally acknowledged.We show, in particular, that an appropriate choice of initial
conditions enables one popular model to approximate any reasonable hazard-rate data.This suggests how precarious it might
be to judge the appropriateness of mortality models by a perceived consilience with a favored hazard-rate function, such as
the Gompertz exponential.**Keyword note:**Steinsaltz__David Evans__Steven_N**Report ID:**636**Relevance:**100

**Title:**Boosting with early stopping: convergence and consistency**Author(s):**Yu, Bin; Zhang, Tong; **Date issued:**February 2003**Keyword note:**Yu__Bin Zhang__Tong**Report ID:**635**Relevance:**100

**Title:**Efficient independent component analysis**Author(s):**Chen, Aiyou; Bickel, Peter; **Date issued:**Jan 2003

http://nma.berkeley.edu/ark:/28722/bk0000n2k8r (PDF) **Abstract:**We propose a Fisher Efficient estimator in ICA model (independent component analysis). First we provide a $\sqrt(n)$-consistent
estimator using the empirical characteristic function, and then show that by directly estimating the efficient influence function
we can construct a one-step MLE estimate which reaches asymptotic Fisher efficiency (EFFICA). We compare a variant of EFFICA
to standard and state of the art algorithms such as Kernel ICA method (Bach & Jordan, 2002) using benchmark simulations and
exhibit its excellent performance.**Keyword note:**Chen__Aiyou Bickel__Peter_John**Report ID:**634**Relevance:**100

**Title:**Resampling-based multiple testing for microarray data analysis**Author(s):**Ge, Yongchao; Dudoit, Sandrine; Speed, Terence P.; **Date issued:**Jan 2003

http://nma.berkeley.edu/ark:/28722/bk0000n2k4j (PDF) **Abstract:**The burgeoning field of genomics has revived interest in multiple testing procedures by raising new methodological and computational
challenges. For example, microarray experiments generate large multiplicity problems in which thousands of hypotheses are
tested simultaneously. In their 1993 book, Westfall & Young propose resampling-based $p$-value adjustment procedures which
are highly relevant to microarray experiments. This article discusses different criteria for error control in resampling-based
multiple testing, including (a) the family wise error rate of Westfall & Young's 1993 book and (b) the false discovery rate
developed by Benjamini & Hochberg's 1995 paper, both from a frequentist viewpoint; and (c) the positive false discovery rate
of Storey's 2002 paper published in J.R.S.S.B, which has a Bayesian motivation. We also introduce our recently developed
fast algorithm for implementing the minP adjustment to control family-wise error rate. Adjusted $p$-values for different approaches
are applied to gene expression data from two recently published microarray studies. The properties of these procedures for
multiple testing are compared.**Keyword note:**Ge__Yongchao Dudoit__Sandrine Speed__Terry_P**Report ID:**633**Relevance:**100

**Title:**[Title unavailable]**Author(s):**Bickel, Peter; **Date issued:**January 2003**Keyword note:**Bickel__Peter_John**Report ID:**632**Relevance:**100

**Title:**The Markov Moment Problem and de Finettti's Theorem: Parts I and II**Author(s):**Diaconis, Persi; Freedman, David A.; **Date issued:**Jan 2003

http://nma.berkeley.edu/ark:/28722/bk0000n2k6n (PDF) **Abstract:**Part I. The Markov moment problem is to characterize the moment sequences of densities on the unit interval that are bounded
by a given positive constant c. There are well-known characterizations through complex systems of non-linear inequalities.
Hausdorff found simpler necessary and sufficient linear conditions. This paper gives a new proof, with some ancillary results,
for example, characterizing moment sequences of bounded densities with respect to an arbitrary measure. A connection with
de Finetti's theorem is then described, and moments of monotone densities are characterized. Part II. There is an abstract
version of de Finetti's theorem that characterizes mixing measures with bounded or L_p densities. The general setting is reviewed;
after the theorem is proved, it is specialized to coin tossing and to exponential random variables. Laplace transforms of
bounded densities are characterized, simplifying a well-known theorem.**Keyword note:**Diaconis__Persi Freedman__David**Report ID:**631**Relevance:**100