**Statistics Technical Reports:**Search | Browse by year

**Term(s):**2008**Results:**23**Sorted by:****Page: 1 2 Next**

**Title:**Commuting birth-and-death processes**Author(s):**Evans, Steven N.; Sturmfels, Bernd; Uhler, Caroline; **Date issued:**December 2008

http://nma.berkeley.edu/ark:/28722/bk0005d0g5h (PDF) **Abstract:**We use methods from combinatorics and algebraic statistics to study analogues of birth-and-death processes that have as their
state space a finite subset of the m-dimensional lattice and for which the m matrices that record the transition probabilities
in each of the lattice directions commute pairwise. One reason such processes are of interest is that the transition matrix
is straightforward to diagonalize, and hence it is easy to compute n step transition probabilities. The set of commuting birth-and-death
processes decomposes as a union of toric varieties, with the main component being the closure of all processes whose nearest
neighbor transition probabilities are positive. We exhibit an explicit monomial parametrization for this main component, and
we explore the boundary components using primary decomposition.**Keyword note:**Evans__Steven_N Sturmfels__Bernd Uhler__Caroline**Report ID:**769**Relevance:**100

**Title:**Dynamics of the time to the most recent common ancestor in a large branching population**Author(s):**Evans, Steven N.; Ralph, Peter L.; **Date issued:**December 2008

http://nma.berkeley.edu/ark:/28722/bk0005d0g9q (PDF) **Abstract:**If we follow an asexually reproducing population through time, then the amount of time that has passed since the most recent
common ancestor (MRCA) of all current individuals lived will change as time progresses. The resulting stochastic process has
been studied previously when the population has a constant large size and evolves via the diffusion limit of standard Wright-Fisher
dynamics. We investigate cases in which the population varies in size and evolves according to a class of models that includes
suitably conditioned $(1+\beta)$-stable continuous state branching processes (in particular, it includes the conditioned Feller
continuous state branching process). We also consider the discrete time Markov chain that tracks the MRCA age just before
and after its successive jumps. We find transition probabilities for both the continuous and discrete time processes, determine
when these processes are transient and recurrent, and compute stationary distributions when they exist. We also introduce
a new family of Markov processes that stand in a relation with respect to the $(1+\beta)$-stable continuous state branching
process that is similar to the one between the Bessel-squared diffusions and the Feller continuous state branching process.**Keyword note:**Evans__Steven_N Ralph__Peter**Report ID:**768**Relevance:**100

**Title:**High-dimensional covariance estimation by minimizing $\ell_1$-penalized log-determinant divergence**Author(s):**Ravikumar, Pradeep; Wainwright, Martin J.; Raskutti, Garvesh; Yu, Bin; **Date issued:**November 2008

http://nma.berkeley.edu/ark:/28722/bk0005d0h3x (PDF) **Abstract:**Given i.i.d. observations of a random vector $X \in \mathbb{R}^p$, we study the problem of estimating both its covariance
matrix $\Sigma^*$, and its inverse covariance or concentration matrix \mbox{$\Theta^* = (\Sigma^*)^{-1}$.} We estimate $\Theta^*$
by minimizing an $\ell_1$-penalized log-determinant Bregman divergence; in the multivariate Gaussian case, this approach corresponds
to $\ell_1$-penalized maximum likelihood, and the structure of $\Theta^*$ is specified by the graph of an associated Gaussian
Markov random field. We analyze the performance of this estimator under high-dimensional scaling, in which the number of
nodes in the graph $p$, the number of edges $s$ and the maximum node degree $d$, are allowed to grow as a function of the
sample size $n$. In addition to the parameters $(p,s,d)$, our analysis identifies other key quantities covariance matrix
$\Sigma^*$; and (b) the $\ell_\infty$ operator norm of the sub-matrix $\Gamma^*_{S S}$, where $S$ indexes the graph edges,
and $\Gamma^* = (\Theta^*)^{-1} \otimes (\Theta^*)^{-1}$; and (c) a mutual incoherence or irrepresentability measure on the
matrix $\Gamma^*$ and (d) the rate of decay $1/\sctail(\numobs,\scdelta)$ on the probabilities $ \{|\widehat{\Sigma}^n_{ij}-
\Sigma^*_{ij}| > \delta \}$, where $\widehat{\Sigma}^n$ is the sample covariance based on $n$ samples. Our first result establishes
consistency of our estimate $\widehat{\Theta}$ in the elementwise maximum-norm. This in turn allows us to derive convergence
rates in Frobenius and spectral norms, with improvements upon existing results for graphs with maximum node degrees $\degmax
= o(\sqrt{\spindex})$. In our second result, we show that with probability converging to one, the estimate $\widehat{\Theta}$
correctly specifies the zero pattern of the concentration matrix $\Theta^*$. We illustrate our theoretical results via simulations
for various graphs and problem parameters, showing good correspondences between the theoretical predictions and behavior in
simulations.**Keyword note:**Ravikumar__Pradeep Wainwright__Martin Raskutti__Garvesh Yu__Bin**Report ID:**767**Relevance:**100

**Title:**Analyzing Data with Graphs: Metagenomic Data and the Phylogenetic Tree**Author(s):**Purdom, Elizabeth; **Date issued:**November 2008

http://nma.berkeley.edu/ark:/28722/bk0005d0h74 (PDF) **Abstract:**In biological experiments, researchers often have information in the form of a graph that supplements observed numerical data.
Incorporating the knowledge contained in these graphs into an analysis of the numerical data is an important and non trivial
task. We look at the example of metagenomic data -- data from a genomic survey of the abundence of different species of bacteria
in a sample. Here, the graph of interest is a phylogenetic tree depicting the interspecies relationships among the bacteria
species. We demonstrate that analysis of the data in a non-standard inner-product space effectively uses this additional graphical
information and produces more meaningful results.**Keyword note:**Purdom__Elizabeth**Report ID:**766**Relevance:**100

**Title:**Message-passing for graph-structured linear programs: Proximal methods and rounding schemes**Author(s):**Ravikumar, Pradeep; Agarwal, Alekh; Wainwright, Martin J.; **Date issued:**October 2008

http://nma.berkeley.edu/ark:/28722/bk0005d0j1b (PDF) **Abstract:**The problem of computing a maximum a posteriori (MAP) configuration is a central computational challenge associated with Markov
random fields. A line of work has focused on ``tree-based'' linear programming (LP) relaxations for the MAP problem. This
paper develops a family of super-linearly convergent algorithms for solving these LPs, based on proximal minimization schemes
using Bregman divergences. As with standard message-passing on graphs, the algorithms are distributed and exploit the underlying
graphical structure, and so scale well to large problems. Our algorithms have a double-loop character, with the outer loop
corresponding to the proximal sequence, and an inner loop of cyclic Bregman divergences used to compute each proximal update.
Different choices of the Bregman divergence lead to conceptually related but distinct LP-solving algorithms. We establish
convergence guarantees for our algorithms, and illustrate their performance via some simulations. We also develop two classes
of graph-structured rounding schemes, randomized and deterministic, for obtaining integral configurations from the LP solutions.
Our deterministic rounding schemes use a ``re-parameterization'' property of our algorithms so that when the LP solution is
integral, the MAP solution can be obtained even before the LP-solver converges to the optimum. We also propose a graph-structured
randomized rounding scheme that applies to iterative LP solving algorithms in general. We analyze the performance of our
rounding schemes, giving bounds on the number of iterations required, when the LP is integral, for the rounding schemes to
obtain the MAP solution. These bounds are expressed in terms of the strength of the potential functions, and the energy gap,
which measures how well the integral MAP solution is separated from other integral configurations. We also report simulations
comparing these rounding schemes.**Keyword note:**Ravikumar__Pradeep Agarwal__Alekh Wainwright__Martin**Report ID:**765**Relevance:**100

**Title:**Estimating divergence functionals and the likelihood ratio by convex risk minimization**Author(s):**Nguyen, XuanLong; Wainwright, Martin J.; Jordan, Michael I.; **Date issued:**October 2008

http://nma.berkeley.edu/ark:/28722/bk0005d0j5j (PDF) **Abstract:**We develop and analyze M-estimation methods for the divergence functionals and the likelihood ratios of two probability distributions.
Our method is based on a non-asymptotic variational characterization of $f$-divergences, which allows the problem of estimating
divergences to be tackled via convex risk optimization. The resulting estimators are simple to implement, requiring only the
solution of standard convex programs. We present an analysis of consistency and convergence for these estimators. Given conditions
only on the ratios of densities, we show that our estimators can achieve optimal minimax rates for the likelihood ratio in
certain regimes. Finally, we derive an efficient optimization algorithm for computing our estimates, and illustrate their
convergence behavior and practical viability by simulations.**Keyword note:**Nguyen__XuanLong Wainwright__Martin Jordan__Michael_I**Report ID:**764**Relevance:**100

**Title:**Multiway Spectral Clustering: A Margin-based Perspective**Author(s):**Zhang, Zhihua; Jordan, Michael I.; **Date issued:**September 2008

http://nma.berkeley.edu/ark:/28722/bk0005d0j7n (PDF) **Abstract:**Spectral clustering is a broad class of clustering procedures in which an intractable combinatorial optimization formulation
of clustering is "relaxed" into a tractable eigenvector problem, and in which the relaxed solution is subsequently "rounded"
into an approximate discrete solution to the original problem. In this paper we present a novel margin-based perspective on
multiway spectral clustering. We show that the margin-based perspective illuminates both the relaxation and rounding aspects
of spectral clustering, providing a unified analysis of existing algorithms and guiding the design of new algorithms. We
also present connections between spectral clustering and several other topics in statistics, specifically minimum-variance
clustering, Procrustes analysis and Gaussian intrinsic autoregression.**Keyword note:**Zhang__Zhihua Jordan__Michael_I**Report ID:**763**Relevance:**100

**Title:**Vital rates from the action of mutation accumulation**Author(s):**Wachter, Kenneth W.; Steinsaltz, David R.; Evans, Steven N.; **Date issued:**August 2008

http://nma.berkeley.edu/ark:/28722/bk0005d0q1x (PDF) **Abstract:**New models for evolutionary processes of mutation accumulation allow hypotheses about the age-specificity of mutational effects
to be translated into predictions of heterogeneous population hazard functions. We apply these models to questions in the
biodemography of longevity, including proposed explanations of Gompertz hazards and mortality plateaus, and use them to explore
the possibility of melding evolutionary and functional models of aging.**Keyword note:**Wachter__Kenneth Steinsaltz__David Evans__Steven_N**Report ID:**762**Relevance:**100

**Title:**Union support recovery in high-dimensional multivariate regression**Author(s):**Obozinski, Guillaume; Wainwright, Martin J.; Jordan, Michael I.; **Date issued:**August 2008

http://nma.berkeley.edu/ark:/28722/bk0005d0q54 (PDF) **Abstract:**In the problem of multivariate regression, a K-dimensional response vector is regressed upon a common set of p covariates,
with a matrix B* of regression coefficients. We study the behavior of the group Lasso using l1/l2 regularization for the union
support problem, meaning that the set of s rows for which B* is non-zero is recovered exactly. Studying this problem under
high-dimensional scaling, we show that group Lasso recovers the exact row pattern with high probability over the random design
and noise for scalings of (n; p; s) such that the sample complexity parameter given by theta(n,p,s)=n/(2 psi(B*) log(p-s))
exceeds a critical threshold. Here n is the sample size, p is the ambient dimension of the regression model, s is the number
of non-zero rows, and psi(B*) is a sparsity-overlap function that measures a combination of the sparsities and overlaps of
the K-regression coefficient vectors that constitute the model. This sparsity-overlap function reveals that, if the design
is uncorrelated on the active rows, block l1/l2 regularization for multivariate regression never harms performance relative
to an ordinary Lasso approach, and can yield substantial improvements in sample complexity (up to a factor of K) when the
regression vectors are suitably orthogonal. For more general designs, it is possible for the ordinary Lasso to outperform
the group Lasso. We complement our analysis with simulations that demonstrate the sharpness of our theoretical results, even
for relatively small problems.**Keyword note:**Obozinski__Guillaume Wainwright__Martin Jordan__Michael_I**Report ID:**761**Relevance:**100

**Title:**Data Spectroscopy: Eigenspace Of Convolution Operators And Clustering**Author(s):**Shi, Tao; Belkin, Mikhail; Yu, Bin; **Date issued:**July 2008

http://nma.berkeley.edu/ark:/28722/bk0005d0q9b (PDF) **Abstract:**This paper focuses on obtaining clustering information in a distribution when iid data are given. First, we develop theoretical
results for understanding and using clustering information contained in the eigenvectors of data adjacency matrices based
on a radial kernel function (with a sufficiently fast tail decay). We provide population analyses to give insights into which
eigenvectors should be used and when the clustering information for the distribution can be recovered from the data. In particular,
we learned that top eigenvectors do not contain all the clustering information. Second, we use heuristics from these analyses
to design the Data Spectroscopic clustering (DaSpec) algorithm that uses properly selected top eigenvectors, determines the
number of clusters, gives data labels, and provides a classification rule for future data, all based on only one eigen decomposition.
Our findings not only extend and go beyond the intuitions underlying existing spectral techniques (e.g. spectral clustering
and Kernel Principal Components Analysis), but also provide insights about their usability and modes of failure. Simulation
studies and experiments on real world data are conducted to show the promise of our proposed data spectroscopy clustering
algorithm relative to k-means and one spectral method. In particular, DaSpec seems to be able to handle unbalanced groups
and recover clusters of different shapes better than competing methods.**Keyword note:**Shi__Tao Belkin__Mikhail Yu__Bin**Report ID:**760**Relevance:**100

**Title:**A path following algorithm for Sparse Pseudo-Likelihood Inverse Covariance Estimation (SPLICE)**Author(s):**Rocha, Guilherme V.; Zhao, Peng; Yu, Bin; **Date issued:**July 2008

http://nma.berkeley.edu/ark:/28722/bk0005d0549 (PDF) **Abstract:**Given n observations of a p-dimensional random vector, the covariance matrix and its inverse (precision matrix) are needed
in a wide range of applications. Sample covariance (e.g. its eigenstructure) can misbehave when p is comparable to the sample
size n. Regularization is often used to mitigate the problem. In this paper, we proposed an l1-norm penalized pseudo-likelihood
estimate for the inverse covariance matrix. This estimate is sparse due to the l1-norm penalty, and we term this method
SPLICE. Its regularization path can be computed via an algorithm based on the homotopy/LARS-Lasso algorithm. Simulation
studies are carried out for various inverse covariance structures for p=15 and n=20,1000. We compare SPLICE with the l1-norm
penalized likelihood estimate and a l1-norm penalized Cholesky decomposition based method. SPLICE gives the best overall
performance in terms of three metrics on the precision matrix and ROC curve for model selection. Moreover, our simulation
results demonstrate that the SPLICE estimates are positive-definite for most of the regularization path even though the restriction
is not enforced.**Keyword note:**Rocha__Guilherme_V Zhao__Peng Yu__Bin**Report ID:**759**Relevance:**100

**Title:**Information In The Non-Stationary Case**Author(s):**Vu, Vincent Q.; Yu, Bin; Kass, Robert E.; **Date issued:**June 2008

http://nma.berkeley.edu/ark:/28722/bk0005d058h (PDF) **Abstract:**Information estimates such as the "direct method" of Strong et al. (1998) sidestep the difficult problem of estimating the
joint distribution of response and stimulus by instead estimating the difference between the marginal and conditional entropies
of the response. While this is an effective estimation strategy, it tempts the practitioner to ignore the role of the stimulus
and the meaning of mutual information. We show here that, as the number of trials increases indefinitely, the direct (or "plug-in")
estimate of marginal entropy converges (with probability 1) to the entropy of the time-averaged conditional distribution of
the response, and the direct estimate of the conditional entropy converges to the time-averaged entropy of the conditional
distribution of the response. Under joint stationarity and ergodicity of the response and stimulus, the difference of these
quantities converges to the mutual information. When the stimulus is deterministic or non-stationary the direct estimate of
information no longer estimates mutual information, which is no longer meaningful, but it remains a measure of variability
of the response distribution across time.**Keyword note:**Vu__Vincent_Q Yu__Bin Kass_Robert_E**Report ID:**758**Relevance:**100

**Title:**The Age-Specific Force of Natural Selection and Walls of Death**Author(s):**Wachter, Kenneth W.; Evans, Steven N.; Steinsaltz, David R.; **Date issued:**July 2008

http://nma.berkeley.edu/ark:/28722/bk0005d0615 (PDF) **Abstract:**W. D. Hamilton's celebrated formula for the age-specific force of natural selection furnishes predictions for senescent mortality
due to mutation accumulation, at the price of reliance on a linear approximation. Applying to Hamilton's setting the full
non-linear demographic model for mutation accumulation of Evans et al. (2007), we find surprising differences. Non-linear
interactions cause the collapse of Hamilton-style predictions in the most commonly studied case, refine predictions in other
cases, and allow Walls of Death at ages before the end of reproduction. Haldane's Principle for genetic load has an exact
but unfamiliar generalization.**Keyword note:**Wachter__Kenneth Evans__Steven_N Steinsaltz__David**Report ID:**757**Relevance:**100

**Title:**On Model Selection Consistency of the Elastic net When p>>n**Author(s):**Jia, Jinzhu; Yu, Bin; **Date issued:**May 2008

http://nma.berkeley.edu/ark:/28722/bk0005d0638 (PDF) **Abstract:**In this paper, we study the model selection property of the Elastic net. In the classical settings when $p$ (the number of
predictors) and $q$ (the number of predictors with non-zero coefficients in the true linear model) are fixed, Yuan and Lin
(2007) give a necessary and sufficient condition for the Elastic net to consistently select the true model, which is called
the Elastic Irrepresentable Condition (EIC) in this paper. Here we study the general case when $p,q$ and $n$ all go to infinity.
For general scalings of $p,q$ and $n$, when Gaussian noise is assumed, sufficient conditions on $p$, $q$ and $n$ are given
in this paper such that EIC guarantees the Elastic net's model selection consistency. We show that to make these conditions
hold, $n$ should grow at a rate faster than $q*\log(p-q)$. For the classical case, when $p$ and $q$ are fixed, we also study
the relationship between EIC and the Irrepresentable Condition (IC) which is necessary and sufficient for the Lasso to select
the true model. Through theoretical results and simulation studies, we provide insights into when and why EIC is weaker than
IC and when the Elastic net can consistently select the true model even when the Lasso can not.**Keyword note:**Jia__Jinzhu Yu__Bin**Report ID:**756**Relevance:**100

**Title:**Maximum Likelihood Estimation of Cloud Height from Multi-Angle Satellite Imagery.**Author(s):**Anderes, E.; Yu, B.; Jovanovic, V.; Moroney, C.; Garay, M.; Braverman, A.; Clothiaux, E.; **Date issued:**May 2008

http://nma.berkeley.edu/ark:/28722/bk0005d065c (PDF) **Abstract:**We develop a new estimation technique for recovering depth-of-field from multiple stereo images. Depth-of-field is estimated
by determining the shift in image location resulting from different camera viewpoints. When this shift is not divisible by
pixel width, the multiple stereo images can be combined to form a super-resolution image. By modeling this super-resolution
image as a realization of a random field, one can view the recovery of depth as a likelihood estimation problem. We apply
these modeling techniques to the recovery of cloud height from multiple viewing angles provided by the MISR instrument on
the Terra Satellite. Our efforts are focused on a two layer cloud ensemble where both layers are relatively planer, the bottom
layer is optically thick and textured, and the top layer is optically thin. Our results demonstrate that with relative ease,
we get comparable estimates to the M2 stereo matcher which is the same algorithm used in the current MISR standard product.
Moreover, our techniques provide the possibility of modeling all of the MISR data in a unified way for cloud height estimation.
Future research is underway to extend this framework for fast, quality global estimates of cloud height.**Keyword note:**Anderes__Ethan Yu__Bin Jovanovic__V Moroney__C Garay__M Braverman__Amy_J Clothiaux__Eugene_E**Report ID:**755**Relevance:**100

**Title:**Information-theoretic limits on sparse signal recovery: Dense versus sparse measurement matrices**Author(s):**Wang, Wei; Wainwright, Martin J.; Ramchandran, Kannan; **Date issued:**May 2008

http://nma.berkeley.edu/ark:/28722/bk0005d067g (PDF) **Abstract:**We study the information-theoretic limits of exactly recovering the support of a sparse signal using noisy projections defined
by various classes of measurement matrices. Our analysis is high-dimensional in nature, in which the number of observations
$\numobs$, the ambient signal dimension $\pdim$, and the signal sparsity $\kdim$ are all allowed to tend to infinity in a
general manner. This paper makes two novel contributions. First, we provide sharper necessary conditions for exact support
recovery using general (non-Gaussian) dense measurement matrices. Combined with previously known sufficient conditions, this
result yields sharp characterizations of when the optimal decoder can recover a signal for various scalings of the sparsity
$\kdim$ and sample size $\numobs$, including the important special case of linear sparsity ($\kdim = \Theta(\pdim)$) using
a linear scaling of observations ($\numobs = \Theta(\pdim)$). Our second contribution is to prove necessary conditions on
the number of observations $\numobs$ required for asymptotically reliable recovery using a class of $\spgam$-sparsified measurement
matrices, where the measurement sparsity $\spgam(\numobs, \pdim, \kdim) \in (0,1]$ corresponds to the fraction of non-zero
entries per row. Our analysis allows general scaling of the quadruplet $(\numobs, \pdim, \kdim, \spgam)$, and reveals three
different regimes, corresponding to whether measurement sparsity has no effect, a minor effect, or a dramatic effect on the
information-theoretic limits of the subset recovery problem.**Keyword note:**Wang__Wei Wainwright__Martin Ramchandran__K**Report ID:**754**Relevance:**100

**Title:**High-dimensional subset recovery in noise: Sparsified measurements without loss of statistical efficiency**Author(s):**Omidiran, Dapo; Wainwright, Martin J.; **Date issued:**May 2008

http://nma.berkeley.edu/ark:/28722/bk0005d069k (PDF) **Abstract:**We consider the problem of estimating the support of a vector $\beta^* \in \mathbb{R}^{p}$ based on observations contaminated
by noise. A significant body of work has studied behavior of $\ell_1$-relaxations when applied to measurement matrices drawn
from standard dense ensembles (e.g., Gaussian, Bernoulli). In this paper, we analyze \emph{sparsified} measurement ensembles,
and consider the trade-off between measurement sparsity, as measured by the fraction $\gamma$ of non-zero entries, and the
statistical efficiency, as measured by the minimal number of observations $n$ required for exact support recovery with probability
converging to one. Our main result is to prove that it is possible to let $\gamma \rightarrow 0$ at some rate, yielding measurement
matrices with a vanishing fraction of non-zeros per row while retaining the same statistical efficiency as dense ensembles.
A variety of simulation results confirm the sharpness of our theoretical predictions.**Keyword note:**Omidiran__Dapo Wainwright__Martin**Report ID:**753**Relevance:**100

**Title:**To what extent does genealogical ancestry imply genetic ancestry?**Author(s):**Matsen, Frederick A.; Evans, Steven N.; **Date issued:**May 2008

http://nma.berkeley.edu/ark:/28722/bk0005d0770 (PDF) **Abstract:**Recent statistical and computational analyses have shown that a genealogical most recent common ancestor may have lived in
the recent past. However, coalescent-based approaches show that genetic most recent common ancestors for a given non-recombining
locus are typically much more ancient. It is not immediately clear how these two perspectives interact. This paper investigates
relationships between the number of descendant alleles of an ancestor allele and the number of genealogical descendants of
the individual who possessed that allele for a simple diploid genetic model extending the genealogical model of J.T. Chang.**Keyword note:**Matsen__Frederick_A Evans__Steven_N**Report ID:**752**Relevance:**100

**Title:**Network-based consensus averaging with general noisy channels**Author(s):**Rajagopal, Ram; Wainwright, Martin J.; **Date issued:**May 2008

http://nma.berkeley.edu/ark:/28722/bk0005d0793 (PDF) **Abstract:**This paper focuses on the consensus averaging problem on graphs under general noisy channels. We study a particular class
of distributed consensus algorithms based on damped updates, and using the ordinary differential equation method, we prove
that the updates converge almost surely to exact consensus for finite variance noise. Our analysis applies to various types
of stochastic disturbances, including errors in parameters, transmission noise, and quantization noise. Under a suitable
stability condition, we prove that the error is asymptotically Gaussian, and we show how the asymptotic covariance is specified
by the graph Laplacian. For additive parameter noise, we show how the scaling of the asymptotic MSE is controlled by the
spectral gap of the Laplacian.**Keyword note:**Rajagopal__Ram Wainwright__Martin**Report ID:**751**Relevance:**100

**Title:**High-Dimensional Graphical Model Selection Using $\ell_1$-Regularized Logistic Regression**Author(s):**Ravikumar, Pradeep; Wainwright, Martin J.; Lafferty, John D.; **Date issued:**April 2008

http://nma.berkeley.edu/ark:/28722/bk0005d0816 (PDF) **Abstract:**We consider the problem of estimating the graph structure associated with a discrete Markov random field. We describe a method
based on $\ell_1$-regularized logistic regression, in which the neighborhood of any given node is estimated by performing
logistic regression subject to an $\ell_1$-constraint. Our framework applies to the high-dimensional setting, in which both
the number of nodes $p$ and maximum neighborhood sizes $d$ are allowed to grow as a function of the number of observations
$n$. Our main results provide sufficient conditions on the triple $(n, p, d)$ for the method to succeed in consistently estimating
the neighborhood of every node in the graph simultaneously. Under certain assumptions on the population Fisher information
matrix, we prove that consistent neighborhood selection can be obtained for sample sizes $n = \Omega(d^3 \log p)$, with the
error decaying as $\order(\exp(-C n/d^3))$ for some constant $C$. If these same assumptions are imposed directly on the sample
matrices, we show that $n = \Omega(d^2 \log p)$ samples are sufficient.**Keyword note:**Ravikumar__Pradeep Wainwright__Martin Lafferty__John**Report ID:**750**Relevance:**100