**Statistics Technical Reports:**Search | Browse by year

**Term(s):**2005**Results:**9**Sorted by:**

**Title:**Structured Prediction, Dual Extragradient and Bregman Projections**Author(s):**Taskar, Ben; Lacoste-Julien, Simon; Jordan, Michael I.; **Date issued:**November 2005

http://nma.berkeley.edu/ark:/28722/bk0000n2x12 (PDF) **Abstract:**We present a simple and scalable algorithm for maximum-margin estimation of structured prediction models, including an important
class of Markov networks and combinatorial models. We formulate the estimation problem as a convex-concave saddle-point problem
that allows us to use simple projection methods based on the dual extragradient algorithm (Nesterov, 2003). The projection
step can be solved using dynamic programming or combinatorial algorithms for min-cost convex flow, depending on the structure
of the problem. We show that this approach provides a memory-efficient alternative to formulations based on reductions to
a quadratic program (QP). We analyze the convergence of the method and present experiments on two very different structured
prediction tasks: 3D image segmentation and word alignment, illustrating the favorable scaling properties of our algorithm.**Keyword note:**Taskar__Ben Lacoste-Julien__Simon Jordan__Michael_I**Report ID:**697**Relevance:**100

**Title:**Curse-of-dimensionality revisited: Collapse of importance sampling in very high-dimensional systems.**Author(s):**Li, Bo; Bengtsson, Thomas; Bickel, Peter; **Date issued:**November 2005

http://nma.berkeley.edu/ark:/28722/bk0000n2h1v (PDF) **Abstract:**It has been widely realized that Monte Carlo methods (approximation via a sample ensemble) may fail in large scale systems.
This work offers some theoretical insight into this phenomenon. In the context of a particle filter (as well as in general
importance samplers), we demonstrate that the maximum of the weights associated with the sample ensemble members converges
to one as both sample size and system dimension tends to infinity. Under fairly weak assumptions, this convergence is shown
to hold for both a Gaussian case and for a more general case with iid kernels. Similar singularity behavior is also shown
to hold for non-Gaussian, spherically symmetric kernels (e.g. multivariate Cauchy distribution). In addition, in certain large
scale settings, we show that the estimator of an expectation based on importance sampling converges weakly to a law, rather
than the target constant. Our work is presented and discussed in the context of atmospheric data assimilation for numerical
weather prediction.**Keyword note:**Li__Bo Bengtsson__Thomas Bickel__Peter_John**Report ID:**696**Relevance:**100

**Title:**On divergences, surrogate loss functions and decentralized detection**Author(s):**Nguyen, X.; Wainwright, M. J.; Jordan, M. I.; **Date issued:**October 2005

http://nma.berkeley.edu/ark:/28722/bk0000n2m9t (PDF) **Abstract:**We develop a general correspondence between a family of loss functions that act as surrogates to 0-1 loss, and the class of
Ali-Silvey or $f$-divergence functionals. This correspondence provides the basis for choosing and evaluating various surrogate
losses frequently used in statistical learning (e.g., hinge loss, exponential loss, logistic loss); conversely, it provides
a decision-theoretic framework for the choice of divergences in signal processing and quantization theory. We exploit this
correspondence to characterize the statistical behavior of a nonparametric decentralized hypothesis testing algorithms that
operate by minimizing convex surrogate loss functions. In particular, we specify the family of loss functions that are equivalent
to 0-1 loss in the sense of producing the same quantization rules and discriminant functions.**Keyword note:**Nguyen__XuanLong Wainwright__Martin Jordan__Michael_I**Report ID:**695**Relevance:**100

**Title:**Brownian motion on time scales, basic hypergeometric functions, and some continued fractions of Ramanujan**Author(s):**Bhamidi, Shankar; Evans, Steven N.; Peled, Ron; Ralph, Peter; **Date issued:**September 2005

http://nma.berkeley.edu/ark:/28722/bk0000n3g1t (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n3g2c (PostScript) **Abstract:**Motivated by L\'evy's characterization of Brownian motion on the line, we propose an analogue of Brownian motion that has
as its state space an arbitrary unbounded closed subset of the line: such a process will a martingale, has the identity function
as its quadratic variation process, and is "continuous" in the sense that its sample paths don't skip over points. We show
that there is a unique such process, which turns out to be automatically a Feller-Dynkin Markov process. We find its generator,
which is a natural generalization of the operator $f \mapsto \frac(1)(2) f''$. We then consider the special case where the
state space is the self-similar set $\(\pm q^k : k \in \Z\) \cup \(0\)$ for some $q>1$. Using the scaling properties of the
process, we represent the Laplace transforms of various hitting times as certain continued fractions that appear in Ramanujan's
"lost" notebook and evaluate these continued fractions in terms of basic hypergeometric functions (that is, $q$-analogues
of classical hypergeometric functions). The process has $0$ as a regular instantaneous point, and hence its sample paths can
be decomposed into a Poisson process of excursions from $0$ using the associated continuous local time. Using the reversibility
of the process with respect to the natural measure on the state space, we find the entrance laws of the corresponding It\^o
excursion measure and the Laplace exponent of the inverse local time -- both again in terms of basic hypergeometric functions.
By combining these ingredients, we obtain explicit formulae for the resolvent of the process. We also compute the moments
of the process in closed form. Some of our results involve $q$-analogues of classical distributions such as the Poisson distribution
that have appeared elsewhere in the literature.**Keyword note:**Bhamidi__Shankar Evans__Steven_N Peled__Ron Ralph__Peter**Report ID:**694**Relevance:**100

**Title:**A genotype calling algorithm for Affymetrix SNP arrays**Author(s):**Rabbee, Nusrat; Speed, Terence P.; **Date issued:**August 2005

http://nma.berkeley.edu/ark:/28722/bk0000n3c2t (PDF) **Abstract:**Motivation: A classification algorithm, based on a multi-chip, multi-SNP approach is proposed for Affymetrix SNP arrays. Current
procedures for calling genotypes on SNP arrays process all the features associated with one chip and one SNP at a time. Using
a large training sample where the genotype labels are known, we develop a supervised learning algorithm to obtain more accurate
classification results on new data. The method we propose, RLMM, is based on a robustly fitted, linear model and uses the
Mahalanobis distance for classification. The chip-to-chip non-biological variance is reduced through normalization. This model-based
algorithm captures the similarities across genotype groups and probes, as well as across thousands of SNPs for accurate classification.
In this paper, we apply RLMM to Affymetrix 100K SNP array data, present classification results and compare them to genotype
calls obtained from the Affymetrix procedure DM, as well as to the publicly available genotype calls from the HapMap project.
Availability: The RLMM software is implemented in R and is available from the first author at nrabbee@stat.berkeley.edu.**Keyword note:**Rabbee__Nusrat Speed__Terry_P**Report ID:**693**Relevance:**100

**Title:**Bivariate variable selection for classification problem**Author(s):**Ng, Vivian; Breiman, Leo; **Date issued:**August 2005

http://nma.berkeley.edu/ark:/28722/bk0000n2z7w (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n2z8f (PostScript) **Abstract:**In recent years, large amount of attention has been placed on variable or feature selection in various domains. Varieties
of variable selection methods have been proposed in the literature. However, most of them are focused on univariate variable
selection -- method that selects relevant variables one by one. Currently, there is not much emphasis on variable selection
on pairs of variables. It is not unreasonable, as researchers in industries have been asked to identify pairs of variables
that are relevant. All is well using univariate variable selection for identifying independently significant variables, but
pairs of independently important variables are not the same as pairs of variables that have joint effect. Therefore, univariate
variable selection methods are not applicable in selecting pairs of linked variables. To overcome this obstacle, Professor
Breiman and I propose a bivariate variable selection method that detects linked pairs of variables. It is equally important
to learn the relationship between each linked pair with the response variable. To this end, a graphical tool is designed
for visualizing the relationship uncovered by the proposed bivariate variable selection method.**Keyword note:**Ng__Vivian_Wai_Ying Breiman__Leo**Report ID:**692**Relevance:**100

**Title:**Binning in Gaussian Kernel Regularization**Author(s):**Shi, Tao; Yu, Bin; **Date issued:**April 2005

http://nma.berkeley.edu/ark:/28722/bk0000n2z5s (PDF) **Abstract:**Gaussian kernel regularization is widely used in the machine learning literature and proven successful in many empirical experiments.
The periodic version of the Gaussian kernel regularization has been shown to be minimax rate optimal in estimating functions
in any finite order Sobolev spaces. However, for a data set with $n$ points, the computation complexity of the Gaussian kernel
regularization method is of order $O$($n^3$). In this paper we propose to use binning to reduce the computation of Gaussian
kernel regularization in both regression and classification. For the periodic Gaussian kernel regression, we show that the
binned estimator achieves the same minimax rates of the unbinned estimator, but the computation is reduced to $O(m^3)$ with
$m$ as the number of bins. To achieve the minimax rate in the $k$-th order Sobolev space, $m$ needs to be in the order of
$O(kn^(1/(2k+1)))$, which makes the binned estimator computation of order $O(n)$ for $k=1$ and even less for larger $k$. Our
simulations show that the binned estimator (binning 120 data points into 20 bins in our simulation) provides almost the same
accuracy with only 0.4\% of computation time. For classification, binning with the $L2$-loss Gaussian kernel regularization
and the Gaussian kernel Support Vector Machines is tested in a polar cloud detection problem. With basically the same computation
time, the $L2$-loss Gaussian kernel regularization on 966 bins achieves better test classification rate (79.22\%) than that
(71.40\%) on 966 randomly sampled data. Using the OSU-SVM Matlab package, the SVM trained on 966 bins has a comparable test
classification rate as the SVM trained on 27,179 samples, but reduces the training time from 5.99 hours to 2.56 minutes. The
SVM trained on 966 randomly selected samples has a similar training time as and a slightly worse test classification rate
than the SVM on 966 bins, but has 67\% more support vectors so takes 67\% longer to predict on a new data point. The SVM trained
on 512 cluster centers from the k-mean algorithm reports almost the same test classification rate and a similar number of
support vectors as the SVM on 512 bins, but the k-mean clustering itself takes 375 times more computation time than binning.**Keyword note:**Shi__Tao Yu__Bin**Report ID:**689**Relevance:**100

**Title:**kA Probabilistic Interpretation of Canonical Correlation Analysis**Author(s):**Bach, Francis R.; Jordan, Michael I.; **Date issued:**April 2005

http://nma.berkeley.edu/ark:/28722/bk0000n2z24 (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n2z3p (PostScript) **Abstract:**We give a probabilistic interpretation of canonical correlation (CCA) analysis as a latent variable model for two Gaussian
random vectors. Our interpretation is similar to the probabilistic interpretation of principal component analysis (Tipping
and Bishop, 1999; Roweis, 1998). In addition, we cast Fisher linear discriminant analysis (LDA) within the CCA framework.**Keyword note:**Bach__Francis_R Jordan__Michael_I**Report ID:**688**Relevance:**100

**Title:**Subtree prune and re-graft: a reversible real tree valued Markov process**Author(s):**Evans, Steven N.; Winter, Anita; **Date issued:**February 2005

http://nma.berkeley.edu/ark:/28722/bk0000n2x9g (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n2z01 (PostScript) **Abstract:**We use Dirichlet form methods to construct and analyze a reversible Markov process, the stationary distribution of which is
the Brownian continuum random tree. This process is inspired by the subtree prune and re-graft (SPR) Markov chains that appear
in phylogenetic analysis. A key technical ingredient in this work is the use of a novel Gromov--Hausdorff type distance to
metrize the space whose elements are compact real trees equipped with a probability measure. Also, the investigation of the
Dirichlet form hinges on a new path decomposition of the Brownian excursion.**Keyword note:**Evans__Steven_N Winter__Anita**Report ID:**685**Relevance:**100