Statistics Technical Reports:Search | Browse by year

Sorted by:
Page: 1 2  Next

Title:On The choice of m for the m out of n bootstrap in hypothesis testing
Author(s):Bickel, Peter J.; Ren, Jian-Jian; 
Date issued:May 1996 (PDF)
Keyword note:Bickel__Peter_John Ren__Jian-Jian
Report ID:476

Title:[Title unavailable]
Author(s):Yu, Bin; 
Date issued:November 1996
Keyword note:Yu__Bin
Report ID:475

Title:Smoothing Spline Models for the Analysis of Nested and Crossed Samples of Curves
Author(s):Brumback, Babette A.; Rice, John A.; 
Date issued:Nov 1996 (PDF) (PostScript)
Abstract:We introduce a class of models for an additive decomposition of groups of curves stratified by crossed and nested factors, generalizing smoothing splines to such samples by associating them with a corresponding mixed effects model. The models are also useful for imputation of missing data and exploratory analysis of variance. We prove that the best linear unbiased predictors (BLUP) from the extended mixed effects model correspond to solutions of a generalized penalized regression where smoothing parameters are directly related to variance components, and we show that these solutions are natural cubic splines. The model parameters are estimated using a highly efficient implementation of the EM algorithm for restricted maximum likelihood (REML) estimation based on a preliminary eigenvector decomposition. Variability of computed estimates can be assessed with asymptotic techniques or with a novel hierarchical bootstrap resampling scheme for nested mixed effects models. Our methods are applied to menstrual cycle data from studies of reproductive function that measure daily urinary progesterone; the sample of progesterone curves is stratified by cycles nested within subjects nested within conceptive and non-conceptive groups.
Keyword note:Brumback__Babette_Anne Rice__John_Andrew
Report ID:474

Title:The $L_2$ Rate of Convergence for Event History Regression with Time-Dependent Covariates
Author(s):Huang, Jianhua; Stone, Charles J.; 
Date issued:September 1996
Keyword note:Huang__Jianhua Stone__Charles
Report ID:473

Title:Accurate estimation of travel times from single-loop detectors
Author(s):Petty, Karl; Bickel, Peter; Jiang, Jiming; Ostland, Michael; Rice, John; Ritov, Ya'cov; Schoenberg, Frederic; 
Date issued:Aug 1996 (PDF) (PostScript)
Abstract:As advanced traveler information systems become increasingly prevalent the importance of accurately estimating link travel times grows. Unfortunately, the predominant source of highway traffic information comes from single-trap loop detectors which do not directly measure vehicle speed. The conventional method of estimating speed, and hence travel time, from the single-trap data is to make a common vehicle length assumption and to use a resulting identity relating density, flow, and speed. Hall and Persaud (1989) and Pushkar, Hall, and Acha-Daza (1994) show that these speed estimates are flawed. In this paper we present a methodology to estimate link travel times directly from the single-trap loop detector flow and occupancy data without heavy reliance on the flawed speed calculations. Our methods arise naturally from an intuitive stochastic model of traffic flow. We demonstrate by example on data collected on I-880 that when the loop detector data has a fine resolution (about one second), the single-trap estimates of travel time can accurately track the true travel time through many degrees of congestion. Probe vehicle data and double-trap travel time estimates corroborate the accuracy of our methods in our examples.
Keyword note:Petty__Karl_F Bickel__Peter_John Jiang__Jiming Ostland__Michael_Anthony Rice__John_Andrew Ritov__Yaacov Schoenberg__Frederic_R
Report ID:472

Title:The Feynman-Kac formula and decomposition of Brownian paths
Author(s):Jeanblanc, M.; Pitman, J.; Yor, M.; 
Date issued:Sep 1996 (PDF) (PostScript)
Abstract:This paper describes connections between the Feynman-Kac formula, related Sturm-Liouville equations, and various decompositions of Brownian paths into independent components.
Pub info:Comput. Appl. Math. 16, 27-52, 1997
Keyword note:Jeanblanc__Monique Pitman__Jim Yor__Marc
Report ID:471

Title:On the lengths of excursions of some Markov processes
Author(s):Pitman, Jim; Yor, Marc; 
Date issued:Aug 1996 (PDF) (PostScript)
Abstract:Results are obtained regarding the distribution of the ranked lengths of component intervals in the complement of the random set of times when a recurrent Markov process returns to its starting point. Various martingales are described in terms of the L\'evy measure of the Poisson point process of interval lengths on the local time scale. The martingales derived from the zero set of a one-dimensional diffusion are related to martingales studied by Az\'ema and Rainer. Formulae are obtained which show how the distribution of interval lengths is affected when the underlying process is subjected to a Girsanov transformation. In particular, results for the zero set of an Ornstein-Uhlenbeck process or a Cox-Ingersoll-Ross process are derived from results for a Brownian motion or recurrent Bessel process, when the zero set is the range of a stable subordinator.
Pub info:S{\'e}minaire de Probabilit{\'e}s XXXI, 272-286, Lecture Notes in Math. 1655, Springer, 1997
Keyword note:Pitman__Jim Yor__Marc
Report ID:470

Title:On the relative lengths of excursions derived from a stable subordinator
Author(s):Pitman, Jim; Yor, Marc; 
Date issued:Aug 1996 (PDF) (PostScript)
Abstract:Results are obtained concerning the distribution of ranked relative lengths of excursions of a recurrent Markov process from a point in its state space whose inverse local time process is a stable subordinator. It is shown that for a large class of random times $T$ the distribution of relative excursion lengths prior to $T$ is the same as if $T$ were a fixed time. It follows that the generalized arc-sine laws of Lamperti extend to such random times $T$. For some other random times $T$, absolute continuity relations are obtained which relate the law of the relative lengths at time $T$ to the law at a fixed time.
Pub info:S{\'e}minaire de Probabilit{\'e}s XXXI, 287-305, Lecture Notes in Math. 1655, Springer, 1997
Keyword note:Pitman__Jim Yor__Marc
Report ID:469

Title:Some extensions of Knight's identity for Brownian motion
Author(s):Pitman, Jim; Yor, Marc; 
Date issued:August 1996
Keyword note:Pitman__Jim Yor__Marc
Report ID:468

Title:Laplace Transforms Related to Excursions of a One-dimensional Diffusion
Author(s):Pitman, Jim; Yor, Marc; 
Date issued:Aug 1996 (PDF) (PostScript)
Abstract:Various known expressions in terms of hyperbolic functions for the Laplace transforms of random times related to one-dimensional Brownian motion are derived in a unified way by excursion theory and extended to one-dimensional diffusions.
Pub info:Bernoulli 5, 249-255, 1999
Keyword note:Pitman__Jim Yor__Marc
Report ID:467

Title:More on Recurrence and Waiting Times
Author(s):Wyner, Abraham J.; 
Date issued:August 1996
Keyword note:Wyner__Abraham_J
Report ID:466

Title:Construction of Markovian Coalescents
Author(s):Evans, Steven N.; Pitman, Jim; 
Date issued:Jul 1996 (PDF) (PostScript)
Abstract:Partition-valued and measure-valued coalescent Markov processes are constructed whose state describes the decomposition of a finite total mass $m$ into a finite or countably infinite number of masses with sum $m$, and whose evolution is determined by the following intuitive prescription: each pair of masses of magnitudes $x$ and $y$ runs the risk of a binary collision to form a single mass of magnitude $x+y$ at rate $\kappa(x,y)$, for some nonnegative, symmetric collision rate kernel $\kappa(x,y)$. Such processes with finitely many masses have been used to model polymerization, coagulation, condensation, and the evolution of galactic clusters by gravitational attraction. With a suitable metric on the state space, and under appropriate restrictions on $\kappa$ and the initial distribution of mass, it is shown that such processes can be constructed as Feller or Feller-like processes. A number of further results are obtained for the (\em additive coalescent) with collision kernel $\kappa(x,y) = x + y$. This process, which arises from the evolution of tree components in a random graph process, has asymptotic properties related to the stable subordinator of index $1/2$.
Pub info:Ann. Inst. Henri Poincare 34, 339-383, 1998
Keyword note:Evans__Steven_N Pitman__Jim
Report ID:465

Title:Confidence intervals with more power to determine the sign: two ends constrain the means
Author(s):Benjamini, Y.; Hochberg, Y.; Stark, P. B.; 
Date issued:Jul 1996
Abstract:We present two families of two-sided non-equivariant confidence intervals for the mean $\theta$ of a continuous, unimodal, symmetric random variable that, compared with the conventional, symmetric, equivariant confidence interval, are shorter when the observation is small, and restrict the sign of $\theta$ for smaller observations. One of the families, a modification of Pratt's (1961) construction of intervals with minimal expected length when $\theta=0$, is longer than the conventional symmetric interval when $|X|$ is large, and has longer expected length when $|\theta|$ is large. The other family gives the conventional symmetric interval when $|X|$ is large, with a change to the smaller endpoint when $|X|$ is small. Its expected length is less than that of the conventional symmetric interval when $|\theta|$ is small, larger for an intermediate range of $|\theta|$, then approaches that of the conventional interval for large $|\theta|$. This slight modification of the conventional two-sided interval has most of the power advantage of a one-sided interval, but short length.
Keyword note:Benjamini__Yoav Hochberg__Y Stark__Philip_B
Report ID:464

Title:Empirical Modeling of Extreme Events from Return-Volume Time Series in Stock Market
Author(s):Bühlmann, Peter; 
Date issued:Jun 1996 (PDF) (PostScript)
Abstract:We propose the discretization of real-valued financial time series into few ordinal values and use non-linear likelihood modeling for sparse Markov chains within the framework of generalized linear models for categorical time series. We analyze daily return and volume data and estimate the probability structure of the process of extreme lower, extreme upper and the complementary usual events. Knowing the whole probability law of such ordinal-valued vector processes of extreme events of return and volume allows us to quantify non-linear associations. In particular, we find a (new kind of) asymmetry in the return-volume relationship which is a partial answer to a research issue given by Karpoff (1987). We also propose a simple prediction algorithm which is based on an empirically selected model.
Keyword note:Buhlmann__Peter
Report ID:463

Title:Closure of linear processes
Author(s):Bickel, Peter J.; Bühlmann, Peter; 
Date issued:Sep 1996 (PDF) (PostScript)
Abstract:We consider the sets of moving-average and autoregressive processes and study their closures under the Mallows metric and the total variation convergence on finite dimensional distributions. These closures are unexpectedly large, containing non-ergodic processes which are Poisson sums of i.i.d. copies from a stationary process. The presence of these non-ergodic Poisson sum processes has immediate implications. In particular, identifiability of the hypothesis of linearity of a process is in question. A discussion of some of these issues for the set of moving-average processes has already been given without proof in Bickel and B\"{u}hlmann (1996). We establish here the precise mathematical arguments and present some additional extensions: results about the closure of autoregressive processes and natural sub-sets of moving-average and autoregressive processes which are closed.
Keyword note:Bickel__Peter_John Buhlmann__Peter
Report ID:462

Title:On Average Derivative Quantile Regression
Author(s):Chaudhuri, Probal; Doksum, Kjell; Samarov, Alexander; 
Date issued:Apr 1996 (PDF) (PostScript)
Abstract:Keywords: Average derivative estimate, transformation model, projection pursuit model, index model, survival analysis, heteroscedasticity, reduction of dimensionality, quantile specific regression coefficients For fixed $\alpha \in (0,1)$, the quantile regression function gives the $\alpha$th quantile $\theta_(\alpha) ( (\bf x) )$ in the conditional distribution of a response variable $Y$ given the value $(\bf X) = (\bf x)$ of a vector of covariates. It can be used to measure the effect of covariates not only in the center of a population, but also in the upper and lower tails. A functional that summarizes key features of the quantile specific relationship between $(\bf X)$ and $Y$ is the vector $\mbox(\boldmath$\beta$)_(\alpha)$ of weighted expected values of the vector of partial derivatives of the quantile function $\theta_(\alpha) ( (\bf x) )$. In a nonparametric setting, $\mbox(\boldmath$\beta$)_(\alpha)$ can be regarded as a vector of quantile specific nonparametric regression coefficients. In survival analysis models (e.g. Cox's proportional hazard model, proportional odds rate model, accelerated failure time model) and in monotone transformation models used in regression analysis, $\mbox(\boldmath$\beta$)_(\alpha)$ gives the direction of the parameter vector in the parametric part of the model. $\mbox(\boldmath$\beta$)_(\alpha)$ can also be used to estimate the direction of the parameter vector in semiparametric single index models popular in econometrics. We show that, under suitable regularity conditions, the estimate of $\mbox(\boldmath$\beta$)_(\alpha)$ obtained by using the locally polynomial quantile estimate of Chaudhuri (1991 (\it Annals of Statistics)), is $n^(1/2)$-consistent and asymptotically normal with asymptotic variance equal to the variance of the influence function of the functional $\mbox(\boldmath$\beta$)_(\alpha)$. We discuss how the estimate of $\mbox(\boldmath$\beta$)_(\alpha)$ can be used for model diagnostics and in the construction of a link function estimate in general single index models.
Keyword note:Chaudhuri__Probal Doksum__Kjell_Andreas Samarov__Alexander
Report ID:461

Title:[Bias, Variance, and] Arcing Classifiers
Author(s):Breiman, Leo; 
Date issued:Feb 1996
Date modified:revised July, 1996 (PDF) (PostScript)
Abstract:Recent work has shown that combining multiple versions of unstable classifiers such as trees or neural nets results in reduced test set error. One of the more effective is bagging (Breiman [1996a]) Here, modified training sets are formed by resampling from the original training set, classifiers constructed using these training sets and then combined by voting. Freund and Schapire [1995,1996] propose an algorithm the basis of which is to adaptively resample and combine (hence the acronym--arcing) so that the weights in the resampling are increased for those cases most often misclassified and the combining is done by weighted voting. Arcing is more successful than bagging in test set error reduction. We explore two arcing algorithms, compare them to each other and to bagging, and try to understand how arcing works. We introduce the definitions of bias and variance for a classifier as components of the test set error. Unstable classifiers can have low bias on a large range of data sets. Their problem is high variance. Combining multiple versions either through bagging or arcing reduces variance significantly.
Keyword note:Breiman__Leo
Report ID:460

Title:Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data
Author(s):Hoover, Don; Rice, John; Wu, Colin; Yang, Li-Ping; 
Date issued:Apr 1996 (PDF) (PostScript)
Abstract:This paper considers estimation of nonparametric components in a time-varying coefficient model with repeated measurements of responses and covariates. The responses are modeled as depending linearly on the covariates, with coefficients that are functions of time. The measurements are assumed to be independent for different subjects but can be correlated in an unspecified way at different time points within each subject.Three nonparametric estimates, namely kernel, smoothing spline and locally weighted polynomial, of the time-varying coefficients are derived for such repeatedly measured data. A cross-validation criterion is proposed for the selection of the corresponding smoothing parameters. Asymptotic properties, such as consistency, rates of convergence and asymptotic mean squared errors, are established for the kernel estimates. An example of predicting the growth of children born to HIV infected mothers based on gender, HIV status and maternal vitamin A levels shows that this model and the corresponding nonparametric estimates are useful in epidemiological studies.
Keyword note:Hoover__Don Rice__John_Andrew Wu__Chien-Fu Yang__Li-Ping
Report ID:459

Title:Functional ANOVA Models for Generalized Regression
Author(s):Huang, Jianhua; 
Date issued:Apr 1996 (PDF) (PostScript)
Abstract:Functional ANOVA models are considered in the context of generalized regression, which includes logistic regression, probit regression and Poisson regression as special cases. The multivariate predictor function is modeled as a specified sum of a constant term, main effects and interaction terms. Maximum likelihood estimates are used, where the maximizations are taken over suitably chosen approximating spaces. We allow general linear spaces and their tensor products as building blocks for the approximating spaces. It is shown that the $L_2$ rates of convergence of the maximum likelihood estimates and their ANOVA components are determined by the approximation power and dimension of the approximating spaces. When the approximating spaces are appropriately chosen, the optimal rates of convergence can be achieved.
Keyword note:Huang__Jianhua
Report ID:458

Title:Coalescent random forests.
Author(s):Pitman, Jim; 
Date issued:Sep 1996 (PDF)
Abstract:Various enumerations of labeled trees and forests, due to Cayley, Moon, and other authors, are consequences of the following (\em coalescent algorithm) for construction of a sequence of random forests $(R_n, R_(n-1), \cdots, R_1)$ such that $R_k$ has uniform distribution over the set of all forests of $k$ rooted trees labeled by $\INn := \(1, \cdots , n\)$. Let $R_n$ be the trivial forest with $n$ root vertices and no edges. For $n \ge k \ge 2$, given that $R_n, \cdots, R_k$ have been defined so that $R_k$ is a rooted forest of $k$ trees, define $R_(k-1)$ by addition to $R_k$ of a single directed edge picked uniformly at random from the set of $n(k-1)$ directed edges which when added to $R_k$ yield a rooted forest of $k-1$ trees labeled by $\INn$. Variations of this coalescent algorithm are described, and related to the literature of physical processes of clustering and polymerization.
Keyword note:Pitman__Jim
Report ID:457

Page: 1 2  Next