Statistics Technical Reports:Search | Browse by year

Term(s):1999
Results:28
Sorted by:
Page: 1 2  Next

Title:The fifth cell
Author(s):Wachter, K. W.; Freedman, D. A.; 
Date issued:Dec 1999
http://nma.berkeley.edu/ark:/28722/bk0000n284w (PDF)
http://nma.berkeley.edu/ark:/28722/bk0000n285f (PostScript)
Abstract:One form of error that can affect census adjustments is correlation bias, reflecting people who are doubly missing-- from the census and from the adjusted counts as well. This paper presents a method for estimating the total national number of doubly-missing people and their distribution by race and sex. Application to the 1990 U.S. census adjustment leads to an estimate of 3 million doubly-missing people. Correlation bias is likely to be a serious problem for census adjustment in 2000. The methods of this paper are well suited for measuring its magnitude.
Keyword note:Wachter__Kenneth Freedman__David
Report ID:570
Relevance:100

Title:Probability laws related to the Jacobi theta and Riemann zeta functions, and Brownian excursions
Author(s):Biane, Philippe; Pitman, Jim; Yor, Marc; 
Date issued:Oct 1999
http://nma.berkeley.edu/ark:/28722/bk0000n2817 (PDF)
http://nma.berkeley.edu/ark:/28722/bk0000n282s (PostScript)
Abstract:This paper reviews known results which connect Riemann's integral representations of his zeta function, involving Jacobi's theta function and its derivatives, to some particular probability laws governing sums of independent exponential variables. These laws are related to one-dimensional Brownian motion and to higher dimensional Bessel processes. We present some characterizations of these probability laws, and some approximations of Riemann's zeta function which are related to these laws.
Keyword note:Biane__Philippe Pitman__Jim Yor__Marc
Report ID:569
Relevance:100

Title:A necessary and sufficient condition for the $\Lambda$-coalescent to come down from infinity.
Author(s):Schweinsberg, Jason; 
Date issued:Sep 1999
Abstract:Let $\Pi_(\infty)$ be the standard $\Lambda$-coalescent of Pitman, which is defined so that $\Pi_(\infty)(0)$ is the partition of the positive integers into singletons, and, if $\Pi_n$ denotes the restriction of $\Pi_(\infty)$ to $\( 1, \ldots, n \)$, then whenever $\Pi_n(t)$ has $b$ blocks, each $k$-tuple of blocks is merging to form a single block at the rate $\lambda_(b,k)$, where \begin(displaymath) \lambda_(b,k) = \int_0^1 x^(k-2) (1-x)^(b-k) \: \Lambda(dx) \end(displaymath) for some finite measure $\Lambda$. We give a necessary and sufficient condition for the $\Lambda$-coalescent to ``come down from infinity'', which means that the partition $\Pi_(\infty)(t)$ almost surely consists of only finitely many blocks for all $t > 0$. We then show how this result applies to some particular families of $\Lambda$-coalescents.
Pub info:ECP Vol 5 (2000) Paper 1
Keyword note:Schweinsberg__Jason
Report ID:568
Relevance:100

Title:Random Forests--Random Features
Author(s):Breiman, Leo; 
Date issued:Sep 1999
http://nma.berkeley.edu/ark:/28722/bk0000n271q (PDF)
http://nma.berkeley.edu/ark:/28722/bk0000n2728 (PostScript)
Abstract:Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost, but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. These ideas are al;so applicable to regression.
Keyword note:Breiman__Leo
Report ID:567
Relevance:100

Title:Two coalescents derived from the ranges of stable subordinators
Author(s):Bertoin, Jean; Pitman, Jim; 
Date issued:Sep 1999
Abstract:Let $M_\alpha$ be the closure of the range of a stable subordinator of index $\alpha\in ]0,1[$. There are two natural constructions of the $M_(\alpha)$'s simultaneously for all $\alpha\in ]0,1[$, so that $M_(\alpha)\subseteq M_(\beta)$ for $0< \alpha < \beta < 1$: one based on the intersection of independent regenerative sets and one based on Bochner's subordination. We compare the corresponding two coalescent processes defined by the lengths of complementary intervals of $[0,1]\backslash M_(1-\rho)$ for $0 < \rho < 1$. In particular, we identify the coalescent based on the subordination scheme with the coalescent recently introduced by Bolthausen and Sznitman.
Pub info:Electronic Journal of Probability, Vol. 5 (2000) Paper no. 7, pages 1-17
Keyword note:Bertoin__Jean Pitman__Jim
Report ID:566
Relevance:100

Title:Right inverses of L\'evy processes and stationary stopped local times
Author(s):Evans, Steven N.; 
Date issued:Aug 1999
http://nma.berkeley.edu/ark:/28722/bk0000n4259 (PDF)
http://nma.berkeley.edu/ark:/28722/bk0000n426v (PostScript)
Abstract:If $X$ is a L\'evy process on the line, then there exists a non--decreasing, c\`adl\`ag process $H$ such that $X(H(x)) = x$ for all $x \ge 0$ if and only if $X$ is recurrent and has a non--trivial Gaussian component. The minimal such $H$ is a subordinator $K$. The law of $K$ is identified and shown to be the same as that of a multiple of the inverse local time at $0$ of $X$. When $X$ is Brownian motion, $K$ is just the usual ladder times process and this result extends the classical result of L\'evy that the maximum process has the same law as the local time at $0$. Write $G_t$ for last point in the range of $K$ prior to $t$. In a parallel with classical fluctuation theory, the process $Z := (X_t - X_(G_t))_(t \ge 0)$ is Markov with local time at $0$ given by $(X_(G_t))_(t \ge 0)$. The transition kernel and excursion measure of $Z$ are identified. A similar programme is carried out for L\'evy processes on the circle. This leads to the construction of a stopping time such that the stopped local times constitute a stationary process indexed by the circle.
Keyword note:Evans__Steven_N
Report ID:565
Relevance:100

Title:Where Did The Brownian Particle Go?
Author(s):Pemantle, Robin; Peres, Yuval; Pitman, Jim; Yor, Marc; 
Date issued:Aug 1999
Abstract:Consider the radial projection onto the unit sphere of the path a $d$-dimensional Brownian motion $W$, started at the center of the sphere and run for unit time. Given the occupation measure $\mu$ of this projected path, what can be said about the terminal point $W(1)$, or about the range of the original path? In any dimension, for each Borel set $A \subseteq S^(d-1)$, the conditional probability that the projection of $W(1)$ is in $A$ given $\mu(A)$ is just $\mu (A)$. Nevertheless, in dimension $d \ge 3$, both the range and the terminal point of $W$ can be recovered with probability 1 from $\mu$. In particular, for $d \ge 3$ the conditional law of the projection of $W(1)$ given $\mu$ is not $\mu$. In dimension 2 we conjecture that the projection of $W(1)$ cannot be recovered almost surely from $\mu$, and show that the conditional law of the projection of $W(1)$ given $\mu$ is not $\mu$.
Pub info:Electronic Journal of Probability, Vol. 6 (2001) Paper no. 10, pages 1-22
Keyword note:Pemantle__Robin Peres__Yuval Pitman__Jim Yor__Marc
Report ID:564
Relevance:100

Title:On the distribution of ranked heights of excursions of a Brownian bridge
Author(s):Pitman, Jim; Yor, Marc; 
Date issued:Aug 1999
http://nma.berkeley.edu/ark:/28722/bk0000n418f (PDF)
http://nma.berkeley.edu/ark:/28722/bk0000n4190 (PostScript)
Abstract:The distribution of the sequence of ranked maximum and minimum values attained during excursions of a standard Brownian bridge is described. The height of the $j$th highest maximum $M_j$ over a positive excursion of the bridge has the same distribution as $M_1/j$, where the distribution of $M_1$ is given by L\'evy's formula $P( M_1 > x ) = e^(-2x^2)$. The probability density of the height of the $j$th highest maximum of excursions of the reflecting Brownian bridge is given by a modification of the known $\theta$-function series for the density of the maximum absolute value of the bridge. These results are obtained from a more general description of the distribution of ranked values of a homogeneous functional of excursions of the standardized bridge of a self-similar recurrent Markov process.
Pub info:Annals of Probability vol. 29, pages 362-384 (2001)
Keyword note:Pitman__Jim Yor__Marc
Report ID:563
Relevance:100

Title:Nonparametric estimation of a periodic function
Author(s):Hall, Peter; Reimann, James; Rice, John; 
Date issued:Jul 1999
http://nma.berkeley.edu/ark:/28722/bk0000n415s (PDF)
http://nma.berkeley.edu/ark:/28722/bk0000n416b (PostScript)
Abstract:Motivated by applications to light curves of periodic variable stars, we study nonparametric methods for estimating both the period and the amplitude function from noisy observations of a periodic function made at irregularly spaced times. It is shown that nonparametric estimators of period converge at parametric rates and attain a semiparametric lower bound which is the same if the shape of the periodic function is unknown as if it were known. Also, first-order properties of nonparametric estimators of the amplitude function are identical to those that would obtain if the period were known. Numerical simulations and applications to real data show the method to work well in practice.
Keyword note:Hall__Peter Reimann__James_Dennis Rice__John_Andrew
Report ID:562
Relevance:100

Title:Stochastic billiards on general tables
Author(s):Evans, Steven N.; 
Date issued:Jul 1999
http://nma.berkeley.edu/ark:/28722/bk0000n4124 (PDF)
http://nma.berkeley.edu/ark:/28722/bk0000n413p (PostScript)
Abstract:We consider stochastic analogues of classical billiard systems. A particle moves at unit speed with constant direction in the interior of a bounded, $d$--dimensional region with continuously differentiable boundary. The boundary need not be connected; that is, the ``table'' may have interior ``obstacles''. When the particle strikes the boundary, a new direction is chosen uniformly at random from the directions that point back into the interior of the region and the motion continues. Such chains are closely related to those that appear in shake--and--bake simulation algorithms. For the discrete time Markov chain that records the locations of successive hits on the boundary, we show that, uniformly in the starting point, there is exponentially fast total variation convergence to an invariant distribution. By analysing an associated non--linear, first--order PDE, we investigate which regions are such that this chain is reversible with respect to surface measure on the boundary. We also establish a result on uniform total variation C\'esaro convergence to equilibrium for the continuous time Markov process that tracks the position and direction of the particle. A key ingredient in our proof is a result on the geometry of $C^1$ regions that can be described loosely as follows: associated with any bounded $C^1$ region is an integer $N$ such that it is always possible to pass a message between any two locations in the region using a relay of exactly $N$ locations with the property that every location in the relay is directly visible from its predecessor. Moreover, the locations of the intermediaries can be chosen from a fixed, finite subset of positions on the boundary of the region. We also consider corresponding results for polygonal regions in the plane.
Keyword note:Evans__Steven_N
Report ID:561
Relevance:100

Title:A polytope related to empirical distributions, plane trees, parking functions, and the associahedron
Author(s):Pitman, Jim; Stanley, Richard; 
Date issued:Jun 1999
http://nma.berkeley.edu/ark:/28722/bk0000n3b3v (PDF)
http://nma.berkeley.edu/ark:/28722/bk0000n3b4d (PostScript)
Abstract:The volume of the n-dimensional polytope of all (y_1, ... , y_n) with y_i > 0 and y_1 + ... + y_i < x_1 + ... + x_i for all i for arbitrary (x_1, ... , x_n) with x_i > 0 for all i defines a polynomial in variables x_i which admits a number of interpretations, in terms of empirical distributions, plane partitions, and parking functions. We interpret the terms of this polynomial as the volumes of chambers in two different polytopal subdivisions. The first of these subdivisions generalizes to a class of polytopes called sections of order cones. In the second subdivision, the chambers are indexed in a natural way by rooted binary trees with n+1 vertices, and the configuration of these chambers provides a representation of another polytope with many applications, the associahedron.
Pub info:Discrete and Computational Geometry 27: 603-634 (2002)
Keyword note:Pitman__Jim Stanley__Richard
Report ID:560
Relevance:100

Title:A new methodology for evaluating incident detection algorithms
Author(s):Ostland, M.; Petty, K. F.; Bickel, P. J.; Kwon, J.; Rice, J. A.; 
Date issued:June 1999
Keyword note:Ostland__Michael_Anthony Petty__Karl_F Bickel__Peter_John Kwon__Jaimyoung Rice__John_Andrew
Report ID:559
Relevance:100

Title:Some properties of the arc sine law related to its invariance under a family of rational maps
Author(s):Pitman, Jim; Yor, Marc; 
Date issued:Apr 1999
http://nma.berkeley.edu/ark:/28722/bk0000n397j (PDF)
http://nma.berkeley.edu/ark:/28722/bk0000n3983 (PostScript)
Abstract:This paper shows how the invariance of the arc sine distribution on $(0,1)$ under a family of rational maps is related on the one hand to various integral identities with probabilistic interpretations involving random variables derived from Brownian motion with arc sine, Gaussian, Cauchy and other distributions, and on the other hand to results in the analytic theory of iterated rational maps.
Keyword note:Pitman__Jim Yor__Marc
Report ID:558
Relevance:100

Title:A score test for the linkage analysis of qualitative and quantitative traits based on identity by descent data on sib-pairs
Author(s):Dudoit, Sandrine; Speed, Terence P.; 
Date issued:Apr 1999
Abstract:We propose a general likelihood-based approach to the linkage analysis of qualitative and quantitative traits using identity by descent (IBD) data from sib-pairs. We consider the likelihood of IBD data conditional on phenotypes (discrete or continuous) and test the null hypothesis of no linkage between a marker locus and a gene influencing the trait using a score test in the recombination fraction $\theta$ between the two loci. This method unifies the linkage analysis of qualitative and quantitative traits into a single inferential framework, yielding a simple and intuitive test statistic. The score statistic readily extends to accommodate incomplete IBD data at the test locus, by using the hidden Markov model implemented in the programs MAPMAKER/SIBS and GENEHUNTER to obtain the multipoint inheritance distribution for each sib-pair (Kruglyak and Lander (1995) and Kruglyak et al. (1996)). The linkage score test is derived under general genetic models, which may include multiple unlinked genes. Population genetic assumptions, such as random mating or linkage equilibrium between the trait loci, are not required. This score test is thus particularly promising for the analysis of complex human traits. Conditioning on phenotypes avoids unrealistic random sampling assumptions and allows sib-pairs from differing ascertainment mechanisms to be incorporated into a single likelihood analysis. It allows in particular the selection of sib-pairs based on their trait values and the analysis of only those pairs having the most informative phenotypes. A further advantage of the score test is that it is based on the full likelihood, i.e. the likelihood based on all phenotype data rather than just differences of sib-pair phenotypes. Considering only phenotype differences, as in Haseman and Elston (1972) and Kruglyak and Lander (1995), may result in important losses in power. Simulation studies indicate that the linkage score test generally matches or outperforms the Haseman-Elston test, the largest gains in power being for selected samples of sib-pairs with extreme phenotypes.
Keyword note:Dudoit__Sandrine Speed__Terry_P
Report ID:556
Relevance:100

Title:Snakes and spiders: Brownian motion on R-trees
Author(s):Evans, Steven N.; 
Date issued:Apr 1999
http://nma.berkeley.edu/ark:/28722/bk0000n384c (PDF)
http://nma.berkeley.edu/ark:/28722/bk0000n385x (PostScript)
Abstract:We consider diffusion processes on a class of $\bR$--trees. The processes are defined in a manner similar to that of Le Gall's Brownian snake. Each point in the tree has a real--valued ``height'' or ``generation'', and the height of the diffusion process evolves as a Brownian motion. When the height process decreases the diffusion retreats back along a lineage, whereas when the height process increases the diffusion chooses among branching lineages according to relative weights given by a possibly infinite measure on the family of lineages. The class of $\bR$--trees we consider can have branch points with countably infinite branching and lineages along which the branch points have points of accumulation. We give a rigorous construction of the diffusion process, identify its Dirichlet form, and obtain a necessary and sufficient condition for it to be transient. We show that the tail $\sigma$--field of the diffusion is always trivial and draw the usual conclusion that bounded space--time harmonic functions are constant. In the transient case, we identify the Martin compactification and obtain the corresponding integral representations of excessive and harmonic functions. Using Ray--Knight methods, we show that the only entrance laws for the diffusion are the trivial ones that arise from starting the process inside the state--space. Finally, we use the Dirichlet form stochastic calculus to obtain a semimartingale description of the diffusion that involves local time additive functionals associated with each branch point of the tree.
Keyword note:Evans__Steven_N
Report ID:555
Relevance:100

Title:Algebraic evaluations of some Euler integrals, duplication formulae for Appell's hypergeometric function $F_1$, and Brownian variations
Author(s):Ismail, Mourad E. H.; Pitman, Jim; 
Date issued:Apr 1999
http://nma.berkeley.edu/ark:/28722/bk0000n3917 (PDF)
http://nma.berkeley.edu/ark:/28722/bk0000n392s (PostScript)
Abstract:Explicit evaluations of the symmetric Euler integral $\int_0^1 u^(\alpha) (1-u)^(\alpha) f(u) du$ are obtained for some particular functions $f$. %such as $f(u) = [(1- yu)(1-zu)]^(\beta)$ for %$\beta = \alpha + \hf$ as well as for some other values. These evaluations are related to duplication formulae for Appell's hypergeometric function $F_1$ which give reductions of $F_1 ( \alpha, \beta, \beta, 2 \alpha, y, z)$ in terms of more elementary functions for for arbitrary $\beta$ with $z = y/(y-1)$ and for $\beta = \alpha + \hf$ with arbitrary $y,z$. These duplication formulae generalize the evaluations of some symmetric Euler integrals implied by the following result: if a standard Brownian bridge is sampled at time $0$, time $1$, and at $n$ independent random times with uniform distribution on $[0,1]$, then the broken line approximation to the bridge obtained from these $n+2$ values has a total variation whose mean square is $n(n+1)/(2n+1)$. Key words and phrases: Brownian bridge, Gauss's hypergeometric function, Lauricella's multiple hypergeometric series, uniform order statistics, Appell functions.
Keyword note:Ismail__Mourad_E_H Pitman__Jim
Report ID:554
Relevance:100

Title:Constructions of a Brownian path with a given minimum
Author(s):Bertoin, Jean; Pitman, Jim; de Chavez, Juan Ruiz; 
Date issued:Apr 1999
Abstract:We construct a Brownian path conditioned on its minimum value over a fixed time interval by simple transformations of a Brownian bridge
Pub info:Electronic Communications in Probability, Vol. 4 (1999) Paper no. 5, pages 31-37
Keyword note:Bertoin__Jean Pitman__Jim Chavez__Juan_Ruiz_de
Report ID:553
Relevance:100

Title:Inverse Problems as Statistics
Author(s):Stark, P. B.; 
Date issued:Apr 1999
Abstract:H.W. Engl, A.K. Louis, J.R. Mclaughlin and W. Rundell, eds., Springer-Verlag, New York, pp. 253-275 (invited). What mathematicians, scientists, engineers, and statisticians mean by ``inverse problem'' differs. For a statistician, an inverse problem is an inference or estimation problem. The data are finite in number and contain errors, as they do in classical estimation or inference problems, and the unknown typically is infinite-dimensional, as it is in nonparametric regression. The additional complication in an inverse problem is that the data are only indirectly related to the unknown. Standard statistical concepts, questions, and considerations such as bias, variance, mean-squared error, identifiability, consistency, efficiency, and various forms of optimality apply to inverse problems. This article discusses inverse problems as statistical estimation and inference problems, and points to the literature for a variety of techniques and results.
Pub info:in Surveys on Solution Methods for Inverse Problems, Colton, D.,
Keyword note:Stark__Philip_B
Report ID:552
Relevance:100

Title:The 1990 and 2000 Census Adjustment Plans
Author(s):Stark, Philip B.; 
Date issued:Mar 1999
Date modified:revised 16 May 2000
http://nma.berkeley.edu/ark:/28722/bk0000n408x (PDF)
http://nma.berkeley.edu/ark:/28722/bk0000n409g (PostScript)
Abstract:A revised plan for the 2000 Decennial Census was announced in a 24 February 1999 Bureau of the Census publication and a press statement by K. Prewitt, Director of the Bureau of the Census. Census 2000 will include counts and ``adjusted'' counts. The adjustments involve complicated procedures and calculations on data from a sample of blocks, extrapolated throughout the country to demographic groups called ``post-strata.'' The 2000 adjustment plan is called Accuracy and Coverage Evaluation (ACE). ACE is quite similar to the 1990 adjustment plan, called the Post-Enumeration Survey (PES). The 1990 PES fails some plausibility checks and probably would have reduced the accuracy of counts and state shares. ACE and PES differ in sample size, data capture, timing, record matching, post-stratification, methods to compensate for missing data, the treatment of movers, and details of the data analysis. ACE improves on PES in a number of ways, including using a larger sample, using a simpler model to assign ``match probabilities'' to records with insufficient data, and incorporating mail-back return rates into some post-strata. Nonetheless, ACE shares the most serious problems of PES. The ``Be Counted'' program, census response submission over the internet, computer unduplication of records, the treatment of movers, a new definition of ``correct address,'' more limited search for matching records, the use of optical character recognition (OCR) to capture data, the data collection schedule, and the assignment of ``residence probabilities'' to some sample records, are likely to make ACE less accurate than the 1990 PES.
Keyword note:Stark__Philip_B
Report ID:550
Relevance:100

Title:Ecological inference and the ecological fallacy
Author(s):Freedman, David A.; 
Date issued:Mar 1999
http://nma.berkeley.edu/ark:/28722/bk0000n4058 (PDF)
http://nma.berkeley.edu/ark:/28722/bk0000n406t (PostScript)
Abstract:This paper reviews several methods for making ecological inferences, that is, inferring the behavior of individuals from aggregate data. Also considered is the ecological fallacy, which is the idea that relationships observed for groups necessarily hold for individuals.
Keyword note:Freedman__David
Report ID:549
Relevance:100

Page: 1 2  Next