**Statistics Technical Reports:**Search | Browse by year

**Term(s):**2013**Results:**4**Sorted by:**

**Title:**Unseparated pairs and fixed points in random permutations**Author(s):**Diaconis, Persi; Evans, Steven N.; Graham, Ron; **Date issued:**August 2013

http://nma.berkeley.edu/ark:/28722/bk00153298w (PDF) **Abstract:**In a uniform random permutation \Pi of [n] := {1,2,...,n}, the set of elements k in [n-1] such that \Pi(k+1) = \Pi(k) + 1
has the same distribution as the set of fixed points of \Pi that lie in [n-1]. We give three different proofs of this fact
using, respectively, an enumeration relying on the inclusion-exclusion principle, the introduction of two different Markov
chains to generate uniform random permutations, and the construction of a combinatorial bijection. We also obtain the distribution
of the analogous set for circular permutations that consists of those k in [n] such that \Pi(k+1 mod n) = \Pi(k) + 1 mod n.
This latter random set is just the set of fixed points of the commutator [\rho, \Pi], where \rho is the n-cycle (1,2,...,n).
We show for a general permutation \eta that, under weak conditions on the number of fixed points and 2-cycles of \eta, the
total variation distance between the distribution of the number of fixed points of [\eta,\Pi] and a Poisson distribution with
expected value 1 is small when n is large.**Keyword note:**Diaconis__Persi Evans__Steven_N Graham__Ron**Report ID:**822**Relevance:**100

**Title:**Analysis and rejection sampling of Wright-Fisher diffusion bridges**Author(s):**Schraiber, Joshua G.; Griffiths, Robert C.; Evans, Steven N.; **Date issued:**June 2013

http://nma.berkeley.edu/ark:/28722/bk001532b00 (PDF) **Abstract:**We investigate the properties of a Wright–Fisher diffusion process starting at frequency x at time 0 and conditioned to be
at frequency y at time T. Such a process is called a bridge. Bridges arise naturally in the analysis of selection acting on
standing variation and in the inference of selection from allele frequency time series. We establish a number of results about
the distribution of neutral Wright–Fisher bridges and develop a novel rejection-sampling scheme for bridges under selection
that we use to study their behavior.**Keyword note:**Schraiber__Joshua_G Griffiths__Robert_C Evans__Steven_N**Report ID:**821**Relevance:**100

**Title:**Removing unwanted variation from high dimensional data with negative controls**Author(s):**Gagnon-Bartsch, Johann A.; Jacob, Laurent; Speed, Terence P.; **Date issued:**December 2013

http://nma.berkeley.edu/ark:/28722/bk0014f2j28 (PDF) **Abstract:**High dimensional data suffer from unwanted variation, such as the batch effects common in microarray data. Unwanted variation
complicates the analysis of high dimensional data, leading to high rates of false discoveries, high rates of missed discoveries,
or both. In many cases the factors causing the unwanted variation are unknown and must be inferred from the data. In such
cases, negative controls may be used to identify the unwanted variation and separate it from the wanted variation. We present
a new method, RUV-4, to adjust for unwanted variation in high dimensional data with negative controls. RUV-4 may be used when
the goal of the analysis is to determine which of the features are truly associated with a given factor of interest. One nice
property of RUV-4 is that it is relatively insensitive to the number of unwanted factors included in the model; this makes
estimating the number of factors less critical. We also present a novel method for estimating the features' variances that
may be used even when a large number of unwanted factors are included in the model and the design matrix is full rank. We
name this the "inverse method for estimating variances." By combining RUV-4 with the inverse method, it is no longer necessary
to estimate the number of unwanted factors at all. Using both real and simulated data we compare the performance of RUV-4
with that of other adjustment methods such as SVA, LEAPP, ICE, and RUV-2. We find that RUV-4 and its variants perform as well
or better than other methods.**Keyword note:**Gagnon-Bartsch__Johann_A Jacob__Laurent Speed__Terry_P**Report ID:**820**Relevance:**100

**Title:**Comparing somatic mutation-callers**Author(s):**Kim, Su Yeon; Speed, Terence P.; **Date issued:**February 2013

http://nma.berkeley.edu/ark:/28722/bk0012h7s4g (PDF) **Abstract:**Background: Somatic mutation-calling based on DNA from matched tumor-normal patient samples is one of the key tasks carried
by many cancer genome projects. One such large-scale project is The Cancer Genome Atlas (TCGA), which is now routinely compiling
catalogs of somatic mutations from hundreds of paired tumor-normal DNA exome-sequence data. Nonetheless, mutation calling
is still very challenging. TCGA benchmark studies revealed that even relatively recent mutation callers from major centers
showed substantial discrepancies. Evaluation of the mutation callers or understanding the sources of discrepancies is not
straightforward, since for most tumor studies, validation data based on independent whole-exome DNA sequencing is not available,
only partial validation data for a selected (ascertained) subset of sites. Results: We have analyzed two sets of mutation-calling
data from multiple centers and their partial validation data. Various aspects of the mutation-calling outputs were explored
to characterize the discrepancies in detail. To assess the performances of multiple callers, we introduce four approaches
utilizing the external sequence data to varying degrees, ranging from having independent DNA-seq pairs, RNA-seq for tumor
samples only, the original exome-seq pairs only, or none of those. Conclusions: Our analyses provide guidelines to visualizing
and understanding the discrepancies among the outputs from multiple callers. Furthermore, applying the four evaluation approaches
to the whole exome data, we illustrate the challenges and highlight the various circumstances that require extra caution in
assessing the performances of multiple callers.**Keyword note:**Kim__Su_Yeon Speed__Terry_P**Report ID:**819**Relevance:**100