**Statistics Technical Reports:**Search | Browse by year

**Term(s):**2009**Results:**18**Sorted by:**

**Title:**Coalescing systems of Brownian particles on the Sierpinski gasket and stable particles on the line or circle**Author(s):**Evans, Steven N.; Morris, Ben; Sen, Arnab; **Date issued:**November 2009

http://nma.berkeley.edu/ark:/28722/bk000641c58 (PDF) **Abstract:**A well-known result of Arratia shows that one can make rigorous the notion of starting an independent Brownian motion at every
point of an arbitrary closed subset of the real line and then building a set-valued process by requiring particles to coalesce
when they collide. Arratia noted that the value of this process will be almost surely a locally finite set at all positive
times, and a finite set almost surely if the initial value is compact: the key to both of these facts is the observation that,
because of the topology of the real line and the continuity of Brownian sample paths, at the time when two particles collide
one or the other of them must have already collided with each particle that was initially between them. We investigate whether
such instantaneous coalescence still occurs for coalescing systems of particles where either the state space of the individual
particles is not locally homeomorphic to an interval or the sample paths of the individual particles are discontinuous. We
show that Arratia's conclusion is valid for Brownian motions on the Sierpinski gasket and for stable processes on the real
line with stable index greater than one.**Keyword note:**Evans__Steven_N Morris__Ben Sen__Arnab**Report ID:**785**Relevance:**100

**Title:**On the realized risk of high-dimensional Markowitz portfolios**Author(s):**El Karoui, Noureddine; **Date issued:**October 2009

http://nma.berkeley.edu/ark:/28722/bk000641c7c (PDF) **Abstract:**We study the realized risk of Markowitz portfolio computed using parameters estimated from data and generalizations to similar
questions involving the out-of-sample risk in quadratic programs with linear equality constraints. We do so under the assumption
that the data is generated according to an elliptical model, which allows us to study models where we have heavy-tails, tail
dependence, and leptokurtic marginals for the data. We place ourselves in the setting of high-dimensional inference where
the number of assets in the portfolio, $p$, is large and comparable to the number of samples, $n$, we use to estimate the
parameters. Our approach is based on random matrix theory. We consider both the impact of the estimation of the mean and of
the covariance. Our work shows that risk is underestimated in this setting, and further, that in the class of elliptical
distributions, the Gaussian case yields the least amount of risk underestimation. The problem is more pronounced for genuinely
elliptical distributions and Gaussian computations give an overoptimistic view of the situation. We also propose a robust
estimator of realized risk and investigate its performance in simulations.**Keyword note:**El__Karoui__Noureddine**Report ID:**784**Relevance:**100

**Title:**The Lasso under Heteroscedasticity**Author(s):**Jia, Jinzhu; Rohe, Karl; Yu, Bin; **Date issued:**November 2009

http://nma.berkeley.edu/ark:/28722/bk000641c9g (PDF) **Abstract:**Lasso is a popular method for variable selection in regression. Much theoretical understanding has been obtained recently
on its model selection or sparsity recovery properties under sparse and homoscedastic linear regression models. Since these
standard model assumptions are often not met in practice, it is important to understand how Lasso behaves under nonstandard
model assumptions. In this paper, we study the sign consistency of the Lasso under one such model where the variance of the
noise scales linearly with the expectation of the observation. This sparse Poisson-like model is motivated by medical imaging.
In addition to studying the sign consistency, we also give sufficient conditions for $\ell_\infty$ consistency. With theoretical
and simulation studies, we provide conditions for when the Lasso should not be expected to be sign consistent. One interesting
finding is that $\truebeta$ can not be spread out. Precisely, for both deterministic design and random Gaussian design, the
sufficient conditions for the Lasso to be sign consistent require $\|\truebeta\|_2 / [\minbeta]^2$ to be not too big, where
$\minbeta$ is the smallest nonzero element of $|\truebeta|$. By special designs of $\X$, we show that $\|\truebeta\|_2 / [\minbeta]^2=o(n)$
is almost necessary. For Positron Emission Tomography (PET), this suggests that when there are dense areas of the positron
emitting substance, less dense areas are not well detected by the Lasso; this is of particular concern when imaging tumors;
the periphery of the tumor will produce a much weaker signal than the center, leading to a big $\|\truebeta\|_2 / [\minbeta]^2$.
We compare the sign consistency of the Lasso under the Poisson-like model to its sign consistency on the standard model which
assumes the noise is homoscedastic. The comparison shows that when $\truebeta$ is spread out, the Lasso performs worse for
data from the Poisson-like model than those from the standard model, confirming our theoretical findings.**Keyword note:**Jia__Jinzhu Rohe__Karl Yu__Bin**Report ID:**783**Relevance:**100

**Title:**Confidence Intervals for Negative Binomial Random Variables of High Dispersion**Author(s):**Shilane, David; Evans, Steven N.; Hubbard, Alan E.; **Date issued:**September 2009

http://nma.berkeley.edu/ark:/28722/bk000641d1k (PDF) **Abstract:**This paper considers the problem of constructing confidence intervals for the mean of a Negative Binomial random variable
based upon sampled data. When the sample size is large, it is a common practice to rely upon a Normal distribution approximation
to construct these intervals. However, we demonstrate that the sample mean of highly dispersed Negative Binomials exhibits
a slow convergence in distribution to the Normal as a function of the sample size. As a result, standard techniques (such
as the Normal approximation and bootstrap) will construct confidence intervals for the mean that are typically too narrow
and significantly undercover in the case of high dispersion. To address this problem, we propose methods based upon Bernstein's
inequality along with the Gamma and Chi Square distributions as alternatives to the standard methods when the sample size
is small and the dispersion is high. A confidence interval based upon Bernstein's inequality relies upon less stringent assumptions
than those required of parametric models. Moreover, we prove a limit theorem demonstrating that the sample mean of Negative
Binomials converges in distribution to a Gamma random variable under suitable hypotheses, and we use this observation to construct
approximate confidence intervals. Furthermore, we investigate the applicability of the Chi Square distribution as a special
case of the Gamma model. We then undertake a variety of simulation experiments to compare the proposed methods to standard
techniques in terms of empirical coverage and provide concrete recommendations for the settings in which particular intervals
are preferred. We also apply the proposed methods to examples arising in the serial analysis of gene expression and traffic
flow in a communications network to illustrate both the strengths and weaknesses of these procedures along with those of standard
techniques.**Keyword note:**Shilane__David Evans__Steven_N Hubbard__Alan_E**Report ID:**782**Relevance:**100

**Title:**High-dimensionality effects in the Markowitz problem and other quadratic programs with linear equality constraints: risk underestimation**Author(s):**El Karoui, Noureddine; **Date issued:**August 2009

http://nma.berkeley.edu/ark:/28722/bk000641d3p (PDF) **Abstract:**We study the properties of solutions of quadratic programs with linear equality constraints whose parameters are estimated
from data in the high-dimensional setting where $p$, the number of variables in the problem, is of the same order of magnitude
as n, the number of observations used to estimate the parameters. The Markowitz problem in Finance is a subcase of our study.
Assuming normality and independence of the observations we relate the efficient frontier computed empirically to the ``true"
efficient frontier. Our computations show that there is a separation of the errors induced by estimating the mean of the observations
and estimating the covariance matrix. In particular, the price paid for estimating the covariance matrix is an underestimation
of the variance by a factor roughly equal to 1-p/n. Therefore the risk of the optimal population solution is underestimated
when we estimate it by solving a similar quadratic program with estimated parameters. We also characterize the statistical
behavior of linear functionals of the empirical optimal vector and show that they are biased estimators of the corresponding
population quantities. We investigate the robustness of our Gaussian results by extending the study to certain elliptical
models and models where our $n$ observations are correlated (in ``time"). We show a lack of robustness of the Gaussian results,
but are still able to get results concerning first order properties of the quantities of interest, even in the case of relatively
heavy-tailed data (we require two moments). Risk underestimation is still present in the elliptical case and more pronounced
that in the Gaussian case. We discuss properties of the non-parametric and parametric bootstrap in this context. We show
several results, including the interesting fact that standard applications of the bootstrap generally yields inconsistent
estimates of bias. Finally, we propose some strategies to correct these problems and practically validate them in some simulations.
In all the paper, we will assume that p, n and n-p tend to infinity, and p<n.**Keyword note:**El__Karoui__Noureddine**Report ID:**781**Relevance:**100

**Title:**A zero-one law for linear transformations of Levy noise**Author(s):**Evans, Steven N.; **Date issued:**August 2009

http://nma.berkeley.edu/ark:/28722/bk000641d5s (PDF) **Abstract:**A L\'evy noise on $\mathbb{R}^d$ assigns a random real ``mass'' $\Pi(B)$ to each Borel subset $B$ of $\mathbb{R}^d$ with
finite Lebesgue measure. The distribution of $\Pi(B)$ only depends on the Lebesgue measure of $B$, and if $B_1, \ldots, B_n$
is a finite collection of pairwise disjoint sets, then the random variables $\Pi(B_1), \ldots, \Pi(B_n)$ are independent
with $\Pi(B_1 \cup \cdots \cup B_n) = \Pi(B_1) + \cdots + \Pi(B_n)$ almost surely. In particular, the distribution of $\Pi
\circ g$ is the same as that of $\Pi$ when $g$ is a bijective transformation of $\mathbb{R}^d$ that preserves Lebesgue measure.
It follows from the Hewitt--Savage zero--one law that any event which is almost surely invariant under the mappings $\Pi
\mapsto \Pi \circ g$ for every Lebesgue measure preserving bijection $g$ of $\mathbb{R}^d$ must have probability $0$ or $1$.
We investigate whether certain smaller groups of Lebesgue measure preserving bijections also possess this property. We show
that if $d \ge 2$, the L\'evy noise is not purely deterministic, and the group consists of linear transformations and is
closed, then the invariant events all have probability $0$ or $1$ if and only if the group is not compact.**Keyword note:**Evans__Steven_N**Report ID:**780**Relevance:**100

**Title:**Ensemble Filtering for High Dimensional Non-linear State Space Models**Author(s):**Lei, Jing; Bickel, Peter; **Date issued:**August 2009

http://nma.berkeley.edu/ark:/28722/bk000641d90 (PDF) **Abstract:**We consider non-linear state space models in high-dimensional situations, where the two common tools for state space models
both have difficulties. The Kalman filter variants are seriously biased due to non-Gaussianity and the particle filter suffers
from the ``curse of dimensionality''. Inspired by a regression perspective on the Kalman filter, a novel approach is developed
by combining the Kalman filter and particle filter, retaining the stability of the Kalman filter in large systems as well
as the accuracy of particle filters in highly non-linear systems. Its theoretical properties are justified under the Gaussian
linear models as an extension of the Kalman filter. Its performance is tested and compared with other methods on a simulated
chaotic system which is used widely in numerical weather forecasting.**Keyword note:**Lei__Jing Bickel__Peter_John**Report ID:**779**Relevance:**100

**Title:**On information plus noise kernel random matrices**Author(s):**El Karoui, Noureddine; **Date issued:**August 2009

http://nma.berkeley.edu/ark:/28722/bk000641f13 (PDF) **Abstract:**Kernel random matrices have attracted a lot of interest in recent years, from both practical and theoretical standpoints.
Most of the theoretical work so far has focused on the case were the data is sampled from a low-dimensional structure. Very
recently, the first results concerning kernel random matrices with high-dimensional input data were obtained, in a setting
where the data was sampled from a genuinely high-dimensional structure.**Keyword note:**El__Karoui__Noureddine**Report ID:**778**Relevance:**100

**Title:**Soccer/World Football**Author(s):**Brillinger, David R.; **Date issued:**July 2009

http://nma.berkeley.edu/ark:/28722/bk000641d7w (PDF) **Abstract:**This paper provides a review of statistical and related methods and models that have been employed in studies of soccer, also
know as association football. The sections are: 1. Data collection and descriptive analyses, 2. Stochastic modelling, 3. Ranking,
4. Tournaments and scheduling, 5. Game theory, 6. Economics and management, 7. Some remaining topics, 8. Extended reference
list.**Keyword note:**Brillinger__David_R**Report ID:**777**Relevance:**100

**Title:**Modelling spatial trajectories**Author(s):**Brillinger, David R.; **Date issued:**June 2009

http://nma.berkeley.edu/ark:/28722/bk000641f36 (PDF) **Abstract:**The study of trajectories has been basic to science for many centuries. One can mention the motion of the planets, the meanderings
of animals and the routes of ships. More recently there has been considerable modeling and statistical analysis of biological
and ecological processes of moving particles. The models may be motivated formally by difference and differential equations
and by potential functions. Initially, following Liebnitz and Newton, such models were described by deterministic differential
equations, but variability around observed paths has led to the introduction of random variables and to the development of
stochastic calculi. The results obtained from the fitting of such models are highly useful. They may be employed for: simple
description, summary, comparison, simulation, prediction, model appraisal, bootstrapping, and also employed for the estimation
of derived quantities of interest. The potential function approach, to be presented in Section 1.3.4, will be found to have
the advantage that an equation of motion is set down quite directly and that explanatories, including attractors, repellors,
and time-varying fields may be included conveniently.**Keyword note:**Brillinger__David_R**Report ID:**776**Relevance:**100

**Title:**An asymptotic sampling formula for the coalescent with recombination**Author(s):**Jenkins, Paul A.; Song, Yun S.; **Date issued:**June 2009

http://nma.berkeley.edu/ark:/28722/bk000641f59 (PDF) **Abstract:**Ewens sampling formula (ESF) is a one-parameter family of probability distributions with a number of intriguing combinatorial
connections. This elegant closed-form formula first arose in biology as the stationary probability distribution of a sample
configuration at one locus under the infinite-alleles model of mutation. Since its discovery in the early 1970s, the ESF
has been used in various biological applications, and has sparked several interesting mathematical generalizations. In the
population genetics community, extending the underlying random-mating model to include recombination has received much attention
in the past, but no general closed-form sampling formula is currently known even for the simplest extension, that is, a model
with two loci. In this paper, we show that it is possible to obtain useful closed-form results in the case the population-scaled
recombination rate rho is large but not necessarily infinite. Specifically, we consider an asymptotic expansion of the two-locus
sampling formula in inverse powers of rho and obtain closed-form expressions for the first few terms in the expansion. Our
asymptotic sampling formula applies to arbitrary sample sizes and configurations.**Keyword note:**Jenkins__Paul_A Song__Yun_S**Report ID:**775**Relevance:**100

**Title:**Simultaneous support recovery in high dimensions: Benefits and perils of block $\ell_1/\ell_\infty$-regularization**Author(s):**Negahban, S.; Wainwright, M. J.; **Date issued:**May 2009

http://nma.berkeley.edu/ark:/28722/bk0005d0b17 (PDF) **Abstract:**Given a collection of $r \geq 2$ linear regression problems in $\pdim$ dimensions, suppose that the regression coefficients
share partially common supports. This set-up suggests the use of $\ell_{1}/\ell_{\infty}$-regularized regression for joint
estimation of the $\pdim \times \numreg$ matrix of regression coefficients. We analyze the high-dimensional scaling of $\ell_1/\ell_\infty$-regularized
quadratic programming, considering both consistency rates in $\ell_\infty$-norm, and also how the minimal sample size $n$
required for performing variable selection grows as a function of the model dimension, sparsity, and overlap between the supports.
We begin by establishing bounds on the $\ell_\infty$-error as well sufficient conditions for exact variable selection for
fixed design matrices, as well as designs drawn randomly from general Gaussian matrices. Our second set of results applies
to $\numreg = 2$ linear regression problems with standard Gaussian designs whose supports overlap in a fraction $\alpha \in
[0,1]$ of their entries: for this problem class, we prove that the $\ell_{1}/\ell_{\infty}$-regularized method undergoes a
phase transition---that is, a sharp change from failure to success---characterized by the rescaled sample size $\theta_{1,\infty}(n,
p, s, \alpha) = n/\{(4 - 3 \alpha) s \log(p-(2-\alpha) \, s)\}$. More precisely, given sequences of problems specified by
$(n, p, s, \alpha)$, for any $\delta > 0$, the probability of successfully recovering both supports converges to $1$ if $\theta_{1,
\infty}(n, p, s, \alpha) > 1+\delta$, and converges to $0$ for problem sequences for which $\theta_{1,\infty}(n,p,s, \alpha)
< 1 - \delta$. An implication of this threshold is that use of $\ell_1 / \ell_{\infty}$-regularization yields improved statistical
efficiency if the overlap parameter is large enough ($\alpha > 2/3$), but has \emph{worse} statistical efficiency than a naive
Lasso-based approach for moderate to small overlap ($\alpha < 2/3$). Empirical simulations illustrate the close agreement
between these theoretical predictions, and the actual behavior in practice. These results indicate that some caution needs
to be exercised in the application of $\ell_1/\ell_\infty$ block regularization: if the data does not match its structure
closely enough, it can impair statistical performance relative to computationally less expensive schemes.**Keyword note:**Negahban__Sahand Wainwright__Martin**Report ID:**774**Relevance:**100

**Title:**Bayesian Functional ANOVA Modeling Using Gaussian Process Prior Distributions**Author(s):**Kaufman, Cari G.; Sain, Stephan R.; **Date issued:**April 2009

http://nma.berkeley.edu/ark:/28722/bk0005d0b5f (PDF) **Abstract:**Functional analysis of variance (ANOVA) models partition a functional response according to the main effects and interactions
of various factors. This article develops a general framework for functional ANOVA modeling from a Bayesian viewpoint, assigning
Gaussian process prior distributions to each batch of functional effects. We discuss the choices to be made in specifying
such a model, advocating the treatment of levels within a given factor as dependent but exchangeable quantities, and we suggest
weakly informative prior distributions for higher level parameters that may be appropriate in many situations. We discuss
computationally efficient strategies for posterior sampling using Markov Chain Monte Carlo algorithms, and we emphasize useful
graphical summaries based on the posterior distribution of model-based analogues of traditional ANOVA decompositions of variance.
We illustrate this process of model specification, posterior sampling, and graphical posterior summaries in two examples.
The first considers the effect of geographic region on the temperature profiles at weather stations in Canada. The second
example examines sources of variability in the output of regional climate models from a designed experiment.**Keyword note:**Kaufman__Cari Sain__Stephan_R**Report ID:**773**Relevance:**100

**Title:**Fast Approximate Spectral Clustering**Author(s):**Yan, Donghui; Huang, Ling; Jordan, Michael I.; **Date issued:**March 2009

http://nma.berkeley.edu/ark:/28722/bk0005d0c95 (PDF) **Abstract:**Spectral clustering refers to a flexible class of clustering procedures that can produce high-quality clusterings on small
data sets but which has limited applicability to large-scale problems due to its computational complexity of O(n^3), with
n the number of data points. We extend the range of spectral clustering by developing a general framework for fast approximate
spectral clustering in which a distortion-minimizing local transformation is first applied to the data. This framework is
based on a theoretical analysis that provides a statistical characterization of the effect of local distortion on the mis-clustering
rate. We develop two concrete instances of our general framework, one based on local k-means clustering (KASP) and one
based on random projection trees (RASP). Extensive experiments show that these algorithms can achieve significant speedups
with little degradation in clustering accuracy. Specifically, our algorithms outperform k-means by a large margin in terms
of accuracy, and run several times faster than approximate spectral clustering based on the Nystrom method, with comparable
accuracy and significantly smaller memory footprint. Remarkably, our algorithms make it possible for a single machine to
spectral cluster data sets with a million observations within several minutes.**Keyword note:**Yan__Donghui Huang__Ling Jordan__Michael_I**Report ID:**772**Relevance:**100

**Title:**Spectra of large random trees**Author(s):**Bhamidi, Shankar; Evans, Steven N.; Sen, Arnab; **Date issued:**March 2009

http://nma.berkeley.edu/ark:/28722/bk0005d0d84 (PDF) **Abstract:**We analyze the eigenvalues of the adjacency matrices of a wide variety of random trees. Using general, broadly applicable
arguments based on the interlacing inequalities for the eigenvalues of a principal submatrix of a Hermitian matrix and a suitable
notion of local weak convergence for an ensemble of random trees that we call probability fringe convergence, we show that
the empirical spectral distributions for each of a number of random tree models converge to a deterministic (model dependent)
limit as the number of vertices goes to infinity. Moreover, the masses assigned by the empirical spectral distributions
to individual points also converge in distribution to constants. We conclude for ensembles such as the linear preferential
attachment models, random recursive trees, and the uniform random trees that the limiting spectral distribution has a set
of atoms that is dense in the real line. We obtain precise asymptotics on the mass assigned to zero by the empirical spectral
measures via the connection between the number of zero eigenvalues of the adjacency matrix of a tree and the cardinality of
a maximal matching on the tree. In particular, we employ a simplified version of an algorithm due to Karp and Sipser to construct
maximal matchings and understand their properties. Moreover, we show that the total weight of a weighted matching is asymptotically
equivalent to a constant multiple of the number of vertices when the edge weights are independent, identically distributed,
non-negative random variables with finite expected value. We greatly extend a celebrated result obtained by Schwenk for the
uniform random trees by showing that if any ensemble converges in the probability fringe sense and a very mild further condition
holds, then, with probability converging to one, the spectrum of a realization is shared by at least one other (non-isomorphic)
tree. For the the linear preferential attachment model with parameter $a > -1$, we show that for any fixed $k$ the $k$ largest
eigenvalues jointly converge in distribution to a non-trivial limit when rescaled by $n^{1/2\gamma_a}$, where $\gamma_a =
a+2$ is the Malthusian rate of growth parameter for an associated continuous time branching process.**Keyword note:**Bhamidi__Shankar Evans__Steven_N Sen__Arnab**Report ID:**771**Relevance:**100

**Title:**Hierarchical Bayesian nonparametric models with applications**Author(s):**Teh, Yee Whye; Jordan, Michael I.; **Date issued:**January 2009

http://nma.berkeley.edu/ark:/28722/bk0005d0f73 (PDF) **Abstract:**Hierarchical modeling is a fundamental concept in Bayesian statistics. The basic idea is that parameters are endowed with
distributions which may themselves introduce new parameters, and this construction recurses. A common motif in hierarchical
modeling is that of the conditionally independent hierarchy, in which a set of parameters are coupled by making their distributions
depend on a shared underlying parameter. These distributions are often taken to be identical, based on an assertion of exchangeability
and an appeal to de Finetti's theorem. In this review we discuss a thoroughgoing exploitation of hierarchical modeling ideas
in Bayesian nonparametric statistics. The basic idea is that rather than treating distributional parameters parametrically,
we treat them nonparametrically. In particular, the base measure $G_0$ in the Dirichlet process can itself be viewed as
a random draw from some distribution on measures---specifically it can be viewed as a draw from the Dirichlet process. This
yields a natural recursion that we refer to as a hierarchical Dirichlet process. Our focus in this chapter is on nonparametric
hierarchies of this kind, where the tools of Bayesian nonparametric modeling are used recursively.**Keyword note:**Teh__Yee_Whye Jordan__Michael_I**Report ID:**770**Relevance:**100

**Title:**Hyperdeterminantal point processes**Author(s):**Evans, Steven N.; Gottlieb, Alex; **Date issued:**February 2009

http://nma.berkeley.edu/ark:/28722/bk0005d0839 (PDF) **Abstract:**As well as arising naturally in the study of non-intersecting random paths, random spanning trees, and eigenvalues of random
matrices, determinantal point processes (sometimes also called fermionic point processes) are relatively easy to simulate
and provide a quite broad class of models that exhibit repulsion between points. The fundamental ingredient used to construct
a determinantal point process is a kernel giving the pairwise interactions between points: the joint distribution of any number
of points then has a simple expression in terms of determinants of certain matrices defined from this kernel. In this paper
we initiate the study of an analogous class of point processes that are defined in terms of a kernel giving the interaction
between 2M points for some integer M. The role of matrices is now played by 2M-dimensional "hypercubic" arrays, and the determinant
is replaced by a suitable generalization of it to such arrays – Cayley’s first hyperdeterminant. We show that some of the
desirable features of determinantal point processes continue to be exhibited by this generalization.**Keyword note:**Evans__Steven_N Gottlieb__Alex**Report ID:**749**Relevance:**100

**Title:**Mutation-selection balance with recombination: convergence to equilibrium for polynomial selection costs**Author(s):**Clayton, Aubrey; Evans, Steven N.; **Date issued:**February 2009

http://nma.berkeley.edu/ark:/28722/bk0005d0g2v (PDF) **Abstract:**We study a continuous-time dynamical system that models the evolving distribution of genotypes in an infinite population where
genomes may have infinitely many or even a continuum of loci, mutations accumulate along lineages without back-mutation, added
mutations reduce fitness, and recombination occurs on a faster time scale than mutation and selection. Some features of the
model, such as existence and uniqueness of solutions and convergence to the dynamical system of an approximating sequence
of discrete time models, were presented in earlier work by Evans, Steinsaltz, and Wachter for quite general selective costs.
Here we study a special case where the selective cost of a genotype with a given accumulation of ancestral mutations from
a wild type ancestor is a sum of costs attributable to each individual mutation plus successive interaction contributions
from each $k$-tuple of mutations for $k$ up to some finite "degree". Using ideas from complex chemical reaction networks and
a novel Lyapunov function, we establish that the phenomenon of mutation-selection balance occurs for such selection costs
under mild conditions. That is, we show that the dynamical system has a unique equilibrium and that it converges to this
equilibrium from all initial conditions.**Keyword note:**Clayton__Aubrey Evans__Steven_N**Report ID:**738**Relevance:**100