Education

Publications

Rejoinder to discussion of the paper “Human life is unlimited – but short”

##### Extremes, Vol. 21, No. 2, pp. - to appear. | 2018

##### Authors: Rootzén, H. and Zholud, D.

## Summary

What can be learned from data about human survival at extreme age? In this rejoinder we give our views on some of the issues raised in the discussion of our paper Rootzén and Zholud (2017).

Human life is unlimited - but short

##### Extremes, Vol. 20, No. 4, pp. 713-728. | 2017

##### Authors: Rootzén, H. and Zholud, D.

## Summary

Does the human lifespan have an impenetrable biological upper limit which ultimately will stop further increase in life lengths? This question is important for understanding aging, and for society, and has led to intense controversies. Demographic data for humans has been interpreted as showing existence of a limit, or even as an indication of a decreasing limit, but also as evidence that a limit does not exist. This paper studies what can be inferred from data about human mortality at extreme age. We show that in western countries and Japan and after age 110 the probability of dying is about 47% per year. Hence there is no finite upper limit to the human lifespan. Still, given the present stage of biotechnology, it is unlikely that during the next 25 years anyone will live longer than 128 years in these countries. Data, remarkably, shows no difference in mortality after age 110 between sexes, between ages, or between different lifestyles or genetic backgrounds. These results, and the analysis methods developed in this paper, can help testing biological theories of ageing and aid confirmation of success of efforts to find a cure for ageing.

Tail Estimation for Window Censored Processes

##### Technometrics, Vol. 58, No. 1, pp. 95-103. | 2016

##### Authors: Rootzén, H. and Zholud, D.

## Summary

This paper develops methods to estimate the tail and full distribution of the lengths of the 0-intervals in a continuous time stationary ergodic stochastic process which takes the values 0 and 1 in alternating intervals. The setting is that each of many such 0-1 processes have been observed during a short time window. Thus the observed 0-intervals could be non-censored, right censored, left censored or doubly censored, and the lengths of 0-intervals which are ongoing at the beginning of the observation window have a length-biased distribution. We exhibit parametric conditional maximum likelihood estimators for the full distribution, develop maximum likelihood tail estimation methods based on a semi-parametric generalized Pareto model, and propose goodness of fit plots. Finite sample properties are studied by simulation, and asymptotic normality is established for the most important case. The methods are applied to estimation of the length of off-road glances in the 100-car study, a big naturalistic driving experiment.

Efficient estimation of the number of false positives in high-throughput screening

##### Biometrika, Vol. 102, No. 3, pp. 695-704. | 2015

##### Authors: Rootzén, H. and Zholud, D.

## Summary

This paper develops new methods to handle false positives in High-Throughput Screening experiments. The setting is very highly multiple testing problems where testing is done at extreme significance levels and with low degrees of freedom, and where the true null distribution may differ from the theoretical one. We answer the question 'How many of the positive test results are false?' by showing that the conditional distribution of the number of false positives, given that there is in all r positives, approximately has a binomial distribution, and find efficient estimators for its success probability parameter. Furthermore we provide efficient methods for estimation of the true null distribution resulting from a preprocessing method, and techniques to compare it with the theoretical null distribution. Analysis is based on a simple polynomial model for the tail of the distribution of p-values. We provide asymptotics which motivate this model, exhibit properties of estimators of the parameters of the model, and point to model checking tools. The methods are tried out on two large genomic studies and on an fMRI brain scan experiment.

Tail approximations for the Student t-, F-, and Welch statistics for non-normal and not necessarily i.i.d. random variables

##### Bernoulli, Vol. 20, No. 4, pp. 2102-2130. | 2014

##### Authors: Zholud, D.

## Summary

We present a detailed study of the asymptotic behavior of the distribution of the tails of these, perhaps, most commonly used statistical tests under non-standard conditions, that is, releasing the underlying assumptions of normality, independence and identical distribution and considering a more general case where one only assumes that the vector of data has a continuous joint density. We determine asymptotic expressions for P(T > u) as u tends to infinity for this case. The approximations are particularly accurate for small sample sizes and may be used, for example, in the analysis of High-Throughput Screening experiments, where the number of replicates can be as low as two to five and often extremely high significance levels are used. We give numerous examples and complement our results by a thorough investigation of the convergence speed - both theoretically, by deriving exact bounds for absolute and relative errors of the approximations, and by means of a simulation study.

Extreme Value Analysis of Huge Datasets: Tail Estimation Methods in High-Throughput Screening and Bioinformatics

##### PhD Thesis, University of Gothenburg. ISBN: 978-91-628-8354-6. | 2011

##### Authors: Zholud, D.

## Summary

The thesis presents results in Extreme Value Theory with applications to High-Throughput Screening and Bioinformatics. The methods described in the thesis, however, are applicable to statistical analysis of huge datasets in general. The main results are covered in four papers.

The first paper develops novel methods to handle false rejections in High-Throughput Screening experiments where testing is done at extreme significance levels, with low degrees of freedom, and when the true null distribution may differ from the theoretical one. We introduce efficient and accurate estimators of False Discovery Rate and related quantities, and provide methods of estimation of the true null distribution resulting from data preprocessing, as well as techniques to compare it with the theoretical null distribution. Extreme Value Statistics provides a natural analysis tool: a simple polynomial model for the tail of the distribution of p-values. We exhibit the properties of the estimators of the parameters of the model, and point to model checking tools, both for independent and dependent data. The methods are tried out on two large scale genomic studies and on an fMRI brain scan experiment.

The second paper gives a strict mathematical basis for the above methods. We present asymptotic formulas for the distribution tails of, probably, the most commonly used statistical tests, under non-normality, dependence, and non-homogeneity, and derive bounds for the absolute and relative errors of the approximations.

In papers three and four we study high-level excursions of the Shepp statistic for the Wiener process and for a Gaussian random walk. The application areas include finance and insurance, and sequence alignment scoring and database searches in Bioinformatics.

The first paper develops novel methods to handle false rejections in High-Throughput Screening experiments where testing is done at extreme significance levels, with low degrees of freedom, and when the true null distribution may differ from the theoretical one. We introduce efficient and accurate estimators of False Discovery Rate and related quantities, and provide methods of estimation of the true null distribution resulting from data preprocessing, as well as techniques to compare it with the theoretical null distribution. Extreme Value Statistics provides a natural analysis tool: a simple polynomial model for the tail of the distribution of p-values. We exhibit the properties of the estimators of the parameters of the model, and point to model checking tools, both for independent and dependent data. The methods are tried out on two large scale genomic studies and on an fMRI brain scan experiment.

The second paper gives a strict mathematical basis for the above methods. We present asymptotic formulas for the distribution tails of, probably, the most commonly used statistical tests, under non-normality, dependence, and non-homogeneity, and derive bounds for the absolute and relative errors of the approximations.

In papers three and four we study high-level excursions of the Shepp statistic for the Wiener process and for a Gaussian random walk. The application areas include finance and insurance, and sequence alignment scoring and database searches in Bioinformatics.

Extremes of Shepp statistics for Gaussian random walk

##### Extremes, Vol. 12, No. 1, pp. 1-17. | 2009

##### Authors: Zholud, D.

## Summary

We derive asymptotic behavior of the probability of high-level excursion for the maximal increment of a Gaussian random walk. The motivation for writing this paper comes from the problem of finding similarities between long biological sequences in Bioinformatics, however the result might also have suitable applications in other areas such as e.g. finance and insurance.

Extremes of Shepp statistics for the Wiener process

##### Extremes, Vol. 11, No. 4, pp. 339-351. | 2008

##### Authors: Zholud, D.

## Summary

We derive asymptotic behavior of the probability of high-level excursion for the maximal increment of the Wiener process. The result is essential for deriving the corresponding asymptotic formula for maximal increments of a Gaussian random walk, and also has potential applications in finance and insurance.

On the limit distribution of multiscale test statistics for nonparametric curve estimation

##### Mathematical Methods of Statistics, Vol. 15, No. 1, pp. 20-25. | 2006

##### Authors: Dumbgen, L., Piterbarg, V.I. and Zholud, D.

## Summary

We prove continuity of the limit distribution function of certain multiscale test statistics which are used in nonparametric curve estimation, e.g. in testing qualitative hypotheses (about an unknown regression function) such as nonpositivity, monotonicity or concavity.

© Dmitrii Zholud 2014