About this book

We believe that many foundational ideas of Probability and Statistics are best understood when their natural connection is emphasised. We feel that the interested student should learn the mathematical rigour of Probability, the motivating examples and techniques from Statistics, and an instructive technology to perform computations relating to both in an inclusive manner. These formed our main motivations for writing this book. We have chosen to use the R software environment to demonstrate an available computational tool.

The book is intended to be an undergraduate text for a course on Probability Theory. We had in mind courses such as the one year (two semester) Probability course at many universities in India such as the Indian Statistical Institute or Chennai Mathematical Institue, or a one semester (or two quarter) Probability course as is commonly offered as an upper division, post-calculus elective at many North American universities. The Statistics material and the package R are introduced so as to emphasise motivations and applications of the probabilistic material. We assume that our readers are well-versed in calculus, have a basic understanding of the theory of sets and functions, combinatorics, and proof techniques, and have at least a passing awareness of the distinction between countable and uncountable infinities. We do not assume any particular experience of Linear Algebra or Real Analysis.

Using this book

The book is a work in progress. We are making draft chapters available for comments and feedback, which you may send by email to any of the authors below. You are free to use this book for educational purposes.

Suggested BibTeX citation:

@misc{AST-2016,
      AUTHOR = {Siva Athreya, Deepayan Sarkar, and Steve Tanner},
      TITLE = {Probability and Statistics with Examples using R},
      YEAR = {2016},
      NOTE = {Unfinished Book, Last Compilation April 25th 2016, available 
      at \url{http://www.isibang.ac.in/~athreya/psweur/index.html}}}

Chapters

  • Table of contents

  • Preface

  • Chapter 1: Basic Concepts

    • 1.1 Defnitions and Properties
      • 1.1.1 Definitions
      • 1.1.2 Basic Properties
    • 1.2 Equally Likely Outcomes
    • 1.3 Conditional Probability and Bayes' Theorem
    • 1.4 Bayes' Theorem
    • 1.5 Independence
    • 1.6 Using R for computation
  • Chapter 2: Sampling and Repeated Trials

    • 2.1 Bernoulli Trials
      • 2.1.1 Using R to compute probabilities
    • 2.2 Poisson Approximation
    • 2.3 Sampling With and Without Replacement
      • 2.3.1 The Hypergeometric Distribution
      • 2.3.2 Hypergeometric Distributions as a Series of Dependent Trials
      • 2.3.3 Binomial Approximation to the Hypergeometric Distribution
  • Chapter 3: Discrete Random Variables

    • 3.1 Random Variables as Functions
      • 3.1.1 Common Distributions
    • 3.2 Independent and Dependent Variables
      • 3.2.1 Independent Variables
      • 3.2.2 Conditional, Joint, and Marginal Distributions
      • 3.2.3 Memoryless Property of the Geometric Random Variable
      • 3.2.4 Multinomial Distributions
    • 3.3 Functions of Random Variables
      • 3.3.1 Distribution of $f(X)$ and $f(X_1, X_2, \dots , X_n)$
      • 3.3.2 Functions and Independence
  • Chapter 4: Summarizing Discrete Random Variables

    • 4.1 Expected Value
      • 4.1.1 Properties of the Expected Value
      • 4.1.2 Expected Value of a Product
      • 4.1.3 Expected Values of Common Distributions
      • 4.1.4 Expected Value of $f(X_1, X_2, \dots , X_n)$
    • 4.2 Variance and Standard Deviation
      • 4.2.1 Properties of Variance and Standard Deviation
      • 4.2.2 Variances of Common Distributions
      • 4.2.3 Standardized Variables
    • 4.3 Standard Units
      • 4.3.1 Markov and Chebyshev Inequalities
    • 4.4 Conditional Expectation and Conditional Variance
    • 4.5 Covariance and Correlation
      • 4.5.1 Covariance
      • 4.5.2 Correlation
    • 4.6 Exchangeable Random Variables
  • Chapter 5: Continuous Probabilities and Random Variables

    • 5.1 Uncountable Sample Spaces and Densities
      • 5.1.1 Probability Densities on $\mathbb R$
    • 5.2 Continuous Random Variables
      • 5.2.1 Common Distributions
      • 5.2.2 A word about individual outcomes
    • 5.3 Transformation of Continuous Random Variables
    • 5.4 Multiple Continuous Random Variables
      • 5.4.1 Marginal Distributions
      • 5.4.2 Independence
      • 5.4.3 Conditional Density
    • 5.5. Functions of Independent Random variables
      • 5.5.1 Distributions of Sums of Independent Random variables
      • 5.5.2 Distributions of Quotients of Independent Random varibles.
  • Chapter 6: Summarising Continuous Random Variables

    • 6.1 Expectation, and Variance
    • 6.2 Covariance, Correlation, Conditional Expectation and Conditional Variance
    • 6.3 Moment Generating Functions
    • 6.4 Bivariate Normals
  • Chapter 7: Sampling and Descriptive Statistics

    • 7.1 The empirical distribution
    • 7.2 Descriptive Statistics
      • 7.2.1 Sample Mean
      • 7.2.2 Sample Variance
      • 7.2.3 Sample proportion
    • 7.3 Simulation
    • 7.4 Plots
      • 7.4.1 Empirical Distribution Plot for Discrete Distributions
      • 7.4.2 Histograms for Continuous Distributions
      • 7.4.3 Hanging Rootograms for Comparing with Theoretical Distributions
      • 7.4.4 Q-Q Plots for Continuous Distributions
  • Chapter 8: Sampling Distributions and Limit Theorems

    • 8.1 Multi-dimensional continous random variables
      • 8.1.1 Order Statistics and their Distributions
    • 8.2 Distribution of Sampling Statistics from a Normal population
    • 8.3 Weak Law of Large Numbers
    • 8.4 Convergence in Distribution
    • 8.5 Central Limit Theorem
      • 8.6 Normal Approximation and Continuity Correction
  • Chapter 9: Estimation and Hypothesis Testing

    • 9.1 Notations and Terminology for Estimators
    • 9.2 Method of Moments
    • 9.3 Maximum Likelihood Estimate
    • 9.4 Confidence Intervals
      • 9.4.1 Confidence Intervals when the standard deviation $\sigma $ is known
      • 9.4.2 Confidence Intervals when the standard deviation $\sigma $ is unknown
    • 9.5 Hypothesis Testing
      • 9.5.1 The z-test: Test for sample mean when $\sigma $ is known
      • 9.5.2 The t-test: Test for sample mean when $\sigma $ is unknown
      • 9.5.3 A critical value approach
      • 9.5.4 The $\chi ^2$-test : Test for sample variance
      • 9.5.5 The two-sample z-test: Test to compare sample means
      • 9.5.6 The $F$-test: Test to compare sample variances
      • 9.5.7 A $\chi^2$-test for goodness of fit
  • Chapter 10: Linear Regression

    • 10.1 Sample Covariance and Correlation
    • 10.2 Simple Linear Model
    • 10.3 The Least Squares Line
    • 10.4 $a$ and $b$ as Random Variables
    • 10.5 Predicting New Data When $\sigma ^2$ is Known
    • 10.6 Hypothesis Testing and Regression
    • 10.7 Estimating an Unknown $\sigma ^2$

Appendix

This section is being written and we shall update it soon .

  • A Working with Data in R
    • A.1 Datasets in R
    • A.1 Plotting data
  • B Some mathematical details
    • B.1 Linear Algebra
    • B.2 Jacobian Method
    • B.3$\chi^2$-goodness of fit test
  • C Strong Law of Large Numbers
  • D Tables

Contact

Siva Athreya
Indian Statistical Institute
8th Mile Mysore Road
Bangalore, 560059
Email:athreya@isibang.ac.in
Deepayan Sarkar
Indian Statistical Institute
7 SJSS Marg
New Delhi, 110016
Email:deepayan@isid.ac.in
Steve Tanner
Eastern Oregon University
One University Boulevard
La Grande, OR 97850-2807
Email:stanner@eou.edu