Statistics 1

Basic Information

Instructor : Siva Athreya .
Email : athreya@isibang.ac.in
Classes : Tuesday and Thursday: 11:10am -1:10pm
Scoring : Final Exam: 50%, Midterm Exam: 20%, Quizzes: 15%, In-class Worksheets: 10%, and Homework: 5%.

References

Mathematical Statistics and Data Analysis by John A. Rice
Stat Labs: Mathematical Statistics Through Applications by Deborah Nolan and Terry P. Speed [Stat Labs Website] [ Stat Labs Data]
Statistics: The Art and Science of Learning from Data by by Alan Agresti, Christine A. Franklin, Bernhard Klingenberg
Using R for Introductory Statistics by John Verzani
simpleR – Using R for Introductory Statistics by John Verzani

Weekly Schedule

	Week 1	Week 2	Week 3	Week 4	Week 5	Week 6	Week 7
	Week 8	Week 9	Week 10	Week 11	Week 12	Week 13	Week 14

Week 1 : July 25th and July 30th

- R- Basics, In built functions, Slicing, Logical Operators, Creating sequence of vectors
- Working with Ubuntu
- Data: Categorical, Discrete Numeric, Continuous Numeric.
- Plotting Bar Charts in R.
- Dropbox usage and working with Rnw Files.
Articles:-
- Ross Ihaka and Robert Gentlemen on making R R: A language for Data Analysis and Graphics from S.
- Oft referenced : The first classification of data by On the Theory of Scales of Measurement By S. S. Stevens
From the past, Writing of Mathematics 2018, $\LaTeX$ Information:

Week 2 : August 1 and August 6

Quiz 1 and Homework 2

- Key Features of Numeric Data: Centre, Spread and Shape; Five number Summary, IQR, Outliers.
- R- Scan function, Mean/trimmed Mean
- Plotting Histogram (along with options), Stem-Leaf plot.
- Installing Packages (external) and using Datasets in R
- Scatter Plot in R.
- Dice Experiment: Simulated uniform {1,2,3,4,5,6}, Generated 300 samples : of Sums of 5 uniform random variables. Plotted Histogram.

Week 3 : August 8 and August 13

Quiz 2 and Homework 3

- Density Plot : how R does density estimates, choice of bandwidths.
- Box plot along with options.
- Transformation of data.

Week 4 : August 15th (holiday) and August 20th

Quiz 3 and Homework 4

- Discussed Study on Maternal Smoking and Infant Mortality.
- Normal Distribution: calculating pdf, cdf (Tail probabilities) and generating samples in R
- Checks to see if Data is Normal:68-95-99.7 rule; Kurtosis and Skewness; Normal Q-Q plot.
- Understanding Probabilistic Statement and implications at experiment.
Reading In Class Worksheet: chapter one from
Stat Labs: Mathematical Statistics Through Applications by Deborah Nolan and Terry P. Speed [Stat Labs Website] [ Stat Labs Data]

Week 5: August 22nd and August 27th

August 22nd, 2019 - slides.

- Skewed Left, Symmetric and Skewed Right distributions.
- Using Normal Q-Q plot to understand Skewness
- Generating samples from Beta and understanding Skewness.
- Empirical Distribtion, sample mean as consistent and unbiased estimator of true mean.
- Sampling Distributions, Law of Large Numbers, Central Limit Theorem.
- Confidence Intervals.
Reading for Homework 5: chapter one from
Stat Labs: Mathematical Statistics Through Applications by Deborah Nolan and Terry P. Speed [Stat Labs Website] [ Stat Labs Data]
August 27th, 2019 Class Board Photos.
Stirling's Formula and DeMoivre-Laplace Central Limit Theorem by Marton Balazs and Balint Toth

Week 6 : August 29th and September 3rd

Homework 6

- Topic wise(notes+ slides):-
- Bivariate Data
- Simple Linear Regression
- In R : Bar charts of Bivariate Data, lm function for simple linear Regression.
- In R : Simulating Normal data and observation of linear relationship for the conditional mean.
Video: Central Limit Theorem and the Normal Distribution

Week 7: September 5th and September 17th

Midterm Sep 6th- Solution

- In R : Simulating Law of Large Numbers
- Using optim function in R to calculate absolute deviation line.
- Working with Temperature Data and Belgium calls data.
- Comparing with robust linear regression line using rlm function.
- Simple Linear Regression: Estimator for error variance, Distribution of slope estimator.
Sep 17th Lecture by Rajesh Sundaresan
Slides
Rajesh Sundaresan is from the ECE department and the Robert Bosch Centre for Cyber-Physical Systems at the Indian Institute of Science. He works in the areas of communication, computation, and control over networks.

Perceptual Distance and Visual Search: We will discuss a visual neuroscience experiment designed to quantify the similarity between two objects as perceived by human subjects. As an example, a chair with an arm rest and a chair without an arm rest are are objects that are more similar to each other than a chair with an arm rest and a table. The quantification involves attaching numbers to the similarities. We will then discuss two models for the similarity and will study how to compare them. In the process we will learn about the gamma distribution, its parameters, an equality of means test for the gamma distribution, a suitable statistic for this test, and how this statistic can be used to compare the two models.

Week 8: September 19th and September 24th

Homework 7

- Sampling distributions : $\chi^2$, $F$, and $t$.
- Confidence intervals under Normality assumption: Variance known and unknown cases. [Notes]
- Hypothesis Testing: Null, Alternate, Level of Significance and p-value. [Notes]
- Comparing means from two populations
- Testing for group means, Analysis of Variance (introduction)
- Notes
Sep 24th Lecture by Kalyani Ramachandran
Slides
Kalyani Ramachandran is an entrepreneur running a startup in diagnostics. She has a PhD in Molecular Biology and Human Genetics from John Hopkins University, and has obtained a grant from Government of India for R & D of startup.
- Title: Null hypothesis rejection: experimental data analysis
- Abstract:Hypothesis testing specifically rejection of null hypothesis is a commonly used statistical inference methodology in scientific experimental data. In healthcare, it is used in many applications including drug clinical trials etc. The p-value at a significance level above the set threshold gives us confidence to reject the null hypothesis. One should check the assumptions both statistical and experimental ones to interpret the data correctly.

Week 9 : September 26th and October 1st

- Testing in R: understanding $t$ distribution versus Normal distribution, prop.test in R, performing z-test in R by explicitly writing the function, and numerically performing $t$ test.
- One way Analysis of Variance: testing group means are equal or not for scaled search times from Rajesh Sundaresan Page 17 Slides
Sep 27th Lecture by Rajeeva L. Karandikar
Slides
Rajeeva L. Karandikar is currently director of Chennai Mathematical Institute (for last 9 years). He is a Probabilist, who has been active on the applications of statistical ideas to real life problems. In particular he has been involved with opinion polls for Indian parliamentary and state assembly elections over last 2 decades. Earlier, he had a long stint at ISI, as a student (M Stat & Ph D) at ISI Kolkata and faculty at ISI Delhi for over two decades.
- Title: What determines accuracy of inference based on Sampling: Sampling fraction or Sample size
- The talk will focus on sampling and inference based on sample data. One important issue to be addressed is : how to arrive at a suitable sample size given the objective. Most people think of sample size as a proportion of the population and we will illustrate as to why that is not correct and what is relevant is the sample size and not sampling fraction.
Oct 1st Lecture by Rituparna Sen
Slides
Rituparna Sen obtained her PhD in statistics from University of Chicago after completing BStat and MStat from ISI. She taught at University of California Davis before joining ISI.
- Title: Stylized facts of the Indian Stock Market
- Abstract: Stylized facts are properties that are common across various markets and time domains. These properties offer a way to generalize stock price behavior irrespective of the instruments used. Lists of several such stylized facts are available in the literature for the developed western markets. In this talk, we'll present the analysis of historical daily data for eleven years of the fifty constituent stocks of the NIFTY index traded on the National Stock Exchange to check for the stylized facts. It is observed that while some stylized facts of other markets are also true in Indian markets, there are some significant deviations.
- Related articles:
  - Stylized Facts of the Indian Stock Market by Rituparna Sen and Manavathi Subramaniam and
  - Empirical properties of asset returns:stylized facts and statistical issues by Rama Cont

Week 10: October 3rd and October 8th (holiday)

Oct 3rd- Board Photos.

Bannana Muffin Challenge notes
- Method of Moments Estimator
- Maximum Likelihood Estimator
- Chapter 9, Section 9-9.3
  from Probability and Statistics with Examples using R
  Siva Athreya, Deepayan Sarkar, and Steve Tanner Version: April 25th, 2016

Week 11: October 10th and October 15th

- Erdos Renyi Graphs.
- M.L.E. for connection Probabilities
October 10th Lecture by Dootika Vats
Slides
Dootika did her undergraduate in Mathematics from Lady Shri Ram College, Delhi University, Masters in Statistics from Rutgers University, and PhD in Statistics from the University of Minnesota, Twin-Cities. She then did a two year postdoc at the University of Warwick in England, before joining as an Assistant Prof in the Department of Mathematics and Statistics at IIT Kanpur in July 2019. She works in the area of Bayesian computation and specifically on Markov chain Monte Carlo algorithms.

We will introduce the concept of Monte Carlo simulations and Monte Carlo integration from both a statistical perspective, and from it's utility in practically relevant problems. As with most things in statistics, the driving question in many Monte Carlo integration systems is the problem of quantifying variability. We will discuss the challenges in quantifying variability and constructing confidence intervals for multidimensional Monte Carlo problems, discussing the problem of multiple comparisons.
- $\chi^2$-goodness of fit test.

Week 12 : October 17th and October 22nd

Quiz 10 [ Solutions]
Homework 11

- $\chi^2$-test for independence.
- Testing for slope in Simple Linear Regression.
- Back to Basics: understanding Correlation
- Linear Regression

Week 13: October 24th and October 29th

Class Board Photos :
(October 24th) Set 1 Set 2
Class Board Photos :
(October 29th) Photos

Quiz 11 [ Solutions]
Homework 12

- October 24th, 2019: Simulating Random Variables from Uniform $(0,1)$
- October 29th, 2019: Bootstrap and Jackknife methods.
October 24th Lecture by Arjun Gopalaswamy
Slides
After completing his B.E in Industrial Engineering and Management from BMS College of Engineering, Bangalore University, he changed tracks and went on to do his Master's in Wildlife Ecology and Conservation at the University of Florida, Gainesville with a Minor in Statistics. He went on to do his D.Phil(PhD) at the University of Oxford. He is currently the Science Advisor, Global Programs, Wildlife Conservation Society (USA) and carries out research projects on iconic wildlife, such as tigers, lions, cheetahs and elephants in Asia and Africa. He specializes in the field of statistical ecology and is involved in developing innovative statistical models to help understand wildlife populations better.

The fields of ecology and the environmental sciences restricts our use of hypothesis testing for drawing meaningful inference. A primary reason for this is because we rarely conduct "experiments" in these disciplines, because we simply cannot due to the primary issue of scale. In this lecture, I will talk about how this fundamental issue of scale called for a change in how ecologists practice their craft and how statisticians (and "statistical ecologists") helped them do so. In the process we will discuss how the practice forced a shift from hypothesis testing paradigms to to hypothesis discrimination discrimination paradigms using model selection and how hierarchical models replaced conventional models for experiments with real data. As I work on issues related to wildlife, I will draw some examples from wildlife ecology and conservation and will discuss one such likelihood (called occupancy models) that hugely benefited wildlife applications and extended to a range of other disciplines.
October 29th Lecture by V. Venugopal
Slides
Trained as a civil engineer (specialising in Hydrology), Venu is at the Centre for Atmospheric and Oceanic Sciences, IISc. His present interests are to quantify and understand the space-time characteristics of tropical rain and its extremes.

We will try to quantify/understand patterns of tropical rain using satellite-retrievals. We will begin with the question: At any given instant in time, what fraction of the tropics receives rain? We will then try to build a story around that to identify a few "intuitively appealing" statistical attributes of rain. Time permitting, we will see how the (probability) distribution of rain behaves as we aggregate in space and/or time.

Week 14: October 31st and November 5th

- Using Random Number Table.
- Simulating Geometric and Binomial data.
- Worksheet on:
  - $\chi^2$ goodness of fit,
  - Maximum Likelihood Estimator, and
  - Bootstrap

Final Exam week

Final Exam- Solution

Last Modified: October 14th, 2019.

Courses Page

Teaching Page