Three months of Staring at Data

Abhiti, Nitya, and Siva

Plan of Talk

  • PARTI : Website

  • PART II: Karnataka in Focus

  • PART III: Effective Basic Reproduction Number and Dispersion


  • Tracking the COVID-19 infections

    1. Union Ministry of Health and Family Welfare.
    2. State of Karnataka : Daily Media Bulletins.

  • There are six aspects to the portal

    1. State Timelines

    2. Doubling time

    3. Exponential Growth

    4. Karnataka: Understanding from detailed Media Briefs

      • Trace History of Karnataka (Paused Yesterday)

      • Data Analysis: Age distribution, ICU utilisation, Testing

    5. Data Repository

State Time Line : Four Plots

  • Infected-Recovered-Deceased time line.

  • Infection time line in Log Scale.

  • Doubling time.

  • Tracking Exponential Growth.

State: Infected-Recovered-Deceased time line.

  • Stacked Bar plot
    • The red portion corresponds to the number of deceased cases,
    • the green corresponds to the recoveries and
    • the orange corresponds to the number of active cases.

State: Infection time line in Log Scale

  • Graph: Infection timeline of Karnataka on the log scale.
  • Reading: On 27th June, the graph is around 4, which means that there are around $10^4$ total infections

State : Tracking Exponential Growth (Karnataka).

  • Exponential growth will follow a straight line with slope 1 on this graph.

Exponential Growth :

  • Graph:
    - $y$-axis: A moving average of increase in cases over 7 days around a given day
    - $x$-axis: Total number of cases
  • The rate of exponential growth is seen in terms of the spacing of the points: Spaced further away imply the rate of growth is larger and closer together imply the rate of growth is smaller.

  • Detailed Notes are available on our website.


State : Doubling time.

  • Doubling time:

    How long ago the infection count was half of the current count.

  • Reading:

    In Lockdown Phase 2, doubling time increases but at the end of Lockdown Phase 3 and 4, doubling time dips sharply.

Doubling time

Data Sources

  • All India, States and Union Territories Timeline : Union Ministry of Health and Family Welfare.

  • Karnataka : State Union Ministry of Health and Family Welfare, Media Briefs.

  • Tamil Nadu and Kerala : State Union Ministry of Health and Family Welfare, Media Briefs.

  • Maharashtra : National Institute of Disaster Management.

  • All are provided in html or pdf format

Data Repository:

  • All files are in csv format

    • Summary for All India

    • Karnataka Trace History

    • Karnataka Hospitalization information

    • Karnataka Hospitalization information - Consolidated

    • Karnataka Testing information

    • District-Wise Information (of 5 states)

      -- Managed entirely by Ms. Kusuma, N.R. and Ms. Asha Latha at the Indian Statistical Institute, Bangalore centre.

Publically available and open invitation for usage.


--Karnataka In Focus

Media Bulletins of Karnataka Government

Contain the following:

  • Testing Total tests and positives and Screening : At air and sea ports.

  • Test positive Cases : Date, the age, sex, district and either reason for contracting the disease or the reason for being tested.

  • Discharge Details: Recovered and deceased

  • Hospital Information: Consolidated information on number of patients in ICU

Age Distribution

  • Orange color indicates the distribution of the coronavirus patients in Karnataka
  • and the navy outline represents the Age distribution of the entire population of Karnataka

Testing Data

  • The graph shows the percentage of total number of positives tests upto that day over the culumative tests done upto that day.

I.C.U. utilisation

  • This is a timeline of the number of patients in I.C.U.

Trace History Clusters (Media Bulletin Reason)

TJ Congregation in Delhi Influenza like illness
From USA Severe Acute Respiratory Infections
From South America Unknown
From Rest of Europe Others
From Middle East Containment Zones
From United Kingdom Pharmaceutical company at Nanjangud
From Rajasthan From Southern States
From Maharashtra From Gujarat

Age versus Days to Cure: Scatter Plot

  • $x$-axis: Days between their being confirmed positive with COVID-19 and their recovery
  • $y$-axis: Age.
  • The colors represent their cluster of origin.

Age versus Days to Deceased

  • $x$-axis: Days taken to succumb to the disease
  • $y$-axis: Age.
  • The colors represent their cluster of origin.
  • Deceased on Day 0: Tested positive posthumously.

Trace History

We created graphical representation of Karnataka Trace History via a tree diagram.

  • The first generation nodes are called as parents of the cluster.

  • The children are the people who contracted the disease from the people labelled as parents, placed at depth two in the trace history chart.

  • Similarly, grandchildren and great grandchildren have depth three and four respectively.


Parent-Children-Grandchildren— From Maharashtra


Active-Cured-Deceased—- From Maharashtra



Effective Basic Reproduction Number and Dispersion

  • Work in Progress:
      • 3 months of Contact Tracing data available.
      • Clusters can be seen to grow and die out.
      • Contact tracing, Testing and Quarantine Benefits are seen clearly.

20-80 Rule

  • For most infectious diseases, most of the new infections are assigned:

    • to very few individuals and

    • most infected individuals cause almost no infections.

  • Karnataka data for May 3rd and Descendants

    • $864$ patients observed by us here,

    • $683$ of then caused $0$ new infections.

  • 20% of cases cause 80% of transmission

20-80 Rule: 3-May along with their descendants.

plot of chunk unnamed-chunk-2

  • We order these individuals with regards to their individual infectiousness .

  • Consider the top $x$ fraction of these infections, and they cause $y$ fraction of the total infections (say).

  • $(x,y)$ has been plotted here.

The above calculations are preliminary and are being verified.

Basic Reproduction Number

  • Central to understanding of epidemic spread,

    is the basic reproductive number,$R_0$,

    which is defined as the mean number of infections caused by an infected individual in a susceptible population
  • We will analyse cluster data

    • For each cluster we have precise information of how the infection was assigned.

    • Compute the distribution of number of children per infected person.

    • Compute Mean of distribution = $R_0$

  • $R_0$ as a measure of the spread of the disease ?

    • Recall 20-80 Rule

    • Must take note of Variance.

Summary Table

Cluster Size Zeros Maximum R0 Variance
Unknown 1195 1080 27 0.2686 1.729
Pharmaceutical Company 73 53 24 0.726 8.757
From the Southern States 271 243 7 0.214 0.68
Others 670 564 51 0.5299 6.817
TJ Congregation 97 70 15 0.7732 3.823
SARI 717 599 45 0.6862 8.822
ILI 1211 1072 30 0.3543 2.553
Containment Zones 290 252 7 0.3414 1.201
The above calculations are preliminary and are being verified.

Children Histogram for Containment Zones Cluster

  • $R_0= 0.3414$, Variance = $1.201$

plot of chunk unnamed-chunk-4 . . .

  • Stochastic effects in Transmission, one considers

    • Mixture of Poisson with Gamma

    • Negative Binomial with Mean $R_0$ and Dispersion $k$

The above calculations are preliminary and are being verified.

Maximum Likelihood Method.

  • Use log likelihood function of Negative Binomial given the data

  • Find the most likely value for $R_0$ and $k$ given the data

  • Calculus and Numerical approximation.

Containtment Zone Cluster Estimates

  • $R_0= 0.3414$

  • $k = 0.09345$

The above calculations are preliminary and are being verified.

$\chi^2$ - Goodness of Fit

plot of chunk unnamed-chunk-5

  • We performed the $\chi^2$ goodness of fit test and found $p$-value = $0.5966$.
The above calculations are preliminary and are being verified.

All Clusters Summary

plot of chunk unnamed-chunk-6

The above calculations are preliminary and are being verified.

Inference Summary Table

Cluster Size Maximum R0 Variance k p-value
Containment Zones 293 7 0.3447 1.199 0.09345 0.5966
ILI 1398 30 0.3369 2.355 0.06428 0.736
SARI 746 45 0.6743 8.521 0.08023 0.4698
TJ Congregation 97 15 0.7732 3.823 0.2138 0.1138
Others 707 51 0.5191 6.502 0.09214 0.8318
From the Southern States 286 7 0.2168 0.6897 0.08424 0.5409
Pharmaceutical Company 73 24 0.726 8.757 0.1839 0.002671
Unknown 1295 27 0.2625 1.637 0.05792 0.1225
Total 4895 51 0.4029 3.682 0.07344 0.02011
The above calculations are preliminary and are being verified.

Final Comments

Thanks to Karnataka Government Staff

  • For providing detailed media briefs from March 9th, 2020.
  • June 27th Media Briefs, had 918 new cases, and did not have contact tracing

  • Hopefully they will start again.

Positives of above effort

  • Effective $R_0 < 1$, the cluster will die out.

  • Clearly shows effect of contact tracing and quarantine measures.

  • Dispersion $k$ can be used to understand super spreading events.

Thank you

Thanks for attending the talk

Visit Website and Please use the Data

Give us Feedback