[DEMO] COVID-19 Data Analysis
Tracking pandemic trends through data visualization and statistical analysis
COVID-19 Analysis Project
This project examines various aspects of the COVID-19 pandemic through data analysis and visualization. Our goal is to understand the spread patterns, mortality rates, and vaccination progress.
Case Analysis
The following analysis examines COVID-19 case data, tracking cumulative cases, daily new cases, and analyzing the correlation between cases and deaths.
Jupyter Notebook
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
# Set the aesthetic style of the plots
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
# Sample COVID-19 data (simplified)
dates = pd.date_range(start='2022-01-01', periods=90)
cases = np.cumsum(np.random.poisson(lam=[30] * 30 + [20] * 30 + [10] * 30))
deaths = np.cumsum(np.random.poisson(lam=[2] * 30 + [1.5] * 30 + [0.8] * 30))
recovered = np.cumsum(np.random.poisson(lam=[10] * 20 + [25] * 40 + [35] * 30))
# Create a DataFrame
covid_data = pd.DataFrame({
'Date': dates,
'Cases': cases,
'Deaths': deaths,
'Recovered': recovered
})
# Display the first few rows
print("COVID-19 Data Sample:")
display(covid_data.head())
COVID-19 Data Sample:
Date | Cases | Deaths | Recovered | |
---|---|---|---|---|
0 | 2022-01-01 | 27 | 1 | 11 |
1 | 2022-01-02 | 55 | 4 | 23 |
2 | 2022-01-03 | 85 | 7 | 37 |
3 | 2022-01-04 | 117 | 9 | 44 |
4 | 2022-01-05 | 142 | 9 | 55 |
# Plot cumulative cases
plt.figure(figsize=(12, 6))
plt.plot(covid_data['Date'], covid_data['Cases'], color='#FF9999', linewidth=3, label='Cumulative Cases')
plt.title('Cumulative COVID-19 Cases', fontsize=16)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Number of Cases', fontsize=12)
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.savefig('covid_cases.png')
plt.show()
# Plot daily new cases
daily_cases = covid_data['Cases'].diff().fillna(covid_data['Cases'].iloc[0])
plt.figure(figsize=(12, 6))
plt.bar(covid_data['Date'], daily_cases, color='#FF6666', alpha=0.7, label='Daily New Cases')
plt.title('Daily New COVID-19 Cases', fontsize=16)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Number of New Cases', fontsize=12)
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.savefig('covid_daily_cases.png')
plt.show()
# Analyze correlation between cases and deaths
plt.figure(figsize=(8, 8))
sns.scatterplot(x='Cases', y='Deaths', data=covid_data, s=100, alpha=0.7)
plt.title('Correlation between Cases and Deaths', fontsize=16)
plt.xlabel('Cumulative Cases', fontsize=12)
plt.ylabel('Cumulative Deaths', fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('covid_correlation.png')
plt.show()
# Basic R0 calculation (simplified model)
import numpy as np
# Assume serial interval of 5 days
serial_interval = 5
# Calculate daily growth rate (simplified)
growth_rate = np.mean([daily_cases[i+serial_interval]/max(daily_cases[i], 1)
for i in range(len(daily_cases)-serial_interval)])
# Basic reproduction number (simplified)
r0 = growth_rate * serial_interval
print(f"Estimated basic reproduction number (R0): {r0:.2f}")
Estimated basic reproduction number (R0): 5.23
# Summary statistics
summary = covid_data[['Cases', 'Deaths', 'Recovered']].describe()
display(summary)
Cases | Deaths | Recovered | |
---|---|---|---|
count | 90.000000 | 90.000000 | 90.000000 |
mean | 1083.366667 | 77.622222 | 931.100000 |
std | 521.758122 | 41.954116 | 696.491078 |
min | 27.000000 | 1.000000 | 11.000000 |
25% | 680.750000 | 45.250000 | 263.000000 |
50% | 1181.000000 | 82.500000 | 845.500000 |
75% | 1547.000000 | 115.750000 | 1493.000000 |
max | 1763.000000 | 137.000000 | 2279.000000 |
# Calculate and plot Case Fatality Rate over time
covid_data['CFR'] = (covid_data['Deaths'] / covid_data['Cases']) * 100
plt.figure(figsize=(12, 6))
plt.plot(covid_data['Date'], covid_data['CFR'], color='#6666FF', linewidth=3)
plt.title('COVID-19 Case Fatality Rate Over Time', fontsize=16)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Case Fatality Rate (%)', fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('covid_cfr.png')
plt.show()
Vaccination Progress
Next, we analyze vaccination progress, examining daily vaccination rates, cumulative totals, and weekly patterns that emerged during the vaccination campaign.
R Markdown
COVID-19 Vaccination Analysis
Data Analyst
2023-06-01
COVID-19 Vaccination Progress
In this analysis, we’ll examine simulated COVID-19 vaccination data to understand patterns and progress.
# Generate sample data
set.seed(123)
dates <- seq(as.Date("2022-01-01"), as.Date("2022-03-31"), by = "day")
n_days <- length(dates)
# Simulate daily vaccination data
vaccination_data <- data.frame(
Date = dates,
DailyVaccinations = c(
rpois(30, lambda = 1000),
rpois(30, lambda = 2000),
rpois(n_days - 60, lambda = 1500)
)
)
# Calculate cumulative vaccinations
vaccination_data <- vaccination_data %>%
mutate(CumulativeVaccinations = cumsum(DailyVaccinations))
# View the first few rows
head(vaccination_data)
## Date DailyVaccinations CumulativeVaccinations
## 1 2022-01-01 982 982
## 2 2022-01-02 1037 2019
## 3 2022-01-03 946 2965
## 4 2022-01-04 1004 3969
## 5 2022-01-05 1054 5023
## 6 2022-01-06 1014 6037
Vaccination Trends
Let’s visualize the daily vaccination numbers:
ggplot(vaccination_data, aes(x = Date, y = DailyVaccinations)) +
geom_bar(stat = "identity", fill = "#4CAF50", alpha = 0.7) +
geom_smooth(method = "loess", color = "#2196F3", se = FALSE) +
labs(
title = "Daily COVID-19 Vaccinations",
x = "Date",
y = "Number of Vaccinations"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold"),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10)
)
Now, let’s look at the cumulative vaccination progress:
ggplot(vaccination_data, aes(x = Date, y = CumulativeVaccinations)) +
geom_line(color = "#E91E63", size = 1.5) +
labs(
title = "Cumulative COVID-19 Vaccinations",
x = "Date",
y = "Total Vaccinations"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold"),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10)
)
Vaccination Rate Analysis
Let’s calculate and visualize the 7-day rolling average of daily vaccinations:
vaccination_data <- vaccination_data %>%
mutate(
RollingAverage = zoo::rollmean(DailyVaccinations, k = 7, fill = NA)
)
ggplot(vaccination_data, aes(x = Date)) +
geom_bar(aes(y = DailyVaccinations), stat = "identity",
fill = "#4CAF50", alpha = 0.4) +
geom_line(aes(y = RollingAverage), color = "#FF5722", size = 1.5) +
labs(
title = "Daily Vaccinations with 7-Day Rolling Average",
x = "Date",
y = "Number of Vaccinations"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold"),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10)
)
Weekly Patterns
Let’s examine if there are patterns in vaccination rates by day of the week:
vaccination_data <- vaccination_data %>%
mutate(
Weekday = weekdays(Date)
)
# Reorder weekdays
vaccination_data$Weekday <- factor(
vaccination_data$Weekday,
levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
)
ggplot(vaccination_data, aes(x = Weekday, y = DailyVaccinations)) +
geom_boxplot(fill = "#9C27B0", alpha = 0.7) +
labs(
title = "Vaccination Rates by Day of Week",
x = "Day of Week",
y = "Number of Vaccinations"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold"),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10),
axis.text.x = element_text(angle = 45, hjust = 1)
)
Conclusion
This analysis demonstrates the vaccination trends over the first quarter of 2022. We can observe that:
- Vaccination rates increased significantly in February compared to January
- There appears to be a weekly pattern with lower rates on weekends
- By the end of March, we reached approximately 135,148 total vaccinations
Key Findings
Based on our analysis, we can draw several important conclusions:
- The case fatality rate decreased over time, suggesting improvements in treatment protocols.
- Vaccination rates showed clear weekly patterns with lower rates on weekends.
- The estimated basic reproduction number (R0) provides insights into the virus’s transmissibility.
Next Steps
Future analysis will incorporate demographic data to understand risk factors and vaccine equity issues. We also plan to analyze long-term trends as more data becomes available.