Test Questions and Analysis
Test Questions and Analysis
# Group the data by this new column and calculate the mean cholesterol level
cholesterol_comparison = cardio_data.groupby('age_over_50')['cholesterol'].mean(
# Group the data by this new column and calculate the mean cholesterol level
cholesterol_comparison = cardio_data.groupby('age_over_50')['cholesterol'].mean(
# Display results
print(f"Percentage of smokers among men: {smoking_comparison.get(1, 0):.2f}%")
print(f"Percentage of smokers among women: {smoking_comparison.get(2, 0):.2f}%")
# Stack the correlation matrix to find pairs of features and their correlation v
correlation_pairs = spearman_corr.stack()
# Remove self-correlations
correlation_pairs = correlation_pairs[correlation_pairs.index.get_level_values(0
The two features with the highest Spearman rank correlation are: ap_hi and ap_lo
with a correlation of 0.74
# Calculate the thresholds for more than 2 standard deviations from the mean
lower_threshold = mean_height - 2 * std_dev_height
upper_threshold = mean_height + 2 * std_dev_height
Percentage of people more than 2 standard deviations away from the average heigh
t: 3.34%
# Display the first few rows and columns to confirm it loaded correctly
print("Cardio Data Head:")
print(cardio_data.head())
print("Cardio Alco Data Head:")
print(cardio_alco_data.head())
To determine which of the statements is true with 95% confidence, i will conduct statistical tests (like t-
tests or ANOVA) on the relevant data to compare the groups for each statement. Below is an outline of
how I will approach verifying these statements statistically:
Hypothesis: Null hypothesis (H0): There is no difference in cholesterol levels. Alternative hypothesis
(H1): Smokers have higher cholesterol levels.
Confidence Interval: Calculate the 95% confidence interval for the difference in means.
Hypothesis: H0: There is no difference in weight. H1: Smokers weigh less than non-smokers.
Confidence Interval: Calculate the 95% confidence interval for the difference in means.
Hypothesis: H0: There is no difference in blood pressure. H1: Men have higher blood pressure than
women.
Confidence Interval: Calculate the 95% confidence interval for the difference in means.
Hypothesis: H0: There is no difference in blood pressure. H1: Smokers have higher blood pressure than
non-smokers.
Confidence Interval: Calculate the 95% confidence interval for the difference in means._
The difference in the total number of confirmed cases between Italy and Germany b
ecame more than 10,000 on: 2020-03-12
# Calculate the predicted cumulative cases for 2020-03-20 (x = days from start)
days_from_start = (pd.to_datetime('2020-03-20') - italy_data['date'].min()).days
predicted_cases = exponential_func(days_from_start, *params)