0% found this document useful (0 votes)
32 views11 pages

Morán-Pérez - Tarea 4 BStat - 22-02-24

The document discusses several statistical exercises involving the analysis and visualization of different datasets. Bar plots, histograms, and scatter plots are used to analyze trends in variables like physician rates, fluoride levels, and monetary supply over time. Correlations between variables are explored.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views11 pages

Morán-Pérez - Tarea 4 BStat - 22-02-24

The document discusses several statistical exercises involving the analysis and visualization of different datasets. Bar plots, histograms, and scatter plots are used to analyze trends in variables like physician rates, fluoride levels, and monetary supply over time. Correlations between variables are explored.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

22/02/24 “Tarea 4 Bioestadística” Mariano Morán

Ejercicio 3.3
> year <- c(1980, 1990, 1995, 1998, 1999, 2000, 2001)
> family_practice <- c(47.8, 57.6, 59.9, 64.6, 66.2, 67.5, 70.0)
> total_office_based <- c(271.3, 359.9, 427.3, 468.8, 473.2, 490.4, 514.0)
> barplot(family_practice,year, names.arg = year, xlab = "Year", ylab = "Number of
Physicians", col = "blue", main = "Increase in Family Practice Physicians (1980-2001)")

> percentage_family_practice <- (family_practice / total_office_based) * 100


[1] 17.61887 16.00445 14.01825 13.77986 13.98986 13.76427 13.61868
> barplot(percentage_family_practice, names.arg = year, xlab = "Year", ylab = "Percentage
(%)", col = "green", main = "Percentage of Office-Based Physicians who are Family Practice
Physicians")

La tendencia en el número de médicos familiares de manera absoluta va al alza; sin


embargo, la proporción de médicos familiares con respecto al total tiene un
comportamiento decreciente.

Ejercicio 3.4
> fluoride <- c(0.75, 0.86, 0.84, 0.85, 0.97, 0.94, 0.89, 0.84, 0.83, 0.89, 0.88, 0.78, 0.77, 0.76,
0.82, 0.72, 0.92, 1.05, 0.94, 0.83, 0.81, 0.85, 0.97, 0.93, 0.79)
> range_measurement <- max(fluoride) - min(fluoride)
> range_measurement
[1] 0.33
> class_interval_width <- 0.05
> lower_limit <- 0.705
> hist(fluoride, breaks = seq(lower_limit, max(fluoride) + class_interval_width, by =
class_interval_width), col = "lightblue", main = "Frequency Histogram of Fluoride Levels",
xlab = "Fluoride Levels (ppm)", ylab = "Frequency")

> freq <- table(cut(fluoride, breaks = seq(lower_limit, max(fluoride) + class_interval_width,


by = class_interval_width)))
> freq
(0.705,0.755] (0.755,0.805] (0.805,0.855] (0.855,0.905] (0.905,0.955]
2 4 8 4 4
(0.955,1] (1,1.05]
2 1
> relative_freq <- prop.table(freq)
> relative_freq
(0.705,0.755] (0.755,0.805] (0.805,0.855] (0.855,0.905] (0.905,0.955]
0.08 0.16 0.32 0.16 0.16
(0.955,1] (1,1.05]
0.08 0.04
> barplot(relative_freq, main = "Relative Frequency Histogram of Fluoride Levels", xlab =
"Fluoride Levels (ppm)", ylab = "Relative Frequency", col = "lightgreen")

> count_greater_than_90 <- sum(fluoride > 0.90)


> probability_greater_than_90 <- count_greater_than_90 / length(fluoride)
> probability_greater_than_90
[1] 0.28

Ejercicio 3.7
> standard_therapy <- c(4, 15, 24, 10, 1, 27, 31, 14, 2, 16, 32, 7, 13, 36, 29, 6, 12, 18, 14, 15,
18, 6, 13, 21, 20, 8, 3, 24)
> new_therapy <- c(5, 20, 29, 15, 7, 32, 36, 17, 15, 19, 35, 10, 16, 39, 27, 14, 10, 16, 12, 13,
16, 9, 18, 33, 30, 29, 31, 27)
> num_bins <- 10
> par(mfrow = c(1, 2))
> hist(standard_therapy, breaks = num_bins, freq = FALSE, main = "Standard Therapy", xlab
= "Survival Time (months)", ylab = "Relative Frequency", col = "lightblue")
> hist(new_therapy, breaks = num_bins, freq = FALSE, main = "New Therapy", xlab =
"Survival Time (months)", ylab = "Relative Frequency", col = "lightgreen")

La moda de la terapia estándar corresponde al intervalo de clase de entre 10 y 15 meses,


mientras que la moda de la nueva terapia se encuentra entre los 15 y 20 meses, además de
que podemos observar un aumento en las frecuencias relativas de tiempos mayores a 25
meses de sobrevivencia.
Ejercicio 3.10
> ownership_data <- data.frame(State = c("Alabama", "Alaska", "Arizona", "Arkansas",
"California", "Colorado", "Connecticut", "Delaware", "Dist. of Columbia", "Florida",
"Georgia", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky",
"Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota",
"Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", "New
Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", "Ohio", "Oklahoma",
"Oregon", "Pennsylvania", "Rhode Island", "South Carolina", "South Dakota", "Tennessee",
"Texas", "Utah", "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin",
"Wyoming"), `1985` = c(70.4, 61.2, 64.7, 66.6, 54.2, 63.6, 69.0, 70.3, 37.4, 67.2, 62.7, 51.0,
71.0, 60.6, 67.6, 69.9, 68.3, 68.5, 70.2, 73.7, 65.6, 60.5, 70.7, 70.0, 69.6, 69.2, 66.5, 68.5,
57.0, 65.5, 62.3, 68.2, 50.3, 68.0, 69.9, 67.9, 70.5, 61.5, 71.6, 61.4, 72.0, 67.6, 67.6, 60.5,
71.5, 69.5, 68.5, 66.8, 75.9, 63.8, 73.2), `1996` = c(71.0, 62.9, 62.0, 66.6, 55.0, 64.5, 69.0,
71.5, 40.4, 67.1, 69.3, 50.6, 71.4, 68.2, 74.2, 72.8, 67.5, 73.2, 64.9, 76.5, 66.9, 61.7, 73.3,
75.4, 73.0, 70.2, 68.6, 66.8, 61.1, 65.0, 64.6, 67.1, 52.7, 70.4, 68.2, 69.2, 68.4, 63.1, 71.7,
56.6, 72.9, 67.8, 68.8, 61.8, 72.7, 70.3, 68.5, 63.1, 74.3, 68.2, 68.0), `2002` = c(73.5, 67.3,
65.9, 70.2, 58.0, 69.1, 71.6, 75.6, 44.1, 68.7, 71.7, 57.4, 73.0, 70.2, 75.0, 73.9, 70.2, 73.5,
67.1, 73.9, 72.0, 62.7, 76.0, 77.3, 74.8, 74.6, 69.3, 68.4, 65.5, 69.5, 67.2, 70.3, 55.0, 70.0,
69.5, 72.0, 69.4, 66.2, 74.0, 59.6, 77.3, 71.5, 70.1, 63.8, 72.7, 70.2, 74.3, 67.0, 77.0, 72.0,
72.8))
> par(mfrow = c(3, 1))
> attach(ownership_data)
> hist(X1985, main = "Relative Frequency Histogram for 1985", xlab = "Homeownership Rate
(%)", ylab = "Relative Frequency", col = "lightblue", freq = FALSE)
> hist(X1996, main = "Relative Frequency Histogram for 1996", xlab = "Homeownership Rate
(%)", ylab = "Relative Frequency", col = "lightgreen", freq = FALSE)
> hist(X2002, main = "Relative Frequency Histogram for 2002", xlab = "Homeownership Rate
(%)", ylab = "Relative Frequency", col = "orange", freq = FALSE)
Los histogramas nos muestran que con el paso de los años hay más gente que es dueña de
su propia casa. Es probable que las gráficas hayan cambiado a lo largo de estos 17 años
debido a diversos factores, como las condiciones económicas, los cambios demográficos, los
cambios en las políticas de vivienda y las influencias culturales. Por ejemplo, los auges o
recesiones económicas pueden afectar a las tasas de propiedad de la vivienda, mientras que
los cambios demográficos de la población o los cambios en las políticas gubernamentales
relacionadas con la asequibilidad de la vivienda también pueden tener un impacto.
El Congreso podría utilizar la información de estos gráficos para redactar leyes fiscales que
den lugar a importantes deducciones fiscales por la propiedad de la vivienda al comprender
la distribución de las tasas de propiedad de la vivienda en los distintos estados y a lo largo
del tiempo.
Ejercicio 3.33
> datos <- c(33, 31, 19, 25, 23, 27, 11, 9, 29, 3, 17, 9, 2, 5, 8, 2, 9, 1, 3)

Ejercicio 3.39
> subsistence <- c("Shifting Cultivators", "Settled Agriculturists", "Town Dwellers")
> illiterate <- c(114, 76, 93)
> primary_schooling <- c(10, 2, 13)
> at_least_middle_school <- c(45, 53, 208)
> data <- data.frame(subsistence, illiterate, primary_schooling, at_least_middle_school)
> library(tidyr)
> data_long <- pivot_longer(data, -subsistence, names_to = "Literacy Level", values_to =
"Count"
> library(ggplot2)
> ggplot(data_long, aes(x = subsistence, y = Count, fill = `Literacy Level`)) +
geom_bar(stat="identity") + labs(title = "Literacy Level by Subsistence Group in Manipur,
India", x = "Subsistence Group", y = "Count") + theme_minimal() + scale_fill_manual(values
= c("#66c2a5", "#fc8d62", "#8da0cb"))

> rownames(data) <- data$susbsistence


> data$subsistence <- NULL
> rowtotals <- rowSums(data)
> datarowpct <- sweep(data, 1, rowtotals, FUN="/") * 100
> columntotals <- colSums(data)
> datacolpct <- sweep(data, 2, columntotals, FUN="/") * 100
> datarowpct
illiterate primary_schooling at_least_middle_school row_total
1 33.72781 2.9585799 13.31361 50
2 29.00763 0.7633588 20.22901 50
3 14.80892 2.0700637 33.12102 50

La mayoría de las personas analfabetas son cultivadores itinerantes, mientras que la mayoría
de la gente que terminó la secundaria habita en la ciudad.

Ejercicio 3.41
> months <- 1:20
> M2 <- c(2.25, 2.27, 2.28, 2.29, 2.31, 2.32, 2.35, 2.37, 2.40, 2.42, 2.43, 2.42, 2.44, 2.47, 2.49,
2.51, 2.53, 2.53, 2.54, 2.55)
> M3 <- c(2.81, 2.84, 2.86, 2.88, 2.90, 2.92, 2.96, 2.99, 3.02, 3.04, 3.05, 3.05, 3.08, 3.10, 3.10,
3.13, 3.17, 3.18, 3.19, 3.20)
> plot(M2, M3, main = "Scatterplot of M2 vs M3", xlab = "M2 (trillions of dollars)", ylab =
"M3 (trillions of dollars)", pch = 16, col = "blue")

Un gráfico de dispersión no describe adecuadamente la correlación que hay entre los


suministros de dinero M2 y M3, a pesar de que se observa cierta linealidad. Una mejor
manera de representar los datos seria graficando ambas variables para ver cómo cambian a
lo largo de los 20 meses.

You might also like