The data set PimaIndiansDiabetes2 contains a corrected version of the original data set. While the UCI repository index claims that there are no missing values, closer inspection of the data shows several physical impossibilities, e.g., blood pressure or body mass index of 0. In PimaIndiansDiabetes2, all zero values of glucose, pressure, triceps, insulin and mass have been set to NA, see also Wahba et al (1995) and Ripley (1996).

PimaIndiansDiabetes_long

Format

A data frame with 724 observations of 6 numeric variables, and target factor diabetes.

  • pregnant, Number of times pregnant

  • glucose, Plasma glucose concentration (glucose tolerance test)

  • pressure, Diastolic blood pressure (mm Hg)

  • mass, Body mass index (weight in kg/(height in m)\^2)

  • pedigree, Diabetes pedigree function

  • age, Age (years)

  • diabetes, Class variable (test for diabetes), either "pos" or "neg"

Source

J.W. Smith., el al. 1988. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care (pp. 261--265). IEEE Computer Society Press.

mlbench, R package. F. Leisch & E. Dimitriadou, 2021. mlbench: Machine Learning Benchmark Problems https://CRAN.R-project.org/package=mlbench

Details

This is a cleaned subset of mlbench's PimaIndiansDiabetes2. See help(PimaIndiansDiabetes2, package = "mlbench").

Replicating this dataset:

require("mlbench")
data(PimaIndiansDiabetes2)

d <- PimaIndiansDiabetes2
d <- d[, c(1:3, 6:9)] ## Remove 2 colulmns with the most NAs
d <- d[complete.cases(d), ] ## Remove ~44 row-wise incomplete rows
PimaIndiansDiabetes_long <- d
## save(PimaIndiansDiabetes_long, file = "./data/PimaIndiansDiabetes_long.rda")

Examples

library(spinifex)
str(PimaIndiansDiabetes_long)
#> 'data.frame':	724 obs. of  7 variables:
#>  $ pregnant: num  6 1 8 1 0 5 3 2 4 10 ...
#>  $ glucose : num  148 85 183 89 137 116 78 197 110 168 ...
#>  $ pressure: num  72 66 64 66 40 74 50 70 92 74 ...
#>  $ mass    : num  33.6 26.6 23.3 28.1 43.1 25.6 31 30.5 37.6 38 ...
#>  $ pedigree: num  0.627 0.351 0.672 0.167 2.288 ...
#>  $ age     : num  50 31 32 21 33 30 26 53 30 34 ...
#>  $ diabetes: Factor w/ 2 levels "neg","pos": 2 1 2 1 2 1 2 2 1 2 ...
dat  <- scale_sd(PimaIndiansDiabetes_long[, 1:6])
clas <- PimaIndiansDiabetes_long$diabetes

bas <- basis_pca(dat)
mv  <- manip_var_of(bas)
mt  <- manual_tour(bas, mv)

ggt <- ggtour(mt, dat, angle = .2) +
  proto_default(aes_args = list(color = clas, shape = clas))
# \donttest{
animate_plotly(ggt)
# }