Wisconsin Breast Cancer Database — BreastCancer

The objective is to identify each of a number of benign or malignant classes. Samples arrive periodically as Dr. Wolberg reports his clinical cases. The database therefore reflects this chronological grouping of the data. This grouping information appears immediately below, having been removed from the data itself. Each variable except for the first was converted into 11 primitive numerical attributes with values ranging from 0 through 10. Rows with missing attribute values and duplicate rows removed.

BreastCancer_na.rm

Format

A data frame with 675 observations of 8 numeric variables and target factor Class.

Id, Sample code number
Cl.thickness, Clump thickness
Cell.size, Uniformity of cell size
Cell.shape, Uniformity of cell shape
Marg.adhesion, Marginal adhesion
Epith.c.size, Single Epthelial cell size
Bare.nuclei, Bare nuclei
Bl.cromatin, Bland chromatin
Normal.nucleoli, Normal Nucleoli
Mitoses, Mitoses
Class, Class of cancer, either "benign" or "malignant"

Source

J.W. Smith., el al. 1988. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care (pp. 261--265). IEEE Computer Society Press.

mlbench, R package. F. Leisch & E. Dimitriadou, 2021. mlbench: Machine Learning Benchmark Problems https://CRAN.R-project.org/package=mlbench

Details

This is a cleaned subset of mlbench's BreastCancer. See help(BreastCancer, package = "mlbench") for the original.

Replicating this dataset:

require("mlbench")
data(BreastCancer)

raw <- BreastCancer
## rownumber index of 8 duplicate 16 incomplete rows
idx <- !duplicated(raw) & complete.cases(raw) 
d <- raw[idx, 3:10]
d <- apply(d, 2L, as.integer)
d <- data.frame(d, Class = as.factor(raw$Class[idx]))
BreastCancer_na.rm <- d
## save(BreastCancer_na.rm, file = "./data/BreastCancer_na.rm.rda")

Examples

library(spinifex)
str(BreastCancer_na.rm)
#> 'data.frame':	675 obs. of  9 variables:
#>  $ Cell.size      : int  1 4 1 8 1 10 1 1 1 2 ...
#>  $ Cell.shape     : int  1 4 1 8 1 10 1 2 1 1 ...
#>  $ Marg.adhesion  : int  1 5 1 1 3 8 1 1 1 1 ...
#>  $ Epith.c.size   : int  2 7 2 3 2 7 2 2 2 2 ...
#>  $ Bare.nuclei    : int  1 10 2 4 1 10 10 1 1 1 ...
#>  $ Bl.cromatin    : int  3 3 3 3 3 9 3 3 1 2 ...
#>  $ Normal.nucleoli: int  1 2 1 7 1 7 1 1 1 1 ...
#>  $ Mitoses        : int  1 1 1 1 1 1 1 1 5 1 ...
#>  $ Class          : Factor w/ 2 levels "benign","malignant": 1 1 1 1 1 2 1 1 1 1 ...
dat  <- scale_sd(BreastCancer_na.rm[, 1:8])
clas <- BreastCancer_na.rm$Class

bas <- basis_pca(dat)
mv  <- manip_var_of(bas)
mt  <- manual_tour(bas, mv)

ggt <- ggtour(mt, dat, angle = .2) +
  proto_default(aes_args = list(color = clas, shape = clas))
# \donttest{
animate_plotly(ggt)

# }