One year of daily weather observations collected from the Canberra airport in Australia was obtained from the Australian Commonwealth Bureau of Meteorology and processed to create this sample dataset for illustrating data mining using R and Rattle.

weather_na.rm

Format

A data frame of 354 observations of 20 variables. One year of daily observations of weather variables at Canberra airport in Australia between November 1, 2007 and October 31, 2008.

  • Date, The date of observation (Date class).

  • MinTemp, The minimum temperature in degrees Celsius.

  • MaxTemp, The maximum temperature in degrees Celsius.

  • Rainfall, The amount of rainfall recorded for the day in mm.

  • Evaporation, The "Class A pan evaporation" (mm) in the 24 hours to 9am.

  • WindSpeed3pm, Wind speed (km/hr) averaged over 10 minutes prior to 3pm.

  • Humid9am, Relative humidity (percent) at 9am.

  • Humid3pm, Relative humidity (percent) at 3pm.

  • Pressure9am, Atmospheric pressure (hpa) reduced to mean sea level at 9am.

  • Pressure3pm, Atmospheric pressure (hpa) reduced to mean sea level at 3pm.

  • Cloud9am, Fraction of sky obscured by cloud at 9am. This is measured in "oktas", which are a unit of eighths. It records how many eighths of the sky are obscured by cloud. A 0 measure indicates completely clear sky whilst an 8 indicates that it is completely overcast.

  • Cloud3pm, Fraction of sky obscured by cloud (in "oktas": eighths) at 3pm. See Cloud9am for a description of the values.

  • Temp9am, Temperature (degrees C) at 9am.

  • Temp3pm, Temperature (degrees C) at 3pm.

  • RISK_MM, The amount of rain. A kind of measure of the "risk".

  • RainToday, Factor: "yes" if precipitation (mm) in the 24 hours to 9am exceeds 1mm, otherwise 0.

  • RainTomorrow, Factor: "yes" if it rained the following day, the target variable.

Copyright Commonwealth of Australia 2010, Bureau of Meteorology. Definitions adapted from http://www.bom.gov.au/climate/dwo/IDCJDW0000.shtml

Source

Bureau of Meteorology, Commonwealth of Australia http://www.bom.gov.au/climate/data/

rattle, R package. G. Williams, 2020. rattle: Graphical User Interface for Data Science in R https://CRAN.R-project.org/package=rattle

Details

The data has been processed to provide a target variable RainTomorrow (whether there is rain on the following day - No/Yes) and a risk variable RISK_MM (how much rain recorded in millimeters). Various transformations were performed on the source data. The dataset is quite small and is useful only for repeatable demonstration of various data science operations.

This is a cleaned subset of rattle::weather.

Replicating this dataset:

require("rattle")
d <- rattle::weather[, c(1, 3:7, 9, 12:21, 23, 22, 24)]
d <- d[complete.cases(d), ] ## Remove ~12 row-wise incomplete rows
d <- as.data.frame(d)  ## Remove tibble dependency
weather_na.rm <- d
## save(weather_na.rm, file = "./data/weather_na.rm.rda")

Examples

library(spinifex)
str(weather_na.rm)
#> 'data.frame':	354 obs. of  20 variables:
#>  $ Date         : Date, format: "2007-11-01" "2007-11-02" ...
#>  $ MinTemp      : num  8 14 13.7 13.3 7.6 6.2 6.1 8.3 8.8 8.4 ...
#>  $ MaxTemp      : num  24.3 26.9 23.4 15.5 16.1 16.9 18.2 17 19.5 22.8 ...
#>  $ Rainfall     : num  0 3.6 3.6 39.8 2.8 0 0.2 0 0 16.2 ...
#>  $ Evaporation  : num  3.4 4.4 5.8 7.2 5.6 5.8 4.2 5.6 4 5.4 ...
#>  $ Sunshine     : num  6.3 9.7 3.3 9.1 10.6 8.2 8.4 4.6 4.1 7.7 ...
#>  $ WindGustSpeed: num  30 39 85 54 50 44 43 41 48 31 ...
#>  $ WindSpeed9am : num  6 4 6 30 20 20 19 11 19 7 ...
#>  $ WindSpeed3pm : num  20 17 6 24 28 24 26 24 17 6 ...
#>  $ Humidity9am  : int  68 80 82 62 68 70 63 65 70 82 ...
#>  $ Humidity3pm  : int  29 36 69 56 49 57 47 57 48 32 ...
#>  $ Pressure9am  : num  1020 1012 1010 1006 1018 ...
#>  $ Pressure3pm  : num  1015 1008 1007 1007 1018 ...
#>  $ Cloud9am     : int  7 5 8 2 7 7 4 6 7 7 ...
#>  $ Cloud3pm     : int  7 3 7 7 7 5 6 7 7 1 ...
#>  $ Temp9am      : num  14.4 17.5 15.4 13.5 11.1 10.9 12.4 12.1 14.1 13.3 ...
#>  $ Temp3pm      : num  23.6 25.7 20.2 14.1 15.4 14.8 17.3 15.5 18.9 21.7 ...
#>  $ RISK_MM      : num  3.6 3.6 39.8 2.8 0 0.2 0 0 16.2 0 ...
#>  $ RainToday    : Factor w/ 2 levels "No","Yes": 1 2 2 2 2 1 1 1 1 2 ...
#>  $ RainTomorrow : Factor w/ 2 levels "No","Yes": 2 2 2 2 1 1 1 1 2 1 ...
dat  <- scale_sd(weather_na.rm[, 2:18])
clas <- weather_na.rm$RainTomorrow

bas <- basis_pca(dat)
mv  <- manip_var_of(bas)
mt  <- manual_tour(bas, mv)

ggt <- ggtour(mt, dat, angle = .2) +
  proto_default(aes_args = list(color = clas, shape = clas))
# \donttest{
animate_plotly(ggt)
# }