R: Work with k-means results

to_kmeans {opm}

R Documentation

Work with k-means results

Description

Calculate or plot the Calinski-Harabasz statistics from kmeans results. The result of plot is a simple scatter plot which can be modified with arguments passed to plot from the graphics package. Alternatively, determine the borders between clusters of one-dimensional data, create a histogram in which these borders are plotted, or convert an object to one of class kmeans.

Usage

  to_kmeans(x, ...)

  ## S3 method for class 'kmeans'
 to_kmeans(x, ...)

  ## S3 method for class 'kmeanss'
 to_kmeans(x, y, ...)

  ## S3 method for class 'Ckmeans.1d.dp'
 to_kmeans(x, y, ...)

  calinski(x, ...)

  ## S3 method for class 'kmeans'
 calinski(x, ...)

  ## S3 method for class 'Ckmeans.1d.dp'
 calinski(x, y, ...)

  ## S3 method for class 'kmeanss'
 calinski(x, ...)

  ## S3 method for class 'kmeanss'
 plot(x, xlab = "Number of clusters",
    ylab = "Calinski-Harabasz statistics", ...)

  borders(x, ...)

  ## S3 method for class 'kmeans'
 borders(x, y, ...)

  ## S3 method for class 'Ckmeans.1d.dp'
 borders(x, y, ...)

  ## S3 method for class 'kmeanss'
 borders(x, ...)

  ## S3 method for class 'kmeans'
 hist(x, y, col = "black", lwd = 1L,
    lty = 1L, main = NULL, xlab = "Clustered values", ...)

  ## S3 method for class 'Ckmeans.1d.dp'
 hist(x, y, ...)

  ## S3 method for class 'kmeanss'
 hist(x, k = NULL, col = "black",
    lwd = 1L, lty = 1L, main = NULL,
    xlab = "Clustered values", ...)

Arguments

`x`	Object of class `kmeans`, ‘Ckmeans.1d.dp’ or `kmeanss`. For `plot`, only the latter.
`y`	Vector of original data subjected to clustering. Automatically determined for the `kmeanss` methods. For `to_kmeans`, original numeric vector that was used to create a ‘Ckmeans.1d.dp’ object, or index of an element of a `kmeanss` object.
`k`	Numeric vector or `NULL`. If non-empty, it indicates the number of groups (previously used as input for `kmeans`) for which vertical lines should be drawn in the plot that represent the cluster borders. If empty, the smallest non-trivial number of clusters is chosen.
`col`	Graphical parameter passed to `abline`. If several values of `k` are given, `col` is recycled as necessary.
`lwd`	Like `col`.
`lty`	Like `col`.
`main`	Passed to `hist.default`.
`xlab`	Character scalar passed to `hist.default` or to `plot` from the graphics package.
`ylab`	Character scalar passed to `plot` from the graphics package.
`...`	Optional arguments passed to and from other methods. For the `hist` method, optional arguments passed to `hist.default`.

Details

The borders are calculated as the mean of the maximum of the cluster with the lower values and the minimum of the neighbouring cluster with the higher values. The hist method plots a histogram of one-dimensional data subjected to k-means partitioning in which these borders can be drawn.

y must also be in the order it has been when subjected to clustering, but this is not checked. Using kmeanss objects thus might preferable in most cases because they contain a copy of the input data.

Value

to_kmeans creates an object of class kmeans.

borders creates a numeric vector or list of such vectors.

The return value of the hist method is like hist.default; see there for details.

calinksi returns a numeric vector with one element per kmeans object. plot returns it invisibly. Its ‘names’ attribute indicates the original numbers of clusters requested.

Examples

x <- as.vector(extract(vaas_4, as.labels = NULL, subset = "A"))
x.km <- run_kmeans(x, k = 1:10)

# plot() method
# the usual arguments of plot() are available
show(y <- plot(x.km, col = "blue", pch = 19))

plot of chunk unnamed-chunk-1

##        1        2        3        4        5        6        7        8 
##     -Inf 3507.297 3765.857 3879.576 4438.684 4779.887 5473.626 6056.848 
##        9       10 
## 6632.574 7411.174

stopifnot(is.numeric(y), names(y) == 1:10)

# borders() method
(x.b <- borders(x.km)) # => list of numeric vectors

## $`1`
## numeric(0)
## 
## $`2`
## [1] 171.4392
## 
## $`3`
## [1] 111.8538 230.9125
## 
## $`4`
## [1] 100.0824 204.8658 283.0464
## 
## $`5`
## [1]  70.74111 143.81037 230.91248 295.18021
## 
## $`6`
## [1]  65.17332 127.05117 204.86583 261.09713 301.35835
## 
## $`7`
## [1]  48.31749  89.73480 150.11714 223.42479 274.91820 306.03570
## 
## $`8`
## [1]  40.65184  69.02271 111.85380 171.43923 230.91248 274.91820 306.03570
## 
## $`9`
## [1]  40.65184  69.02271 111.85380 166.05988 223.42479 272.72290 301.35835
## [8] 335.07573
## 
## $`10`
## [1]  40.65184  69.02271 111.85380 162.09539 217.67328 257.53834 283.04643
## [8] 305.18844 335.07573

stopifnot(is.list(x.b), length(x.b) == 10, sapply(x, is.numeric))
stopifnot(sapply(x.b, length) == as.numeric(names(x.b)) - 1)

# hist() methods
y <- hist(x.km[[2]], x, col = "blue", lwd = 2)

plot of chunk unnamed-chunk-1

stopifnot(inherits(y, "histogram"))
y <- hist(x.km, 3:4, col = c("blue", "red"), lwd = 2)

plot of chunk unnamed-chunk-1

stopifnot(inherits(y, "histogram"))

# to_kmeans() methods
x <- c(1, 2, 4, 5, 7, 8)
summary(y <- kmeans(x, 3))

##              Length Class  Mode   
## cluster      6      -none- numeric
## centers      3      -none- numeric
## totss        1      -none- numeric
## withinss     3      -none- numeric
## tot.withinss 1      -none- numeric
## betweenss    1      -none- numeric
## size         3      -none- numeric
## iter         1      -none- numeric
## ifault       1      -none- numeric

stopifnot(identical(y, to_kmeans(y)))
# see particularly run_kmeans() which uses this internally if clustering is
# done with Ckmeans.1d.dp::Ckmeans.1d.dp()

[Package opm version 1.3.63 Index]

Work with k-means results

Description

Usage

Arguments

Details

Value

See Also

Examples