to_kmeans {opm}R Documentation

Work with k-means results

Description

Calculate or plot the Calinski-Harabasz statistics from kmeans results. The result of plot is a simple scatter plot which can be modified with arguments passed to plot from the graphics package. Alternatively, determine the borders between clusters of one-dimensional data, create a histogram in which these borders are plotted, or convert an object to one of class kmeans.

Usage

  to_kmeans(x, ...)

  ## S3 method for class 'kmeans'
 to_kmeans(x, ...)

  ## S3 method for class 'kmeanss'
 to_kmeans(x, y, ...)

  ## S3 method for class 'Ckmeans.1d.dp'
 to_kmeans(x, y, ...)

  calinski(x, ...)

  ## S3 method for class 'kmeans'
 calinski(x, ...)

  ## S3 method for class 'Ckmeans.1d.dp'
 calinski(x, y, ...)

  ## S3 method for class 'kmeanss'
 calinski(x, ...)

  ## S3 method for class 'kmeanss'
 plot(x, xlab = "Number of clusters",
    ylab = "Calinski-Harabasz statistics", ...)

  borders(x, ...)

  ## S3 method for class 'kmeans'
 borders(x, y, ...)

  ## S3 method for class 'Ckmeans.1d.dp'
 borders(x, y, ...)

  ## S3 method for class 'kmeanss'
 borders(x, ...)

  ## S3 method for class 'kmeans'
 hist(x, y, col = "black", lwd = 1L,
    lty = 1L, main = NULL, xlab = "Clustered values", ...)

  ## S3 method for class 'Ckmeans.1d.dp'
 hist(x, y, ...)

  ## S3 method for class 'kmeanss'
 hist(x, k = NULL, col = "black",
    lwd = 1L, lty = 1L, main = NULL,
    xlab = "Clustered values", ...)

Arguments

x

Object of class kmeans, ‘Ckmeans.1d.dp’ or kmeanss. For plot, only the latter.

y

Vector of original data subjected to clustering. Automatically determined for the kmeanss methods. For to_kmeans, original numeric vector that was used to create a ‘Ckmeans.1d.dp’ object, or index of an element of a kmeanss object.

k

Numeric vector or NULL. If non-empty, it indicates the number of groups (previously used as input for kmeans) for which vertical lines should be drawn in the plot that represent the cluster borders. If empty, the smallest non-trivial number of clusters is chosen.

col

Graphical parameter passed to abline. If several values of k are given, col is recycled as necessary.

lwd

Like col.

lty

Like col.

main

Passed to hist.default.

xlab

Character scalar passed to hist.default or to plot from the graphics package.

ylab

Character scalar passed to plot from the graphics package.

...

Optional arguments passed to and from other methods. For the hist method, optional arguments passed to hist.default.

Details

The borders are calculated as the mean of the maximum of the cluster with the lower values and the minimum of the neighbouring cluster with the higher values. The hist method plots a histogram of one-dimensional data subjected to k-means partitioning in which these borders can be drawn.

y must also be in the order it has been when subjected to clustering, but this is not checked. Using kmeanss objects thus might preferable in most cases because they contain a copy of the input data.

Value

to_kmeans creates an object of class kmeans.

borders creates a numeric vector or list of such vectors.

The return value of the hist method is like hist.default; see there for details.

calinksi returns a numeric vector with one element per kmeans object. plot returns it invisibly. Its ‘names’ attribute indicates the original numbers of clusters requested.

See Also

graphics::hist graphics::abline Ckmeans.1d.dp::Ckmeans.1d.dp

Other kmeans-functions: run_kmeans

Examples

x <- as.vector(extract(vaas_4, as.labels = NULL, subset = "A"))
x.km <- run_kmeans(x, k = 1:10)

# plot() method
# the usual arguments of plot() are available
show(y <- plot(x.km, col = "blue", pch = 19))

plot of chunk unnamed-chunk-1

##        1        2        3        4        5        6        7        8 
##     -Inf 3507.297 3765.857 3879.576 4438.684 4779.887 5473.626 6056.848 
##        9       10 
## 6632.574 7411.174
stopifnot(is.numeric(y), names(y) == 1:10)

# borders() method
(x.b <- borders(x.km)) # => list of numeric vectors
## $`1`
## numeric(0)
## 
## $`2`
## [1] 171.4392
## 
## $`3`
## [1] 111.8538 230.9125
## 
## $`4`
## [1] 100.0824 204.8658 283.0464
## 
## $`5`
## [1]  70.74111 143.81037 230.91248 295.18021
## 
## $`6`
## [1]  65.17332 127.05117 204.86583 261.09713 301.35835
## 
## $`7`
## [1]  48.31749  89.73480 150.11714 223.42479 274.91820 306.03570
## 
## $`8`
## [1]  40.65184  69.02271 111.85380 171.43923 230.91248 274.91820 306.03570
## 
## $`9`
## [1]  40.65184  69.02271 111.85380 166.05988 223.42479 272.72290 301.35835
## [8] 335.07573
## 
## $`10`
## [1]  40.65184  69.02271 111.85380 162.09539 217.67328 257.53834 283.04643
## [8] 305.18844 335.07573
stopifnot(is.list(x.b), length(x.b) == 10, sapply(x, is.numeric))
stopifnot(sapply(x.b, length) == as.numeric(names(x.b)) - 1)

# hist() methods
y <- hist(x.km[[2]], x, col = "blue", lwd = 2)

plot of chunk unnamed-chunk-1

stopifnot(inherits(y, "histogram"))
y <- hist(x.km, 3:4, col = c("blue", "red"), lwd = 2)

plot of chunk unnamed-chunk-1

stopifnot(inherits(y, "histogram"))

# to_kmeans() methods
x <- c(1, 2, 4, 5, 7, 8)
summary(y <- kmeans(x, 3))
##              Length Class  Mode   
## cluster      6      -none- numeric
## centers      3      -none- numeric
## totss        1      -none- numeric
## withinss     3      -none- numeric
## tot.withinss 1      -none- numeric
## betweenss    1      -none- numeric
## size         3      -none- numeric
## iter         1      -none- numeric
## ifault       1      -none- numeric
stopifnot(identical(y, to_kmeans(y)))
# see particularly run_kmeans() which uses this internally if clustering is
# done with Ckmeans.1d.dp::Ckmeans.1d.dp()

[Package opm version 1.3.63 Index]