R: Provide information on substrates

substrate_info {opm}

R Documentation

Provide information on substrates

Description

Return information on substrates such as their CAS number or other database ID or convert substrate names.

Usage

  ## S4 method for signature 'MOPMX'
substrate_info(object, ...) 
  ## S4 method for signature 'OPM'
substrate_info(object, ...) 
  ## S4 method for signature 'OPMS'
substrate_info(object, ...) 
  ## S4 method for signature 'character'
substrate_info(object,
    what = c("cas", "kegg", "drug", "metacyc", "chebi", "mesh", "seed",
      "downcase", "greek", "concentration", "html", "peptide", "peptide2",
      "all"), browse = 0L, download = FALSE, ...)
  ## S4 method for signature 'factor'
substrate_info(object, ...) 
  ## S4 method for signature 'list'
substrate_info(object, ...) 
  ## S4 method for signature 'substrate_match'
substrate_info(object, ...)

Arguments

`object`	Query character vector, factor or list, S3 object of class ‘substrate_match’, `OPM`, `OPMS` or `MOPMX` object.
`what`	Character scalar indicating which kind of information to output. all Create object of S3 class ‘substrate_data’ containing all available information and useful for display. cas CAS registry number, optionally expanded to an URL. chebi ChEBI database ID, optionally expanded to an URL. concentration Attempt to extract concentration information (as used in opm substrate names) from `object`. Return `NA` wherever this fails. downcase Substrate name converted to lower case, protecting one-letter specifiers, acronyms and chemical symbols, and translating relevant characters from the Greek alphabet. drug KEGG drug database ID, optionally expanded to an URL. greek Substrate name after translation of relevant characters to Greek letters. html Like `greek`, but using HTML tags, and also converting other parts of compound names that require special formatting. kegg KEGG compound database ID, optionally expanded to an URL. mesh MeSH database name (useful for conducting PubMed searches), optionally expanded to an URL. metacyc MetaCyc database ID, optionally expanded to an URL. peptide List of character vectors representing amino acids in three-letter code, in order, contained in the substrate if it is a peptide. Empty character vectors are returned for non-peptide substrates. Amino acids without ‘L’ or ‘D’ annotation are assumed to be in ‘L’ conformation, i.e. ‘L-’ is removed from the beginning of the amino acid codes. peptide2 Like `peptide`, but without removal of ‘L-’ from the beginning of the amino acid codes. seed SEED compound database ID, optionally expanded to an URL. See the references for information on the databases.
`browse`	Numeric scalar. If non-zero, an URL is generated from each ID. If positive, this number of URLs (counted from the beginning) is also opened in the default web browser; if negative, the URLs are only returned. It is an error to try this with those values of `what` that do not yield an ID.
`download`	Logical scalar indicating whether, using the available IDs, substrate information should be queried from the according web services and returned in customised objects. Note that this is unavailable for most values of `what`. At the moment only `kegg` and `drug` can be queried for if the KEGGREST package is available. This would yield S3 objects of the class `kegg_compounds`.
`...`	Optional other arguments passed between the methods.

Details

The query names must be written exactly as used in the stored plate annotations. To determine their spelling, use find_substrate. Each spelling might include a concentration indicator, but the same underlying substrate name yielded the same ID irrespective of the concentration.

Note that the information is only partially complete, depending on the well and the database. While it is possible to link almost all substrates to, say, CAS numbers, they are not necessarily contained in the other databases. Thanks to the work of the ChEBI staff, which is gratefully acknowledged, ChEBI information is complete as far as possible (large molecules such as proteins or other polymers are not covered by ChEBI).

For some wells, even a main substrate cannot be identified, causing all its IDs to be missing. This holds for all control wells, for all wells that contain a mixture of (usually two) substrates, and for all wells that are only specified by a certain pH.

The generated URLs should provide plenty of information on the respective substrate. In the case of ChEBI, KEGG and MetaCyc, much information is directly displayed on the page itself, whereas the chosen CAS site contains a number of links providing additional chemical details. The MeSH web pages directly link to according PubMed searches.

Value

The character method returns a character vector with object used as names and either a matched entry or NA as value. Only if what is set to ‘peptide’ a named list is returned instead. The factor method works like the character method, whereas the list method traverses a list and calls substrate_info on suitable elements, leaving others unchanged. The OPM and OPMS methods work like the character method, using their own substrates.

Depending on the browse argument, the returned IDs might have been converted to URLs, and as a side effect tabs in the default web browser might have been opened. For suitable values of what, setting download to TRUE yielded special objects as described above.

The MOPMX method yield a list with one element of one of the kinds described above per element of object.

References

Bochner, B. R., pers. comm.

http://www.cas.org/content/chemical-substances/faqs

http://www.genome.jp/kegg/

Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M., and Hirakawa, M. 2010 KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Research 38: D355–D360.

http://metacyc.org/

Caspi, R., Altman, T., Dreher, K., Fulcher, C.A., Subhraveti, P., Keseler, I.M., Kothari, A., Krummenacker, M., Latendresse, M., Mueller, L.A., Ong, Q., Paley, S., Pujar, A., Shearer, A.G., Travers, M., Weerasinghe, D., Zhang, P., Karp, P.D. 2012 The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Research 40: D742–D753.

http://www.ncbi.nlm.nih.gov/mesh

Coletti, M.H., Bleich, H.L 2001 Medical subject headings used to search the biomedical literature. Journal of the American Medical Informatics Association 8: 317–323.

http://www.ebi.ac.uk/chebi/

Hastings, J., de Matos, P., Dekker, A., Ennis, M., Harsha, B., Kale, N., Muthukrishnan, V., Owen, G., Turner, S., Williams, M., Steinbeck, C. 2013 The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013. Nucleic Acids Research 41: D456–D463.

Overbeek, R., Begley, T., Butler, R., Choudhuri, J., Chuang, H., Cohoon, M., de Crecy-Lagard, V., Diaz, N., Disz, T., Edwards, R., Fonstein, M., Frank, E., Gerdes, S., Glass, E., Goesmann, A., Hanson, A., Iwata-Reuyl, D., Jensen, R., Jamshidi, N., Krause, L., Kubal, M., Larsen, N., Linke, B., McHardy, A., Meyer, F., Neuweger, H., Olsen, G., Olson, R., Osterman, A., Portnoy, V., Pusch, G., Rodionov, D., Rueckert, C., Steiner, J., Stevens, R., Thiele, I., Vassieva, O., Ye, Y., Zagnitko, O., Vonstein, V. 2005 The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Research 33: 5691–5702.

Examples

# Character method; compare correct and misspelled substrate name
(x <- substrate_info(c("D-Glucose", "D-Gloucose")))

##  D-Glucose D-Gloucose 
##  "50-99-7"         NA

stopifnot(anyNA(x), !all(is.na(x)))
stopifnot(identical(x, # Factor method yields same result
  substrate_info(as.factor(c("D-Glucose", "D-Gloucose")))))

# Now with generation of URLs
(y <- substrate_info(c("D-Glucose", "D-Gloucose"), browse = -1))

##                                                         D-Glucose 
## "http://chem.sis.nlm.nih.gov/chemidplus/direct.jsp?regno=50-99-7" 
##                                                        D-Gloucose 
##                                                                NA

stopifnot(is.na(y) | nchar(y) > nchar(x))
# NA remains NA (and the function would not try to open it in the browser)

# Character method, safe conversion to lower case
(x <- substrate_info(c("a-D-Glucose", "a-D-Gloucose"), "downcase"))

##        a-D-Glucose       a-D-Gloucose 
##  "alpha-D-glucose" "alpha-D-gloucose"

stopifnot(nchar(x) > nchar(c("a-D-Glucose", "a-D-Gloucose")))
# note the protection of 'D' and the conversion of 'a'
# whether or not substrate names are known does not matter here

# Peptide extraction (note treatment of non-standard amino acids)
(x <- substrate_info(c("Ala-b-Ala-D-Glu", "Glucose", "Trp-Val"), "peptide"))

## $`Ala-b-Ala-D-Glu`
## [1] "Ala"   "b-Ala" "D-Glu"
## 
## $Glucose
## character(0)
## 
## $`Trp-Val`
## [1] "Trp" "Val"

stopifnot(is.list(x), sapply(x, length) == c(3, 0, 2))

# List method
(x <- substrate_info(find_substrate(c("D-Glucose", "D-Gloucose"))))

## D-Glucose:
##   1-Thio-b-D-Glucose: 10593-29-0
##   '2-Deoxy-D-Glucose #1': 154-17-6
##   '2-Deoxy-D-Glucose #2': 154-17-6
##   '2-Deoxy-D-Glucose #3': 154-17-6
##   '2-Deoxy-D-Glucose #4': 154-17-6
##   2-Deoxy-D-Glucose-6-Phosphate: 33068-19-8
##   3-O-Methyl-D-Glucose: 146-72-5
##   D-Glucose: 50-99-7
##   'D-Glucose #1': 50-99-7
##   'D-Glucose #10': 50-99-7
##   'D-Glucose #11': 50-99-7
##   'D-Glucose #12': 50-99-7
##   'D-Glucose #2': 50-99-7
##   'D-Glucose #3': 50-99-7
##   'D-Glucose #4': 50-99-7
##   'D-Glucose #5': 50-99-7
##   'D-Glucose #6': 50-99-7
##   'D-Glucose #7': 50-99-7
##   'D-Glucose #8': 50-99-7
##   'D-Glucose #9': 50-99-7
##   D-Glucose-6-Phosphate: 3671-99-6
##   a-D-Glucose-1-Phosphate: 56401-20-8
##   'a-D-Glucose-1-Phosphate #1': 56401-20-8
##   'a-D-Glucose-1-Phosphate #10': 56401-20-8
##   'a-D-Glucose-1-Phosphate #11': 56401-20-8
##   'a-D-Glucose-1-Phosphate #12': 56401-20-8
##   'a-D-Glucose-1-Phosphate #2': 56401-20-8
##   'a-D-Glucose-1-Phosphate #3': 56401-20-8
##   'a-D-Glucose-1-Phosphate #4': 56401-20-8
##   'a-D-Glucose-1-Phosphate #5': 56401-20-8
##   'a-D-Glucose-1-Phosphate #6': 56401-20-8
##   'a-D-Glucose-1-Phosphate #7': 56401-20-8
##   'a-D-Glucose-1-Phosphate #8': 56401-20-8
##   'a-D-Glucose-1-Phosphate #9': 56401-20-8
## D-Gloucose: {}

stopifnot(length(x[[1]]) > length(x[[2]]))

# OPM and OPMS methods
(x <- substrate_info(vaas_1[, 1:3], "all"))

## Negative Control: []
## Dextrin:
##   CAS: 9004-53-9
##   ChEBI: '28675'
##   KEGG compound: C00721
##   KEGG drug: D00084
##   MetaCyc: Dextrins
##   SEED: cpd11594
##   MeSH: Quaternary Ammonium Compounds
## D-Maltose:
##   CAS: 6363-53-7
##   ChEBI: '17306'
##   KEGG compound: C00208
##   KEGG drug: D00044
##   MetaCyc: MALTOSE
##   SEED: cpd00179
##   MeSH: Maltose

stopifnot(inherits(x, "substrate_data"))
stopifnot(identical(x, substrate_info(vaas_4[, , 1:3], "all")))
## Not run: 
##D 
##D   # this would open up to 96 tabs in your browser...
##D   substrate_info(vaas_4, "kegg", browse = 100)
## End(Not run)

[Package opm version 1.3.63 Index]