Introduction
MeSH (Medical Subject Headings) is the NLM (U.S. National Library of Medicine) controlled vocabulary used to manually index articles for MEDLINE/PubMed. MeSH is comprehensive life science vocabulary. MeSH has 19 categories and MeSH.db
contains 16 of them. That is:
Abbreviation | Category |
---|---|
A | Anatomy |
B | Organisms |
C | Diseases |
D | Chemicals and Drugs |
E | Analytical, Diagnostic and Therapeutic Techniques and Equipment |
F | Psychiatry and Psychology |
G | Phenomena and Processes |
H | Disciplines and Occupations |
I | Anthropology, Education, Sociology and Social Phenomena |
J | Technology and Food and Beverages |
K | Humanities |
L | Information Science |
M | Persons |
N | Health Care |
V | Publication Type |
Z | Geographical Locations |
MeSH terms were associated with Entrez Gene ID by three methods, gendoo
, gene2pubmed
and RBBH
(Reciprocal Blast Best Hit).
Method | Way of corresponding Entrez Gene IDs and MeSH IDs |
---|---|
Gendoo | Text-mining |
gene2pubmed | Manual curation by NCBI teams |
RBBH | sequence homology with BLASTP search (E-value<10-50) |
Enrichment Analysis
meshes
supports enrichment analysis (over-representation analysis and gene set enrichment analysis) of gene list or whole expression profile using MeSH annotation. Data source from gendoo
, gene2pubmed
and RBBH
are all supported. User can selecte interesting category to test. All 16 categories are supported. The analysis supports >70 species listed in MeSHDb BiocView.
For algorithm details, please refer to the vignettes of DOSE(Yu et al. 2015) package.
library(meshes)
data(geneList, package="DOSE")
de <- names(geneList)[1:100]
## minGSSize = 200 for only speed up the compilation of the vignette
x <- enrichMeSH(de, MeSHDb = "MeSH.Hsa.eg.db", database='gendoo', category = 'C', minGSSize=200)
head(x)
## ID Description GeneRatio BgRatio pvalue
## D000782 D000782 Aneuploidy 17/96 320/16528 3.866830e-12
## D042822 D042822 Genomic Instability 16/96 312/16528 3.007419e-11
## D012595 D012595 Scleroderma, Systemic 11/96 279/16528 6.449334e-07
## D009303 D009303 Nasopharyngeal Neoplasms 11/96 314/16528 2.049315e-06
## D019698 D019698 Hepatitis C, Chronic 11/96 317/16528 2.246856e-06
## D001471 D001471 Barrett Esophagus 9/96 213/16528 4.070611e-06
## p.adjust qvalue
## D000782 6.457606e-10 3.744719e-10
## D042822 2.511195e-09 1.456224e-09
## D012595 3.590129e-05 2.081890e-05
## D009303 7.504499e-05 4.351805e-05
## D019698 7.504499e-05 4.351805e-05
## D001471 1.132987e-04 6.570108e-05
## geneID
## D000782 4312/55143/991/1062/7153/4751/79019/55839/890/983/4085/332/7272/9212/8208/1111/6790
## D042822 55143/991/1062/4605/7153/1381/9787/4751/10635/890/4085/81620/332/9212/1111/6790
## D012595 4312/6280/1062/4605/7153/3627/4283/6362/7850/3002/4321
## D009303 4312/7153/3627/6241/983/4085/5918/332/3002/4321/6790
## D019698 4312/3627/10563/6373/4283/983/6362/7850/332/3002/3620
## D001471 6280/7153/10563/890/4085/332/2146/4321/6790
## Count
## D000782 17
## D042822 16
## D012595 11
## D009303 11
## D019698 11
## D001471 9
In the over-representation analysis, we use data source from gendoo
and C
(Diseases) category.
In the following example, we use data source from gene2pubmed
and test category G
(Phenomena and Processes) using GSEA.
## minGSSize = 200 for only speed up the compilation of the vignette
y <- gseMeSH(geneList, MeSHDb = "MeSH.Hsa.eg.db", database = 'gene2pubmed', category = "G", minGSSize=200)
head(y)
## ID Description setSize enrichmentScore NES
## D050156 D050156 Adipogenesis 447 -0.3380419 -1.504629
## D009043 D009043 Motor Activity 441 -0.3294623 -1.462880
## D006339 D006339 Heart Rate 331 -0.3686389 -1.593097
## D001846 D001846 Bone Development 321 -0.3706355 -1.593751
## D049629 D049629 Waist-Hip Ratio 316 -0.3509084 -1.508383
## D015430 D015430 Weight Gain 299 -0.3605430 -1.539487
## pvalue p.adjust qvalues rank
## D050156 0.001215067 0.02003082 0.01054254 2508
## D009043 0.001219512 0.02003082 0.01054254 2176
## D006339 0.001278772 0.02003082 0.01054254 2405
## D001846 0.001291990 0.02003082 0.01054254 2100
## D049629 0.001295337 0.02003082 0.01054254 2176
## D015430 0.001307190 0.02003082 0.01054254 1998
## leading_edge
## D050156 tags=28%, list=20%, signal=23%
## D009043 tags=23%, list=17%, signal=20%
## D006339 tags=29%, list=19%, signal=24%
## D001846 tags=27%, list=17%, signal=23%
## D049629 tags=27%, list=17%, signal=23%
## D015430 tags=23%, list=16%, signal=20%
## core_enrichment
## D050156 5562/8626/8434/7070/5071/10499/2067/2874/9611/6716/5925/65989/5595/8609/9563/27332/1499/79738/4837/7157/79960/5729/408/2908/4088/23741/6500/8038/4057/6649/5564/860/8648/10365/10253/54884/4602/8452/7474/6776/8743/79875/596/25956/8644/80781/79923/1490/50486/7840/84162/6041/4692/2246/4208/11075/63924/5919/284119/2308/9411/10216/54795/5950/79365/1293/2247/5468/373/50507/6876/6469/8553/4023/2530/594/7350/81029/3952/1675/79068/5733/4313/10468/10628/6720/9052/2099/3480/11213/857/55893/290/6678/63895/4035/633/23414/8639/2162/165/3551/10788/185/3357/367/4982/3667/1634/4128/23024/3479/6424/9370/2167/652/8839/5346/54829/2625/79689/10974
## D009043 10550/23405/1499/6453/8945/7157/627/408/2908/22881/27445/11132/2752/9445/2571/23621/3082/1291/2915/1543/7466/3240/3350/947/55304/181/3632/2169/27306/1621/80169/9627/196/8678/8863/23284/81627/4692/5799/2259/3087/1278/283/1277/3953/4747/2247/6414/210/4744/5468/8835/89795/4023/8522/3485/3952/79068/8864/4313/2944/2273/2099/3480/8528/4908/56892/3339/5138/57161/4741/4306/6571/79750/4915/5744/2487/58503/347/6863/2952/5327/367/4982/4128/4059/3572/150/7060/9358/7166/3479/9254/5348/4129/9370/3708/1311/5105/4137/1408/5241
## D006339 4985/7139/8929/3784/10681/3375/154/1760/9781/5139/118/2702/6532/6416/2869/270/7157/627/2908/7138/5563/3643/1129/7779/947/1901/2034/4179/4804/64388/1621/4881/8863/5021/844/4212/11030/5797/6403/4803/84059/79789/5176/3953/5243/5468/1012/2868/5793/4023/7056/3952/5577/126/2946/3778/477/5733/4313/2944/9201/3075/9499/2273/2099/1471/857/775/5138/4306/4487/213/5350/5744/23245/2152/2697/2791/185/6863/2952/5327/80206/2200/9607/3572/150/8490/3479/2006/55259/9370/125/652/55351
## D001846 8945/7157/57798/79048/627/6500/8038/860/2752/4882/3371/2915/5745/63971/54455/3791/819/57045/596/2034/54808/80781/1280/64388/2261/4054/11059/3483/9900/26234/4734/9452/4208/4322/253461/1278/7048/51280/10903/30008/7869/1277/3953/10516/10411/8835/79776/11167/2317/3485/3952/5274/54681/4488/10486/1009/2202/91851/2099/5764/23327/3339/8817/83716/6678/4915/633/658/54361/5744/165/5654/10631/3487/367/4982/3667/79971/1634/3479/114899/9370/652/8614/4969
## D049629 8609/9563/23405/10206/23314/4776/25970/627/2908/490/4057/268/3567/23429/283450/1543/3240/3174/81490/23047/55304/5099/54808/4179/2169/948/8082/4018/54465/4256/3087/5919/253461/26470/10903/1581/56172/3953/5950/2638/5468/1012/8835/4023/594/4214/7350/3952/79068/51232/2202/6444/9369/2099/6833/3991/4016/2690/57161/79750/4915/5125/5167/8639/11188/10631/3551/2487/2697/6935/3487/367/3667/4059/150/9358/3479/6424/9370/4629/652/5346/7021/4239
## D015430 627/2908/5563/108/1387/2752/2571/5914/12/2915/4153/2863/1129/7466/3350/596/181/2746/3067/1621/9627/590/3087/6785/5176/3953/5950/2166/1293/5243/5468/54551/4023/7350/3952/5577/3176/79068/3625/9369/6720/2099/3991/857/2690/6571/4915/32/9135/5654/347/2697/3357/2891/367/25802/4128/9607/3572/150/7166/3479/6505/4129/9370/2167/5346/5241
User can use visualization methods implemented in enrichplot (i.e.barplot
, dotplot
, cnetplot
, emapplot
and gseaplot
) to visualize these enrichment results. With these visualization methods, it’s much easier to interpret enriched results.
Semantic Similarity
meshes
implemented four IC-based methods (i.e. Resnik(Philip 1999), Jiang(Jiang and Conrath 1997), Lin(Lin 1998) and Schlicker(Schlicker et al. 2006)) and one graph-structure based method (i.e. Wang(Wang et al. 2007)). For algorithm details, please refer to the vignette of GOSemSim package(Yu et al. 2010)
meshSim
function is designed to measure semantic similarity between two MeSH term vectors.
library(meshes)
## hsamd <- meshdata("MeSH.Hsa.eg.db", category='A', computeIC=T, database="gendoo")
data(hsamd)
meshSim("D000009", "D009130", semData=hsamd, measure="Resnik")
## [1] 0.2910261
## [1] 0.521396
## [1] 0.4914785
## [1] 0.5557103
## D017629 D002890 D008928
## D001369 0.2886598 0.1923711 0.2193326
## D002462 0.6521739 0.2381925 0.2809552
geneSim
function is designed to measure semantic similarity among two gene vectors.
## [1] 0.487
## 835 5261 241 994
## 241 0.732 0.337 1.000 0.438
## 251 0.526 0.588 0.487 0.597
Need helps?
If you have questions/issues, please visit meshes homepage first. Your problems are mostly documented. If you think you found a bug, please follow the guide and provide a reproducible example to be posted on github issue tracker. For questions, please post to Bioconductor support site and tag your post with meshes.
For Chinese user, you can follow me on WeChat (微信).
Session Information
Here is the output of sessionInfo()
on the system on which this document was compiled:
## R version 3.5.0 (2018-04-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.7-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.7-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] meshes_1.6.1 DOSE_3.6.0 MeSH.db_1.10.0
## [4] MeSH.Hsa.eg.db_1.10.0 MeSHDbi_1.16.0 BiocGenerics_0.26.0
##
## loaded via a namespace (and not attached):
## [1] viridis_0.5.1 Biobase_2.40.0 viridisLite_0.3.0
## [4] bit64_0.9-7 splines_3.5.0 ggraph_1.0.1
## [7] prettydoc_0.2.1 assertthat_0.2.0 DO.db_2.9
## [10] rvcheck_0.1.0 stats4_3.5.0 blob_1.1.1
## [13] yaml_2.1.19 ggrepel_0.8.0 pillar_1.2.3
## [16] RSQLite_2.1.1 backports_1.1.2 lattice_0.20-35
## [19] glue_1.2.0 digest_0.6.15 qvalue_2.12.0
## [22] colorspace_1.3-2 cowplot_0.9.2 htmltools_0.3.6
## [25] Matrix_1.2-14 plyr_1.8.4 pkgconfig_2.0.1
## [28] purrr_0.2.4 GO.db_3.6.0 scales_0.5.0
## [31] tweenr_0.1.5 enrichplot_1.0.1 BiocParallel_1.14.1
## [34] ggforce_0.1.1 tibble_1.4.2 IRanges_2.14.10
## [37] ggplot2_2.2.1 UpSetR_1.3.3 lazyeval_0.2.1
## [40] magrittr_1.5 memoise_1.1.0 evaluate_0.10.1
## [43] MASS_7.3-50 tools_3.5.0 data.table_1.11.2
## [46] stringr_1.3.1 S4Vectors_0.18.2 munsell_0.4.3
## [49] AnnotationDbi_1.42.1 bindrcpp_0.2.2 compiler_3.5.0
## [52] rlang_0.2.0 ggridges_0.5.0 units_0.5-1
## [55] grid_3.5.0 igraph_1.2.1 labeling_0.3
## [58] rmarkdown_1.9 gtable_0.2.0 DBI_1.0.0
## [61] reshape2_1.4.3 R6_2.2.2 gridExtra_2.3
## [64] knitr_1.20 dplyr_0.7.5 bit_1.1-13
## [67] udunits2_0.13 bindr_0.1.1 fastmatch_1.1-0
## [70] fgsea_1.6.0 rprojroot_1.3-2 GOSemSim_2.6.0
## [73] stringi_1.2.2 Rcpp_0.12.17 tidyselect_0.2.4
References
Jiang, Jay J., and David W. Conrath. 1997. “Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy.” Proceedings of 10th International Conference on Research in Computational Linguistics. http://www.citebase.org/abstract?id=oai:arXiv.org:cmp-lg/9709008.
Lin, Dekang. 1998. “An Information-Theoretic Definition of Similarity.” In Proceedings of the 15th International Conference on Machine Learning, 296—304. https://doi.org/10.1.1.55.1832.
Philip, Resnik. 1999. “Semantic Similarity in a Taxonomy: An Information-Based Measure and Its Application to Problems of Ambiguity in Natural Language.” Journal of Artificial Intelligence Research 11:95–130.
Schlicker, Andreas, Francisco S Domingues, Jorg Rahnenfuhrer, and Thomas Lengauer. 2006. “A New Measure for Functional Similarity of Gene Products Based on Gene Ontology.” BMC Bioinformatics 7:302. https://doi.org/1471-2105-7-302.
Wang, James Z, Zhidian Du, Rapeeporn Payattakool, Philip S Yu, and Chin-Fu Chen. 2007. “A New Method to Measure the Semantic Similarity of Go Terms.” Bioinformatics (Oxford, England) 23 (May):1274–81. https://doi.org/btm087.
Yu, Guangchuang, and Qing-Yu He. 2016. “ReactomePA: An R/Bioconductor Package for Reactome Pathway Analysis and Visualization.” Molecular BioSystems 12 (2):477–79. https://doi.org/10.1039/C5MB00663E.
Yu, Guangchuang, Fei Li, Yide Qin, Xiaochen Bo, Yibo Wu, and Shengqi Wang. 2010. “GOSemSim: An R Package for Measuring Semantic Similarity Among Go Terms and Gene Products.” Bioinformatics 26 (april):976–78. https://doi.org/10.1093/bioinformatics/btq064.
Yu, Guangchuang, Li-Gen Wang, Yanyan Han, and Qing-Yu He. 2012. “clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters.” OMICS: A Journal of Integrative Biology 16 (5):284–87. https://doi.org/10.1089/omi.2011.0118.
Yu, Guangchuang, Li-Gen Wang, Guang-Rong Yan, and Qing-Yu He. 2015. “DOSE: An R/Bioconductor Package for Disease Ontology Semantic and Enrichment Analysis.” Bioinformatics 31 (4):608–9. https://doi.org/10.1093/bioinformatics/btu684.