This is a set of reproducible examples for the R1 Hmisc
package2, put together in an rmarkdown
notebook using RStudio
and knitr
. When viewing the resulting html file you can see all the code, and can download the entire rmarkdown
script, which is especially helpful for seeing how knitr
chunks are specified. Graphics that have a plotly method for them are rendered using plotly
instead of using defaults such as base graphics, lattice
, or ggplot2
. That way the plots are somewhat interactive, e.g., allow for drill-down for more information without the viewer needing to install R
.
Much of the tabular output produced here was created using html
methods, which are especially easy to implement with rmarkdown
and make for output that can be directly opened using word processors. Jump to Computing Environment for a list of packages used here, and their version numbers.
require(Hmisc)
knitrSet(lang='markdown', h=4.5, fig.path='png/', fig.align='left') # + sometimes ,cache=TRUE
# knitrSet redirects all messages to messages.txt
options(grType='plotly') # for certain graphics functions
mu <- markupSpecs$html # markupSpecs is in Hmisc
cap <- mu$cap # function to output html caption
lcap <- mu$lcap # for continuation for long caption
# These last 2 functions are used by the putHfig function in Hmisc
The following (r mu$styles()
) defines HTML styles and javascript functions, such as ffig, smg
and the function for expanding and collapsing text as done by the expcoll
function.
Here is an example using expcoll
. Click on the down arrow to expand; it turns to an up arrow which can be clicked to contract.
Here is an example using a knitr
hook function markupSpec$html$uncover
which is communicated to knitr
by knitrSet
.
The following (r mu$widescreen()
) causes the html notebook to use an entire wide screen.
The getHdata
function is used to fetch a dataset from the Vanderbilt DataSets
web site. upData
is used to
units
attributed used by Hmisc
and rms
functions for table making and graphicscontents
is used to print a data dictionary, run through an html
method for nicer output. Information about the data source may be found here.
getHdata(pbc)
# Have upData move units from labels to separate attribute
pbc <- upData(pbc,
fu.yrs = fu.days / 365.25,
labels = c(fu.yrs = 'Follow-up Time',
status = 'Death or Liver Transplantation'),
units = c(fu.yrs = 'year'),
drop = 'fu.days',
moveUnits=TRUE, html=TRUE)
Input object size: 76592 bytes; 19 variables 418 observations Label for bili changed from Serum Bilirubin (mg/dl) to Serum Bilirubin units set to mg/dl Label for albumin changed from Albumin (gm/dl) to Albumin units set to gm/dl Label for protime changed from Prothrombin Time (sec.) to Prothrombin Time units set to sec. Label for alk.phos changed from Alkaline Phosphatase (U/liter) to Alkaline Phosphatase units set to U/liter Label for sgot changed from SGOT (U/ml) to SGOT units set to U/ml Label for chol changed from Cholesterol (mg/dl) to Cholesterol units set to mg/dl Label for trig changed from Triglycerides (mg/dl) to Triglycerides units set to mg/dl Label for platelet changed from Platelets (per cm^3/1000) to Platelets units set to per cm^3/1000 Label for copper changed from Urine Copper (ug/day) to Urine Copper units set to ug/day Added variable fu.yrs Dropped variable fu.days New object size: 80224 bytes; 19 variables 418 observations
# The following can also be done by running this command
# to put the results in a new browser tab:
# getHdata(pbc, 'contents')
html(contents(pbc), maxlevels=10, levelType='table')
Name | Labels | Units | Levels | Storage | NAs |
---|---|---|---|---|---|
bili | Serum Bilirubin | mg/dl | double | 0 | |
albumin | Albumin | gm/dl | double | 0 | |
stage | Histologic Stage, Ludwig Criteria | integer | 6 | ||
protime | Prothrombin Time | sec. | double | 2 | |
sex | 2 | integer | 0 | ||
age | Age | double | 0 | ||
spiders | 2 | integer | 106 | ||
hepatom | 2 | integer | 106 | ||
ascites | 2 | integer | 106 | ||
alk.phos | Alkaline Phosphatase | U/liter | double | 106 | |
sgot | SGOT | U/ml | double | 106 | |
chol | Cholesterol | mg/dl | integer | 134 | |
trig | Triglycerides | mg/dl | integer | 136 | |
platelet | Platelets | per cm^3/1000 | integer | 110 | |
drug | 3 | integer | 0 | ||
status | Death or Liver Transplantation | integer | 0 | ||
edema | 3 | integer | 0 | ||
copper | Urine Copper | ug/day | integer | 108 | |
fu.yrs | Follow-up Time | year | double | 0 |
Variable | Levels |
---|---|
sex | male |
female | |
spiders, hepatom | absent |
ascites | present |
drug | D-penicillamine |
placebo | |
not randomized | |
edema | no edema |
edema, no diuretic therapy | |
edema despite diuretic therapy |
The html method is used for the describe
function, and the output is put in a scrollable box. Other than for the overall title and variable names and labels, the output size used here is 80 (0.8 × the usual font size1). But the graphical display of the descriptives statistics that follows this is probably better.
# did have results='asis' above
d <- describe(pbc)
html(d, size=80, scroll=TRUE)
n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
418 | 0 | 98 | 0.998 | 3.221 | 3.742 | 0.50 | 0.60 | 0.80 | 1.40 | 3.40 | 8.03 | 14.00 |
n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
418 | 0 | 154 | 1 | 3.497 | 0.473 | 2.750 | 2.967 | 3.243 | 3.530 | 3.770 | 4.010 | 4.141 |
n | missing | distinct | Info | Mean | Gmd |
---|---|---|---|---|---|
412 | 6 | 4 | 0.893 | 3.024 | 0.9519 |
Value 1 2 3 4 Frequency 21 92 155 144 Proportion 0.051 0.223 0.376 0.350
n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
416 | 2 | 48 | 0.998 | 10.73 | 1.029 | 9.60 | 9.80 | 10.00 | 10.60 | 11.10 | 12.00 | 12.45 |
n | missing | distinct |
---|---|---|
418 | 0 | 2 |
Value male female Frequency 44 374 Proportion 0.105 0.895
n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
418 | 0 | 345 | 1 | 50.74 | 11.96 | 33.84 | 36.37 | 42.83 | 51.00 | 58.24 | 64.30 | 67.92 |
n | missing | distinct |
---|---|---|
312 | 106 | 2 |
Value absent present Frequency 222 90 Proportion 0.712 0.288
n | missing | distinct |
---|---|---|
312 | 106 | 2 |
Value absent present Frequency 152 160 Proportion 0.487 0.513
n | missing | distinct |
---|---|---|
312 | 106 | 2 |
Value absent present Frequency 288 24 Proportion 0.923 0.077
n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
312 | 106 | 295 | 1 | 1983 | 1760 | 599.6 | 663.0 | 871.5 | 1259.0 | 1980.0 | 3826.4 | 6669.9 |
n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
312 | 106 | 179 | 1 | 122.6 | 60.45 | 54.25 | 60.45 | 80.60 | 114.70 | 151.90 | 196.47 | 219.25 |
n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
284 | 134 | 201 | 1 | 369.5 | 194.5 | 188.4 | 213.6 | 249.5 | 309.5 | 400.0 | 560.8 | 674.0 |
n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
282 | 136 | 146 | 1 | 124.7 | 64.07 | 56.00 | 63.10 | 84.25 | 108.00 | 151.00 | 195.00 | 230.95 |
n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
308 | 110 | 210 | 1 | 261.9 | 107.8 | 117.7 | 139.7 | 199.8 | 257.0 | 322.5 | 386.5 | 430.6 |
n | missing | distinct |
---|---|---|
418 | 0 | 3 |
Value D-penicillamine placebo not randomized Frequency 154 158 106 Proportion 0.368 0.378 0.254
n | missing | distinct | Info | Sum | Mean | Gmd |
---|---|---|---|---|---|---|
418 | 0 | 2 | 0.71 | 161 | 0.3852 | 0.4748 |
n | missing | distinct |
---|---|---|
418 | 0 | 3 |
Value no edema edema, no diuretic therapy Frequency 354 44 Proportion 0.847 0.105 Value edema despite diuretic therapy Frequency 20 Proportion 0.048
n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
310 | 108 | 158 | 1 | 97.65 | 83.16 | 17.45 | 24.00 | 41.25 | 73.00 | 123.00 | 208.10 | 249.20 |
n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
418 | 0 | 399 | 1 | 5.251 | 3.429 | 0.671 | 1.661 | 2.992 | 4.736 | 7.155 | 9.649 | 11.063 |
lowest : | 0.1122519 | 0.1177276 | 0.1396304 | 0.1943874 | 0.2108145 |
highest: | 12.3203285 | 12.3449692 | 12.3832991 | 12.4736482 | 13.1279945 |
# prList is in Hmisc; useful for plotting or printing a list of objects
# Can just use plot(d) if don't care about the mess
# If using html output these 2 images would not be rendered no matter what
p <- plot(d)
# The option htmlfig=2 causes markupSpecs$html$cap() to be used to
# HTML-typeset as a figure caption and to put the sub-sub section
# marker ### in front of the caption. htmlfig is the only reason
# results='asis' was needed in the chunk header
# We define a long caption for one of the plots, which does not appear
# in the table of contents
# prList works for html notebooks but not html documents
# prList(p, lcap=c('', 'These are spike histograms'), htmlfig=2)
You can also re-form multiple plotly
graphs into a single HTML object. Here we use the putHfig
function to render the result, with a caption and a short caption for the table of contents. This approach requires you to put results=‘asis’
in the chunk header. By default putHfig
puts a horizontal line before the caption and figure.
putHcap('This used the <code>htmltools tagList</code> function.',
scap='Two <code>plotly</code> graphics combined into one')
plotly
graphics combined into oneThis used the htmltools tagList
function.
htmltools::tagList(p) # lapply(p, plotly::as.widget)
You can also create figure captions outside of R code by using the smg, fcap
HTML tags defined in markupSpecs
. The long caption not appearing in the table of contents will be in a separate line without ###.
Produce stratified quantiles, means/SD, and proportions by treatment group. Plot the results before rendering as an advanced html table:
s <- summaryM(bili + albumin + stage + protime + sex + age + spiders +
alk.phos + sgot + chol ~ drug, data=pbc,
overall=FALSE, test=TRUE)
putHcap('Proportions and', mu$chisq(), 'tests for categorical variables')
plot(s, which='categorical')
To construct the caption outside of the code chunk use e.g. ### r cap(‘Proportions and’, mu$chisq(), ‘tests for categorical variables’) where a backtick is placed before r and after the last ).
putHcap('Extended box plots for the first 4 continuous variables')
plot(s, which='continuous', vars=1 : 4)
putHcap('Extended box plots for the remaining continuous variables')
plot(s, which='continuous', vars=5 : 7)
html(s, caption='Baseline characteristics by randomized treatment',
exclude1=TRUE, npct='both', digits=3,
prmsd=TRUE, brmsd=TRUE, msdsize=mu$smaller2)
Baseline characteristics by randomized treatment. | |||||
N |
D-penicillamine N=154 |
placebo N=158 |
not randomized N=106 |
Test Statistic |
|
---|---|---|---|---|---|
Serum Bilirubin mg/dl |
418 | 0.725 1.300 3.600 3.649 ± 5.282 |
0.800 1.400 3.200 2.873 ± 3.629 |
0.725 1.400 3.075 3.117 ± 4.043 |
F2 415=0.03, P=0.9721 |
Albumin gm/dl |
418 | 3.342 3.545 3.777 3.524 ± 0.396 |
3.212 3.565 3.830 3.516 ± 0.443 |
3.125 3.470 3.720 3.431 ± 0.435 |
F2 415=2.13, P=0.121 |
Histologic Stage, Ludwig Criteria : 1 | 412 | 0.03 4⁄154 | 0.08 12⁄158 | 0.05 5⁄100 | χ26=5.33, P=0.5022 |
2 | 0.21 32⁄154 | 0.22 35⁄158 | 0.25 25⁄100 | ||
3 | 0.42 64⁄154 | 0.35 56⁄158 | 0.35 35⁄100 | ||
4 | 0.35 54⁄154 | 0.35 55⁄158 | 0.35 35⁄100 | ||
Prothrombin Time sec. |
416 | 10.000 10.600 11.400 10.800 ± 1.138 |
10.025 10.600 11.000 10.653 ± 0.851 |
10.100 10.600 11.000 10.750 ± 1.078 |
F2 413=0.23, P=0.7951 |
sex : female | 418 | 0.90 139⁄154 | 0.87 137⁄158 | 0.92 98⁄106 | χ22=2.38, P=0.3042 |
Age | 418 | 41.43 48.11 55.80 48.58 ± 9.96 |
42.98 51.93 58.90 51.42 ± 11.01 |
46.00 53.00 61.00 52.87 ± 9.78 |
F2 415=6.1, P=0.0021 |
spiders | 312 | 0.29 45⁄154 | 0.28 45⁄158 | χ21=0.02, P=0.8852 | |
Alkaline Phosphatase U/liter |
312 | 922 1283 1950 1943 ± 2102 |
841 1214 2028 2021 ± 2183 |
F1 310=0.06, P=0.8123 | |
SGOT U/ml |
312 | 83.8 117.4 151.9 125.0 ± 58.9 |
76.7 111.6 151.5 120.2 ± 54.5 |
F1 310=0.55, P=0.463 | |
Cholesterol mg/dl |
284 | 254 304 377 374 ± 252 |
248 316 417 365 ± 210 |
F1 282=0.37, P=0.5453 | |
a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD. N is the number of non-missing values. Tests used: 1Kruskal-Wallis test; 2Pearson test; 3Wilcoxon test . |
putHcap("Spike histograms, means, quantiles, Gini's mean difference, and SD stratified by treatment",
scap="Stratified spike histograms and quantiles")
Spike histograms, means, quantiles, Gini’s mean difference, and SD stratified by treatment
with(pbc, histboxp(x=sgot, group=drug, sd=TRUE))
The following is a better way to display proportions, for categorical variables. If computing marginal statistics by running the dataset through the Hmisc
addMarginal
function, the plot
method with options(grType='plotly')
is especially useful.
pbcm <- addMarginal(pbc, drug)
s <- summaryP(stage + sex + spiders ~ drug, data=pbc)
putHcap('Proportions stratified by treatment')
plot(s, groups='drug')
s <- summaryP(stage + sex + spiders ~ drug, data=pbcm)
putHcap('Proportions (large symbols) and proportions stratified by treatment (small symbols)',
scap='Proportions with and without stratification by treatment')
Proportions (large symbols) and proportions stratified by treatment (small symbols)
plot(s, marginVal='All', marginLabel='All Treatment Groups')
getHdata(support2)
putHcap("Spike histograms, means, quantiles, Gini's mean difference, and SD for MAP stratified by diagnosis",
scap="Stratified spike histograms and quantiles for MAP")
Spike histograms, means, quantiles, Gini’s mean difference, and SD for MAP stratified by diagnosis
with(support2, histboxp(x=meanbp, group=dzgroup, sd=TRUE, bins=200))
R version 3.4.3 (2017-11-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 17.10 Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] bindrcpp_0.2 Hmisc_4.1-1 ggplot2_2.2.1 Formula_1.2-2 [5] survival_2.41-3 lattice_0.20-35To cite R in publication use:
R Core Team (2017). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
.csl
Reference Style Files# Note: mu was defined in an earlier code chunk
# Only need to install .csl file once.
mu$installcsl(rec=TRUE) # get list of recommended styles
mu$installcsl() # web search of styles meeting your criteria
# Install a .csl file to your project directory:
mu$installcsl('american-medical-association')
citeulike
Bibliographic DatabasesIn the following code chunk, adding cache=TRUE
to the chunk header would result in the code (including the file download) not having to be run each time the report is run. eval=FALSE
is set in the code chunk so the code actually isn’t run.
# Show full reference information for selected BibTeX keys on
# your citeulike database
cu <- mu$citeulikeShow
cu('harrelfe', c('cox58reg', 'cox72'))
# Automatically extract all BibTeX keys referenced in this document and
# show full reference information for them
# The function invisibly returns the vector of keys found
cu('harrelfe', file='examples.Rmd')
# Show and optionally export all articles with a given tag
cu('harrelfe', tags='missing-data')
# Copy and rename downloaded references to current project directory
file.copy('~/Downloads/harrelfe-missing-data.bib', 'missing-data.bib',
overwrite=TRUE)
1. Team RDC. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; www.r-project.org; 2016. http://www.R-project.org.
2. Harrell FE. Hmisc
: A package of miscellaneous R functions. 2015. http://biostat.mc.vanderbilt.edu/Hmisc.