require(Hmisc)
require(plotly)
require(htmlTable)
source('~/R/hreport/R/Misc.r')
source('~/R/hreport/R/accrualReport.r')
source('~/R/hreport/R/exReport.r')
source('~/R/hreport/R/eReport.r')
mu <- markupSpecs$html   # in Hmisc - HTML markups
frac <- mu$frac
mu$styles()              # define HTML styles, functions
## Generate test data
set.seed(1)
n <- 500
d <- data.frame(country=sample(c('US', 'Canada', 'Spain', 'France',
                  'Germany'), n, TRUE),
                site=sample(1:10, n, TRUE))
d$site   <- paste(substring(d$country, 1, 2), d$site, sep='')
d$region <- factor(ifelse(d$country %in% c('US', 'Canada'),
                          'North America', 'Europe'))
d <- upData(d, edate = as.Date('2005-01-01') +
            round(rgamma(n, 2, .01)) - 600 * (country == 'US'),
            rdate = edate + round(runif(n, 1, 30)), print=FALSE)
d$rdate[runif(nrow(d)) < 0.5] <- NA  # non-randomized subjects )
# with(d, table(region, country))
# For US manually compute # randomized per month
us   <- subset(d, country == 'US')
site <- us$site
ed   <- us$edate
rd   <- us$rdate
months <- difftime(as.Date('2007-12-31'), ed, units='days') /
  (365.25 / 12)
m <- max(months)
a <- sum(!is.na(rd)) / as.numeric(m)   # .8545774 (agrees with chart)
# Compute maximum months elapsed for each site then sum over sites
maxpersite <- tapply(months, site, max)
b <- sum(!is.na(rd)) / sum(maxpersite)
## 0.0864429 = 47 / 543.6715 chart: .08645 (rounded)
## Suppose there are more subjects enrolled and randomized than really
## made their way into the dataset
denom <- c(enrolled=nrow(d) * 1.1,
           randomized=sum(!is.na(d$rdate)) + 10)
sethreportOption(tx.var='treat', denom=denom)
## Initialize file to hold appendix information such as subject IDs
## so all later writing to this file can use append=TRUE
appfile <- gethreportOption('appfile')
cat('', file=appfile)

Introduction

Interactive Graphs

Most of the graphs produced here are semi-interactive. One can hover over elements of graphs with the mouse to have detailed information pop up.

Figure Captions

Needles represent the fraction of observations used in the current analysis. The first needle (red) shows the fraction of enrolled patients used. If randomization was taken into account, a second needle (green) represents the fraction of randomized subjects included in the analysis. When the analyses consider treatment assignment, two more needles may be added to the display, showing, respectively, the fraction of subjects randomized to treatment A used in the analysis and the fraction of subjects on treatment B who were analyzed. The colors of these last two needles are the colors used for the two treatments throughout the report. The following table shows some examples. dNeedle uses colors in sethreportOption(tx.col=, er.col=).

# Store using short variable names so Rmarkdown table column
# width will not be wider than actually needed
d1 <- dNeedle(1)
d2 <- dNeedle((3:4)/4)
d3 <- dNeedle((1:2)/4)
d4 <- dNeedle(c(1,2,3,1)/4)
Signpost Interpretation
image All enrolled subjects analyzed, randomization not considered
image Analysis uses 34 of enrolled subjects, and all randomized subjects
image Analysis uses 14 of enrolled subjects, and 12 of randomized subjects
image Same as previous example, and in addition the analysis utilized treatment assignment, analyzing 34 of those randomized to A and 14 of those randomized to B

Extended Box Plots

For depicting distributions of continuous variables, many of the following displays use extended box plots, also called box–percentile plots. A prototype, with explanations, is below.

bpplt()

Dot Charts

Dot charts are used to present stratified proportions. In these charts the area of the symbols is proportional to the square root of the denominator. The legend shows representative denominators and their corresponding symbol areas, using denominators that actually occurred in the data and extended from the minimum observed to the maximum observed sample size.???

Survival Curves

Graphs containing pairs of Kaplan-Meier survival curves show a shaded region centered at the midpoint of the two survival estimates and having a height equal to the half-width of the approximate 0.95 pointwise confidence interval for the difference of the two survival probabilities. Time points at which the two survival estimates do not touch the shaded region denote approximately significantly different survival estimates, without any multiplicity correction.

Accrual

accrualReport(enroll(edate) + randomize(rdate) ~
              region(region) + country(country) + site(site),
              data=d,
              dateRange=c('2005-01-01', '2007-12-31'),
              targetN=
                data.frame(edate=c(500, 1000), rdate=c(250, 500)),
              targetDate=c('2006-01-01', '2007-12-31'),
              closeDate='2007-12-31')
Study Numbers
Number Category
5 Countries
50 Sites
500 Participants enrolled
260 Participants randomized
5.2 Participants per site
50 Sites randomizing
5.2 Subjects randomized per randomizing site
55.1 Months from first subject randomized (2003-05-29) to 2007-12-31
1873.8 Site-months for sites randomizing
37.5 Average months since a site first randomized
0.14 Participants randomized per site per month
15 Mean days from enrollment to randomization
15 Median days from enrollment to randomization

Participants enrolled over time

The blue line depicts the cumulative frequency. The thick grayscale line represent targets.
Category N Used
Enrolled 550 500
image

Participants randomized over time

The blue line depicts the cumulative frequency. The thick grayscale line represent targets.
Category N Used
Enrolled 550 260
Randomized 270 260
image

Days from enrollment to randomization

Quartiles and mean number of days by region and country
Category N Used
Enrolled 550 260
Randomized 270 260
image

Number of sites × number of participants

Number of sites having the given number of participants
Category N Used
Enrolled 550 260
Randomized 270 260
image

Participants enrolled by region and country


Participants randomized by region and country


Sites that enrolled by region and country


Sites that randomized by region and country


Fraction of enrolled participants randomized by region and country


Participants randomized per month by region and country


Partipants randomized per site per month by region and country

Exclusions

d <- upData(d,
            subjid = 1 : n,
            pend   = rbinom(n, 1, .1),
            e1     = rbinom(n, 1, .02),
            e2     = rbinom(n, 1, .02),
            e3     = rbinom(n, 1, .02),
            e4     = ifelse(runif(n) < 0.25, NA, rbinom(n, 1, .10)),
            tested = rbinom(n, 1, .75),
            e5     = ifelse(tested, rbinom(n, 1, .04), NA),
            e6     = rbinom(n, 1, .02),
            e7     = rbinom(n, 1, .02),
            rndz   = rbinom(n, 1, .75),
            labels=c(e1='Prior MI', e2='History of Asthma',
              e3='History of Upper GI Bleeding',
              e4='No Significant CAD', e5='Inadequate Renal Function',
              e6='Pneumonia within 6 weeks', e7='Prior cardiac surgery'),
            print=FALSE)
erd <- data.frame(subjid = 1 : 50,
                  loc   = sample(c('gastric', 'lung', 'trachea'), 50, TRUE))
# To check warning messages, greportOption denom does not match pend, e1-e7
exReport(~ pending(pend) + e1 + e2 + e3 + e4 + e5 + e6 + e7 +
         randomized(rndz) + id(subjid) + cond(e5, 'Tested', tested),
         erdata = erd,
         whenapp= c(e4='CCTA done'), data=d) #, hc=3.75, h=4)

Cumulative exclusions

Cumulative number of exclusions (\(y\)-axis) and number of additional exclusions after exclusions placed higher, for participants not actually randomized. Exclusions are sorted by descending number of incremental exclusions. 550 participants were enrolled, 12 non-excluded participants are pending randomization, and 20 participants were excluded. 371 participants were randomized. Note: Number of observations (500) does not equal number officially enrolled (550). Note: Number of enrolled (488) minus number excluded (20) does not match official number randomized (270).

Exclusions

Incremental exclusions are those in addition to exclusions in earlier rows. Marginal exclusions are numbers of participants excluded for the indicated reason whether or not she was excluded for other reasons. The three Fractions are based on incremental exclusions.
Exclusions Incremental
Exclusions
Marginal
Exclusions
Fraction of
Enrolled
Fraction of
Exclusions
Fraction
Remaining
No Significant CAD (CCTA done, n=386) 6 6 0.012 0.30 0.988
Prior Cardiac Surgery 4 4 0.008 0.20 0.980
Inadequate Renal Function / 373 3 3 0.006 0.15 0.973
3367 = 0.008 of Tested
Prior MI 2 3 0.004 0.10 0.969
History of Upper GI Bleeding 2 2 0.004 0.10 0.965
Pneumonia within 6 Weeks 2 2 0.004 0.10 0.961
History of Asthma 1 1 0.002 0.05 0.959
Total 20 0.041 1.00 0.959

Exclusions in randomized participants

Frequency of exclusions for participants marked as randomized
Exclusion Frequency
Prior MI 8
History of Asthma 7
History of Upper GI Bleeding 7
No Significant CAD 42
Inadequate Renal Function 15
Pneumonia within 6 Weeks 7
Prior Cardiac Surgery 8
Total Partcipants with Any Exclusion 84

Click arrow at left to show participant IDs:
Click arrow at left to see more information about those participants:

Adverse Events

For this example, the denominators for the two treatments in the pop-up needles will be incorrect because the dataset did not have subject IDs.

# Original source of aeanonym: HH package
# aeanonym <- read.table(hh("datasets/aedotplot.dat"), header=TRUE, sep=",")
# Modified to remove denominators from data and to generate raw data
# (one record per event per subject)
ae <-
structure(list(RAND = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("a", 
"b"), class = "factor"), PREF = structure(c(12L, 12L, 
18L, 18L, 26L, 26L, 33L, 33L, 5L, 5L, 27L, 27L, 6L, 6L, 15L, 
15L, 22L, 22L, 23L, 23L, 31L, 31L, 17L, 17L, 2L, 2L, 3L, 3L, 
13L, 13L, 25L, 25L, 28L, 28L, 14L, 14L, 4L, 4L, 8L, 8L, 19L, 
19L, 21L, 21L, 29L, 29L, 10L, 10L, 20L, 20L, 16L, 16L, 32L, 32L, 
11L, 11L, 1L, 1L, 30L, 30L, 24L, 24L, 9L, 9L, 7L, 7L),
  .Label = tolower(c("ABDOMINAL PAIN", 
"ANOREXIA", "ARTHRALGIA", "BACK PAIN", "BRONCHITIS", "CHEST PAIN", 
"CHRONIC OBSTRUCTIVE AIRWAY", "COUGHING", "DIARRHEA", "DIZZINESS", 
"DYSPEPSIA", "DYSPNEA", "FATIGUE", "FLATULENCE", "GASTROESOPHAGEAL REFLUX", 
"HEADACHE", "HEMATURIA", "HYPERKALEMIA", "INFECTION VIRAL", "INJURY", 
"INSOMNIA", "MELENA", "MYALGIA", "NAUSEA", "PAIN", "RASH", "RESPIRATORY DISORDER", 
"RHINITIS", "SINUSITIS", "UPPER RESP TRACT INFECTION", "URINARY TRACT INFECTION", 
"VOMITING", "WEIGHT DECREASE")), class = "factor"), SAE = c(15L, 
9L, 4L, 9L, 4L, 9L, 2L, 9L, 8L, 11L, 4L, 11L, 9L, 12L, 5L, 12L, 
7L, 12L, 6L, 12L, 6L, 12L, 2L, 14L, 2L, 15L, 1L, 15L, 4L, 16L, 
4L, 17L, 11L, 17L, 6L, 20L, 10L, 23L, 13L, 26L, 12L, 26L, 4L, 
26L, 13L, 28L, 9L, 29L, 12L, 30L, 14L, 36L, 6L, 37L, 8L, 42L, 
20L, 61L, 33L, 68L, 10L, 82L, 23L, 90L, 76L, 95L)), .Names = c("RAND", 
"PREF", "SAE"), class = "data.frame", row.names = c(NA, 
-66L))
subs <- rep(1 : nrow(ae), ae$SAE)
ae <- ae[subs, c('RAND', 'PREF')]
names(ae) <- c('treat', 'event')
label(ae$treat) <- 'Treatment'
denom <- c(enrolled=1000,
           randomized=400,
                     a=212, b=188)
sethreportOption(tx.var='treat', denom=denom)
eReport(event ~ treat, data=ae, minincidence=.05)

Proportion of adverse events by Treatment

Proportion of adverse events by Treatment sorted by descending risk difference. 3 events with less than 0.05 incidence in all groups are not shown (Hyperkalemia, Rash, Weight Decrease).
Category N Used
Enrolled 1000 400
Randomized 400 400
A 212 212
B 188 188
image
---
title: "DSMB Report for EXAMPLE Trial"
author: "FE Harrell"
date: '`r Sys.Date()`'
output:
  html_notebook:
    code_folding: hide
    toc: yes
    toc_float:
      collapsed: yes
  htmul_document:
    toc: yes
    toc_depth: 3
    toc_float:
      collapsed: yes
---
```{r setup,results='hide'}
require(Hmisc)
require(plotly)
require(htmlTable)
source('~/R/hreport/R/Misc.r')
source('~/R/hreport/R/accrualReport.r')
source('~/R/hreport/R/exReport.r')
source('~/R/hreport/R/eReport.r')
mu <- markupSpecs$html   # in Hmisc - HTML markups
frac <- mu$frac
```
```{r setup2}
mu$styles()              # define HTML styles, functions
```
<!-- May need to run the following system commands one time:
cd /usr/local/bin
sudo ln -s /usr/lib/rstudio/bin/pandoc/pandoc-citeproc
-->

```{r gendata}
## Generate test data
set.seed(1)
n <- 500
d <- data.frame(country=sample(c('US', 'Canada', 'Spain', 'France',
                  'Germany'), n, TRUE),
                site=sample(1:10, n, TRUE))
d$site   <- paste(substring(d$country, 1, 2), d$site, sep='')
d$region <- factor(ifelse(d$country %in% c('US', 'Canada'),
                          'North America', 'Europe'))

d <- upData(d, edate = as.Date('2005-01-01') +
            round(rgamma(n, 2, .01)) - 600 * (country == 'US'),
            rdate = edate + round(runif(n, 1, 30)), print=FALSE)
d$rdate[runif(nrow(d)) < 0.5] <- NA  # non-randomized subjects )

# with(d, table(region, country))

# For US manually compute # randomized per month
us   <- subset(d, country == 'US')
site <- us$site
ed   <- us$edate
rd   <- us$rdate
months <- difftime(as.Date('2007-12-31'), ed, units='days') /
  (365.25 / 12)
m <- max(months)
a <- sum(!is.na(rd)) / as.numeric(m)   # .8545774 (agrees with chart)
# Compute maximum months elapsed for each site then sum over sites
maxpersite <- tapply(months, site, max)
b <- sum(!is.na(rd)) / sum(maxpersite)
## 0.0864429 = 47 / 543.6715 chart: .08645 (rounded)

## Suppose there are more subjects enrolled and randomized than really
## made their way into the dataset
denom <- c(enrolled=nrow(d) * 1.1,
           randomized=sum(!is.na(d$rdate)) + 10)

sethreportOption(tx.var='treat', denom=denom)
## Initialize file to hold appendix information such as subject IDs
## so all later writing to this file can use append=TRUE
appfile <- gethreportOption('appfile')
cat('', file=appfile)
``` 

# Introduction
## Interactive Graphs
Most of the graphs produced here are semi-interactive.  One can hover over elements of graphs with the mouse to have detailed information pop up.

## Figure Captions
Needles represent the fraction of observations used in the current
analysis.  The first needle (red) shows the fraction of enrolled
patients used.  If randomization was taken into account, a second
needle (green) represents the fraction of randomized subjects included
in the analysis.  When the analyses consider treatment assignment, two
more needles may be added to the display, showing, respectively, the
fraction of subjects randomized to treatment A used in the analysis
and the fraction of subjects on treatment B who were analyzed.  The
colors of these last two needles are the colors used for the two
treatments throughout the report.  The following table shows some
examples.  `dNeedle` uses colors in `sethreportOption(tx.col=, er.col=)`.

```{r needledefs}
# Store using short variable names so Rmarkdown table column
# width will not be wider than actually needed
d1 <- dNeedle(1)
d2 <- dNeedle((3:4)/4)
d3 <- dNeedle((1:2)/4)
d4 <- dNeedle(c(1,2,3,1)/4)
```

|Signpost   | Interpretation |
|------- | -------------------------------------------------|
| `r d1` | All enrolled subjects analyzed, randomization not considered|
| `r d2` | Analysis uses `r frac(3,4)` of enrolled subjects, and all randomized subjects|
| `r d3` | Analysis uses `r frac(1,4)` of enrolled subjects, and `r frac(1,2)` of randomized subjects|
| `r d4` | Same as previous example, and in addition the analysis utilized treatment assignment, analyzing `r frac(3,4)` of those randomized to A and `r frac(1,4)` of those randomized to B|

# Extended Box Plots
For depicting distributions of continuous variables, many of the
following displays use extended box plots, also called
box--percentile plots.  A prototype, with explanations, is below.
```{r bpplt}
bpplt()
```

## Dot Charts
Dot charts are used to present stratified proportions.  In these
charts the area of the symbols is proportional to the square root of
the denominator.  The legend shows representative denominators and
their corresponding symbol areas, using denominators that actually
occurred in the data and extended from the minimum observed to the
maximum observed sample size.???


## Survival Curves
Graphs containing pairs of Kaplan-Meier survival curves show a shaded
region centered at the midpoint of the two survival estimates and
having a height equal to the half-width of the approximate 0.95 pointwise
confidence interval for the difference of the two survival
probabilities.  Time points at which the two survival estimates do not
touch the shaded region denote approximately significantly different
survival estimates, without any multiplicity correction.

# Accrual

```{r accrual,results='asis'}
accrualReport(enroll(edate) + randomize(rdate) ~
              region(region) + country(country) + site(site),
              data=d,
              dateRange=c('2005-01-01', '2007-12-31'),
              targetN=
                data.frame(edate=c(500, 1000), rdate=c(250, 500)),
              targetDate=c('2006-01-01', '2007-12-31'),
              closeDate='2007-12-31')
```

# Exclusions
```{r excl,results='asis'}
d <- upData(d,
            subjid = 1 : n,
            pend   = rbinom(n, 1, .1),
            e1     = rbinom(n, 1, .02),
            e2     = rbinom(n, 1, .02),
            e3     = rbinom(n, 1, .02),
            e4     = ifelse(runif(n) < 0.25, NA, rbinom(n, 1, .10)),
            tested = rbinom(n, 1, .75),
            e5     = ifelse(tested, rbinom(n, 1, .04), NA),
            e6     = rbinom(n, 1, .02),
            e7     = rbinom(n, 1, .02),
            rndz   = rbinom(n, 1, .75),
            labels=c(e1='Prior MI', e2='History of Asthma',
              e3='History of Upper GI Bleeding',
              e4='No Significant CAD', e5='Inadequate Renal Function',
              e6='Pneumonia within 6 weeks', e7='Prior cardiac surgery'),
            print=FALSE)

erd <- data.frame(subjid = 1 : 50,
                  loc   = sample(c('gastric', 'lung', 'trachea'), 50, TRUE))

# To check warning messages, greportOption denom does not match pend, e1-e7
exReport(~ pending(pend) + e1 + e2 + e3 + e4 + e5 + e6 + e7 +
         randomized(rndz) + id(subjid) + cond(e5, 'Tested', tested),
         erdata = erd,
         whenapp= c(e4='CCTA done'), data=d) #, hc=3.75, h=4)

# Show exclusions in original variable order
if(FALSE) exReport(~ pending(pend) + e1 + e2 + e3 + e4 + e5 + e6 + e7 +
         randomized(rndz) + id(subjid) + cond(e5, 'Tested', tested),
         erdata=erd,
         whenapp=c(e4='CCTA done'), data=d, #hc=3.75, h=4,
         sort=FALSE, app=FALSE)
```

# Adverse Events
For this example, the denominators for the two treatments in the
pop-up needles will be incorrect because the dataset did not have
subject IDs.
```{r aes,results='asis'}
# Original source of aeanonym: HH package
# aeanonym <- read.table(hh("datasets/aedotplot.dat"), header=TRUE, sep=",")
# Modified to remove denominators from data and to generate raw data
# (one record per event per subject)

ae <-
structure(list(RAND = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("a", 
"b"), class = "factor"), PREF = structure(c(12L, 12L, 
18L, 18L, 26L, 26L, 33L, 33L, 5L, 5L, 27L, 27L, 6L, 6L, 15L, 
15L, 22L, 22L, 23L, 23L, 31L, 31L, 17L, 17L, 2L, 2L, 3L, 3L, 
13L, 13L, 25L, 25L, 28L, 28L, 14L, 14L, 4L, 4L, 8L, 8L, 19L, 
19L, 21L, 21L, 29L, 29L, 10L, 10L, 20L, 20L, 16L, 16L, 32L, 32L, 
11L, 11L, 1L, 1L, 30L, 30L, 24L, 24L, 9L, 9L, 7L, 7L),
  .Label = tolower(c("ABDOMINAL PAIN", 
"ANOREXIA", "ARTHRALGIA", "BACK PAIN", "BRONCHITIS", "CHEST PAIN", 
"CHRONIC OBSTRUCTIVE AIRWAY", "COUGHING", "DIARRHEA", "DIZZINESS", 
"DYSPEPSIA", "DYSPNEA", "FATIGUE", "FLATULENCE", "GASTROESOPHAGEAL REFLUX", 
"HEADACHE", "HEMATURIA", "HYPERKALEMIA", "INFECTION VIRAL", "INJURY", 
"INSOMNIA", "MELENA", "MYALGIA", "NAUSEA", "PAIN", "RASH", "RESPIRATORY DISORDER", 
"RHINITIS", "SINUSITIS", "UPPER RESP TRACT INFECTION", "URINARY TRACT INFECTION", 
"VOMITING", "WEIGHT DECREASE")), class = "factor"), SAE = c(15L, 
9L, 4L, 9L, 4L, 9L, 2L, 9L, 8L, 11L, 4L, 11L, 9L, 12L, 5L, 12L, 
7L, 12L, 6L, 12L, 6L, 12L, 2L, 14L, 2L, 15L, 1L, 15L, 4L, 16L, 
4L, 17L, 11L, 17L, 6L, 20L, 10L, 23L, 13L, 26L, 12L, 26L, 4L, 
26L, 13L, 28L, 9L, 29L, 12L, 30L, 14L, 36L, 6L, 37L, 8L, 42L, 
20L, 61L, 33L, 68L, 10L, 82L, 23L, 90L, 76L, 95L)), .Names = c("RAND", 
"PREF", "SAE"), class = "data.frame", row.names = c(NA, 
-66L))

subs <- rep(1 : nrow(ae), ae$SAE)
ae <- ae[subs, c('RAND', 'PREF')]
names(ae) <- c('treat', 'event')
label(ae$treat) <- 'Treatment'

denom <- c(enrolled=1000,
           randomized=400,
					 a=212, b=188)

sethreportOption(tx.var='treat', denom=denom)

eReport(event ~ treat, data=ae, minincidence=.05)
```
