R library validation using ATDD

Brian Repko

Why this talk?

R/Pharma 2023 in Chicago - R Validation Hub work
How to write / share tests of the pharmaverse packages
Limited discussion on solutions - lots of JSON

Why am I talking?

Background is software / data engineer / bioinformatician
Last 15 years in biomedical research, clinical trials, and RWD
Last 10 years primarily in R
I was part of the JBehave team back in the day
Lots of presentations and actual training curriculum on this

What are we talking about?

System-level vs Component-level (Unit Testing)
- Here “system” can mean set of R packages / scripts
Focus on non-code / plain text to enable collaboration
Lots of names for this practice (requirements or/and tests)
- “Customer Tests” (from XP)
- Behavior-Driven Development (BDD)
- Acceptance-Test-Driven Development (ATDD)
- Storytest-Driven Development (SDD)
- Example-Driven Development (EDD)
- Specification By Example (SBE)

What does this look like?

There are lots of solutions / flavors
Fit / FitNesse (Ward Cunningham) - table-based
Concordian - enhanced markdown-based
Robot Framework - tab-delimited keyword based
Given-When-Then (Gherkin language)
- JBehave (Dan North, Liz Keogh)
- Cucumber (Aslak Hellesøy) / SpecFlow (Gáspár Nagy)
Gauge (ThoughtWorks) - enhanced markdown-based

How does this work?

Features are plain-text files
- Either Gherkin, Markdown, or other
Steps are functions together with a text-pattern
- Parameters as placeholders in pattern - like {glue} string
- May be placed into categories - Given/When/Then
- Steps are either registered or discovered (via annotation)
Parse features into step calls via pattern matching
Step function call returns are collected / handled / reported

How does this work?

Note

This is from the {cucumber} R package example - next slide!

Features

# tests/acceptance/sum.feature
Feature: Sum
  Scenario: Sum should work for 2 numbers
    Given I have 1
    And I have 2
    When I add them
    Then I get 3

  Scenario: Sum should work for 3 numbers
    Given I have 1
    And I have 2
    But I have 3
    When I add them
    Then I get 6

Steps

given("I have {int}", function(int, context) {
  context$numbers <- c(context$numbers, int)
})

when("I add them", function(context) {
  context$result <- sum(context$numbers)
})

then("I get {int}", function(int, context) {
  expect_equal(context$result, int)
})

What about for R?

{cucumber} package from Jakub Sobolewski
- Last major release (version 2.0.0) was Apr 4, 2025
- Current version is 2.1.1
Gherkin-based .feature files
Steps are registered via given, when, then functions
Execution converted to testthat code (reported this way)
Features and steps are typically under tests/acceptance

Opportunity

Ability to collect and run tests on all of pharmaverse
- Potential to simulate clinical trial events
Potentially extend to R-multiverse (ROpenSci initiative)
Shiny app, web app, or API testing
- See Jakob’s website for details on testing Shiny apps with {cucumber}
- Step functions could use web driver like {chromote}
There exists experience with this practice (though less in pharma)
For Gherkin - tooling already exists
- Markdown code block styling (eg. ```gherkin)
- Plugins for various IDEs

Challenges

How best to integrate quarto?
- Reporting format? Execution engine? Both?
Do something like Gauge or leverage Cucumber?
Potentially extend {cucumber}
- If we wanted to go to function annotation…
  - {plumber2} has added @then annotation
  - Would need to add function annotation during onLoad
  - Could store text-pattern in Rmd or other file
- Quarto as reporting output
Potentially extend {knitr} or Quarto
Dealing with packages used by framework vs system under test ({callr}?)
Needs data storage / comparison ({diffdf}?)

Next Steps

Discuss with R Validation Hub
Work through various design decisions / options
ISC Proposal for Apr 2026
Get in touch!