
Since 2023, BristolMyersSquibb, the Y company and cynkra have teamed up to develop a novel no-code solution for R.
library(blockr)
Attaching package: 'blockr'
The following object is masked from 'package:graphics':
    layout
library(pracma)
library(shiny)
Introduction
blockr is an R package designed to democratize data analysis by providing a flexible, intuitive, and code-free approach to building data pipelines. It has 2 main user targets:
- On the one hand, it empowers non technical users to create insightful data workflows using pre-built blocks that can be easily connected, all without writing a single line of code.
- On the other hand, it provides developers with a set of tools to seamlessly create new blocks, thereby enhancing the entire framework and fostering collaboration within organizations teams.
blockr is data agnostic, meaning it can work with any kind of dataset, that is pharmaceutical data or sport analytics data. It builds on top of shiny to ensure real time feedback to any data change. Finally, it allows to export code to create reproducible data analysis.
Getting started
As a simple user
As a simple user, youâre not expected to write any single line of code to use blockr. You can use the below kitchen sink to get started. This example is based on the palmer penguins data and running a single stack with 3 blocks: the first block to select the data, another one to create the plot and then add the points to it.
blockr has a its own validation system. For instance, using the below example, you can try to press return on the first block select box (penguins is the selected default). Youâll notice an immediate feedback message. A global message is displayed in the block upper middle part: â1 error(s) found in this blockâ. You get more detailed mesages next to the faulty input(s): âselected value(s) not among provided choicesâ. You can repeat the same experience with the last plot layer block, by emptying the color and shape select inputs. Error messages can accumulate.
You can dynamically add blocks to a current stack, that gathers a
set of related blocks. You can think a stack as a data analysis
recipe as in cooking, where blocks are instructions. To add a new
block, you can click on the + icon on the stack top right corner. This
opens a sidebar on the left side, where one may search for blocks that
are compatible with the current state of the pipeline. With an empty
stack, only entry point blocks are suggested, so you can import data.
Then, after clicking on the block, the suggestion list changes so you
can, for instance, filter data or select only a subset of columns, and
much more.
library(blockr)
library(palmerpenguins)
library(ggplot2)
new_ggplot_block <- function(col_x = character(), col_y = character(), ...) {
  data_cols <- function(data) colnames(data)
  new_block(
    fields = list(
      x = new_select_field(col_x, data_cols, type = "name"),
      y = new_select_field(col_y, data_cols, type = "name")
    ),
    expr = quote(
      ggplot(mapping = aes(x = .(x), y = .(y)))
    ),
    class = c("ggplot_block", "plot_block"),
    ...
  )
}
new_geompoint_block <- function(color = character(), shape = character(), ...) {
  data_cols <- function(data) colnames(data$data)
  new_block(
    fields = list(
      color = new_select_field(color, data_cols, type = "name"),
      shape = new_select_field(shape, data_cols, type = "name")
    ),
    expr = quote(
      geom_point(aes(color = .(color), shape = .(shape)), size = 2)
    ),
    class = c("geompoint_block", "plot_layer_block", "plot_block"),
    ...
  )
}
stack <- new_stack(
  data_block = new_dataset_block("penguins", "palmerpenguins"),
  plot_block = new_ggplot_block("flipper_length_mm", "body_mass_g"),
  layer_block = new_geompoint_block("species", "species")
)
serve_stack(stack)
Toward more complex analysis
Letâs consider this dataset, which contains 120 years of olympics athletes data until Rio in 2016. In the below kitchen sink, we first add an upload block:
- Download the dataset file locally.
- CLick on Add stack.
- Click on the stack +button and search forbrowser, then select thenew_filesbrowser_block.
- Uncollapse the stack by click on the top right arrow icon. This makes the upload block file input visible.
- Click on File selectand select the downloaded file at step 1 (athlete_events.csv).
- As we obtain a csv file, we must parse it with a new_csv_block. Repeat step 3 to add thenew_csv_block. The table is271116rows and15columns.
- Add a new_filter_blockand selectSexas column and thenFin the values input. We leave the comparison to==and click on theRunbutton. Notice we now have 74522 rows.
- Add a new_mutate_blockwith the following expression:birth_year = Year - Age(this gives us an approximate birth year). Click on submit.
From now on, we leave the first stack as is and will reuse it in other stacks. We want to display the average height distribution for female athletes. Letâs do it below.
- Create a new stack by clicking on Add stack.
- Add it a new_result_block. This allows to import the data from the first stack (and potentially any stack from the dashboard). If you donât see any data, select another stack name from the dropdown menu.
- Add a new_ggplot_block, leavexas default function and selectHeightas variable in the columns input.
- Add a new_geomhistogram_block. Now we have our distribution plot.
Alternatively, you could remove the 2 plot blocks and add a
new_summarize_block using mean as function and Height as column
(result: 168 cm).
In the following, we create a look-up table to be able to retrieve the
athlete names based on their ID.
- Create a new stack.
- Add a result block to import data from the very first stack.
- Add a new_select_blockand only selectID,Name,birth_year,TeamandSportas columns.
Our goal is now to find which athlete did 2 or more different sports.
- Create a new stack.
- Add a result block to import data from the very first stack.
- Add a new_filter_block, selectMedalas column,!=as comparison operator and leave the value empty. Click on run, which will only get athletes with medals.
- Add a new_group_by_block, grouping byID(as some athletes have the same name).
- Add a new_summarize_blockby choising the functionn_distinctapplied on theSportcolumns.
- Add a new_filter_block, selectN_DISTINCTas column,>=as comparison operator and set the value to 2. Click on run. This gives us the athletes that are doing 2 sports or more.
- Add a new_join_block. Selectleft_joinas join function, select the third stack (lookup table) as join table andIDas column.
- Add a new_arrange_blockfor thebirth_yearcolumn.
As a conclusion, Hjrdis Viktoria Tpel (1904) was the first recorded athlete to compete in 2 different sports, swimming and diving for Sweden. Lauryn Chenet Williams (1984) is the latest for US with Athletics and Bobsleigh. Itâs actually quite amazing to see people competing in two quite unrelated sports like swimming and handbain the case of Roswitha Krause.
library(blockr)
library(blockr.ggplot2)
options(shiny.maxRequestSize = 100 * 1024^2)
do.call(set_workspace, args = list(title = "My workspace"))
serve_workspace(clear = FALSE)
As an end-user, you are not supposed to write code. As such, if you think anything is missing, you can open an issue here, or ask any developer you are working with to create new blocks. This leads us to the second part of this blog post ⌠How to use blockr as a developers?
As a developer
How to install it:
pak::pak("BristolMyersSquibb/blockr")
blockr canât provide any single data manipulation or visualization block. Thatâs the reason why we made it easily extensible. You can get an introduction to blockr for developers here.
In the following, we create an ordinary differential equations solver block using the pracma package. We choose the Lorenz attractor. With R, equations may be written as:
lorenz <- function(t, y, parms) {
  c(
    X = parms[1] * y[1] + y[2] * y[3],
    Y = parms[2] * (y[2] - y[3]),
    Z = -y[1] * y[2] + parms[3] * y[2] - y[3]
  )
}
where t is the time, y a vector of solutions and params the
various parameters. If you are familiar with
deSolve,
equations are defined with similar functions. For this blog post, we
selected pracma as deSolve does not run in shinylive, so you could not
see the embedded demonstration.
Add interactivity with the fields
We want to add interactivity on the 3 different parameters. Hence, we
create our new block function with 3 fields inside a list. Since the
expected values are numbers, we leverage the new_numeric_field.
Parameters are only explicitly shown for the first field:
new_ode_block <- function(...) {
  fields <- list(
    a = new_numeric_field(value = -8 / 3, min = -10, max = 20),
    b = new_numeric_field(-10, -50, 100),
    c = new_numeric_field(28, 1, 100)
  )
  # TBD
  # ...
}
As you may imagine, these fields are subsequently translated into shiny
inputs, that is numericInput in our example. If you face a situation
where you need to implement a custom field, not included in blockr, you
can read this
vignette.
Create the block expression
As next step, we instantiate our block with the new_block blockr
constructor:
new_block(
  fields = fields,
  expr = quote(<EXPR>),
  ...,
  class = <CLASSES>,
  submit = FALSE
)
A block is composed of fields, a quoted expression which involved
fields (to delay the evaluation), somes classes which control the
block behavior, and extra parameters passed with .... Finally,
submit allows to delay the block evaluation by requiring the user to
click on a submit button (FALSE by default). This prevents from
triggering unwanted intensive computations.
In our example, the expression calls the ode45 function. Notice the
usage of substitute to inject the lorenz function within the
expression. This is necessary since lorenz is defined outside of the
expression, and using quote would fail. Fields are invoked with
.(field_name), a rather strange notation, required by bquote to
process the expression. It is not mandory to understand this technical
underlying detail, but this standard must be respected. You may also
notice that some parameters like the initial conditions y0 or time
values are hardcoded. We leave the reader to transform them into fields,
as an exercise:
new_block(
  fields = fields,
  expr = substitute(
    as.data.frame(
      ode45(
        fun,
        y0 = c(X = 1, Y = 1, Z = 1),
        t0 = 0,
        tfinal = 100,
        parms = c(.(a), .(b), .(c))
      )
    ),
    list(fun = lorenz)
  )
  # TBD
)
Add the right classes
We give our block 2 classes, namely ode_block and data_block:
new_ode_block <- function(...) {
  fields <- list(
    a = new_numeric_field(-8 / 3, -10, 20),
    b = new_numeric_field(-10, -50, 100),
    c = new_numeric_field(28, 1, 100)
  )
  new_block(
    fields = fields,
    expr = substitute(
      as.data.frame(
        ode45(
          fun,
          y0 = c(X = 1, Y = 1, Z = 1),
          t0 = 0,
          tfinal = 100,
          parms = c(.(a), .(b), .(c))
        )
      ),
      list(fun = lorenz)
    ),
    ...,
    class = c("ode_block", "data_block")
  )
}
As explained earlier, they are required to control the block behavior,
as blockr is build with S3. For
instance, data_block have a specific evaluation method, to
calculate the expression:
evaluate_block.data_block <- function (x, ...)
{
  stopifnot(...length() == 0L)
  eval(generate_code(x), new.env())
}
where generate_code processes the block code. Data blocks are
considered as entry point blocks, as opposed to transformation
blocks, that operate on data. Therefore, you may easily understand that
the evaluation method for a transform block requires to pass the data
from the previous block with %>%:
evaluate_block.block <- function (x, data, ...)
{
  stopifnot(...length() == 0L)
  eval(substitute(data %>% expr, list(expr = generate_code(x))), list(data = data))
}
If you want to build a plot block and plot layers blocks, you would have
to design a specific evaluate method, that accounts for the + operator
required by ggplot2. To learn more about how to create a plot block, you
can read this
article.
Demo
library(blockr)
library(pracma)
library(blockr.ggplot2)
lorenz <- function(t, y, parms) {
  c(
    X = parms[1] * y[1] + y[2] * y[3],
    Y = parms[2] * (y[2] - y[3]),
    Z = -y[1] * y[2] + parms[3] * y[2] - y[3]
  )
}
new_ode_block <- function(...) {
  fields <- list(
    a = new_numeric_field(-8 / 3, -10, 20),
    b = new_numeric_field(-10, -50, 100),
    c = new_numeric_field(28, 1, 100)
  )
  new_block(
    fields = fields,
    expr = substitute(
      as.data.frame(
        ode45(
          fun,
          y0 = c(X = 1, Y = 1, Z = 1),
          t0 = 0,
          tfinal = 100,
          parms = c(.(a), .(b), .(c))
        )
      ),
      list(fun = lorenz)
    ),
    ...,
    class = c("ode_block", "data_block")
  )
}
stack <- new_stack(
  new_ode_block,
  new_ggplot_block(
    func = c("x", "y"),
    default_columns = c("y.1", "y.2")
  ),
  new_geompoint_block
)
serve_stack(stack)
Packaging new blocks: the registry
In the above example, we define the block on the fly. However, an other outstanding feature of blockr is the registry, which you can see as a blocks supermarket. From the R side, the registry is an environment that can be extended by developers who bring their own blocks packages:
To get an overview of all available blocks within the blockr core
package, we call get_registry:
get_registry()
                 ctor                                  description  category
1       arrange_block                              Arrange columns transform
2           csv_block                           Read a csv dataset    parser
3       dataset_block              Choose a dataset from a package      data
4  filesbrowser_block       Select files on the server file system      data
5        filter_block                       filter rows in a table transform
6      group_by_block                             Group by columns transform
7          head_block               Select n first rows of dataset transform
8          join_block                              Join 2 datasets transform
9          json_block                          Read a json dataset    parser
10       mutate_block                                 Mutate block transform
11          rds_block                           Read a rds dataset    parser
12       result_block Shows result of another stack as data source      data
13       select_block                    select columns in a table transform
14    summarize_block                        summarize data groups transform
15       upload_block                   Upload files from location      data
16          xpt_block                           Read a xpt dataset    parser
                                            classes      input     output
1             arrange_block, transform_block, block data.frame data.frame
2   csv_block, parser_block, transform_block, block     string data.frame
3                  dataset_block, data_block, block       <NA> data.frame
4             filesbrowser_block, data_block, block       <NA>     string
5              filter_block, transform_block, block data.frame data.frame
6            group_by_block, transform_block, block data.frame data.frame
7                head_block, transform_block, block data.frame data.frame
8                join_block, transform_block, block data.frame data.frame
9  json_block, parser_block, transform_block, block     string data.frame
10             mutate_block, transform_block, block data.frame data.frame
11  rds_block, parser_block, transform_block, block     string data.frame
12                  result_block, data_block, block       <NA> data.frame
13             select_block, transform_block, block data.frame data.frame
14          summarize_block, transform_block, block data.frame data.frame
15                  upload_block, data_block, block       <NA>     string
16  xpt_block, parser_block, transform_block, block     string data.frame
   package
1   blockr
2   blockr
3   blockr
4   blockr
5   blockr
6   blockr
7   blockr
8   blockr
9   blockr
10  blockr
11  blockr
12  blockr
13  blockr
14  blockr
15  blockr
16  blockr
This function returns a dataframe containing information about blocks
such as their constructors, like new_ode_block, the description, the
category (data, transform, plot ⌠this is user defined), classes,
accepted input, returned output and package.
To register a block we call register_block (or register_blocks for
multiple blocks):
register_my_blocks <- function() {
  register_block(
    constructor = new_ode_block,
    name = "ode block",
    description = "Computed the Lorent attractor solutions",
    classes = c("ode_block", "data_block"),
    input = NA_character_,
    output = "data.frame",
    package = "<YOUR_PACKAGE>",
    category = "data"
  )
  # You can register any other blocks here ...
}
where <YOUR_PACKAGE> must be replaced by your real package name.
Within a zzz.R script, you can ensure to register any block when the
package loads with a hook:
.onLoad <- function(libname, pkgname) {
  register_my_blocks()
  invisible(NULL)
}
After the registration, you can check whether the registry is updated, by looking at the ode block:
register_my_blocks()
reg <- get_registry()
reg[reg$package == "<YOUR_PACKAGE>", ]
        ctor                             description category
11 ode_block Computed the Lorent attractor solutions     data
                        classes input     output        package
11 ode_block, data_block, block  <NA> data.frame <YOUR_PACKAGE>
 
                     
                  