cynkra


Three small dots for more readable code

From the Blog
Package development
R

Authors

Maƫlle Salmon

David Schoch

Kirill Müller

Published

Introduction

Have you ever skimmed through the tidyverse’s design guide by Hadley Wickham? Even though that book hasn’t been completed yet, it already has useful content such as the pattern ā€œPut … after required argumentsā€, that we have applied in several of our packages, and that we’d like to focus on in this post!

Why put … after required arguments?

In a function with both

  • Required arguments, i.e.Ā arguments with no default value (e.g.Ā the graph argument of many igraph functions such as hits_score())
  • Optional arguments, i.e.Ā arguments with a default value (e.g.Ā the weights argument that defaults to NULL in many igraph functions such as hits_score())

You’d hope for

  • Users to name optional arguments to make the code easier to read for outsiders. Compare hits_score(x, TRUE, NA) to hits_score(x, scales = TRUE, weights = NA): the latter is obvious even to the casual reader; this may be you in a few months.
  • Yourself to be able to change the order of optional arguments without wreaking havoc.

Well, the pattern described in the tidy design guide allows you to do just that.

Here is a mocked up definition of the make_wheel() function in igraph, that we call make_wheel0(). The function creates a wheel graph (a center connected to peripheral nodes, which when plotted looks like a bike wheel). Its arguments are: the number of nodes n, the optional mode which describes the direction of edges, and center which indicates the ID of the central node.

library(igraph)
make_wheel0 <- function(
  n,
  ...,
  mode = c("in", "out", "mutual", "undirected"),
  center = 1
) {
  res <- wheel_impl(
    n = n,
    mode = mode,
    center = center - 1
  )
  if (igraph_opt("add.params")) {
    res$name <- switch(
      igraph_match_arg(mode),
      "in" = "In-wheel",
      "out" = "Out-wheel",
      "Wheel"
    )
    res$mode <- mode
    res$center <- center
  }
  res
}

With that make_wheel0() definition, not naming the mode argument does not work: it is silently ignored! make_wheel0(10) and make_wheel0(10, "mutual") return the same wheel graph with edges pointing towards the center. Only make_wheel0(10, mode = "mutual") returns a wheel graph with mutual (bidirectional) edges.

(default <- make_wheel0(10))
IGRAPH b632ad1 D--- 10 18 -- In-wheel
+ attr: name (g/c), mode (g/c), center (g/n)
+ edges from b632ad1:
 [1]  2-> 1  3-> 1  4-> 1  5-> 1  6-> 1  7-> 1  8-> 1  9-> 1 10-> 1  2-> 3
[11]  3-> 4  4-> 5  5-> 6  6-> 7  7-> 8  8-> 9  9->10 10-> 2
plot(default, vertex.size = 35, vertex.label.cex = 2)

(mutual <- make_wheel0(10, mode = "mutual"))
IGRAPH 94442f5 D--- 10 36 -- Wheel
+ attr: name (g/c), mode (g/c), center (g/n)
+ edges from 94442f5:
 [1]  1-> 2  2-> 1  1-> 3  3-> 1  1-> 4  4-> 1  1-> 5  5-> 1  1-> 6  6-> 1
[11]  1-> 7  7-> 1  1-> 8  8-> 1  1-> 9  9-> 1  1->10 10-> 1  2-> 3  3-> 4
[21]  4-> 5  5-> 6  6-> 7  7-> 8  8-> 9  9->10 10-> 2  2->10 10-> 9  9-> 8
[31]  8-> 7  7-> 6  6-> 5  5-> 4  4-> 3  3-> 2
plot(mutual, vertex.size = 35, vertex.label.cex = 2)

(surprise <- make_wheel0(10, "mutual"))
IGRAPH 1ce383c D--- 10 18 -- In-wheel
+ attr: name (g/c), mode (g/c), center (g/n)
+ edges from 1ce383c:
 [1]  2-> 1  3-> 1  4-> 1  5-> 1  6-> 1  7-> 1  8-> 1  9-> 1 10-> 1  2-> 3
[11]  3-> 4  4-> 5  5-> 6  6-> 7  7-> 8  8-> 9  9->10 10-> 2
plot(surprise, vertex.size = 35, vertex.label.cex = 2)

This is not exactly the ideal behavior since it can be confusing for users.

But its actual definition is the following. We use rlang::check_dots_empty() to ensure no argument has been passed as ..., therefore it errors for unnamed arguments.

make_wheel <- function(
  n,
  ...,
  mode = c("in", "out", "mutual", "undirected"),
  center = 1
) {
  rlang::check_dots_empty()
  res <- wheel_impl(
    n = n,
    mode = mode,
    center = center - 1
  )
  if (igraph_opt("add.params")) {
    res$name <- switch(
      igraph_match_arg(mode),
      "in" = "In-wheel",
      "out" = "Out-wheel",
      "Wheel"
    )
    res$mode <- mode
    res$center <- center
  }
  res
}

In consequence, users have to name the second argument:

no_surprise <- make_wheel(10, "mutual")
Error in `make_wheel()`:
! `...` must be empty.
āœ– Problematic argument:
• ..1 = "mutual"
ℹ Did you forget to name an argument?

Adding the dots in later function life

In the dm and igraph packages, we decided to add the ellipsis, together with the rlang’s empty dots check, to functions that had been around and exported for a while. This meant we were looking at the possibility of breaking users’ code. 😱 While breaking changes might sometimes be unavoidable, in this case we resorted to a gentler solution that we will sketch out here.

Let’s compare different versions of a fictional function.

In this first implementation, my_message() lets the user pass unnamed optional arguments pretty and verbose.

my_message <- function(x, pretty = FALSE, verbose = TRUE) {
  if (!is.character(x)) {
    stop("x must be a character")
  }

  if (verbose) {
    message("Hello!")
  }

  if (pretty) {
    cat(sprintf("-- %s --", x))
  } else {
    cat(x)
  }
}

my_message("a")
Hello!

a
my_message("a", TRUE)
Hello!

-- a --
my_message("a", TRUE, FALSE)
-- a --

This new version of my_message() is stricter but breaks existing code!

my_message <- function(x, ..., pretty = FALSE, verbose = TRUE) {
  rlang::check_dots_empty()
  if (!is.character(x)) {
    stop("x must be a character")
  }

  if (verbose) {
    message("Hello!")
  }

  if (pretty) {
    cat(sprintf("-- %s --", x))
  } else {
    cat(x)
  }
}
my_message("a")
Hello!

a
my_message("a", TRUE)
Error in `my_message()`:
! `...` must be empty.
āœ– Problematic argument:
• ..1 = TRUE
ℹ Did you forget to name an argument?
my_message("a", TRUE, FALSE)
Error in `my_message()`:
! `...` must be empty.
āœ– Problematic arguments:
• ..1 = TRUE
• ..2 = FALSE
ℹ Did you forget to name an argument?
my_message("a", pretty = TRUE, verbose = FALSE)
-- a --

The gentler version of that new check recovers arguments by their position, and nudges the users towards adapting their code by emitting a lifecycle warning. A bit more work for us, less pain for users.

my_message <- function(x, ..., pretty = FALSE, verbose = TRUE) {
  if (!is.character(x)) {
    stop("x must be a character")
  }

  if (...length() > 0) {
    user_env <- rlang::caller_env()
    recovered_attrs <- recover_positional_attrs(
      dots = list(...),
      current = list(pretty = pretty, verbose = verbose),
      user_env = user_env
    )
    pretty <- recovered_attrs$pretty
    verbose <- recovered_attrs$verbose
  }

  if (verbose) {
    message("Hello!")
  }

  if (pretty) {
    cat(sprintf("-- %s --", x))
  } else {
    cat(x)
  }
}
recover_positional_attrs() definition
recover_positional_attrs <- function(
  dots,
  current,
  user_env = rlang::caller_env()
) {
  nms <- rlang::names2(dots)
  unnamed <- dots[!nzchar(nms)]
  if (
    length(dots) == 1L &&
      length(unnamed) == 1L &&
      is.character(unnamed[[1L]]) &&
      length(unnamed[[1L]]) == 1L
  ) {
    lifecycle::deprecate_soft(
      "3.0.0",
      "my_message(pretty = 'must be named')",
      user_env = user_env
    )
    return(list(pretty = unnamed[[1L]], verbose = NULL))
  }

  if (
    length(dots) == 2L &&
      length(unnamed) == 2L &&
      all(is.logical(unlist(unnamed))) &&
      all(lengths(unnamed) == 1L)
  ) {
    lifecycle::deprecate_soft(
      "3.0.0",
      "my_message(verbose = 'must be named')",
      details = "Both pretty and verbose must be named",
      user_env = user_env
    )

    return(list(pretty = unnamed[[1L]], verbose = unnamed[[2L]]))
  }

  list(pretty = NULL, verbose = NULL)
}
my_message("a", TRUE, FALSE)
Warning: The `verbose` argument of `my_message()` must be named as of <NA> 3.0.0.
ℹ Both pretty and verbose must be named

-- a --

Note that the lifecycle message displays NA but when emitted from a package, it displays the name of that package.

In igraph, we went for a similar gentle solution. Now, igraph being igraph which means big, we made it scalable from the get-go:

  • In a function where we want to transition towards using the ellipsis pattern, we add marker comments # BEGIN GENERATED ARG_HANDLE: <fn> and # END GENERATED ARG_HANDLE.
  • A script saved in tools/ contains the ā€œconfigurationā€: for each migrated function, the names and positions of arguments, from which package version the migration started, etc.
  • Another script saved in tools/ generates, between the marker comments and based on the configuration, the call to a helper function called migrate_recover_args().

Conclusion

The ellipsis between required and optional arguments makes for more readable code. Erroring when unnamed arguments are passed after the required arguments, thanks to rlang::check_dots_empty(), avoids surprising the users. However, for existing functions where passing unnamed optional arguments used to be accepted, it is gentler to warn than error, and to recover unnamed arguments based on their position.

The ellipsis pattern is fascinating because it means a package developer can improve the experience of people who’ll read the code that package users write!