tsbox 0.1: class-agnostic time series

Christoph Sax


The R ecosystem knows a vast number of time series classes: ts, xts, zoo, tsibble, tibbletime or timeSeries. The plethora of standards causes confusion. As different packages rely on different classes, it is hard to use them in the same analysis. tsbox provides a set of tools that make it easy to switch between these classes. It also allows the user to treat time series as plain data frames, facilitating the use with tools that assume rectangular data.


comic by xkcd


The tsbox package is built around a set of functions that convert time series of different classes to each other. They are frequency-agnostic, and allow the user to combine multiple non-standard and irregular frequencies. Because coercion works reliably, it is easy to write functions that work identically for all classes. So whether we want to smooth, scale, differentiate, chain-link, forecast, regularize or seasonally adjust a time series, we can use the same tsbox-command for any time series class.

This blog gives a short overview of the changes introduced in 0.1. A detailed overview of the package functionality is given in the documentation page (or in a previous blog-post).

Keeping explicit missing values

Version 0.1, now on CRAN, brings a large number of bug fixes and improvements. A substantial change involves the treatment of NA values in data frames. Previously, all NAs in data frames were treated as implicit, and were only made explicit by a call to ts_regular.

This has changed now. If you convert a ts object to a data frame, all NA values will be preserved. To replicate previous behavior, apply the ts_na_omit function:

library(tsbox)
x.ts <- ts_c(mdeaths, austres)
x.ts
ts_df(x.ts)
ts_na_omit(ts_df(x.ts))

ts_span extends outside of series span

This lays the groundwork for ts_span to be extensible. With extend = TRUE, ts_span extends a regular series with NA values, up to the specified limits, similar to base window. Like all functions in tsbox, this is frequency-agnostic. For example, in the following, the monthly series mdeaths is extended by monthly NA values, while the quarterly series austres is extended by quarterly NA values.

x.df <- ts_df(ts_c(mdeaths, austres))
ts_span(x.df, end = "1999-12-01", extend = TRUE)

ts_default standardizes column names in a data frame

In rectangular data structures, i.e., in a data.frame, a data.table, or a tibble, tsbox stores one or multiple time series in the ‘long’ format. By default, tsbox detects a value, a time and zero, one or several id columns. Alternatively, the time column and the value column can be explicitly named time and value. If explicit names are used, the column order will be ignored.

While automatic column name detection is useful in interactive mode, it produces unnecessary overhead in longer workflows. The helper function ts_default detects and renames the time and the value column, so that auto-detection will be turned off in subsequent steps (note that the names of the id columns are not affected):

x.df <- ts_df(ts_c(mdeaths, austres))
names(x.df) <- c("a fancy id name", "date", "count")
ts_plot(x.df)  # tsbox is fine with that
ts_default(x.df)

ts_summary summarizes time series

ts_summary provides a frequency agnostic summary of a ts-boxable object:

ts_summary(ts_c(mdeaths, austres))
#>        id obs    diff freq      start        end
#> 1 mdeaths  72 1 month   12 1974-01-01 1979-12-01
#> 2 austres  89 3 month    4 1971-04-01 1993-04-01

ts_summary returns a plain data frame that can be used for any purpose. It is also recommended for the extraction of various time series properties, such as start, freq or id:

ts_summary(austres)$id
#> [1] "austres"
ts_summary(austres)$start
#> [1] "1971-04-01"

And a cheatsheet!

Finally, we fabricated a tsbox cheat sheet that summarizes most functionality. Print and enjoy working with time series.





Data Scientist/Engineer (40-100%)

cynkra team


We are hiring a data scientist/engineer! You are enthusiastic about R and know a bit about Git – everything else is negotiable. We offer interesting projects around the R ecosystem and lots of freedom, with option to work remotely.

Photo by Scott Webb

What we look for

  • Very good knowledge of R

  • Basic knowledge of Git

  • Experience in Machine Learning, Statistics, Programming, Experience with Databases, Teaching, shiny, tidyverse, etc., is a plus.

  • 40%-100% commitment, can be adapted throughout the year

  • Ability and desire to learn and improve on the job

  • Good working knowledge of written English

  • Very good command of spoken English or German

What we offer

  • Interesting projects around consulting and open source software development

  • Nice private office in Zurich close to ETH Hönggerberg, free coffee during office hours

  • Remote work possible

  • Competitive compensation

How to apply

Please submit your application via mail@cynkra.com, or share a private GitHub repository with us. Get in touch with us if you have further questions.

Who we are

cynkra is a Zurich-based data consulting company with a strong focus on R. We use R and the tidyverse in the vast majority of our projects. We support businessess and organizations by helping them picking the right tools, implementing solutions, training and code review. We are enthusiastic about open source software and contribute to it, too. Learn more at www.cynkra.com.





Time series of the world, unite!

Christoph Sax


The R ecosystem knows a ridiculous number of time series classes. So, I decided to create a new universal standard that finally covers everyone’s use case…

Ok, just kidding!

tsbox, now freshly on CRAN, provides a set of tools that are agnostic towards existing time series classes. It is built around a set of converters, which convert time series stored as ts, xts, data.frame, data.table, tibble, zoo, tsibble or timeSeries to each other.

To install the stable version from CRAN:

install.packages("tsbox")

To get an idea how easy it is to switch from one class to another, consider this:

library(tsbox)
x.ts <- ts_c(mdeaths, fdeaths)
x.xts <- ts_xts(x.ts)
x.df <- ts_df(x.xts)
x.tbl <- ts_tbl(x.df)
x.dt <- ts_tbl(x.tbl)
x.zoo <- ts_zoo(x.dt)
x.tsibble <- ts_tsibble(x.zoo)
x.timeSeries <- ts_timeSeries(x.tsibble)

We jump form good old ts objects toxts, store our time series in various data frames and convert them to some highly specialized time series formats.

tsbox is class-agnostic

Because these converters work nicely, we can use them to make functions class-agnostic. If a class-agnostic function works for one class, it works for all:

ts_scale(x.ts)
ts_scale(x.xts)
ts_scale(x.df)
ts_scale(x.dt)
ts_scale(x.tbl)

ts_scale normalizes one or multiple series, by subtracting the mean and dividing by the standard deviation. It works like a ‘generic’ function: You can apply it on any time series object, and it will return an object of the same class as its input.

So, whether we want to smooth, scale, differentiate, chain-link, forecast, regularize or seasonally adjust a series, we can use the same commands to whatever time series at hand. tsbox offers a comprehensive toolkit for the basics of time series manipulation. Here are some additional operations:

ts_pc(x.ts)                 # percentage change rates
ts_forecast(x.xts)          # forecast, by exponential smoothing
ts_seas(x.df)               # seasonal adjustment, by X-13
ts_frequency(x.dt, "year")  # convert to annual frequency
ts_span(x.tbl, "-1 year")   # limit time span to final year

tsbox is frequency-agnostic

There are many more. Because they all start with ts_, you can use auto-complete to see what’s around. Most conveniently, there is a time series plot function that works for all classes and frequencies:

ts_plot(
  `Airline Passengers` = AirPassengers,
  `Lynx trappings` = ts_df(lynx),
  `Deaths from Lung Diseases` = ts_xts(fdeaths),
  title = "Airlines, trappings, and deaths",
  subtitle = "Monthly passengers, annual trappings, monthly deaths"
)

www.dataseries.org

There is also a version that uses ggplot2 and has the same syntax.

Time series in data frames

You may have wondered why we treated data frames as a time series class. The spread of dplyr and data.table has given data frames a boost and made them one of the most popular data structures in R. So, storing time series in a data frame is an obvious consequence. And even if you don’t intend to keep time series in data frames, this is still the format in which you import and export your data. tsbox makes it easy to switch from data frames to time series and back.

Make existing functions class-agnostic

tsbox includes tools to make existing functions class-agnostic. To do so, the ts_ function can be used to wrap any function that works with time series. For a function that works on "ts" objects, this is as simple as that:

ts_rowsums <- ts_(rowSums)
ts_rowsums(ts_c(mdeaths, fdeaths))

Note that ts_ returns a function, which can be used with or without a name.

In case you are wondering, tsbox uses data.table as a backend, and makes use of its incredibly efficient reshaping facilities, its joins and rolling joins. And thanks to anytime, tsbox will be able to recongnize almost any date format without manual intervention.

So, enjoy some relieve in R’s time series class struggle.

Website: www.tsbox.help









More posts

tsbox 0.1: class-agnostic time series

Christoph Sax

Data Scientist/Engineer (40-100%)

cynkra team

Time series of the world, unite!

Christoph Sax

Done “Establishing DBI”!?

Kirill Müller


Other blogs

R-bloggers