Get started with Time series data

data science for everyone

Author

Bongani Ncube

Published

2 November 2025

Install these packages in Rstudio for a smooth workflow

packages: ['forecast','tseries','vrtest','fpp3','TSstudio','UKgrid','tibble']

\[\color{orange}{\text{Neat Elite Research And Data Analytics}}\]

Examples of time series data

Google stock prices

World economy

Time series data
  • Time series data - is a sequence of values, each associate to a unique point in time that can divide to the following two groups:

  • Regular time series - is a sequence of observations which were captured at equally spaced time intervals (e.g., every month, week, day, hour, etc.)

  • Irregular time series - or unevenly spaced time series, is a sequence of observations which were not captured on equally spaced time intervals (for example rainy days, earthquakes, clinical trials, etc.)

  • this exercise will assist you to get started with time series data

We’ll be using the built-in Airpassengers dataset, which is a classic toy time series the monthly totals of international airline passengers from 1949 to 1960. Note that this dataset is a ts object, a special base R class for handling time series data

Time series objects

There are multiple classes in R for time-series data, the most common types are:

  • The ts class for regular time-series data, and mts class for multiple time seires objects , the most common class for time series data
  • The xts and zoo classes for both regular and irregular time series data, mainly popular in the financial field
  • The tsibble class, a tidy format for time series data, support both regular and irregular time-series data

The attribute of time series object

A typical time series object should have the following attributes:

  • A vector or matrix objects with sequential observations
  • Index or timestamp
  • Frequency units
  • Cycle units

Where the frequency of the series represents the units of the cycle. For example, for monthly series, the frequency units are the month of the year, and the cycle units are the years. Similarly, for daily series, the frequency units could be the day of the year, and the cycle units are also the years.

The stats package provides a set of functions for handling and extracting information from a ts object. The frequency and cycle functions, as their names imply return the frequency and the cycle, respectivly, of the object.

look at the data

verify it is a ts object

check if ts

the start and end functions return the starting and ending time of the series, respectively:

The tsp function returns both the start and end of the series and its frequency:

the ts_info function from the TSstudio package returns a concise summary of the series:

Creating a ts object

The ts function allows to create a ts object from a single vector and a mts object from a multiple vectors (or matrix). By defining the start (or end) and frequency of the series, the function generate the object index.

use the ts() function to change to time series data , use ?ts() to get more help

How to define the start and frequency arguments?

Series Type Cycle Units Frequency Units Frequency Example
Quarterly Years Quarter of the year 4 ts(x, start = c(2019, 2), frequency = 4)
Monthly Years Month of the year 12 ts(x, start = c(2019, 1), frequency = 12)
Weekly Years Week of the year 52 ts(x, start = c(2019, 13), frequency = 52)
Daily Years Day of the year 365 ts(x, start = c(2019, 290), frequency = 365)
The tsibble class

β€œThe tsibble package provides a data infrastructure for tidy temporal data with wrangling tools…”

In other words, the tsibble object allows you to work with a data frame alike (i.e., tbl object) with a time awareness attribute. The key characteristics of this class:

  • It has a date/time object as an index
  • Using key to store multiple time series objects
  • A tbl object - can apply any of the normal tools to reformat, clean or modify tbl object such as dplyr functions

tsibble objects

  • A tsibble allows storage and manipulation of multiple time series in R.

  • It contains:

    • An index: time information about the observation
    • Measured variable(s): numbers of interest
    • Key variable(s): optional unique identifiers for each series
  • It works with tidyverse functions.

The tsibble index

For observations more frequent than once per year, we need to use a time class function on the index.

Note
  • the above gives a time series data.frame which can be manipulated by the tidyverse packages

notice how the above is the same as

Common time index variables can be created with these functions:

Frequency Function
Annual start:end
Quarterly yearquarter()
Monthly yearmonth()
Weekly yearweek()
Daily as_date(), ymd()
Sub-daily as_datetime()

change airpassangers to dataframe

Warning

for some reason this is not working in web-r but will work if you paste code in Rstudio or R

turn it to a tsibble

Warning

for some reason this is not working in web-r but will work if you paste code in Rstudio or R

plot

Note
  • ggtsdisplay() from the forecast package, plots the raw series alongside the ACF and PACF plots of the time series with a few useful internal knobs and whistles.
  • You can also add ggplot layers to this figure outside of the ggtsdisplay function.
Warning
  • the above will not run because the package forecast wont run in webr so you copy the code to Rstudio or R
Note

Another useful exploratory plot to visually expect is the decomposition plot that plots the raw series, the trend line (same as blue line in above plot), a plot of the seasonal component of the series, if one exists, and a random/remainder component.

There are two decomposition functions in the stats package: decompose() and stl.

Note
  • stl() has more sophisticated set of parameters than decompose(), but both can broadly accomplish the same task.
  • Armed with this information, we can now make a series of informed guesses about what our initial ARIMA(p,d,q) model that is most appropriate. The secondary peaks in the ACF are evidence of a seasonal component, which we confirmed in the decomposed series.
  • the pattern in the Airpassengers dataset seems to closely resemble the ARMA(1,1) model.
  • In addition, the PACF plot has significant autocorrelations in opposite directions, followed by oscillating, mostly insignificant correlations until we hit 12 months out, which is, agian, the seasonal component.

Stationarity and Unit Root Tests

If \(\{y_t\}\) is a stationary time series, then for all \(s\), the distribution of \((y_t,\dots,y_{t+s})\) does not depend on \(t\).

A stationary series is:

  • roughly horizontal
  • constant variance
  • no patterns predictable in the long-term

If \(\{y_t\}\) is a stationary time series, then for all \(s\), the distribution of \((y_t,\dots,y_{t+s})\) does not depend on \(t\).

Note
  • Next up we want to check if the time series appears to violate the requirement that the data is stationary i.e, the number of international air travel passengers increases over time without returning back to a stable equilibrium, even though the patterns in the data themselves are consistent.

  • if we have a non-stationary time series, we need to transform it into a stationary series in order for our models to work properly.

Unit root tests

  1. Augmented Dickey Fuller test: null hypothesis is that the data are non-stationary and non-seasonal.
  2. Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test: null hypothesis is that the data are stationary and non-seasonal.
  3. Other tests available for seasonal data.

First, we run the augmented Dickey-Fuller test:

Warning

for some reason this is not working in web-r but will work if you paste code in Rstudio or R

Tip

The null hypothesis is that the series is non-stationary and we have a p-value that is low enough to reject the null,

The KPSS test, on the other hand, has a null hypothesis that the series is stationary.

Warning

for some reason this is not working in web-r but will work if you paste code in Rstudio or R

  • we are getting conflicting results from our first two unit root tests. The ADF test says the series is stationary while the KPSS test rejects its null hypothesis, indicating that the series is non-stationary.

  • Here we will run the variance ratio test mentioned in the slides.

Warning

for some reason this is not working in web-r but will work if you paste code in Rstudio or R

Again, we get additional evidence that our series is stationary, since the test statistic deviates from 1.

Building and Testing an ARIMA Model

ARIMA models

So we have two tests that provide us evidence of stationarity, and one test that shows us evidence of non-stationarity.

Based on the patterns we see in the ACF and PACF plots, the decompositoin plots, and the unit root tests, it would be reasonable to initially guess that this series was an ARMA(1,1) process or an ARIMA(1,1,1) process. the forecast package gives us several easy to use functions to build and test each of these models.

Warning

for some reason this is not working in web-r but will work if you paste code in Rstudio or R

Warning

for some reason this is not working in web-r but will work if you paste code in Rstudio or R

Here we have created an ARIMA model with and without differencing the series since we have gotten conflicting unit root test results. Based on the AIC and BIC diagnostics for each of the models, an ARIMA model with a unit root actually models the data better.

Now, we should take the residuals from best fitting model and plot the ACF again to see if we have any leftover serial correlation to explain.

Warning

for some reason this is not working in web-r but will work if you paste code in Rstudio or R

Based on the the ACF, it looks like we have some leftover seasonality to model out of the data. Notice in the initial ARIMA model we provided the number of periods, but did not specify an order of seasonality. Seasonal components may have their own ARIMA(p,d,q) process, though they tend to be less complicated than the full ARIMA. Since this seasonality peaks once each 12 months and slowly declines, it looks like it includes an AR(1) process.