Frequency | Function |
---|---|
Annual | start:end |
Quarterly | yearquarter() |
Monthly | yearmonth() |
Weekly | yearweek() |
Daily |
as_date() , ymd()
|
Sub-daily | as_datetime() |
Get started with Time series data
data science for everyone
packages: ['forecast','tseries','vrtest','fpp3','TSstudio','UKgrid','tibble']
\[\color{orange}{\text{Neat Elite Research And Data Analytics}}\]
Examples of time series data
Google stock prices
World economy
Time series data - is a sequence of values, each associate to a unique point in time that can divide to the following two groups:
Regular time series - is a sequence of observations which were captured at equally spaced time intervals (e.g., every month, week, day, hour, etc.)
Irregular time series - or unevenly spaced time series, is a sequence of observations which were not captured on equally spaced time intervals (for example rainy days, earthquakes, clinical trials, etc.)
this exercise will assist you to get started with time series data
Weβll be using the built-in Airpassengers dataset, which is a classic toy time series the monthly totals of international airline passengers from 1949 to 1960. Note that this dataset is a
ts
object, a special baseR
class for handling time series data
Time series objects
There are multiple classes in R for time-series data, the most common types are:
- The
ts
class for regular time-series data, andmts
class for multiple time seires objects , the most common class for time series data - The
xts
andzoo
classes for both regular and irregular time series data, mainly popular in the financial field - The
tsibble
class, a tidy format for time series data, support both regular and irregular time-series data
The attribute of time series object
A typical time series object should have the following attributes:
- A vector or matrix objects with sequential observations
- Index or timestamp
- Frequency units
- Cycle units
Where the frequency of the series represents the units of the cycle. For example, for monthly series, the frequency units are the month of the year, and the cycle units are the years. Similarly, for daily series, the frequency units could be the day of the year, and the cycle units are also the years.
The stats package provides a set of functions for handling and extracting information from a ts
object. The frequency
and cycle
functions, as their names imply return the frequency and the cycle, respectivly, of the object.
look at the data
verify it is a
ts
object
check if ts
the
start
andend
functions return the starting and ending time of the series, respectively:
The
tsp
function returns both the start and end of the series and its frequency:
the
ts_info
function from the TSstudio package returns a concise summary of the series:
Creating a ts object
The ts
function allows to create a ts
object from a single vector and a mts
object from a multiple vectors (or matrix). By defining the start (or end) and frequency of the series, the function generate the object index.
use the
ts()
function to change to time series data , use?ts()
to get more help
How to define the start and frequency arguments?
Series Type | Cycle Units | Frequency Units | Frequency | Example |
---|---|---|---|---|
Quarterly | Years | Quarter of the year | 4 | ts(x, start = c(2019, 2), frequency = 4) |
Monthly | Years | Month of the year | 12 | ts(x, start = c(2019, 1), frequency = 12) |
Weekly | Years | Week of the year | 52 | ts(x, start = c(2019, 13), frequency = 52) |
Daily | Years | Day of the year | 365 | ts(x, start = c(2019, 290), frequency = 365) |
βThe tsibble package provides a data infrastructure for tidy temporal data with wrangling toolsβ¦β
In other words, the tsibble
object allows you to work with a data frame alike (i.e., tbl
object) with a time awareness attribute. The key characteristics of this class:
- It has a date/time object as an index
- Using key to store multiple time series objects
- A
tbl
object - can apply any of the normal tools to reformat, clean or modifytbl
object such asdplyr
functions
tsibble
objects
A
tsibble
allows storage and manipulation of multiple time series in R.-
It contains:
- An index: time information about the observation
- Measured variable(s): numbers of interest
- Key variable(s): optional unique identifiers for each series
It works with tidyverse functions.
The tsibble
index
For observations more frequent than once per year, we need to use a time class function on the index.
- the above gives a time series data.frame which can be manipulated by the tidyverse packages
notice how the above is the same as
Common time index variables can be created with these functions:
change
airpassangers
to dataframe
for some reason this is not working in web-r but will work if you paste code in Rstudio or R
turn it to a
tsibble
for some reason this is not working in web-r but will work if you paste code in Rstudio or R
plot
-
ggtsdisplay()
from theforecast
package, plots the raw series alongside the ACF and PACF plots of the time series with a few useful internal knobs and whistles. - You can also add
ggplot
layers to this figure outside of theggtsdisplay
function.
- the above will not run because the package
forecast
wont run inwebr
so you copy the code to Rstudio or R
Another useful exploratory plot to visually expect is the decomposition plot that plots the raw series, the trend line (same as blue line in above plot), a plot of the seasonal component of the series, if one exists, and a random/remainder component.
There are two decomposition functions in the stats
package: decompose()
and stl
.
-
stl()
has more sophisticated set of parameters thandecompose()
, but both can broadly accomplish the same task. - Armed with this information, we can now make a series of informed guesses about what our initial ARIMA(p,d,q) model that is most appropriate. The secondary peaks in the ACF are evidence of a seasonal component, which we confirmed in the decomposed series.
- the pattern in the
Airpassengers
dataset seems to closely resemble the ARMA(1,1) model. - In addition, the PACF plot has significant autocorrelations in opposite directions, followed by oscillating, mostly insignificant correlations until we hit 12 months out, which is, agian, the seasonal component.
Stationarity and Unit Root Tests
If \(\{y_t\}\) is a stationary time series, then for all \(s\), the distribution of \((y_t,\dots,y_{t+s})\) does not depend on \(t\).
A stationary series is:
- roughly horizontal
- constant variance
- no patterns predictable in the long-term
If \(\{y_t\}\) is a stationary time series, then for all \(s\), the distribution of \((y_t,\dots,y_{t+s})\) does not depend on \(t\).
Next up we want to check if the time series appears to violate the requirement that the data is stationary i.e, the number of international air travel passengers increases over time without returning back to a stable equilibrium, even though the patterns in the data themselves are consistent.
if we have a non-stationary time series, we need to transform it into a stationary series in order for our models to work properly.
Unit root tests
- Augmented Dickey Fuller test: null hypothesis is that the data are non-stationary and non-seasonal.
- Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test: null hypothesis is that the data are stationary and non-seasonal.
- Other tests available for seasonal data.
First, we run the augmented Dickey-Fuller test:
for some reason this is not working in web-r but will work if you paste code in Rstudio or R
The null hypothesis is that the series is non-stationary and we have a p-value that is low enough to reject the null,
The KPSS test, on the other hand, has a null hypothesis that the series is stationary.
for some reason this is not working in web-r but will work if you paste code in Rstudio or R
we are getting conflicting results from our first two unit root tests. The ADF test says the series is stationary while the KPSS test rejects its null hypothesis, indicating that the series is non-stationary.
Here we will run the variance ratio test mentioned in the slides.
for some reason this is not working in web-r but will work if you paste code in Rstudio or R
Again, we get additional evidence that our series is stationary, since the test statistic deviates from 1.
Building and Testing an ARIMA Model
ARIMA models
So we have two tests that provide us evidence of stationarity, and one test that shows us evidence of non-stationarity.
Based on the patterns we see in the ACF and PACF plots, the decompositoin plots, and the unit root tests, it would be reasonable to initially guess that this series was an ARMA(1,1) process or an ARIMA(1,1,1) process. the forecast
package gives us several easy to use functions to build and test each of these models.
for some reason this is not working in web-r but will work if you paste code in Rstudio or R
for some reason this is not working in web-r but will work if you paste code in Rstudio or R
Here we have created an ARIMA model with and without differencing the series since we have gotten conflicting unit root test results. Based on the AIC and BIC diagnostics for each of the models, an ARIMA model with a unit root actually models the data better.
Now, we should take the residuals from best fitting model and plot the ACF again to see if we have any leftover serial correlation to explain.
for some reason this is not working in web-r but will work if you paste code in Rstudio or R
Based on the the ACF, it looks like we have some leftover seasonality to model out of the data. Notice in the initial ARIMA model we provided the number of periods, but did not specify an order of seasonality. Seasonal components may have their own ARIMA(p,d,q) process, though they tend to be less complicated than the full ARIMA. Since this seasonality peaks once each 12 months and slowly declines, it looks like it includes an AR(1) process.