Explanatory Analysis: Hands on
Welcome to interactive learning
This will go a long way helping you learn R
hands on , but beware coz this can be addictive!
remove the
#
(comments and run)
- we are going to be using the
mtcars
inbuilt dataset
- Take a peek at the recipe site dataset using
summary()
The dataset contains:
- ? observations
- ? variables
- to answer questions above use the
dim()
function
Next, we examine the first five observations of the data. The rest of the observations are not shown. You can also see the types of variables:
chr
(character),int
(integer),dbl
(double)use
head
to see first rows of the datause
tail
to see last rows as well
use base R to explore the data
-
table()
to look at frequencies for categorical data
now you try
table(new_data$vs)
Working with tidyverse
Select variables, generate new variable and rename variable
We will work with these functions.
Select variables using dplyr::select()
When you work with large datasets with many columns, it is sometimes easier to select only the necessary columns to reduce the dataset size. This is possible by creating a smaller dataset (fewer variables). Then you can work on the initial part of data analysis with this smaller dataset. This will greatly help data exploration.
however for this exercise we gonna need all the variables for exploration so we will select everything
- select
mpg ,cyl and qsec
variables
extending select
verb
sometimes it is necessary to perform conditional selection on variables because + at times you need only numerical variables for correlations + you may only need categorical variables for testing independence
for such a case we can use functions such as
select_if
the code above will only select variables of class
numeric
Generate new variable using mutate()
With mutate()
, you can generate a new variable. For example, in the dataset new_data
, we want to create a new variable named log_mpg
which is a log transformation of calories .
\[log\_mpg=\log(mpg)\]
And letβs observe the first five observations:
extending mutate
function
it is often wise to perform conditional mutations on data
sometimes it is necessary to perform conditional mutation on variables such that + you only mutate is a certain condition is met
we often use
mutate_if()
,mutate_at
andmutate_all
to achieve this
- check data types before the coming operation
- we note that category ,servings and high_traffic are characters when in actual fact they should be factors > lets change that
nice!! we have turned every
character
to afactor
Rename variable using rename()
Now, we want to rename
- variable mpg to
miles_per_gallon
- variable cyl to
cylinder
how about we visualise with ggplot
- what is the relationship between
mpg and qsec
- add labels