Getting started with R

Data Science for Everyone

Author

Bongani Ncube

1. Getting to know R

People usually describe R as quirky . But i certainly think quirky is good, it means it’s a little bit out of the norm. And I think that’s what makes R, R. So to easen things up, we’ll briefly discuss some of the most important family of data types in base R: vectors.

In R, assignment statements have the following form:

object_name <- value

= will work in place of <-, but it will cause confusion later so do not use it. Use the keyboard shortcut: Alt + - (the minus sign) to poof up the assignment operator, <- , in a flash. 💫

Atomic vectors

The basic data structure in R is the vector (R’s 1D data structure). Vectors come in two flavours: atomic vectors and lists. The two differ in the types of their elements: all elements of an atomic vector must be the same type, whereas the elements of a list can have different types. There are four primary types of atomic vectors: logical, integer, double, and character (which contains strings).

Vectors are created using use c(), short for combine. You can determine the type of a vector with typeof() and how many elements it contains with length().

Logical vector

  • will create a vector for you
  • display output
  • determine the type of vector

Double vector:

  • These are special values defined by the floating point standard.
  • add a value of your choice
  • display results

Integer vector:

  • written similarly to doubles but must be followed by L
  • determine output

Character vector

  • fill in your name

Dataframes

  • In R, data frames , are the most common way of storing and analyzing data. However, vectors too can suffice for general and basic data manipulation

With that said, let’s take atomic vectors out for a try!

We’ll start by looking at some simple data. Suppose a college takes a sample of student grades for a grades science class:

So, how many students does the sample contain 🗒?

Determine number of elements in the vector

Indexing Vectors

Oftentimes we may want to access only part of a vector, or perhaps an individual element. This is called indexing and is accomplished with square brackets, []. R has a very flexible system that gives us several choices of index:

  • Passing a vector of positive numbers returns the slice of the vector containing the elements at those locations. The first position is 1 (not 0, as in some other languages).

  • Passing a vector of negative numbers returns the slice of the vector containing the elements everywhere except at those locations.

  • Let’s get the first and sixth grade:

we should expect 50 & 3

Perfect!

  • Now let’s get a vector of all grades except for the 1st and the 6th student.

we should expect all other grades except at index 1 and 6

Alright, now we know our way around vectors,

It’s time we performed some analysis of the grades data.

  • For a first, we can find the simple average grade (in other words, the mean grade value).

So the mean grade is just around 50 - more or less in the middle of the possible range from 0 to 100.

Combining vectors

Let’s add a second set of data for the same students, this time recording the typical number of hours per week they devoted to studying.

We can create a 2-dimensional matrix from grades and study_hours by combining the vectors by columns using cbind()

  • Create a 2D matrix
  • print output

Before going into anything complex, we might want to find out the dimensions of our object. Turns out, dim() can also help you retrieve the number of rows and columns of an object.

  • Dimension of the resulting matrix [Rows, Columns]
Tip

As expected 🤩, the matrix contains 2 columns each with 22 rows.

We subset matrices using the notation [row_number,column_number]

So, for instance, if we wanted to access the first element in the in the first column (study_hours), we would do it as below:

  • Show the first element of the first column
  • Show the third element of the second column
that’s all
  • hope you enjoyed