Teaching
Mathematics and Statistics teachings
Ordinary Differential Equations
Real Analysis
NC/ND mathematics
Linear Algebra
Advanced Calculus
Generalized linear models
Generalized Additive Models
Survival Analysis
Statistical Inference
Multivariate Statistics
Regression Analysis and ANOVA
Actuarial Statistics (CS1B & CS2B)
Regression and Classification
Unsupervised Learning
Econometrics
Advanced Tidyverse
Advanced Tidymodels
Caret
Rmarkdown
Quarto
Rshiny
h2o/MLR3
Advanced SQL queries
Advanced EXCEL
SPSS/STATA
I have a blend of R trainings that I offer for individuals and proffesionals such that each participant will be empowered to :
Essentials -Basics in R Data Analysis
- Duration -
12hours-4 days
- price -
US$30
This is perfect for those who are wishing to acquire a solid grounding in R for analyzing and understanding data . Learn how to process and manipulate data efficiently in R . At the end of the course you will be able to:
-
- vectors
- factors
- matrices
- lists
- subsetting data
- dataframes
Advanced - Boost your expertise
boost your expertise by learning more advanced levels in R. By the end of the course you will have acquired intermediate thorough and practical mastery of :
Statistical Analyst
become an expert in statiscal analysis by combining data analysis and statistical analysis . By the end you would have learnt how to
General Information
Setup
Click the Setup link on the navbar at the top and review all the information and follow the instructions prior to the workshop.
You should set aside a couple hours to download, install, and test all the software needed for the course. All the software we’re using in class is open-source and freely available online. This setup must be completed prior to class, as we will not have much time for troubleshooting software installation issues during class. Email Bongani Ncube if you’re having difficulty.
Course Schedule
Intro to R
This novice-level introduction is directed toward life scientists with little to no experience with statistical computing or bioinformatics. This interactive introduction will introduce the R statistical computing environment. The first part of this workshop will demonstrate very basic functionality in R, including functions, functions, vectors, creating variables, getting help, filtering, data frames, plotting, and reading/writing files.
Learning Objectives
- Become familiar with the RStudio interface and project management using RStudio
- Using R scripts to make analyses reproducible
- Perform basic arithmetic operations in R
- Using functions, creating variables, getting help
- Installing and loading R packages
- Importing and inspecting data
Advanced Data Manipulation with R
Data analysis involves a large amount of janitor work – munging and cleaning data to facilitate downstream data analysis. This session assumes a basic familiarity with R and covers tools and techniques for advanced data manipulation. It will cover data cleaning and “tidy data,” and will introduce R packages that enable data manipulation, analysis, and visualization using split-apply-combine strategies. Upon completing this lesson, students will be able to use the dplyr package in R to effectively manipulate and conditionally compute summary statistics over subsets of a “big” dataset containing many observations.
Learning Objectives
- Employ the
filter
operation to return only rows of data meeting a condition- Employ the
select
function to subset data including only columns of interest- Employ the
mutate
function to apply other chosen functions to existing columns and create new columns of data- Employ the
arrange
function to sort data by columns of interest- Use the
group_by
andsummarize
functions in combination to perform summary and statistical analyses over subgroupings of data- Employ the ‘pipe’ operator,
%>%
, to link together a sequence of functions- Reformat and reshape “messy” wide data to a tidy format using functions from the tidyr package
Advanced Data Visualization with R and ggplot2
This session will cover fundamental concepts for creating effective data visualization and will introduce tools and techniques for visualizing large, high-dimensional data using R. We will review fundamental concepts for visually displaying quantitative information, such as using series of small multiples, avoiding “chart-junk,” and maximizing the data-ink ratio. After briefly covering data visualization using base R graphics, we will introduce the ggplot2 package for advanced high-dimensional visualization. We will cover the grammar of graphics (geoms, aesthetics, stats, and faceting), and using ggplot2 to create plots layer-by-layer. Upon completing this lesson, students will be able to use R to explore a high-dimensional dataset by faceting and scaling arbitrarily complex plots in small multiples.
Learning Objectives
- Understand the grammar of graphics, and about building a plot layer by layer
- Map features of the data to aesthetics of a plot
- Rescale data for more effective visualization
- Create typical visualizations, such as scatter plots, histograms, density plots, boxplots, and their alternatives.
- Faceting plots to show visualizations in small multiples
- Creating publication-ready plots and using themes
Reproducible Research & Dynamic Documents
Contemporary life sciences research is plagued by reproducibility issues. This session covers some of the barriers to reproducible research and how to start to address some of those problems during the data management and analysis phases of the research life cycle. In this session we will cover using R and dynamic document generation with RMarkdown and RStudio to weave together reporting text with executable R code to automatically generate reports in the form of PDF, Word, or HTML documents.
Learning Objectives
- Understand the benefits of using dynamic documentation for reproducible research
- Using markdown as a markup / formatting language
- Embedding R code in an RMarkdown document
- Compiling Rmarkdown to an HTML or PDF report
- making quarto reports , presentations , blogs and websites
Essential of Mathematical Statistics in R
This session will provide hands-on instruction and exercises covering basic statistical analysis in R. This will cover descriptive statistics, t-tests, linear models, chi-square, clustering, dimensionality reduction, and resampling strategies. We will also cover methods for “tidying” model results for downstream visualization and summarization.
Learning Objectives
- Using exploratory data analysis and descriptive statistics to get a “feel” for the data you are working with
- Probability, random variables, discrete and continuous distributions, and the use of calculus to obtain expressions for parameters of these distributions such as the mean and variance (Bernoulli, binomial, Poisson, Geometric, Hypergeometric, Multinomial, negative binomial, Gaussian, Exponential, uniform, Gamma, Weibull, Chi-squared, Beta, Student’s T, F-distribution)
- Joint distributions for multiple random variables are introduced together with the important concepts of independence, correlation and covariance, marginal and conditional distributions
- Techniques for determining distributions of transformations of random variables
- Implementing statistical tests for continuous outcomes in R: t-tests, ANOVA, simple linear regression, and multiple linear regression
- Implementing statistical tests for categorical outcomes in R: chi-square tests, fisher exact tests, logistic regression
- Perform power and sample size analysis using R
- “Tidying” the results of statistical analysis
- understand the concept of simple and multiple linear regression
- perform simple linear regression
- perform multiple linear regression
- perform model fit assessment of linear regression models
- present and interpret the results of linear regression analyses
Survival Analysis
This session will provide hands-on instruction and exercises covering survival analysis using R.
Learning Objectives
- Learn the meaning of terms used in survival analysis: hazard, survival function, Kaplan-Meier curve, censoring, and proportional hazards
- Censoring mechanisms (Type 1 right censoring, type 2 right censoring, interval censoring and left censoring)
- Constructing a survival table in R
- Using the survminer package to create Kaplan-Meier plots
- Perform cox regression for survival analysis on multiple variables
- Categorizing continuous exposure variables for Kaplan-Meier analysis
- Parametric survival models in R
- to understand the basic concept of parametric survival analysis
- to understand the common parametric survival analysis models such as the exponential regression model and the Weibull survival mode.
- to perform the analysis for the exponential regression model and the Weibull regression model
- Parametric survival models in the accelerated failure time metric
- The extended Cox model, interactions and time-varying covariates
- Joint modelling/Multivariate/Clustered survival data
- Frailty models
- Sample size calculations for survival studies
Generalized Linear Models
This session will provide hands-on instruction and exercises covering Generalized Linear Models using R.
Learning Objectives
- to understand the concept of simple and multiple binary logistic regression
- to perform simple binary logistic regression
- to perform multiple binary logistic regression
- to perform model assessment of binary logistic regression
- to present and interpret results from binary logistic regression
- to understand the concept of logistic regression model to analyze data with polychotomous (multinomial) outcome
- to estimate parameters of interest in a logistic regression model from data with polychotomous (multinomial) outcome
- to make inference based on a logistic regression model from data with polychotomous (multinomial) outcome
- to predict the outcome based on a logistic regression model from data with polychotomous (multinomial) outcome
- to perform model checking of logistic regression model from data with with polychotomous (multinomial) outcomes
- understand the basic concepts behind Poisson regression for count and rate data
- perform Poisson regression for count and rate and log-linear models
- perform model fit assessment
- present and interpret the results of Poisson regression analyses
- Methods for assessing model adequacy (residual analysis, likelihood ratio statistic, score statistic, Wald statistic, deviance statistic, Pearson chi-square statistic, Pearson residuals and deviance residuals, over-dispersion)
- Nominal and ordinal logistic regression for categorical response variables with more than two categories (Baseline-category logit model, Cumulative logits model, Proportional odds model, Adjacent-category logit model)
- Models for count data (Poisson regression and
MULTIVARIATE STATISTICS IN R
After completion of the module the student will be able to:
Learning Objectives
- Explore and summarize multivariate data using graphical and numerical techniques
- Describe properties of multivariate distributions
- Demonstrate an understanding of limitations of some multivariate techniques
- Identify appropriate multivariate techniques to analyse multivariate data
- Perform multivariate analyses using R or Stata
- Explain and Interpret results from multivariate analyses
- Aspects of multivariate data (data organization & display, distance)
- Matrix algebra (positive definite matrices, mean vector and covariance matrix)
- Multivariate normal distribution
- Inference on mean vector
- MANOVA
- Multivariate Linear Regression
- Principal component analysis and PCA bi-plots
- Simple and Multiple Correspondence Analysis
- Multidimensional scaling
- Cluster Analysis
- Discriminant Analysis, Canonical Variate Analysis and Analysis of Distanc
Predictive Modeling & Forecasting
This session will provide hands-on instruction for using machine learning algorithms to predict a disease outcome. We will cover data cleaning, feature extraction, imputation, and using a variety of models to try to predict disease outcome. We will use resampling strategies to assess the performance of predictive modeling procedures such as Random Forest, stochastic gradient boosting, elastic net regularized regression (LASSO), and k-nearest neighbors. We will also demonstrate demonstrate how to forecast future trends given historical infectious disease surveillance data using methodology that accounts for seasonality and nonlinearity .
Learning Objectives
- Using exploratory data analysis & reviewing data visualization techniques to get a “feel” for the data you are working with
- Feature extraction and variable re-coding for machine learning analysis
- Imputing missing data
- Using the caret package for automated model training and testing
- Understand how resampling techniques can be used to develop a predictive model
- Assess the performance of a variety of predictive models on a particular data set: random forest, support vector machines, k-nearest neighbor, and elastic net regularized > regression
- Introduce forecasting and time series analysis
Introduction to SQL and its integration with R
Unlock the Power of Data with SQL, dplyr, tidyverse
Are you ready to take your data science skills to the next level? Then this course is for you!. You’ll immerse yourself in the exciting world of SQL and RStudio, learning how to integrate these powerful tools to optimize your data analytics. This course is designed to provide you with a solid and practical understanding of how to use SQL within the RStudio ecosystem.
Learning Objectives
- Introduction to SQL and RStudio
- we will become familiar with the RStudio environment and understand the basics of SQL.
- We’ll also explore SQL basics like SELECT ,FROM ,WHERE ,JOIN, DBI
- SQL Integration with dplyr
- we’ll learn how to use the package for data manipulation and how to integrate SQL queries with .
- We’ll explore the key verbs of such as select , filter ,mutate ,summarize , arrange and arrange and how to perform SQL queries within using .
- This will also focus on SQL integration and for efficient data manipulation.
- The tidyverse Ecosystem
R resources
Getting Help
Google it!: Try Googling generalized versions of any error messages you get. That is, remove text that is specific to your problem (names of variables, paths, datasets, etc.). You’d be surprised how many other people have probably had the same problem and solved it.
Stack Overflow: There are over 100,000 questions tagged with “R” on SO. Here are the most popular ones, ranked by vote. Always search before asking, and make a reproducible example if you want to get useful advice. This is a minimal example that allows others who are trying to help you to see the error themselves.
Read package vignettes. For example, see the dplyr CRAN page, scroll about halfway down to see the introduction to dplyr vignette.
General R Resources
- TryR: An interactive, browser-based R tutor
- Swirl: An R package that teaches you R (and statistics!) from within R
- Jenny Bryan’s Stat 545 “Data wrangling, exploration, and analysis with R” course material: An excellent resource for learning R, dplyr, and ggplot2
- DataCamp’s free introduction to R
- More DataCamp courses (UVA’s education benefits will cover these!).
- RStudio’s printable cheat sheets
- Rseek: A custom Google search for R-related sites
- Bioconductor vignettes, workflows, and course/conference materials
dplyr resources
ggplot2 resources
- The official ggplot2 documentation
- The ggplot2 book, edition 1, by the developer, Hadley Wickham
- New version of the ggplot2 book, freely available on GitHub
- The ggplot2 Google Group (mailing list, support forum)
- LearnR: A blog with a good number of posts describing how to reproduce various kind of plots using ggplot2
- SO questions tagged with ggplot2
- A catalog of graphs made with ggplot2, complete with accompanying R code
- RStudio’s ggplot2 cheat sheet
Markdown / RMarkdown resources
- Basic Markdown + RMarkdown reference
- In-browser markdown editors:
- Minimal: bioconnector.github.io/markdown-editor
- Better: stackedit.io, dillinger.io
- A good markdown reference
- A good 10-minute markdown tutorial
- RStudio’s RMarkdown Cheat Sheet and RMarkdown Reference Sheet
- The RMarkdown documentation has an excellent getting started guide, a gallery of demos, and several articles illustrating advanced usage.
- The knitr website has lots of useful reference material about how knitr works, options, and more.