Preface
Why
Statistics with R
?
Philosophy
What is in this handbook?
Resources
About me
Four basic ingredients
1
Setting up R
1.1
R
1.2
RStudio
1.2.1
The RStudio IDE
1.2.2
Install packages
1.2.3
RStudio Projects
1.3
Git & GitHub
1.4
Resources
2
Data Import
2.1
Entering data
2.2
From Text
2.3
From Excel
2.4
From SPSS
2.5
From SAS
2.6
From Stata
2.7
From systat
2.8
Data from R packages
3
R-Basics
3.1
Help
3.2
Data structures
3.2.1
Vectors
3.2.2
Sequences
3.2.3
Factors
3.2.4
Data frames
3.2.5
Tibbles
3.2.6
Matrix
3.2.7
List
3.2.8
Array
3.3
Dates
3.3.1
Date Conversion
3.3.2
Date to Character
3.4
Piping
3.5
Base pipes
3.6
Tidy pipes
4
R-Markdown
4.1
Installation
4.2
Resources
4.3
PowerPoint
I Data Preprocessing
Introduction
5
Data Manipulation
5.1
Tutorial
6
Data Wrangling
6.1
Wrangling Tutorial
6.2
Wrangling Tutorial 2
6.3
Data Manipulations
7
Missing Values
7.1
Deleting NA’s
7.2
Multiple Imputations
7.3
NA’s tutorial
8
Outliers
8.1
Outliers
8.1.1
Detection by plots
8.1.2
Using statistics
8.1.3
Using MAD
8.1.4
Interquartile Range (IQR)
8.1.5
Grubb’s Test
8.1.6
Tools in R
8.2
Leverage
8.3
Influential
II Data Visualization
Introduction
ggplot2
8.3.1
Syntax
9
Aesthetic Mappings
9.1
Aesthetics
9.2
Coordinate systems
9.3
Color scales
9.4
Figure design
9.5
Right order
10
Visualizing Amounts
11
Visualizing Distributions
11.1
Histograms
11.2
Boxplots
12
Visualizing Proportions
13
Visualizing Trends
III Descriptive Statistics
Introduction
14
Data Tabulation
14.1
Frequency Tables
14.1.1
Tables in R
14.2
Cross-tabulations
14.2.1
Cross-tabs in R
14.3
Kable package
14.4
Tutorial
15
Univariate Analysis
15.1
Measurement Scales
15.2
Central Tendency
15.2.1
Arithmetic mean
15.2.2
Median
15.2.3
Mode
15.2.4
Quantiles
15.3
Dispersion
15.3.1
Range
15.3.2
Interquartile range
15.3.3
Variance
15.3.4
Standard deviation
15.3.5
% Variability
15.4
Chebychev’s rule
15.5
Empirical rule
15.6
Method of moments
15.7
Skewness
15.7.1
Skewness risk
15.8
Kurtosis
15.8.1
Kurtosis risk
15.9
Robust Statistics
15.9.1
Trimmed mean
15.9.2
Winsorized mean
15.9.3
Trimmed sd
15.9.4
MAD
15.9.5
IQR deviation
15.10
Summary reports
15.11
Tutorial
16
Bivariate Analysis
16.1
Spurious correlations
16.2
Bivariate data
16.3
Quantitative pairs
16.3.1
Scatterplots
16.3.2
Linear correlation
16.3.3
Partial correlations
16.3.4
Part correlation
16.4
Mixed scales
16.4.1
Dotplots
16.4.2
Boxplots
16.4.3
Rank correlations
16.4.4
Point-biserial correlation
16.5
Nonlinear correlation
16.5.1
eta
16.6
Correlation matrix
16.7
Qualitative pairs
16.7.1
Contingency table
16.7.2
Chi-square statistic
16.7.3
Mosaic plots
16.7.4
Pie charts
16.7.5
Barplots
16.7.6
Contingency correlations
16.8
Recreating data
IV Regression Analysis
Introduction
17
Simple Regression
17.1
Linear regression
17.2
Sample data
17.2.1
Univariate analysis
17.2.2
Scatterplots
17.2.3
Batter up
17.3
Sum of squared residuals
17.4
The linear model
17.5
Prediction and prediction errors
17.6
Model diagnostics
18
Multiple Regression
18.1
Sample data
18.1.1
Univariate analysis and correlation plots
18.1.2
Scatterplots
18.2
Simple Model
18.3
Model validation
18.4
Model diagnostics
18.4.1
Linearity
18.4.2
Nearly normal residuals
18.4.3
Constant variability
18.4.4
Outliers
18.4.5
Leverage points
18.4.6
Influential observations
18.4.7
Global tests of linear model assumptions
18.5
Nonlinear regression model
18.6
Multiple variables regression model
18.7
Evaluating multi-collinearity
18.8
Best subset regression
18.9
Stepwise regression
18.10
Comparing competing models
18.10.1
Akaike Information Criterion
18.10.2
Bayesian Information Criterion
18.10.3
Adjusted R-Squared
18.11
Cross Validation
18.12
Printing the final regression table
18.12.1
The ‘jtools’ package
18.12.2
The final model
18.12.3
The ‘stargazer’ package
18.13
TUTORIAL
19
GLM Regression
19.1
Beyond linear models
19.2
Logistic regression
19.2.1
Fitting a logistic regression model with
glm()
19.2.2
Log-odds transform
19.2.3
Worked Example
19.2.4
Over-dispersion
19.2.5
Comparing overall models
19.3
Modeling probabilities
19.3.1
Dissecting the logistic model
19.3.2
Predicting
19.4
Case study
19.5
Probit regression
19.6
Summary
V Time Series Analysis
Introduction
20
Time Series
21
Time Series Smoothing
22
Time Series Models
Appendix
A
R-Pubs
A.1
Prerequisites
A.2
Instructions
B
Google Colab
C
R & SQL
Published with bookdown
Statistics with R
Chapter 22
Time Series Models