Preface
Why
Statistics with R
?
Philosophy
What is in this handbook?
Resources
About me
Four basic ingredients
1
Setting up R
1.1
R
1.2
RStudio
1.2.1
The RStudio IDE
1.2.2
Install packages
1.2.3
RStudio Projects
1.3
Git & GitHub
1.4
Resources
2
Data Import
2.1
Entering data
2.2
From Text
2.3
From Excel
2.4
From SPSS
2.5
From SAS
2.6
From Stata
2.7
From systat
2.8
Data from R packages
3
R-Basics
3.1
Help
3.2
Data structures
3.2.1
Vectors
3.2.2
Sequences
3.2.3
Factors
3.2.4
Data frames
3.2.5
Tibbles
3.2.6
Matrix
3.2.7
List
3.2.8
Array
3.3
Dates
3.3.1
Date Conversion
3.3.2
Date to Character
3.4
Piping
3.5
Base pipes
3.6
Tidy pipes
4
R-Markdown
4.1
Installation
4.2
Resources
4.3
Introduction
4.4
R Markdown in RStudio
4.5
Getting started
4.6
Slideshows
4.7
PowerPoint reports
I Data Preprocessing
Introduction
5
Data Manipulation
5.1
Tutorial
6
Data Wrangling
6.1
Wrangling Tutorial
6.2
Wrangling Tutorial 2
6.3
Data Manipulations
7
Missing Values
7.1
Deleting NA’s
7.2
Multiple Imputations
7.3
NA’s tutorial
8
Outliers
8.1
Outliers
8.1.1
Detection by plots
8.1.2
Using statistics
8.1.3
Using MAD
8.1.4
Interquartile Range (IQR)
8.1.5
Grubb’s Test
8.1.6
Tools in R
8.2
Leverage
8.3
Influential
II Data Visualization
Introduction
ggplot2
8.3.1
Syntax
9
Aesthetic Mappings
9.1
Aesthetics
9.2
Coordinate systems
9.3
Color scales
9.4
Figure design
9.5
Right order
10
Visualizing Amounts
11
Visualizing Distributions
11.1
Histograms
11.2
Boxplots
12
Visualizing Proportions
13
Visualizing Trends
III Descriptive Statistics
Introduction
14
Data Tabulation
14.1
Frequency Tables
14.1.1
Tables in R
14.2
Cross-tabulations
14.2.1
Cross-tabs in R
14.3
Kable package
14.4
Tutorial
15
Univariate Analysis
15.1
Measurement Scales
15.2
Central Tendency
15.2.1
Arithmetic mean
15.2.2
Median
15.2.3
Mode
15.2.4
Quantiles
15.3
Dispersion
15.3.1
Range
15.3.2
Interquartile range
15.3.3
Variance
15.3.4
Standard deviation
15.3.5
% Variability
15.4
Chebychev’s rule
15.5
Empirical rule
15.6
Method of moments
15.7
Skewness
15.7.1
Skewness risk
15.8
Kurtosis
15.8.1
Kurtosis risk
15.9
Robust Statistics
15.9.1
Trimmed mean
15.9.2
Winsorized mean
15.9.3
Trimmed sd
15.9.4
MAD
15.9.5
IQR deviation
15.10
Summary reports
15.11
Tutorial
16
Bivariate Analysis
16.1
Spurious correlations
16.2
Bivariate data
16.3
Quantitative pairs
16.3.1
Scatterplots
16.3.2
Linear correlation
16.3.3
Partial correlations
16.3.4
Part correlation
16.4
Mixed scales
16.4.1
Dotplots
16.4.2
Boxplots
16.4.3
Rank correlations
16.4.4
Point-biserial correlation
16.5
Nonlinear correlation
16.5.1
eta
16.6
Correlation matrix
16.7
Qualitative pairs
16.7.1
Contingency table
16.7.2
Chi-square statistic
16.7.3
Mosaic plots
16.7.4
Pie charts
16.7.5
Barplots
16.7.6
Contingency correlations
16.8
Recreating data
IV Regression Analysis
Introduction
17
Simple Regression
17.1
OLS approach
17.2
Linear regression
17.3
Sample data
17.3.1
Univariate analysis
17.3.2
Scatterplots
17.3.3
Batter up
17.4
Sum of squared residuals
17.5
The linear model
17.6
Prediction and prediction errors
17.7
Model diagnostics
18
Multiple Regression
18.1
Sample data
18.1.1
Univariate analysis and correlation plots
18.1.2
Scatterplots
18.2
Simple Model
18.3
Model validation
18.4
Model diagnostics
18.4.1
Linearity
18.4.2
Nearly normal residuals
18.4.3
Constant variability
18.4.4
Outliers
18.4.5
Leverage points
18.4.6
Influential observations
18.4.7
Global tests of linear model assumptions
18.5
Nonlinear regression model
18.6
Multiple variables regression model
18.7
Evaluating multi-collinearity
18.8
Best subset regression
18.9
Stepwise regression
18.10
Comparing competing models
18.10.1
Akaike Information Criterion
18.10.2
Bayesian Information Criterion
18.10.3
Adjusted R-Squared
18.11
Cross Validation
18.12
Printing the final regression table
18.12.1
The ‘jtools’ package
18.12.2
The final model
18.12.3
The ‘stargazer’ package
18.12.4
The “modelsummary” package
18.13
TUTORIAL
19
GLM Regression
19.1
Maximum Likelihood
19.2
Beyond linear models
19.3
Logistic regression
19.3.1
Fitting a logistic regression model with
glm()
19.3.2
Log-odds transform
19.3.3
Worked Example
19.3.4
Over-dispersion
19.3.5
Comparing overall models
19.4
Modeling probabilities
19.4.1
Dissecting the logistic model
19.4.2
Predicting
19.5
Case study
19.6
Probit regression
19.7
Summary
V Time Series Analysis
Introduction
Other Representations
Date Versus Datetime
See Also
20
Time Series
20.1
Univariate Time Series Analysis
20.2
Time series data
20.3
Smoothing a Time Series
Problem
Solution
Discussion
See Also
20.4
TS plots
20.5
Time series components
20.5.1
TS patterns
20.6
Moving averages
20.6.1
MA’s of MA’s
20.6.2
Trend-cycle with seasonal data
20.7
Decomposing Non-Seasonal Data
20.8
Classical decomposition
20.8.1
Classical additive decomposition
20.9
X11 decomposition
20.10
STL decomposition
20.11
Seasonal Adjustments
20.11.1
Extensions: X-12 and X-13
20.12
Autocorrelation
20.12.1
Trend and seasonality in ACF plots
20.12.2
Monthly electricity production
20.12.3
Monthly electricity production
20.12.4
Monthly electricity production
20.12.5
White noise
20.13
Partial autocorrelation
20.14
Stationarity
20.14.1
Stationary?
20.14.2
Stationary?
20.14.3
Stationary?
20.15
Transformations
20.15.1
Calendar adjustments
20.15.2
Population adjustments
20.15.3
Inflation adjustments
20.15.4
Mathematical transformations
20.16
Differencing
20.16.1
Second-order differencing
20.16.2
Seasonal differencing
20.16.3
Electricity production
20.16.4
Electricity production
20.16.5
Electricity production
20.16.6
Electricity production
20.16.7
Electricity production
20.16.8
Seasonal differencing
20.16.9
Interpretation of differencing
20.17
Unit root tests
20.17.1
KPSS test
20.17.2
Automatically selecting differences
20.18
Missing values in TSA
20.18.1
Introduction
20.18.2
Types of time series data
20.18.3
Time series imputation
20.18.4
Time-Series specific method
20.18.5
The Combination of Seasonal Adjustment and other methods
20.18.6
Video tutorial
20.18.7
Cheat-sheet
20.19
Identifying outliers
20.19.1
Anomalies detection
20.20
TUTORIAL
21
Time Series Models
21.1
Univariate Time Series Modeling
21.2
Time series CHEAT-SHEET
21.2.1
Data Preparation
21.2.2
Exploring and Plotting ts Data
21.2.3
Seasonality
21.2.4
Lags and ACF, PACF
21.2.5
White Noise and the Ljung-Box Test
21.2.6
Model Selection
21.2.7
Naive Models
21.2.8
Residuals
21.2.9
Evaluating Model Accuracy
21.2.10
Many Models
21.3
Naive approach
21.3.1
Seasonal naive method
21.3.2
Drift method
21.3.3
Examples
21.4
Linear models
21.4.1
Multiple regression
21.4.2
Some useful predictors for linear models
21.4.3
Trend
21.4.4
Beer production - example
21.5
ETS models
21.5.1
Historical perspective
21.5.2
Simple method
21.5.3
Optimisation
21.5.4
ETS models with trend
21.5.5
Holt and Winters model
21.5.6
Holt-Winters additive model
21.5.7
Holt-Winters multiplicative method
21.6
Autoregressive models
21.6.1
AR(1) model
21.6.2
AR(2) model
21.6.3
Stationarity conditions
21.7
Moving Average (MA) models
21.7.1
MA(1) model
21.7.2
MA(2) model
21.8
ARIMA models
21.8.1
Exercise
21.9
Seasonal ARIMA models
21.9.1
Common ARIMA models
21.9.2
Exercise
21.10
TUTORIAL
Appendix
A
R-Pubs
A.1
Prerequisites
A.2
Instructions
B
Google Colab
C
R & SQL
C.1
Preview a .sql file
C.2
SQL chunks in RMarkdown
C.3
Passing vars to/from SQL chunks
C.4
Query parameter
C.5
Multiple parameters
C.6
SQL FILES & CHUNKS
Published with bookdown
Statistics with R
Chapter 10
Visualizing Amounts
In this tutorial we will discuss how to visualize amounts using bars.