• Preface
    • Why Statistics with R?
    • Philosophy
    • What is in this handbook?
    • Resources
    • About me
  • Four basic ingredients
  • 1 Setting up R
    • 1.1 R
    • 1.2 RStudio
      • 1.2.1 The RStudio IDE
      • 1.2.2 Install packages
      • 1.2.3 RStudio Projects
    • 1.3 Git & GitHub
    • 1.4 Resources
  • 2 Data Import
    • 2.1 Entering data
    • 2.2 From Text
    • 2.3 From Excel
    • 2.4 From SPSS
    • 2.5 From SAS
    • 2.6 From Stata
    • 2.7 From systat
    • 2.8 Data from R packages
  • 3 R-Basics
    • 3.1 Help
    • 3.2 Data structures
      • 3.2.1 Vectors
      • 3.2.2 Sequences
      • 3.2.3 Factors
      • 3.2.4 Data frames
      • 3.2.5 Tibbles
      • 3.2.6 Matrix
      • 3.2.7 List
      • 3.2.8 Array
    • 3.3 Dates
      • 3.3.1 Date Conversion
      • 3.3.2 Date to Character
    • 3.4 Piping
    • 3.5 Base pipes
    • 3.6 Tidy pipes
  • 4 R-Markdown
    • 4.1 Installation
    • 4.2 Resources
    • 4.3 Introduction
    • 4.4 R Markdown in RStudio
    • 4.5 Getting started
    • 4.6 Slideshows
    • 4.7 PowerPoint reports
  • I Data Preprocessing
  • Introduction
  • 5 Data Manipulation
    • 5.1 Tutorial
  • 6 Data Wrangling
    • 6.1 Wrangling Tutorial
    • 6.2 Wrangling Tutorial 2
    • 6.3 Data Manipulations
  • 7 Missing Values
    • 7.1 Deleting NA’s
    • 7.2 Multiple Imputations
    • 7.3 NA’s tutorial
  • 8 Outliers
    • 8.1 Outliers
      • 8.1.1 Detection by plots
      • 8.1.2 Using statistics
      • 8.1.3 Using MAD
      • 8.1.4 Interquartile Range (IQR)
      • 8.1.5 Grubb’s Test
      • 8.1.6 Tools in R
    • 8.2 Leverage
    • 8.3 Influential
  • II Data Visualization
  • Introduction
    • ggplot2
      • 8.3.1 Syntax
  • 9 Aesthetic Mappings
    • 9.1 Aesthetics
    • 9.2 Coordinate systems
    • 9.3 Color scales
    • 9.4 Figure design
    • 9.5 Right order
  • 10 Visualizing Amounts
  • 11 Visualizing Distributions
    • 11.1 Histograms
    • 11.2 Boxplots
  • 12 Visualizing Proportions
  • 13 Visualizing Trends
  • III Descriptive Statistics
  • Introduction
  • 14 Data Tabulation
    • 14.1 Frequency Tables
      • 14.1.1 Tables in R
    • 14.2 Cross-tabulations
      • 14.2.1 Cross-tabs in R
    • 14.3 Kable package
    • 14.4 Tutorial
  • 15 Univariate Analysis
    • 15.1 Measurement Scales
    • 15.2 Central Tendency
      • 15.2.1 Arithmetic mean
      • 15.2.2 Median
      • 15.2.3 Mode
      • 15.2.4 Quantiles
    • 15.3 Dispersion
      • 15.3.1 Range
      • 15.3.2 Interquartile range
      • 15.3.3 Variance
      • 15.3.4 Standard deviation
      • 15.3.5 % Variability
    • 15.4 Chebychev’s rule
    • 15.5 Empirical rule
    • 15.6 Method of moments
    • 15.7 Skewness
      • 15.7.1 Skewness risk
    • 15.8 Kurtosis
      • 15.8.1 Kurtosis risk
    • 15.9 Robust Statistics
      • 15.9.1 Trimmed mean
      • 15.9.2 Winsorized mean
      • 15.9.3 Trimmed sd
      • 15.9.4 MAD
      • 15.9.5 IQR deviation
    • 15.10 Summary reports
    • 15.11 Tutorial
  • 16 Bivariate Analysis
    • 16.1 Spurious correlations
    • 16.2 Bivariate data
    • 16.3 Quantitative pairs
      • 16.3.1 Scatterplots
      • 16.3.2 Linear correlation
      • 16.3.3 Partial correlations
      • 16.3.4 Part correlation
    • 16.4 Mixed scales
      • 16.4.1 Dotplots
      • 16.4.2 Boxplots
      • 16.4.3 Rank correlations
      • 16.4.4 Point-biserial correlation
    • 16.5 Nonlinear correlation
      • 16.5.1 eta
    • 16.6 Correlation matrix
    • 16.7 Qualitative pairs
      • 16.7.1 Contingency table
      • 16.7.2 Chi-square statistic
      • 16.7.3 Mosaic plots
      • 16.7.4 Pie charts
      • 16.7.5 Barplots
      • 16.7.6 Contingency correlations
    • 16.8 Recreating data
  • IV Regression Analysis
  • Introduction
  • 17 Simple Regression
    • 17.1 OLS approach
    • 17.2 Linear regression
    • 17.3 Sample data
      • 17.3.1 Univariate analysis
      • 17.3.2 Scatterplots
      • 17.3.3 Batter up
    • 17.4 Sum of squared residuals
    • 17.5 The linear model
    • 17.6 Prediction and prediction errors
    • 17.7 Model diagnostics
  • 18 Multiple Regression
    • 18.1 Sample data
      • 18.1.1 Univariate analysis and correlation plots
      • 18.1.2 Scatterplots
    • 18.2 Simple Model
    • 18.3 Model validation
    • 18.4 Model diagnostics
      • 18.4.1 Linearity
      • 18.4.2 Nearly normal residuals
      • 18.4.3 Constant variability
      • 18.4.4 Outliers
      • 18.4.5 Leverage points
      • 18.4.6 Influential observations
      • 18.4.7 Global tests of linear model assumptions
    • 18.5 Nonlinear regression model
    • 18.6 Multiple variables regression model
    • 18.7 Evaluating multi-collinearity
    • 18.8 Best subset regression
    • 18.9 Stepwise regression
    • 18.10 Comparing competing models
      • 18.10.1 Akaike Information Criterion
      • 18.10.2 Bayesian Information Criterion
      • 18.10.3 Adjusted R-Squared
    • 18.11 Cross Validation
    • 18.12 Printing the final regression table
      • 18.12.1 The ‘jtools’ package
      • 18.12.2 The final model
      • 18.12.3 The ‘stargazer’ package
      • 18.12.4 The “modelsummary” package
    • 18.13 TUTORIAL
  • 19 GLM Regression
    • 19.1 Maximum Likelihood
    • 19.2 Beyond linear models
    • 19.3 Logistic regression
      • 19.3.1 Fitting a logistic regression model with glm()
      • 19.3.2 Log-odds transform
      • 19.3.3 Worked Example
      • 19.3.4 Over-dispersion
      • 19.3.5 Comparing overall models
    • 19.4 Modeling probabilities
      • 19.4.1 Dissecting the logistic model
      • 19.4.2 Predicting
    • 19.5 Case study
    • 19.6 Probit regression
    • 19.7 Summary
  • V Time Series Analysis
  • Introduction
    • Other Representations
    • Date Versus Datetime
    • See Also
  • 20 Time Series
    • 20.1 Univariate Time Series Analysis
    • 20.2 Time series data
    • 20.3 Smoothing a Time Series
      • Problem
      • Solution
      • Discussion
      • See Also
    • 20.4 TS plots
    • 20.5 Time series components
      • 20.5.1 TS patterns
    • 20.6 Moving averages
      • 20.6.1 MA’s of MA’s
      • 20.6.2 Trend-cycle with seasonal data
    • 20.7 Decomposing Non-Seasonal Data
    • 20.8 Classical decomposition
      • 20.8.1 Classical additive decomposition
    • 20.9 X11 decomposition
    • 20.10 STL decomposition
    • 20.11 Seasonal Adjustments
      • 20.11.1 Extensions: X-12 and X-13
    • 20.12 Autocorrelation
      • 20.12.1 Trend and seasonality in ACF plots
      • 20.12.2 Monthly electricity production
      • 20.12.3 Monthly electricity production
      • 20.12.4 Monthly electricity production
      • 20.12.5 White noise
    • 20.13 Partial autocorrelation
    • 20.14 Stationarity
      • 20.14.1 Stationary?
      • 20.14.2 Stationary?
      • 20.14.3 Stationary?
    • 20.15 Transformations
      • 20.15.1 Calendar adjustments
      • 20.15.2 Population adjustments
      • 20.15.3 Inflation adjustments
      • 20.15.4 Mathematical transformations
    • 20.16 Differencing
      • 20.16.1 Second-order differencing
      • 20.16.2 Seasonal differencing
      • 20.16.3 Electricity production
      • 20.16.4 Electricity production
      • 20.16.5 Electricity production
      • 20.16.6 Electricity production
      • 20.16.7 Electricity production
      • 20.16.8 Seasonal differencing
      • 20.16.9 Interpretation of differencing
    • 20.17 Unit root tests
      • 20.17.1 KPSS test
      • 20.17.2 Automatically selecting differences
    • 20.18 Missing values in TSA
      • 20.18.1 Introduction
      • 20.18.2 Types of time series data
      • 20.18.3 Time series imputation
      • 20.18.4 Time-Series specific method
      • 20.18.5 The Combination of Seasonal Adjustment and other methods
      • 20.18.6 Video tutorial
      • 20.18.7 Cheat-sheet
    • 20.19 Identifying outliers
      • 20.19.1 Anomalies detection
    • 20.20 TUTORIAL
  • 21 Time Series Models
    • 21.1 Univariate Time Series Modeling
    • 21.2 Time series CHEAT-SHEET
      • 21.2.1 Data Preparation
      • 21.2.2 Exploring and Plotting ts Data
      • 21.2.3 Seasonality
      • 21.2.4 Lags and ACF, PACF
      • 21.2.5 White Noise and the Ljung-Box Test
      • 21.2.6 Model Selection
      • 21.2.7 Naive Models
      • 21.2.8 Residuals
      • 21.2.9 Evaluating Model Accuracy
      • 21.2.10 Many Models
    • 21.3 Naive approach
      • 21.3.1 Seasonal naive method
      • 21.3.2 Drift method
      • 21.3.3 Examples
    • 21.4 Linear models
      • 21.4.1 Multiple regression
      • 21.4.2 Some useful predictors for linear models
      • 21.4.3 Trend
      • 21.4.4 Beer production - example
    • 21.5 ETS models
      • 21.5.1 Historical perspective
      • 21.5.2 Simple method
      • 21.5.3 Optimisation
      • 21.5.4 ETS models with trend
      • 21.5.5 Holt and Winters model
      • 21.5.6 Holt-Winters additive model
      • 21.5.7 Holt-Winters multiplicative method
    • 21.6 Autoregressive models
      • 21.6.1 AR(1) model
      • 21.6.2 AR(2) model
      • 21.6.3 Stationarity conditions
    • 21.7 Moving Average (MA) models
      • 21.7.1 MA(1) model
      • 21.7.2 MA(2) model
    • 21.8 ARIMA models
      • 21.8.1 Exercise
    • 21.9 Seasonal ARIMA models
      • 21.9.1 Common ARIMA models
      • 21.9.2 Exercise
    • 21.10 TUTORIAL
  • Appendix
  • A R-Pubs
    • A.1 Prerequisites
    • A.2 Instructions
  • B Google Colab
  • C R & SQL
    • C.1 Preview a .sql file
    • C.2 SQL chunks in RMarkdown
    • C.3 Passing vars to/from SQL chunks
    • C.4 Query parameter
    • C.5 Multiple parameters
    • C.6 SQL FILES & CHUNKS
  • Published with bookdown
  • DOI
  • ISBN

Statistics with R