Machine Learning in R (Part I)

Machine Learning in R (Part I)

Presented By Jared Lander
Jared Lander
Jared Lander
Chief Data Scientist at Lander Analytics

Jared Lander is the Founder and CEO of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fund raising to finance and humanitarian relief efforts.

Presentation Description

Modern statistics has become almost synonymous with machine learning, a collection of techniques that utilize today's incredible computing power. This two-part course focuses on the available methods for implementing machine learning algorithms in R, and will examine some of the underlying theory behind the curtain. We start with the foundation of it all, the linear model and its generalization, the glm. We look at how to assess model quality with traditional measures and cross-validation and visualize models with coefficient plots. Next, we turn to penalized regression with the Elastic Net. After that, we turn to Boosted Decision Trees utilizing xgboost. Attendees should have a good understanding of linear models and classification and should have R and RStudio installed, along with the `glmnet`, `xgboost`, `boot`, `ggplot2`, `UsingR` and `coefplot` packages.

Linear Models
Learn about the best fit line
Understand the formula interface in R
Understand the design matrix
Fit Models with `lm`
Visualize the coefficients with `coefplot`
Make predictions on new data

Generalized Linear Models
Learn about Logistic Regression for classification
Learn about Poisson Regression for count data
Fit models with `glm`
Visualize the coefficients with `coefplot`

Model Assessment
Compare models

Learn the reasoning and process behind cross-validation

Elastic Net
Learn about penalized regression with the Lasso and Ridge
Fit models with `glmnet`
Understand the coefficient path
View coefficients with `coefplot`

Boosted Decision Trees
Learn how to make classifications (and regression) using recursive partitioning
Fit models with `xgboost`
Make compelling visualizations with `DiagrammeR`

Presentation Curriculum

Machine Learning in R Part I
Hide Content