ODSC West 2019 Warm-Up: Machine Learning

ODSC West 2019 Warm-Up: Machine Learning

Presented By ODSC West
ODSC West
ODSC West

Vinod Bakthavachalam

Vinod Bakthavachalam is a data scientist working with the Content Strategy and Enterprise teams where his work has recently focused on understanding the skills landscape around the world using Coursera data (see the Global Skills Index Coursera recently published for some of his work). Prior to Coursera, he majored in Economics, Statistics, and Molecular and Cell Biology at UC Berkeley, and worked in quantitative finance.


Scott J Haines

Scott Haines is a distributed systems engineer focused on real-time, highly available, trust- worthy analytics systems. He works at Twilio where he is a Principal Software Engineer on the Voice Insights team where he helped drive spark adoption, streaming pipeline architecture best practices, as well as a massive stream processing platform. Prior to Twilio, he worked writing the backend Java API’s for Yahoo Games, as well as the real- time game ranking/ratings engine (built on Storm) to provide personalized recommendations and page views for 10 million customers. He finished his tenure at Yahoo working for Flurry Analytics where he wrote the alerts/notifications system for mobile.


Jane Adams

Jane Adams is an emergent media artist, working at the intersection of visual expression and scientific inquiry. As the Data Visualization Artist in Residence at the University of Vermont Complex Systems Center, Jane builds engaging, interactive, web-based visualizations of high-dimensional data for exploratory analysis. Her visualization research topics include social network lexical analysis, healthcare morbidity and mortality modeling, and geospatial temporal dynamics, all through a lens of complexity science. In her spare time, Jane experiments with music-color synesthesia, machine learning for computational creativity, self-sustaining aquaponic sculpture, and citizen science. She is the lead community organizer of Vermont Women in Machine Learning and Data Science (WiMLDS), and holds a MFA in Emergent Media. Stay in touch on Twitter @artistjaneadams


Andrew Long, PhD

Andrew Long is a Senior Data Scientist at Fresenius Medical Care North America (FMCNA). Andrew holds a PhD in biomedical engineering from Johns Hopkins University and a Master’s degree in mechanical engineering from Northwestern University. Andrew joined FMCNA in 2017 after participating in the Insight Health Data Fellows Program. At FMCNA, he is responsible for building, piloting, and deploying predictive models using machine learning to improve the quality of life of every patient who receives dialysis from FMCNA. He currently has multiple models in production to predict which patients are at the highest risk of negative outcomes.

Presentation Description

Causal Inference & Machine Learning

Speaker: Vinod Bakthavachalam, Data Scientist at Coursera

Lots of data science problems, especially towards informing business and product strategy, involve understanding causal relationships. The standard way to measure these is through AB testing, but many times that is infeasible, requiring alternative techniques from the causal inference that are an essential component of any data scientist's toolkit. The talk will walk through these techniques, some applications, and recent work at the intersection of causal inference and machine learning to handle large data sets.


Real-ish Time Predictive Analytics with Spark Structured Streaming

Speaker: Scott J Haines, Principal Software Engineer at Twilio

In 20 short minutes learn what becomes possible when you add Spark into your analytics pipeline. Learn how to effectivley solve common Data Engineering problems with compile-time guarenttes - like how to ingest, normalize, transform and join datasets in realtime. Learn how to add insights on top of your streaming data with simple filters and pre-trained models. 


Visualizing Complexity: Dimensionality Reduction and Network Science 

Speaker: Jane Adams, Data Visualization Artist at University of Vermont Complex Systems Center

Working with mathematicians, data scientists, and domain experts at the University of Vermont Complex Systems Center, data visualization artist Jane Adams has developed strategies for prototyping exploratory graphs of high-dimensional data. In this 90-minute workshop, Adams shares some of these methods for data discovery and interaction, navigating a creative workflow from paper prototypes of visual hypotheses through web-based interactive slices, offering critical insight for clustering, interpolation, and feature engineering.


Healthcare NLP with a doctor's bag of notes 

Speaker: Andrew Long, PhD, Data Scientist at Fresenius Medical Care

Nausea, vomiting, and diarrhea are words you would not frequently find in a natural language processing (NLP) project for tweets or product reviews. However, these words are common in healthcare. In fact, many clinical signs and patient symptoms (e.g. shortness of breath, fever, or chest pain) are only present in free-text notes and are not captured with structured numerical data. As a result, it is important for healthcare data scientists to be able to extract insight from unstructured clinical notes in electronic medical records. In this 20 min warm-up, the audience will have the opportunity to learn about an NLP concept known as bag-of-words. The audience will also get a preview of the outline for the 90-min workshop held at the upcoming ODSC West 2019.

Presentation Curriculum

Causal Inference & Machine Learning
06:49
Hide Content
Real-ish Time Predictive Analytics with Spark Structured Streaming
18:57
Hide Content
Data Art: Seeing the Future of Exploratory Analysis
14:38
Hide Content
Healthcare NLP with a doctor's bag of notes
17:24
Hide Content