Sean is the field data science lead at Databricks. He is an Apache Spark committer and PMC member, and co-author Advanced Analytics with Spark. He started the Oryx project from his startup, Myrrix. Previously, he was director of Data Science at Cloudera and an engineer at Google.
We hear about "model bias," but really models are just mirrors to the data they trained on. Can we use them to detect instances of bias in data, and not just make predictions? This talk will examine the results of the 2019 StackOverflow Developer Survey, and apply Apache Spark and SHAP (Shapley Additive Explanations) to study whether attributes like gender have outsized effects on developer salaries in certain instances.