Yuriy is a highly versatile software engineer and data scientist with over 10 years of industry experience, as well as a successful track record of algorithmic programming contests and Kaggle competitions (currently top 2% worldwide). During his career, Yuriy has performed a variety of R&D, data analysis, engineering, and leadership roles. He believes in T-shaped skills, therefore Yuriy has broad interests in technology in general, but his deepest knowledge is in data science and software architecture.
Target leakage is one of the most difficult problems in developing real-world models. It occurs when training data gets contaminated with information that will not be known at prediction time. Data collection, feature engineering, partitioning, and model validation are all potential sources of data leakage. This talk offers real-life examples of data leakage at different stages of data science projects, discusses countermeasures, and lays out best practices for model validation.