Biased Training Data Set?

Hi all, one very simple and basic question – I checked the forum, but could not find any information about this: In the competition description there is said that 1502 out of 2224 people died which gives you a general chance to survice of 32.5%. Taking the training data set: 549 out of 891 people died resulting in a chance of 38.4% surviving. My first (test) approach was then submitting the test data set with a randomly generated surviving-variable (32.5%) leading to a score of 0.47847. – Going by chance I would have expected a score of about 0.52-0.56. This goes along with various statements about lower scores then expected in the forum. Performing a simple t test between the death/surviving ratio in total and in the training data set…

Link to Full Article: Biased Training Data Set?

Pin It on Pinterest

Share This