Test set and Titanic history

Dear Kaggle community this is my first post for my first competition, so please forgive me if it is a naive question. Since the Titanic deaths are a piece of history, one can find all details about the passengers online, including the target variable of interest survived ,  e.g. http://www.encyclopedia-titanica.org/titanic-victims/ If one makes a submission consisting only of 0’s and 1’s for the test set passengers, how can the competion organizer be sure that this was the output of a machine learning model, and not just a quick hack with internet fact checking. Is the test set, or the hidden test set, somehow randomizing the passenger details? Or is it synthetic, toy data set? Of course, I love a fair data mining challenge and would myself try to be honest,…


Link to Full Article: Test set and Titanic history

Pin It on Pinterest

Share This