Reproducible research: Stripe’s approach to data science

When people talk about their data infrastructure, they tend to focus on the technologies: Hadoop, Scalding, Impala, and the like. However, we’ve found that just as important as the technologies themselves are the principles that guide their use. We’d like to share our experience with one such principle that we’ve found particularly useful: reproducibility. We’ll talk about our motivation for focusing on reproducibility, how we’re using Jupyter Notebooks as our core tool, and the workflow we’ve developed around Jupyter to operationalize our approach. Jupyter notebooks are a fantastic way to create reproducible data science research. Motivation Data tools are most often used to generate some kind of exploratory analysis report. At Stripe, an example is an investigation of the probability that a card gets declined, given the time since its…


Link to Full Article: Reproducible research: Stripe’s approach to data science

Pin It on Pinterest

Share This