Reproducibility is important in all research, but is especially important in environmental health studies that can play a role in determining EPA standards and other policy changes. This talk will discuss some of the challenges involved in working with large health data sets and in creating reproducible data pipelines, as well as reviewing the structure and thinking behind one pipeline preparing public data for use with our other data sources.