Code Check Guide

Code Check Guide Collective

1 Introduction

Code Check Guide

1.1 What is code check?

1.1.1 Why is it important?

Reproducibility checks
Ensure the correct analysis is conducted and follows methodology
Everyone will inevitably make mistakes. We should have a system to catch them

1.1.2 Why don’t people do it?

Lack of time
Not seen as a priority
Lack of expertise
Embarrassed for others to see their code
Lack of incentives - error-mitigating procedures are weakly related to chances of publishing
Errors might be noticed, leading to retractions and corrections.

1.1.3 What specific problems does this guide address?

Time: help to prioritise review tasks for allotted time
Priority: clearly explain how review makes science better
Expertise: explain what type of code check can be done at different levels of expertise; signpost resources for developing appropriate expertise
Embarrassment: tips for code writers to make them more confident about showing their work to others
How to give and get credit for code review
Increase and diversify the pool of potential reviewers

1.2 Who is this guide for?

1.2.1 Expertise

Novice to intermediate experience with research code (you should at least be able to write an analysis script in the language you’re checking)
Mainly end users, no assumption of a computer science background
No assumption that you know git/github
Coding language agnostic (but examples in R and python?)

1.2.2 Roles

Code reviewers
Code writers to prep code for others to review
Checklist for self-check (when others don’t have time/expertise)

1.3 What are the goals?

1.3.1 Goals of code checking in general

1.3.1.1 Does it run?

Can a researcher who uses that language run it easily, are any unusual or complex procedures explained?

1.3.1.2 Is it reproducible?

Do you get the same outputs? Is it straightforward to check them?

1.3.1.3 Is it auditable/understandable?

Even if you don’t have the expertise to assess the stats or data processing, is the code well-organised enough to figure out what is intended so mistakes could be detected? Are the outputs sufficiently detailed to allow interrogation?

1.3.1.4 Does it follow best practices?

Is there too much repeated code that could benefit from modularisation? DRY (Don’t repeat yourself) and SPOT (Single Point of Truth)? Are the outputs of long processes saved and loaded from file? Are there unit tests? Do the variable names make sense? Do the results match what is shown in the output and there is no rounding up or down?

1.3.1.5 Is it correct and appropriate?

Is the code actually doing what is intended? Is what is intended correct? Some logical problems can be caught without domain knowledge, such as intending to to filter out male subjects, but actually filtering them IN. Many other problems require domain and/or statistical knowledge, so may only be appropriate in some circumstances.

1.3.2 What are NOT goals of code check

1.3.2.1 Debugging

Do not submit code that doesn’t run for you

1.3.2.2 Code Help

Don’t expect the reviewer to create code for you

1.3.2.3 Stats Consulting

Do not rely on code check to assess the appropriateness of your scientific decisions or statistical analyses

1.3.3 Goals in this guide

Main focus on the first 3 goals above (runnable, reproducible, auditable)
Maybe best practices in an appendix?
We won’t focus much on the last one, as it requires stats/domain knowledge