Reproducibility and Artifact Evaluation

The research presented at SIGSIM PADS typically subsumes artifacts beyond the document itself: software, mechanized proofs, models, test suites, benchmarks, and so on. In addition, computational results that rely on executing associated artifacts are presented, underlining the development of modeling and simulation methods as an experimental science. Authors should be rewarded for creating systems that others in the community can build on and providing documentation enabling the reproduction of the presented computational results.

Therefore, SIGSIM PADS has joined the group of ACM conferences that have established an optional reviewing process to evaluate artifacts and reproduce computational results (in the following called AERCR). When submitting a contribution, the authors can choose whether to participate in the AERCR. The outcome of AERCR does not affect the standard paper evaluation process. The reproducibility process only starts after the paper is accepted.

Goals

The goals of AERCR are manifold:

  • to increase the trust in modeling and simulation research and enhance the credibility of the presented results;
  • to reward authors who put efforts into the documentation to facilitate the reuse of computational results and artifacts;
  • to make artifacts accessible and to prepare them for reuse;
  • to foster progress in modeling and simulation science and practice;
  • to enhance the visibility and impact of modeling and simulation research.

Criteria

When submitting the paper, the authors can choose the criteria for evaluating the artifacts and computational results associated with their paper, i.e.,

  • artifacts are evaluated to be permanently available for retrieval;
  • artifacts are evaluated to be functional or reusable;
  • associated with results that can be reproduced.

Badges

Depending on the criteria chosen and the evaluation process results (see below), one or more of the ACM badges (see ACM Artifact Review and Badging) may be stamped on the final paper published in the SIGSIM PADS proceedings or TOMACS special issue.

Artifacts Evaluated - Functional

The artifacts associated with the research are found to be documented, consistent, complete, and exercisable and include appropriate evidence of verification and validation.

  • Documented: At a minimum, an inventory of artifacts is included, and sufficient description is provided to enable the artifacts to be exercised.
  • Consistent: The artifacts are relevant to the associated paper and contribute in some inherent way to generating its main results.
  • Complete: All components relevant to the paper in question are included to the extent possible. (Proprietary artifacts need not be included. If they are required to exercise the package, this should be documented, along with instructions on obtaining them. Proxies for proprietary data should be included to demonstrate the analysis.)
  • Exercisable: Scripts and/or software used to generate the results in the associated paper can be successfully executed, and included data can be accessed and appropriately manipulated.

Artifacts Evaluated - Reusable

The artifacts associated with the paper are of a quality that significantly exceeds minimal functionality. That is, they have all the qualities of the Artifacts Evaluated – Functional level, but, in addition, they are very carefully documented and well-structured to the extent that reuse and repurposing are facilitated.

Artifacts Available

The authors have created artifacts relevant to this paper that have been placed in a publicly accessible archival repository. A DOI or link to this repository, along with a unique identifier for the object, is provided. Note that this badge will only be awarded in conjunction with one of the Artifacts Evaluated badges. Finally, the artifact must have an associated license, ideally, one that allows use for comparison purposes.

Results Reproduced

This badge is applied to papers in which the paper’s main results have been successfully obtained by a person or team other than the author. In particular, the paper’s main results were obtained in a subsequent study by a person or team other than the authors, who used, in part, artifacts provided by the author.

Evaluation Process

At the time of paper submission, the authors can choose whether or not, and if so, how they want to participate by selecting none, one, or more of the criteria boxes. Only accepted papers will participate in the artifact reviewing and reproducing computational results process. The evaluation process will start shortly after the corresponding paper is accepted. Therefore, authors should document how to reproduce computational results and prepare their artifacts well in advance of the time of notification. Please note, by no means will the Artifact Evaluation initiative impact the acceptance of papers, meaning that authors who opt-in will not have higher chances of having their work accepted, nor will failing at reproducing the results impact the acceptance of the paper.

The members of the AERCR committee are responsible for conducting the evaluation process according to the criteria the authors applied for when submitting their paper and the expectations put forward by the associated ACM badges. One member of the AERCR committee will be assigned to each paper’s artifacts and computational results.

Artifact Evaluation and Reproducibility Reports

To reward the authors, increase the visibility of reproduced research and the evaluated artifacts, and show how the initiative is carried out, the proceedings will include a short report citing the original paper and showing the reproduced results, making them available to readers. The Reproducibility Chairs will check these reports to ensure that they are objective and factual and that no negative results are attached to the original papers in the reports.

Artifact Details

To avoid excluding some papers, the AERCR committee will try to accept any artifact authors wish to submit. These can be models, software, mechanized proofs, test suites, data sets, etc. Obviously, the better the artifact is packaged, the more likely the member of the AERCR committee can work with it. Submission of an artifact does not contain tacit permission to make its content public. No part of the artifacts will be published during or after completing an evaluation, and no part will be retained after the evaluation. Artifacts may include models, data files, proprietary binaries, etc. Authors are encouraged to anonymize data files where appropriate.

Submission Guidelines for the Artifacts

Please note that the SIGSIM PADS Artifact Evaluation initiative is an open reviewing process in which both reviewers and authors know each other. The review will start after the research paper has been accepted. We expect the artifacts to be submitted within one week after the acceptance letter. The chair of the AERCR will contact the authors with further information. To support an efficient process, authors should carefully follow the guidelines below when preparing the supplementary material. The submitted material will be a zip file that includes, also depending on the criteria chosen:

A README file describing the artifacts associated with the paper (e.g., data, simulation models, …). Please provide detailed information about the required environment in which the artifact(s) can be executed and evaluated, in particular:

  • List of authors, highlighting the corresponding author for the artifact evaluation and reproduction of computational results,
  • Instructions on how to resolve dependencies of software, if required,
  • Hardware requirements (e.g., amount of RAM, number of cores),
  • Describe how to evaluate if the artifacts have been properly installed,
  • Expected runtimes for each experiment,
  • A description of the software/hardware environment used in your experiments,

In addition, for the criteria of availability

  • A license file
  • DOI etc.

In addition, for the criteria or computational results reproduced

  • List of results to be reproduced (e.g., list of tables and figures),
  • Claims supported by results.
  • A script for every figure or table of the paper that contains results that shall be reproduced.

Of course, any additional information that appears relevant may be included to simplify the AERCR process.

Here are some general tips to make life easier for both authors and the AERCR committee:

  • Automate as much as possible: it saves much time from not having to re-run experiments that suffered from human error. This is feasible even for artifacts needing multiple nodes or replicating configurations in multiple places. See this repository for an example of artifact automation with Docker.

  • Try out the artifacts on a blank environment, following the steps included in the documentation.

  • Log both successes and failures so that users know that something happened. Avoid logging unnecessary or confusing information, such as verbose output or expected failures. Log potential issues, such as an optional but recommended library not being present.

  • Measure resource use using tools such as mpstat, iostat, vmstat, and ifstat to measure CPU, I/O, memory, and network use respectively on Linux, or /usr/bin/time -v to measures the time and memory used by a command also on Linux. This lets users know what to expect.

We provided a blueprint for authors on how such AERCR documentation might look in: README.

ACM Badges

Current versions of ACM Artifact Review and Badging v1.1 as applicable in SIGSIM PADS (more information here).

Artifacts Evaluated - Functional

Artifacts Evaluated - Reusable

Artifacts Available

Results Reproduced