What is this?

This is the Gerrit code review datasets as part of the MSR 2016 data track paper "Mining the Modern Code Review Repositories: A Dataset of People, Process and Product". We extracted all the dataset from Gerrit server using our mining scripts which based on the official REST API. The data is stored as in relational database (MySQL) format.

MSR 2016 data showcase - Mining Code Review Repositories from Xin Yang

Target Projects

We exported the database objects to self-contained files for each project. All .sql files are available here:

If you want to use this dataset in a publication, please cite the following paper.
@InProceedings{Yang2016MSR,
  Author = {Yang, Xin and Kula, Raula Gaikovina and Yoshida, Norihiro and Iida, Hajimu},
  Title = {Mining the Modern Code Review Repositories: A Dataset of People, Process and Product},
  BookTitle = {Proceedings of the 13th International Conference on Mining Software Repositories},
  Pages = {460--463},
  Year = {2016}
}

Notice: To protect the privacy of developers, we have anonymized all the usernames and email address of developers.

Documentation

In our wiki pages, you can find the details of the database schema, , how to query from it using SQL, and how to obtain the source code.

Mining Scripts

The mining scripts can be found here, you can run/modify them to get your own dataset.

Active Members

Current Work

Contacts

If you have any questions, please contact us (xinyang [at] ist.osaka-u.ac.jp).