Ten Simple Rules for Taking Advantage of Git and GitHub Yasset Perez-Riverol; Laurent Gatto; Rui Wang; Timo Sachsenberg; Julian Uszkoreit; Felipe da Veiga Leprevost; Christian Fufezan; Tobias Ternent; Stephen J. Eglen; Daniel S. Katz; Tom J. Pollard; Alexander Konovalov; Robert M. Flight; Kai Blin; Juan Antonio Vizcaíno, edited byScott Markel PLoS Computational Biology , vol. 12, iss. p.
2222590Ten Simple Rules for Taking Advantage of Git and GitHub — PLoS Computational Biology , vol. 12, iss. p.2016Yasset Perez-Riverol; Laurent Gatto; Rui Wang; Timo Sachsenberg; Julian Uszkoreit; Felipe da Veiga Leprevost; Christian Fufezan; Tobias Ternent; Stephen J. Eglen; Daniel S. Katz; Tom J. Pollard; Alexander Konovalov; Robert M. Flight; Kai Blin; Juan Antonio Vizcaíno
Bioinformatics is a broad discipline in which one common denominator is the need to produce and/or use software that can be applied to biological data in different contexts. To enable and ensure the replicability and traceability of scientific claims, it is essential that the scientific publication, the corresponding datasets, and the data analysis are made publicly available [[1],[2]]. All software used for the analysis should be either carefully documented (e.g., for commercial software) or, better yet, openly shared and directly accessible to others [[3],[4]]. The rise of openly available software and source code alongside concomitant collaborative development is facilitated by the existence of several code repository services such as SourceForge, Bitbucket, GitLab, and GitHub, among others. These resources are also essential for collaborative software projects because they enable the organization and sharing of programming tasks between different remote contributors. Here, we introduce the main features of GitHub, a popular web-based platform that offers a free and integrated environment for hosting the source code, documentation, and project-related web content for open-source projects. GitHub also offers paid plans for private repositories (see Box 1) for individuals and businesses as well as free plans including private repositories for research and educational use.
Box 1
By default, GitHub repositories are freely visible to all. Many projects decide to share their work publicly and openly from the start of the project in order to attract visibility and to benefit from contributions from the community early on. Some other groups prefer to work privately on projects until they are ready to share their work. Private repositories ensure that work is hidden but also limit collaborations to just those users who are given access to the repository. These repositories can then be made public at a later stage, such as, for example, upon submission, acceptance, or publication of corresponding journal articles. In some cases, when the collaboration was exclusively meant to be private, some repositories might never be made publicly accessible.
GitHub relies, at its core, on the well-known and open-source version control system Git, originally designed by Linus Torvalds for the development of the Linux kernel and now developed and maintained by the Git community. One reason for GitHub’s success is that it offers more than a simple source code hosting service [[5],[6]]. It provides developers and researchers with a dynamic and collaborative environment, often referred to as a social coding platform, that supports peer review, commenting, and discussion [[7]]. A diverse range of efforts, ranging from individual to large bioinformatics projects, laboratory repositories, as well as global collaborations, have found GitHub to be a productive place to share code and ideas and to collaborate (see Table 1).
"Bioinformatics repository examples with good practices of using GitHub.The table contains the name of the repository, the type of example (issue tracking, branch structure, unit tests), and the URL of the example. All URLs are prefixed with https://github.com/.(10.1371/journal.pcbi.1004947.t001)"File:Pcbi.1004947.t001
Some of the recommendations outlined below are broadly applicable to repository hosting services. However, our main aim is to highlight specific GitHub features. We provide a set of recommendations that we believe will help the reader to take full advantage of GitHub’s features for managing and promoting projects in bioinformatics as well as in many other research domains. The recommendations are ordered to reflect a typical development process: learning Git and GitHub basics, collaboration, use of branches and pull requests, labeling and tagging of code snapshots, tracking project bugs and enhancements using issues, and dissemination of the final results.
The backbone of GitHub is the distributed version control system Git. Every change, from fixing a typo to a complete redesign of the software, is tracked and uniquely identified. Although Git has a complex set of commands and can be used for rather complex operations, learning to apply the basics requires only a handful of new concepts and commands and will provide a solid ground to efficiently track code and related content for research projects. Many introductory and detailed tutorials are available (see Table 2 below for a few examples). In particular, we recommend A Quick Introduction to Version Control with Git and GitHub by Blischak et al. [[5]].
In a nutshell, initializing a (local) repository (often abbreviated as repo) marks a directory as one to be tracked (Fig 1). All or parts of its content can be added explicitly to the list of files to track.
File:Pcbi.1004947.g001The structure of a GitHub-based project illustrating project structure and interactions with the community.
↑Cite error: Invalid <ref> tag; no text was provided for refs named pcbi.1004947.ref001
↑Cite error: Invalid <ref> tag; no text was provided for refs named pcbi.1004947.ref002
↑Cite error: Invalid <ref> tag; no text was provided for refs named pcbi.1004947.ref003
↑Cite error: Invalid <ref> tag; no text was provided for refs named pcbi.1004947.ref004
↑ 5.05.1Cite error: Invalid <ref> tag; no text was provided for refs named pcbi.1004947.ref005
↑Cite error: Invalid <ref> tag; no text was provided for refs named pcbi.1004947.ref006
↑Cite error: Invalid <ref> tag; no text was provided for refs named pcbi.1004947.ref007
↑Cite error: Invalid <ref> tag; no text was provided for refs named pcbi.1004947.ref018
↑Cite error: Invalid <ref> tag; no text was provided for refs named pcbi.1004947.ref019
↑Cite error: Invalid <ref> tag; no text was provided for refs named pcbi.1004947.ref020
↑Cite error: Invalid <ref> tag; no text was provided for refs named pcbi.1004947.ref021
↑Cite error: Invalid <ref> tag; no text was provided for refs named pcbi.1004947.ref022
↑Cite error: Invalid <ref> tag; no text was provided for refs named pcbi.1004947.ref023
↑Cite error: Invalid <ref> tag; no text was provided for refs named pcbi.1004947.ref024