1. (MODAClouds) Software repositories

This document is a draft!

Contents

(MODAClouds) Software repositories

1.1. Hosting

The following public repositories were prepared for the usage by the MODAClouds consortium members. These repositories are mainly meant to cover those software components whose primary versioning system is either Git or Mercurial.

on GitHub:
- https://github.com/modaclouds -- a GitHub account;
- https://github.com/modaclouds-team -- a GitHub organization;
on BitBucket:
- https://bitbucket.org/modaclouds -- a BitBucket account;
- https://bitbucket.org/modaclouds-team -- a BitBucket team;

As hinted above all these repositories are public, thus only the open-source software components should be hosted in these repositories!

The supported versioning systems are:

Git repositories for GitHub;
Mercurial (HG) or Git repositories for BitBucket;
Subversion (SVN) access to Git repositories for GitHub; (this feature should be used only for checkouts and not for commits;)

The choice between GitHub or BitBucket, and between Git or Mercurial, is left open to each consortium member, especially since some software components are already hosted in one or the other variants. Moreover it is not mandatory for a software component to be hosted in one of these accounts. However if a partner does want to migrate from an existing SVN repository to either Git or Mercurial, while still keeping the history of the code (i.e. a complete migration), we can provide support.

Describe the possibility of mirroring.

automatic mirroring from other Git or Mercurial repositories hosted somewhere else;
automatic import (with history) from existing SVN repositories;

Describe the status of the internal SVN repository.

1.1.1. Repository types

1.1.1.1. "Account" repositories

The actual user account should be used only for:

various software components developed by the MODAClouds consortium members, which are officially released and endorsed by the consortium; (i.e. "released" or "endorsed" means by mentioning the software component and its corresponding repository in various deliverables, web-sites, or other materials;)
any third-party software component that was patched, thus enabling or enhancing its the usage within the MODAClouds project, and only if it is strictly required by the other components hosted in the account; (i.e. all other dependencies that are used "as-provided" should not be duplicated here;)
only source code for the stable software components; (i.e. no experiments, prototypes, etc., which should be hosted in one of the organization / team accounts;)
only open-source code;

1.1.1.2. "Organization" / "team" repositories

The organization / team accounts should be used only for:

various software components that are either prototypes, experiments, under initial development, etc.;
any third-party software component that was patched, but not already pushed to the user account repositories; (see also the constraints above;)
any other projects developed under the MODAClouds umbrella, like for example scripts, benchmarks, etc.;
again, only open-source code is allowed;

For distinguishing between the two sets of repositories we shall use the following terminology:

official repositories (or release repositories) are the ones in the actual user account (i.e. without the -team suffix);
experimental or collaborative repositories are the ones in the organization / team accounts (i.e. with the -team suffix);

1.1.1.3. "Individual" repositories

However many of the MODAClouds consortium members already have independent repositories under other accounts (personal, institutional, or per-project). These should be considered equivalent to the development or collaborative repositories. Thus in case some of the software components are already hosted inside these "individual" accounts, the usage of the experimental or collaborative repositories is completely optional, and encouraged only when multiple developers (with different affiliations) work on the same repository. Therefore in the case only one developer is maintaining a repository, he can keep it under the "individual" account.

1.1.1.4. "Mirror" repositories

Should we keep mirrors (and where) for those third-party software components which although are used "as-provided", have a high risk of "disappearing" (e.g. being deprecated by their developers and removed or renamed) or having "unstable code histories" (i.e. non-forward pushes, history rewriting, etc.)?

(These "mirror" repositories, which are for third-party software components, should not be confused with MODAClouds repository mirrors.)

1.1.2. Creation and access

For the "official repositories" (i.e. the actual user accounts) the following steps are proposed:

a very small set of persons are given the credentials to the account (i.e. login and password); (currently a few of the MODAClouds support team members;)
any consortium member can ask for a repository to be created --- by sending an email to the all@ mailing list and CC-ing the support@ email address --- specifying the following:
- the name of the repository (in accordance with the naming guidelines described below);
- the GitHub / BitBucket account of the person (or persons) that should be delegated with the repository administration;
(after all discussions have finished on the all@ mailing list,)
the repository is created;
the designated person is allowed push and administrative rights over the repository;
the designated person can now push freely any changes (in accordance with the various guidelines described below); moreover he can designate other persons as committers or administrators;

For the "experimental or collaborative repositories":

a very small set of persons are given administrative rights over the organization / team; (e.g. exactly the same set of persons that have the credentials for the actual user account;)
any consortium member can ask for a designated person (or persons) (with the corresponding GitHub / BitBucket account) to be added to the organization / team --- by simply sending an email to the support@ email address;
any member of the organization / team can create any repository (in accordance with the naming guidelines described below);
any member of the organization / team can push to any repository part of the same organization / team; (i.e. including those not created by himself, although one can easily limit access by modifying the rights for a particular repository in order to reduce accidental updates;)

To summarize:

while anyone can create an "experimental or collaborative repository", only a small set of people can create "official repositories" which are then delegated to a designated person (or persons);
while anyone can push to any "experimental or collaborative repository", only the designated person (or persons) can push to an "official repository";

Obviously anyone (including anonymous Internet users) can download the source code of any repository, either within the "official" or "experimental / collaboration" repositories.

1.2. Guidelines

1.2.1. Naming

All the repositories should follow these naming conventions:

prefixed with modaclouds-, for the newly developed software components within the MODAClouds project;
prefixed with a token indicating the origin (e.g. mosaic-, cloud4soa-, apache-, basho-, etc.), for those software components that were imported from other projects, and only slightly patched;
prefixed with modaclouds-, even in case of imported projects if the modifications are substantial and diverge considerably from the original source; (such an action is required because the modified component could be considered a fork of the original one, and under many licenses it is required to change the name in such a case;)
it is advisable to use only lower-case Latin letters and numbers, with a hyphen (-) separating the tokens;
it is advisable to use full words instead of acronyms --- obviously except the universally accepted ones like HTTP and similar --- which although leads to longer repository names it provides better information regarding the contents of the repository;
in cases when the source code diverges radically from one version (branch) to another --- thus calling them development lines --- with active development being made on both development lines, and furthermore both the lines are required within the project for a given release, then two repositories should be kept, the latest one with the original name, and the older one suffixed with a token indicating the development line, such as -v2; (of course this could solved by keeping separate branches within the same repository, but in some situations it is easier to just separate the repositories;) (see also the discussion related to branching;)
in cases when a repository is maintained as a convenience mirror, changing only the versioning system --- for example for a component the "native" versioning system is Git, but a Mercurial mirror is kept for those developers better accustomed with Mercurial --- then the mirror repository should be suffixed with a token indicating the type of the new versioning system, such as -hg in case of a Mercurial mirror, while the original repository stays without any suffixes;
(in the extreme case the various suffixes are combined, like for example modaclouds-something-v2-hg for a Mercurial mirror of a Git repository, for the v2 development line of the something software component;)

1.2.2. Branches

Within the context of this section the following conventions are made:

by "branch" it is meant:
- branches as understood within the Git ecosystem;
- (named) branches as understood within the Mercurial ecosystem;
- bookmarks as understood within the Mercurial ecosystem; (which although are exact equivalents to Git branches should be used sparingly;)
- (except the "main branch", these are equivalent to Subversion branches;)
by "main branch" it is meant:
- the master branch within the Git ecosystem; (or a different branch that was chosen as the default one in either GitHub or BitBucket administrative page of the repository;)
- the default branch within the Mercurial ecosystem; (although one can choose in BitBucket a different named branch to be shown by default in the various repository's pages, we refer herein strictly to the default branch;)
- (in both cases this is the equivalent to the Subversion trunk;)

For any software component hosted in one of these repositories, the "main branch" should always contain the latest stable source code for that particular component (i.e. the current "main branch" should always have an accompanying version tag). This ensures that any checkout by an unadvised user results in at least clean and compilable code (and at best the latest usable code).

Any development, including maintenance for already stable versions --- thus called development lines --- should be made in other branches than the "main branch". Thus updating the "main branch" --- which as described above represents the latest stable version --- depending on the used versioning system, is done by:

in case of Git, either one of the following:
- developing on a separate "maintenance branch", then within the "main branch" merging the "maintenance branch" (which results in either a fast-forward or an actual merge commit), followed by tagging the "main branch" and (optionally) removing the "maintenance branch";
- developing on a separate "maintenance branch", then rebasing on-top of the "main branch", and following as above;
- developing on a separate "maintenance branch", then tagging a particular commit, followed by changing the "main branch" to point to the newly created tag; (the disadvantage is that such a method potentially results in non-fast-forward push for the "main branch", which in effect makes the "master branch" "jump" between different history lines;)
- developing directly on the "main branch", but prior to a push to the official repository, a tag is created; (possibly the simplest method;)
in case of Mercurial, the situation is more restrictive and only two options are available:
- developing on a separate "maintenance branch", then within the "main branch" (i.e. the default branch) merging the "maintenance branch" (which always results in a merge commit), followed by tagging the "main branch" and (optionally) closing the "maintenance branch";
- developing directly on the "main branch", but prior to a push to the official repository, a tag is created; (possibly the simplest method;)

However the previous techniques tackle only the situation when a single development branch is active for a particular software component (i.e. the "main branch"), and requires that any new versions are strictly developed on-top of the previous versions. Thus only sequential versions are possible (e.g. starting with v0.1.0, followed by v0.1.1, followed by v1.0.0, then v1.0.1, v1.1.0, v1.1.1, v2.0.0, etc.), and parallel releases are impossible (e.g. v2.0.0 followed by v2.0.1 in parallel with v1.5.0 followed by v1.5.1, etc.) The solution in such a situation again depends on the versioning system:

in case of Git a separate branch is kept for each individual development line, with the latest "stable branch" being the "main branch"; however when switching from one generation to another (i.e. v1.5.0 to v2.0.0) the "main branch" must be pointed to a different commit, resulting in a non-fast-forward push;
in case of Mercurial, either one of the following:
- the "main branch" is always reserved for the latest stable version, and a named branch is kept for the other development lines;
- a different repository is kept for the two development lines; (see the previous section about naming such repositories;)

As naming conventions the following could be applied:

for the "main branch":
- although in Git the name master is used, probably it is better to name it stable to be clear about the contract between the developer and the user;
- in Mercurial the default branch is always used; (optionally a stable bookmark can be created to point to the tip head;)
for the "next" development line (i.e. the next stable):
- in Git a branch named development should be used;
- in Mercurial a branch named bX.Y should be used; (where X.Y is the number of the future vX.Y.Z release;) (optionally a development bookmark can be created to point to the bX.Y head;)
for the previous development lines (i.e. the previous stable):
- in Git, a branch named legacy should be used; (if more than one previously stable development line is maintained, then the branches should be named legacy-X.Y;)
- in Mercurial a branch named bX.Y should be used; (just like in the case of a development branch;) (again a bookmark named legacy or legacy-X.Y can be created;)
for all those development lines which are not currently maintained:
- in Git no branch should be used;
- in Mercurial the corresponding branches (i.e. bX.Y) should be marked as closed; (and no bookmarks should be used;)

The reason why in Mercurial the branches are always named bX.Y, regardless of their status, is because branch names are immutable. (As opposed to Mercurial bookmarks or Git branches which are just pointers to a particular commit.)

It should be debated if the "main branch" should contain the source code that is part of the latest MODAClouds release, as opposed to the latest component release (e.g. the source code for the component at the latest version (say v2.0) could be newer than the source code for that component at the version (say v1.5) that was embedded in the latest MODAClouds release, thus the two could be incompatible). However because some components are shared between multiple projects, such a policy cannot be enforced. But the question still stands for the other MODAClouds-only components.

1.2.3. Tagging

The following types of tags should be present in all repositories:

version tags: prefixed with v and compatible with "Semantic Versioning" (http://semver.org/), used only for stable releases of that component; (e.g. v0.1.0, v2.1.3, etc.;) (the version "trees" are independent between various components within the MODAClouds ecosystem, and a similar version number does not imply any relation or compatibility between the two different components;)
snapshot tags: prefixed with s and followed by a date in the format YYYY-MM-DD, used only as internal reference points during development and collaboration (i.e. these versions should not be considered "released" or "stable"); (e.g. s2014-02-10, etc.;) (in the extremely rare case when two snapshots are made in the same day a letter suffix is added to the tag, such as -a;)
release tags: prefixed with modaclouds-r and followed by the version of the actual MODAClouds release, used to mark the version of the code which was included (or used) in a particular release; (e.g. modaclouds-r1.0.0, etc.)

Except the snapshot tags, both the version tags and the release tags are to be considered immutable and once created they should not be removed or changed. On the other hand the snapshot tags imply no commitment and (although inadvisable) they can be changed or removed.

Moreover it is advisable that each tag is cryptographically signed by the lead developer, and the tag message should contain a short description of the changes featured by that particular version (as opposed to the ancestor).

The release tags could have been prefixed only with r (e.g. r1.0.0), however in the case a component is used in multiple projects, and the same repository is used, then adding the project name as a prefix eliminates any ambiguity. (For example the mosaic-node repository was referenced in a few mOSAIC releases, is currently referenced in the MODAClouds releases, and could in future be referenced in SPECS releases. Such a release tagging scheme allows the same repository to hold the tags for all these projects, without any conflicts. Furthermore it can help to differentiate between the various variants and enhancements made between the three distinct projects.)

1.2.4. Required files

In each of the repositories there must be the following files:

a "read-me" file describing the source code to the user;
a "notice" file describing the copyright and licensing;

1.2.4.1. Read-me file

In the root of the repository there should be a "read-me" file, which should contain the following details:

a small description about the source code (e.g. what purpose it holds, where can the user find more about it, etc.;)
the copyright owner of the source code; (see the next section for more details;)
the license under which the source code is usable by a third party; (see the next section for more details;)
the fact that the code is "related" with the MODAClouds project, thus sponsored by the EU under the FP-7 program; (we say "related" because the code could have been imported from another project and only patched to work in the context of the MODAClouds project; for projects completely developed within MODAClouds this should be stated at the beginning of the code;)

Furthermore the read-me file can contain other useful information such as:

how to quickly install the software (either by downloading or by compiling from the source);
how to quickly use the software (i.e. common use cases);
links to more comprehensive documentation materials;
who were the developers and contributors;
in case the repository is a patched variant of a third-party software, what are the differences, and possibly why were they needed, and obviously a link to the original project;

The name of the read-me file should be either README.* or readme.* (i.e. either all upper-case or all lower-case letters, followed by an extension).

The format of the read-me file should be one of the following:

plain text (the name being README.txt or readme.txt);
MarkDown (the name being README.md or readme.md);
reStructuredText (the name being README.rst or readme.rst);
any other markup language that is understood by both GitHub and BitBucket;
in any case the file must be easily readable by only using a plain text viewer / editor (such as vi, less, Notepad, etc.); (i.e. manually edited HTML is on the borderline, however no binary format files such as PDF or DOCX are allowed;)

To be written!

Add a template for the read-me file.

1.2.4.2. Notice file

In the root of the repository there should be a notice file, which should contain the following details:

the copyright owner of the source code; (most likely the institution to which the developer belongs to;)
the license under which the source code is usable by a third party; (it is enough to refer to the "common-name" of the license, and point the user to the dedicated license file holding the actual contents;) (for a list of open-source licenses and their "common-name", see the SPDX project licenses list at https://spdx.org/licenses;)

In addition to the mandatory information above, the notice file could also contain:

any "boiler-plate text" as recommended by the used license (such as "warranty" disclaimers);
exceptions to the notice (such as a different documentation for documentation, or example source code);
any other informative text strictly related to licensing;

The name of the notice file should be either NOTICE.TXT or notice.txt, and its format should be plain text (i.e. without heavy markup languages).

To be written!

Add a template for the notice file.

1.2.5. Repository layout

Except for the two required files described above (i.e. the "read-me" and the "notice" file), each repository should abide to the best-practices of the programming language / environment / framework of the project.

However the following generic guidelines could be applied:

any additional documentation --- which is not written with the aid of the programming language / environment / framework, such as Java API, etc. --- should be placed within a folder named documents (or Documents);
any example files --- such as source code snippets, scripts, or various files, which are not part of the actual software component, and which don't fit within the source code tree as mandated by the programming language / environment / framework --- should be placed within a folder named examples (or Examples);
any additional scripts --- which, just like above, don't belong to a particular framework --- should be placed within a folder named scripts (or Scripts);
any additional "resource" files --- again not mandated by a particular framework, but useful for the scripts or other contexts --- should be placed within a folder named resources (or Resources); (image or diagram files that are part of the documentation should be placed in the dedicated documentation folder;)

1.2.5.1. Repository examples

To be written!

Add an example for a Java project handled trough Maven.

1.2.5.2. Script files

To be written!

Describe the files that should be present in the scripts folder, which would ease the integration process.

1.2.6. Miscellaneous

To be written!

file naming best practices;
executable files;
binary files;
archived files;

1.3. Workflows

To be written!

cloning third party repositories within the official repositories;
cloning of official repositories by developers;