Jump to content

Infrastructure/Project Structure

From KDE Community Wiki
Revision as of 00:50, 13 September 2015 by Mpyne (talk | contribs) ('Divisions' was an awful name. dfaure suggested 'product' instead, which is much better.)

Improving KDE Project Organization

Note

This is a Wiki form of an email sent to the kde-core-devel mailing list under the subject "Proposal to improving KDE Software Repository Organization", dated 18 Aug 2014.


This is a proposal to evolve the current method of organizing our mass of KDE source code repositories, and their dependencies, as contained in the kde-build-metadata repository and used by kdesrc-build and build.kde.org (referred to as "CI"). This is needed in order to correct some deficiencies in the current specification, and to help better support changing trends in developer workflow.

Current Situation

If you're familiar with the current organization of "KDE build metadata" you should skip to the next section.

Currently, the git-based source code repositories that make up KDE.org's software releases are each given a "project path" that fully specifies the name of the module in a virtual hierarchy. For instance, kdesrc-build itself is really "extragear/utils/kdesrc-build", and KDE 4's kdelibs is "kde/kdelibs".

Since many modules support KDE4 and Qt5/KF5 (or may in the future), some developers associated with KDE source code repositories introduced the "branch group" construct, that maps the git repository branch for the majority of repositories into a few broad groupings, such as "stable-qt4", "latest-qt4" and "kf5-qt5". Developers and users using kdesrc-build could then use these groups to easily build the appropriate git branch of the many repositories needed for current releases of KDE.org software. This also allowed the CI infrastructure to support testing the development branches of both software using both KDE4 and KF5, in addition to the libraries/Frameworks themselves.

Current Issues

Things have gone fairly well with branch groups, but there have been minor issues with the construct:

  1. The existing metadata listing dependencies between git repositories could not support multiple branch groups, as the dependencies were not necessarily identical for a given repository, for every possible branch group it belonged to. We worked around this by forking the metadata such that each different branch group used a separate dependency file.
  2. Compounding that issue, different branch groups would have different sets of repositories. For instance some repositories will never have a KF5-based release due to ongoing reorganization, and many repositories were born for KF5. By common agreement, software using kde-build-metadata now recognize empty git branch names to mean that a repository doesn't actually belong to the given branch group. This is still a workaround, however; if we forget to manually specify an empty branch, then CI and kdesrc-build will both try to build that repository as part of that branch group (using a default branch).

Upcoming Problems

A larger concern (and what instigated this effort) is that the KF5 era will introduce multiple development models that are difficult for the CI infrastructure to efficiently support.

For example, testing the KF5-based Plasma 5 Workspace will eventually need to test both the stable and development tracks for Plasma 5. Under the branch group concept, this would lead to branch groups "kf5-qt5" and "kf5-qt5-stable" (or similar names).

However the KF5 repositories that Plasma 5 requires do not have a split between stable and devel: They use a review-required process by which there's only one development track. In other words, Plasma 5's two development tracks will only depend on 1 KF5 track.

At this time, that means CI will have to build 56 KF5 modules to test Plasme 5-stable, and then re-build, re-install, etc. the exact same 56 modules to then test Plasma 5-devel. This re-build is required because experience has shown that built repositories cannot be assumed to be compatible between different branch groups (in fact many repositories are significantly different on-disk between branch groups). There's simply no data recorded at this point that delimits the ways in which repositories would remain compatible (or not) between different branch group combinations.

Solving this (so that the right 56 modules are retained and re-used) would require quite some manual hackery, and it's uncertain how easy these hacks are to implement within Jenkins and the CI infrastructure in the first place.

Overview of Proposed Fix

What we would like to do instead is the classic Comp. Sci. fix: Another layer of indirection.

In this case, we'd like to re-organize the kde-build-metadata to map to the same types of project divisions that we already intuitively utilize ourselves (i.e. the repositories that make up Plasma 5 are a different grouping than those that make up KDE Frameworks 5, which are different from those that make up KDevelop for KF5, etc.).

Under this scheme, the universe of all (KDE.org) git repositories would fall into this outline:

+ Product (e.g. KF5)
 + Track (a development track, e.g. "devel")
  + Repositories + Git branches

The following would be true of these 'products':

  • Each product/track combination could depend on a different product (e.g. Plasma5/Devel could depend on KF5/Devel).
  • Each product/track combination would list all git repositories that make up that track (wildcards will continue to be permitted), along with the git branch of that repository. E.g. Plasma5/Devel could include "kde/workspace/plasma-workspace: master", while Plasma5/Stable might include "kde/workspace/plasma-workspace: Plasma/5.0".
  • The "branch group" concept will be retained (both for backwards compat for kdesrc-build users and for ease of Jenkins implementation), and is the "most global" grouping (but now, of products, not repositories directly). Each product will map global branch group names to one of its tracks, if appropriate.

So "kf5-qt5" might mean "KF5/Devel, Plasma5/Devel, etc." while "kf5-qt5-stable" might mean "KF5/Devel, Plasma5/Stable, etc.". If CI builds "kf5-qt5-stable" and then builds "kf5-qt5", it would be able to skip building "KF5/Devel" the second time as it's stated to be compatible with both Plasma5 tracks.

  • Any given repository in a branch group would map to 0-1 products. 0, since a repository simply might not be present at all (and might even be in different products for different global branch groups...). 1, since there must be only 1 possible git branch name for a repository.
  • Instead of using a separate dependency file, intra-product dependencies would be listed along with the rest of the product/track details.
  • Likewise, inter-product dependencies would be supported (but the dependency would only be on the repository names, since the branches for that repository would be controlled by the product/track combination). This is to allow for smaller applications that depend on only a couple of Tier 1 KF5 repositories to be tested without building all 50+ KF5 modules too.
  • You can also simply depend on a product/track combo as a whole, without listing each individual dependency (similar to how many apps now depend on the virtual "kf5umbrella" repository).
  • A product can specify that certain of its tracks are equivalent. For instance, FooApp/stable might only require Plasma5/stable, but work perfectly fine with Plasma5/devel if it's already available, which is something Plasma5 can specify. This helps reduce combinatorial explosion for the CI infrastructure.
  • Every repository would need to be a member of some Product/Track combination to be seen by CI, even small apps.

Detailed Outline

The JSON file already in use in the current specification would be modified to have (besides the boilerplate), a structure of the following form to hold the required data:

    "products": {
      "KF5": { ... },
      "Plasma": {
        "branch_group_tracks": {
          "kf5-qt5": "devel",
          "kf5-qt5-stable": "stable"
        },
        "products_needed": {
          "devel": {
            "Qt5": "devel",
            "Milou": "devel",
            "KF5": "devel"
          },
          "stable": {
            "Qt5": "stable",
            "Milou": "stable",
            "KF5": "devel"
          }
        },
        "repositories": {
          "kde/workspace/*": {
            "devel": "master",
            "stable": "Plasma/5.0"
          },
          "kde/workspace/oxygen": {
            "devel": "master"
            !! Wouldn't be included with the "stable" track at all!
          }
          *All* other modules would be listed here for this group
        },
        "excluded_repositories": [
          "kde/workspace/plasma-nm"  (maybe this goes with a separate product)
        ],
        "dependencies": {
          "*": {  <-- would apply to all tracks
            "products": [ <-- could be used to depend on entire products
              "KF5"
            ],
            "repositories": {
              "kde/workspace/*": "extragear/base/milou",
              "kde/workspace/plasma-workspace": "kde/workspace/libkscreen",
              more common deps go here...
            }
          },
          ( individual tracks could have added dependencies on repos or even
            whole products )
          "devel": {
            "repositories": {
              "kde/workspace/*": "project/only/devel/depends/on",
            }
          }
        }
      },
      "Qt4": { ... },
      etc.
    }

Some notes:

  • The branch_group_tracks section is where the global branch-group concept (latest-qt4, kf5-qt5-stable, etc.) would be mapped to the appropriate track for this product. This is perhaps most useful for CI, though kdesrc-build could still utilize it for those who manually list modules to build.
  • The products_needed section would list product/track pairs needed for each track in this product. This is not a dependency per se, it simply indicates to the CI infrastructure that repos from already-built products/tracks would not need to be rebuilt if they match the product/track requirements contained in this section. However any inter-product dependencies for repositories in this product must be to products mentioned in this section, so that it's possible to determine the appropriate branch to build.
  • The repositories section would list every single git repository that is part of this product/track, using the project path to name the repositories, and allowing wildcards as the existing metadata does. You'd have to be careful with wildcards not to accidentally include a repository from a different product (we anticipate validation tooling to help with this).

You'll also note that it's possible for different tracks to have different lists of repositories (it's even possible for a given repo to belong to different products, which is allowable as long as the graph of products/tracks for the whole branch group has that repo in no more than 1 product.

  • The excluded_repositories section is optional, and would be used in situations where it's easier to use wildcards to include too many repositories into the product/track, and then filter out the repositories that should not be part of the product. It might be easier just to spell out each repository however...
  • The dependencies section is pretty much what it says on the tin, and strengthens the "compatibility non-interference" and ordering properties of products into actual dependencies, and also allows for repository to repository dependencies to be expressed for the CI (this would replace the dependency-data-foo files in kde-build-metadata).
  • The objects under dependencies are mappings of tracks to the dependency information itself. The * track would be used for dependencies common to every track of that product.
  • dependencies/$track/products is to allow entire products to be declared a dependency (the track is not specified, since it's already required to be noted in products_needed), and is optional.
  • The dependencies/$track/repositories section on the other hand, should always be present, at least to specify intra-product dependencies as needed by both CI and kdesrc-build. These dependencies are between repositories, not products, and don't include any branch information (since branches are now entirely determined by which product/track combination contains a repository).

Repository dependencies can cross product boundaries (which is why every repo is required to be part of some product/track combination). Cross-product dependencies would still require an entry in products_needed (Milou, in this case) to figure out which track to use.

Next Steps

Porting to the proposed new system would require code changes in both build.kde.org and kdesrc-build, testing, and setup of the required metadata in kde-build-metadata, with the wider community to be kept informed as progress is made.

The hope with all of this is to manage the complexity that arises from the interdependencies of git repository+branch combinations, in a way that allows us to maintain the value of using our CI testing infrastructure without needlessly recompiling and reinstalling software that should be compatible, and to do all of this in a way that aligns with our intuitive understanding of how we now organize our projects.