Skip to content

Git shared hosting quirk

IT

Show https://github.com/torvalds/linux/blob/b4061a10fc29010a610ff2b5b20160d7335e69bf/drivers/hid/hid-samsung.c#L113-L118 to a friend.

Oops 'eh? Yep, Linux has been backdoored.

Well, or not.

Konstantin Ryabitsev explains it nicely in a cgit mailing list email:

It is common for git hosting environments to configure all forks of the same repo to use an "object storage" repository. For example, this is what allows git.kernel.org's 600+ forks of linux.git to take up only 10GB on disk as opposed to 800GB. One of the side-effects of this setup is that any object in the shared repository can be accessed from any of the forks, which periodically confuses people into believing that something terrible has happened.

The hack was discussed on Github in Dec 2018 when it was discovered. I forgot about it again but Konstantin's mail brought the memory back and I think it deserves more attention.

I'm sure putting some illegal content into a fork and sending a made up "blob" URL to law enforcement would go quite far. Good luck explaining the issue. "Yes this is my repo" but "no, no that's not my data" ... "yes, it is my repo but not my data" ... "no we don't want that data either, really" ... "but, but there is nothing we can do, we host on github...1".

Updates

05.11.20 Nate Friedman (CEO of Github) promises

[..] we are going to make it much more obvious when you're viewing an orphaned commit.

For context: The source code of Github (the product) had been leaked as a commit to Github's own DMCA repository. The repository has turned into a playground since Github took down the hosting for youtube-dl as the result of a DMCA complaint.

14.11.20 Seems Github now adds a warning to commits that are not in a reachable branch Github commit warning message

28.01.22 Github currently fails to show the warning message, so https://github.com/torvalds/linux/tree/8bcab0346d4fcf21b97046eb44db8cf37ddd6da0 is making rounds now: Fake commit to Linus Torvalds' kernel repo updating the README file and claiming to have deleted Linux


  1. Actually there is something you can do. Making a repo private takes it out of the shared "object storage". You can make it public again afterwards. Seems to work at least for now. 

Debian Gitlab (salsa.debian.org) tricks

Debian

Debian is moving the git hosting from alioth.debian.org, an instance of Fusionforge, to salsa.debian.org which is a Gitlab instance.

There is some background reading available on https://wiki.debian.org/Salsa/. This also has pointers to an import script to ease migration for people that move repositories. It's definitely worth hanging out in #alioth on oftc, too, to learn more about salsa / gitlab in case you have a persistent irc connection.

As of now() salsa has 15,320 projects, 2,655 users in 298 groups.
Alioth has 29,590 git repositories (which is roughly equivalent to a project in Gitlab), 30,498 users in 1,154 projects (which is roughly equivalent a group in Gitlab).

So we currently have 50% of the git repositories migrated. One month after leaving beta. This is very impressive.
As Alioth has naturally accumulated some cruft, Alexander Wirt (formorer) estimates that 80% of the repositories in use have already been migrated.

So it's time to update your local .git/config URLs!

Mehdi Dogguy has written nice scripts to ease handling salsa / gitlab via the (extensive and very well documented) API. Among them is list_projects that gets you nice overview of the projects in a specific group. This is especially true for the "Debian" group that contains the former collab-maint repositories, so source code that can and shall be maintained by Debian Developers collectively.

Finding migrated repositories

Salsa can search quite quickly via the Web UI: https://salsa.debian.org/search?utf8=✓&search=htop

Salsa search screenshot

but finding the URL to clone the repository from is more clicks and ~4MB of data each time (yeah, the modern web), so

$ curl --silent https://salsa.debian.org/api/v4/projects?search="htop" | jq .
[
  {
    "id": 9546,
    "description": "interactive processes viewer",
    "name": "htop",
    "name_with_namespace": "Debian / htop",
    "path": "htop",
    "path_with_namespace": "debian/htop",
    "created_at": "2018-02-05T12:44:35.017Z",
    "default_branch": "master",
    "tag_list": [],
    "ssh_url_to_repo": "git@salsa.debian.org:debian/htop.git",
    "http_url_to_repo": "https://salsa.debian.org/debian/htop.git",
    "web_url": "https://salsa.debian.org/debian/htop",
    "avatar_url": null,
    "star_count": 0,
    "forks_count": 0,
    "last_activity_at": "2018-02-17T18:23:05.550Z"
  }
]

is a bit nicer.

Please notice the git url format is a bit odd, it's either
git@salsa.debian.org:debian/htop.git or
ssh://git@salsa.debian.org/debian/htop.git.

Notice the ":" -> "/" after the hostname. Bit me once.

Finding repositories to update

At this time I found it useful to check which of the repositories I have cloned had not yet been updated in the local .git/config:

find ~/debconf ~/my_sources ~/shared -ipath '*.git/config' -exec grep -H 'url.*git\.debian' '{}' \;

Thanks to Jörg Jaspert (Ganneff) the Debconf repositories have all been moved to Salsa now.
Hint: Bug him for his scripts if you need to do complex moves.

Updating the URLs has been an hours work on my side and there is little you can do to speed that up if - as in the Debconf case - teams have used the opportunity to clean up and things are not as easy as using sed -i.

But there is no reason to do this more than once, so for the laptops...

Speeding up migration on multiple devices

rsync -armuvz --existing --include="*/" --include=".git/config" --exclude="*" ~/debconf/ laptop:debconf/

will rsync the .git/config files that you changed to other systems where you keep partial copies.

On these a simple git pull to get up to remote HEAD or using the git_pull_all one-liner from https://daniel-lange.com/archives/99-Managing-a-project-consisting-of-multiple-git-repositories.html will suffice.

Git short URL

Stefano Rivera (tumbleweed) shared this clever trick:

git config --global url."ssh://git@salsa.debian.org/".insteadOf salsa:

This way you can git clone salsa:debian/htop.

Managing a project consisting of multiple git repositories

IT

The core team organizing DebConf, the annual Debian developer conference, reached out to me two weeks ago to help support this year's effort a bit.

I'm very happy to do so as Debian is a cornerstone of everything I do in the Open Source/Free Software space.

Screenshot of git_pull_all with color

To get me started I got access to a lot of mailing lists and irc channels. And even more git repositories. So many that the DebConf team even has an instruction page on how the repositories all fit together.

It's unfortunately quite common to split a bigger project into many git repositories to ease access rights management and reduce the noise and data transfer volume for the average user. The downside is, everybody ends up with a dozen or more individual repositories to keep pulling. And then there's git annex for yet another level of indirection.

Joey Hess, a former Debian developer, has even written an extensive tool, myrepos, to meta-manage the different repositories and it can do quite some magic across different SCMSs1. In my case this is a bit of an overkill though.

And using myrepos may get you confused at some point whether to now run mr or git directly for each batch of repos you have inherited over some time of working on multiple projects.2 Thus I prefer the simple route:

Check out each repository into a common top-level directory (~/debconf/ in this case) and then put the following two lines into an executable script git_pull_all into that top level directory:

#!/bin/sh
find ~/debconf -mindepth 1 -maxdepth 1 -type d -exec sh -c "cd {}; test -r .git/config && git pull $*" \;

This will allow you to pull all git repos with one command and keep the normal syntax for everything else you do with each repo.

The --mindepth and --maxdepth will instruct find to just go and run your git pull only inside each direct child of the top level directory. So recursion depth = 1. That is the single trick there is to this.

Updates:

If you like to have some color and a bit of a spaced layout for improved readability, try:

#!/bin/sh
find ~/debconf -mindepth 1 -maxdepth 1 -type d -exec sh -c "cd {}; test -r .git/config && (printf \"\033[1m\033[34m%-50s\033[0m\" \"\${PWD}:\" ; git pull $*)" \;

When you have pull.rebase=true set in your .gitconfig, you can run ./git_pull_all --no-rebase to avoid rebases in case you work somewhere and want to have the merge commits.

P.S.: The DebConf15 Heidelberg registration just opened, please check the DebConf15 homepage for news, venue information and please register if you want to come around.


  1. Source Code Management Systems, like git, mercurial (hg) or subversion (svn). Or God forbid ... cvs. I don't like the (D)VCS (Distributed) Version Control Systems moniker. Because that's not really all these systems do. Not even the most important piece of what they do these days. 

  2. With myrepos you can still work with each individual repository via git. Just so nobody will write in "but...".