Both at work and in my free time I interact with lots of different git repositories - across my machines I usually have about 100 different repositories checked out. Maintaining these clones by hand might be possible, but I am way too lazy for that and have written a go program called ggman to manage all of these in a simple way.
Lots of repositories
Sometime around late 2013 when studying at university I was contributing to a research project called MathHub as part of working in a research group.
As part of the project, the group created various different git repositories on a private GitLab instance gl.mathhub.info that held source code in a DSL.
When working with this content, it was normal for me and others to constantly have somewhere between 20 and 50 repositories organized in different gitlab groups cloned on our machines.
We then built various other content from these sources.
While the git client could take care of the actual cloning, it soon became clear that manually maintaining these clones by hand was tedious and needed tool support.
For this purpose we ended up building a tool in Python -- called LocalMathHub or lmh for short -- to maintain a local tree of cloned repositories, git cloneing and git pulling them as needed.
I took over lmh sometime in early 2014.
lmh cloned the various repositories into a structure that mirrored the repository / group structure on the GitLab instance.
For instance the repository gl.mathhub.info/hello/world would end up in a folder $HOME/MathHub/hello/world, the repository meta/inf would end up in a folder $HOME/MathHub/meta/inf and so on.
It furthermore supported our domain-specific build processes, deriving several research assets.
To this end, lmh resolved dependencies between repositories, acting very much like our own package manager similar to npm or pip.
Our building processes also required a full LaTeX installation, so the tool ended up being Dockerized.
Writing GitManager
Fast forward to a couple years later to 2016 and I was working with lots of different code in lots of different repositories from GitHub.
I liked a somewhat organized hard disk, so I again wanted to have a local tree mirroring the structure of repositories on GitHub.
But as GitHub was not our private GitLab instance, lmh didn't support it.
Besides, lmh was very much focused on our specific building process and was dockerized - that seemed overkill for the task at hand.
So I wrote a very simple tool in Python I called GitManager - that did exactly this. It was used by setting up a configuration file like:
> Projects
>> tkw1536
https://github.com/tkw1536/GitManager.git
https://github.com/tkw1536/tkw01536.de.git
https://github.com/tkw1536/guys.wtf.git
This file says to create a "Projects" folder. Then inside it, create a "tkw1536" subfolder. Finally clone the three git repositories into this folder.
With the help of GitManager I could also use commands like git-manager pull or git-manager push to pull or push all local repositories.
This made my job simpler, but I now had to maintain this configuration file.
I eventually added a function to automatically discover newly cloned repositories and rewrite the configuration file for me.
This helped a bit more, and I ended up using git-manager for all my clones for a couple years.
Design goals for ggman
Fast forward again a couple more years to 2019 and I got annoyed at needing to update that configuration file. So I decided to redo my repository management.
I looked around and there were other tools at the time - however these usually had some downsides:
They were limited to one repository provider. For example GitHub CLI only worked with GitHub repositories. At the same time GitLab CLI only worked with GitLab repositories.
Tools typically encouraged a flat directory structure. For example, they cloned repositories directly under a
Projectsfolder. Two repositories that had little to do with each other might end up on disk directly next to each other.Some tools were only available from within an IDE or GUI.
All of these were annoyances -- but overall meant existing tools could not do what I wanted.
As I was starting out with go at the time, I decided to use the opportunity to learn the language properly and start writing my own tool.
I couldn't think of a good name, I eventually settled on ggman - with "man" standing for "manager" and the gs standing for "git" and "go".
In order to best fit my own workflow, and to prevent me having to rewrite the tool again on the future, I decided on several goals for ggman.
As I have been using ggman without any major rewrites1 ever since, I consider ggman and these goals successful.
In particular, I decided ggman should:
- be command-line first;
- be simple to install, configure and use;
- encourage an obvious hierarchical directory structure, but remain fully functional with any directory structure;
- remain free of repository provider-specific code; and
- not store any repository-specific data outside of the repositories themselves (enabling the user to switch back to only git at any point).
In order to explain how ggman is designed to achieve these goals, I feel like it is best to describe the how to install and use it.
The source code of ggman lives on GitHub, resulting in a single binary that is dropped into the user's $PATH to install.
The binary optionally requires that the user has git installed, but will automatically fall back to the go-git library if not.
They user can optionally configure several shell aliases, by invoking and evaluating the output of ggman shellrc in their shell's profile.
Cloning repositories with ggman
Once installed, ggman manages all git repositories inside a given root directory, and automatically sets up new repositories relative to the URLs they are cloned from.
This root folder defaults to ~/Projects, but can be customized using a $GGROOT environment variable.
The first ggman command users will likely interact with is one like the following:
$ ggman clone https://github.com/tkw1536/ggman.git
Cloning "git@github.com:tkw1536/ggman.git" into "/Users/whoever/Projects/github.com/tkw1536/ggman" ...
Cloning into '/Users/whoever/Projects/github.com/tkw1536/ggman'...
remote: Enumerating objects: 133, done.
remote: Counting objects: 100% (133/133), done.
remote: Compressing objects: 100% (130/130), done.
remote: Total 133 (delta 16), reused 23 (delta 0), pack-reused 0 (from 0)
Receiving objects: 100% (133/133), 188.95 KiB | 355.00 KiB/s, done.
Resolving deltas: 100% (16/16), done.
The ggman clone command is intended to clone a repository into the local directory structure.
It achieves this using several steps:
Parse the provided into its' so-called URL components. Here, the components of a URL are the hostname, the username and '/'-separated elements of the path. A username of
gitas well as a trailing suffix of.gitare dropped.Some examples:
URL Components git@github.com/user/repogithub.com,user,repogithub.com/hello/world.gitgithub.com,hello,worldgithub.com/some/repogitlab.com,some,repouser@server.com:repo.gitserver.com,user,repoThe
ggman compscommand is a utility that allows us to print out components of a specific URL:$ ggman comps https://github.com/tkw1536/ggman.git github.com tkw1536 ggmanNotice how the components of a URL are identical if cloned via SSH:
$ ggman comps git@github.com:tkw1536/ggman.git github.com tkw1536 ggmanThis means the same underlying operation happens, regardless if the
httpsorsshURL is passed toggman clone. This component abstraction is not specific to GitHub - it allows ggman to remain provider independent and works with (almost) any repository host2.Assign the repository a local path using these components, and create parent folders as needed. In this case the target path would be
$GGROOT/github.com/tkw1536/ggman. Theggman clonecommand above would create$GGROOT/github.comand$GGROOT/github.com/tkw1536folders as needed.Figure out which URL to clone the repository from. This is achieved by turning the components back into a form which git understands.
We can inspect this using the
ggman canoncommand. In our case:$ ggman canon https://github.com/tkw1536/ggman.git git@github.com:tkw1536/ggman.gitAs you can see, ggman defaults to cloning using an
sshclone URL. This can be configured using a so-calledCANSPEC(short for "canonization specification"), but I won't into detail here.Finally invoke the git command to actually clone the repository.
Finding and performing actions on repositories
But ggman can not only clone new repositories.
It can also perform actions across existing repositories.
Actions in principle take the form ggman [FILTERS] ACTION.
The supported actions are things which effectively map to plain git commands, such as:
ggman pull, which pulls changes from remotes into the local repositories;ggman push, which pushes changes to remotes remote repositories;ggman exec COMMANDwhich directly invokes an external command; orggman ls, which prints a list of local repositories.
By default, any action will act on all repositories existing in some sub-directory of $GGROOT.
For example:
$ ggman ls
/Users/whoever/Projects/github.com/hello/world
/Users/whoever/Projects/github.com/tkw1536/ggman
/Users/whoever/Projects/github.com/tkw1536/tkw01536.de
/Users/whoever/Projects/gitlab.com/lorem/ipsum
This lists all locally cloned repositories.
It is also possible to only act on a subset of repositories using a "FILTER" argument.
The simplest one is the --for filter, which fuzzy matches against repositories.
For example:
$ ggman --for github.com ls
/Users/whoever/Projects/github.com/hello/world
/Users/whoever/Projects/github.com/tkw1536/ggman
/Users/whoever/Projects/github.com/tkw1536/tkw01536.de
This command lists only repositories that match "github.com" in their URL.
It no longer shows the gitlab.com repository from above.
As the matching is fuzzy, it also allows to omit characters or components. For example:
$ ggman --for lo/ips ls
/Users/whoever/Projects/gitlab.com/lorem/ipsum
will only match the lorem/ipsum repository.
For convenience ggman provides two shell aliases that make use of the --for filter:
ggcd PATTERNwhich finds a repository matching a given pattern and cds into it. For example,ggcd lo/ipwould cd into/Users/whoever/Projects/gitlab.com/lorem/ipsum.This makes it extremely simple to find a project belonging to a repository and working on it.
ggcode PATTERN, which is likeggcdexcept that it opens a Visual Studio Code instance in the desired directory. This makes it extremely quick to start coding on a specific repository, without having to navigate through various user interfaces.
Another filter worth mentioning is the special --here filter which only matches the repository in the current directory (even if not under $GGROOT).
There are also other filters, see the README for a complete list.
Other ggman functionality
ggman has a lot more functionality, but describing everything would make this blog post much longer.
I would however like to quickly mention a couple of other commands:
ggman webwhich opens the current repository in a web browser. This is useful to quickly use GitHub's web interface to look at issues, or check on the status of CI.ggman relocatewhich moves cloned repositories into the paths thatggman clonewould have cloned them to. Together with the--herefilter allows moving repositories not originally within$GGROOTinto the structure.ggman fixwhich updates remote URLs to use their canonical variant. Useful e.g. if some repositories were manually cloned usinghttps.
If you are interested I encourage you to have a look at the README or ask me if you're interested.
Conclusion
And that is already all I want to say for now, thank you for reading 😀.
In summary: I work with lots of git repositories.
I wrote a tool called ggman to maintain and expand a local directory structure of all of these.
It can locate where specific repositories are cloned to, and run operations such as git pull or git push across all of them.
Feel free to try it out and give me feedback at https://github.com/tkw1536/ggman.
Unless you count me changing several implementation details under the hood, but I do not. ↩︎
The concept of components works great with only one exception: URLs with custom ports like
git@my.domain.com:2222/hello/world.git. The workaround I've used so far is to skip canonicalization for those, or to setup a config file specific to the host. ↩︎