A customer wanted to move away from IBM Rational Synergy (called Synergy from hereon) to Git. This post looks at a few options and trade offs on doing this and describes the chosen option at the customer. The actual migration, using PySynergy, is shortly described. Also, an attempt was made at creating a new migration tool, but efforts were cancelled due to costing too much time.
Please do note that this article is an (over-) simplification of the Synergy model. This article is not meant to describe the Synergy model.
The customer wanted to move away for IBM Rational Synergy. Given that this was the way of working for several years, a lot of history is recorded. The history is very useful when attempting to do things such as tracking down bugs and/or implementation decisions. Or at least, find the developer who made the decision.
There are several ways of keeping history when migrating. The first option is, of course, to migrate nothing. Given our need, this is not what we wanted. The second option is to keep each version of each release of a project. This means that each released version of a project is recorded in Git. As a result, you can see what has changed between each project, but you cannot see which developer changed what parts. The final option is to record each change. This option records all history and allows you to do better analysis on the history. Given the need of the developers, the last option was chosen.
When developers were asked how much history we wanted to preserve, several of them wanted all of it. They gave the earlier argument: Preserving all history allows for tracking down bugs and/or implementation decisions.
Synergy and Git don’t share the same approach to versioning. First of all, Synergy is a tool which provides much more functionality than Git. Synergy is a Software Configuration Management (SCM) tool which, from Wikipedia (as of 21st of May, 2018):
“Rational Synergy is a software tool that provides software configuration management (SCM) capabilities for all artifacts related to software development including source code, documents and images as well as the final built software executable and libraries. Rational Synergy also provides the repository for the change management tool known as Rational Change. Together these two tools form an integrated configuration management and change management environment that is used in software development organizations that need controlled SCM processes and an understanding of what is in a build of their software.”
Although Synergy does have versions for each individual file, changes to files are usually combined in tasks which have their own state. A developer can create a task, change several files in this task in his own local version of the project (and even change the same file multiple times resulting in multiple versions). When the developer completes the task, the versions are distributed to the main project and the local versions of the project of each developer.
A task seems to map nicely to a commit in Git. In practice though, there are multiple tasks which are developed in parallel (each developer works on his own task.) In our case, this resulted in a highly spaghettified history in Git which makes it hard to make sense of the commit history graph. Nevertheless, other functionality such as ´git blame´ is still a valuable tool.
The question is whether you really need it. It really is a trade off between cost an gains. Abandoning history is free, but there are no gains. Keeping history at the project-releases history is easy to do (just ´check out´ each project release in Synergy, copy it to your git repository and commit all the changes), but gains are limited. Migrating full history takes a lot of effort, but the history. If you hardly use the history at all, then you should probably stick to keeping history at the project-release level. If you heavily rely on the history, then it would be a good decision to migrate as much as you can.
Fortunately, a team at Nokia already created PySynergy. This is a tool which migrates all history between two given releases (incorporating all releases in between the two given releases). It does this using the task-based approach, as described before. The tool is no longer maintained, a bit rough (as can be expected from migration software), but satisfied our needs.
Migration is done by using the command line interface of Synergy and querying all relevant history, and storing it locally for faster access. The second part converts the locally stored history to a Git fast-import format. In turn, this export can easily be imported by Git.
It should be noted though, that conversion is not that straight forward. Synergy does not keep a detailed history as Git does. It keeps a current version of a project and records which tasks are applied to that project. You cannot go back to a specific time for a project. Synergy simply does not have that information. Also, tasks can span multiple projects at the same time. Furthermore, a task can add a file to a project, a following task can remove that file. The file is no longer part of the project structure, but the task does hold this information. PySynergy, however, does its best to reconstruct as much as it can.
Running the first part of PySynergy will probably take the most amount of time. The CLI of Synergy isn´t that fast. PySynergy needs to call the CLI multiple times for each object (file, project, task, …). It is very likely that the first part of the migration needs multiple days, depending on the size (number of releases, files, tasks) of the project. And, of course, you’ll need an active instance of Synergy, otherwise the CLI will not work. Also, beware that PySynergy was written for an older version of Synergy. Some parts of the CLI deprecated in newer versions of Synergy. The output of the CLI might be different than when PySynergy was written. Finally, as warned by others, PySynergy also requires a fairly clean way of working with Synergy.
In the end we migrated the history using PySynergy. Although it took a while, the history seems to be in order.
During the migration I noticed an Synergy backup file. After inspecting this file, it was fairly clear that it is simply a database dump, record by record, and includes the version history. I wrote a Python script to convert the database dump to a SQLite database which we can run queries on directly. This effort can be found in the ccm_backup_reader repository on Github. Lots of information on Synergy can be found by creative googling. Internals, such as database structure, can be derived from several articles. It should be noted that this effort was taken on as my own side-project, in my own hours and thus time was limited.
To speed up the first part of the migration I tried to create a substitute for the CLI tool. This CLI tool tries to mimic the workings of the original CLI tool and presents results in the same manner. Some commands are rather straight forward, such as getting the attributes for an object (file, task, project, …). Other commands, such as running queries are less transparent as the inner workings are not publicly visible. The effort was abandoned, after a while and never tested.
I also tried to implement a migration strategy myself. Given that we have direct access to the database (SQLite) ourselves, we can directly perform queries. There is no need to read history and store it somewhere. To support this, a very simple object model on top of the database was implemented. Using these objects, I tried to replay tasks on projects. Although this is very fast (compared to PySynergy), given the loose history of Synergy this attempt failed. In my case, the tasks were not always bound to a single version of a project, or even a single project. As a result some changes cannot be applied.
Note that this is a different approach than PySynergy takes. PySynergy does not replay tasks, but rather takes differences between two releases of a project and derives changes, tasks, etc from these differences.
I stopped working on this project as my time was limited and PySynergy seemed to work. If you wish to migrate your Synergy history yourself, without using PySynergy, then this is your best bet. Be warned though, this is no easy task.
Migrating history from Synergy to Git is certainly possible. It should be noted that, depending on the level of history one wants to preserve, this is no simple task.