Subversion Dump File Validation

Common result from converting a complex non-Subversion source code repository to a Subversion dump file
Hi everyone,
Many new SourceHosting.net clients have existing repositories that they want to import into a Subversion repository here. If we’re lucky, the client already uses Subversion, and it’s no more difficult to import the repository than dumping and loading it.
Converting from CVS to SVN is quite easy using the cvs2svn script. We rarely run into any problems, and it has a flexible set of command line options. If, however, the client uses a different source code control system, the process can get trickier.
There are several repository converters available for download, including Polarion Importer for SVN and VSS2SVN. These tools are welcome additions to a release engineer’s bag of tricks, especially if different groups in an organization have not standardized on a single SCM system and you’re trying to convert everyone to Subversion.
As we’ve used the tools mentioned above more and more frequently, we’ve found that they work well for new clients with reasonably small and uncomplicated repositories. However, as the complexity grows with more tags, branches and merge points, the likelihood of producing a corrupted or logically-incorrect Subversion dump file increases.
This problem is illustrated by a recent repository conversion in which the new client used a tool to convert from their existing VSS repository to a Subversion dump file, preparing for their migration to SourceHosting.net. Upon cursory inspection, the dump file contents looked reasonable, but some number of revisions into the loading process, svnadmin reported the following error:
<<< Started new transaction, based on original revision 33 svnadmin: File not found: transaction '30-1', path '/src/docs/ChangeLog' * editing path : src/docs/ChangeLog ...
After reading through the dump file, we discovered that it contained a sequence of operations on the “/src/docs/ChangeLog” file that occurred before the file had been added to the repository. We’ve also run across negative-numbered revisions and attempts to delete files from locations that don’t exist. All of these situations will abort repository loading.
These errors aren’t difficult to detect, and even fix, by parsing through the dump file and rewriting it slightly with the Perl CPAN module SVN::Dumpfile. This module extracts data from a dump file and also allows new data to be inserted into it.
In order to make it easier to test software that generates Subversion dump files, we are building on the work of SVN::Dumpfile and creating a sequence of dump file validation tests. Not even the svnadmin tests included in the Subversion distribution parse dump files, and perhaps this work will migrate into their suite as well.
If you have any suggestions or ideas for specific tests that you would like included, please let us know.
Link Summary
- http://svnbook.red-be....svnadmin.c.dump.html
- http://svnbook.red-be....svnadmin.c.load.html
- http://cvs2svn.tigris.org/
- http://cvs2svn.tigris.../cvs2svn.html#cmd-ref
- http://community.pola...p;project=svnimporter
- http://www.pumacode.org/projects/vss2svn
- http://search.cpan.org/dist/SVN-Dumpfile/
- http://svn.collab.net...ine/svnadmin_tests.py