cvs2svn Features

The primary goal of cvs2svn is to migrate as much information as possible from your old CVS repository to your new Subversion or git repository.

Unfortunately, CVS doesn't record complete information about your project's history. For example, CVS doesn't record what file modifications took place together in the same CVS commit. Therefore, cvs2svn attempts to infer from CVS's incomplete information what really happened in the history of your repository. So the second goal of cvs2svn is to reconstruct as much of your CVS repository's history as possible.

The third goal of cvs2svn is to allow you to customize the conversion process and the form of your output repository as flexibly as possible. cvs2svn has very many conversion options that can be used from the command line, many more that can be configured via an options file, and provides many hooks to allow even more extreme customization by writing Python code.

Feature summary

No information lost
cvs2svn works hard to avoid losing any information from your CVS repository (unless you specifically ask for a partial conversion using --trunk-only or --exclude).
Changesets
CVS records modifications file-by-file, and does not keep track of what files were modified at the same time. cvs2svn uses information like the file modification times, log messages, and dependency information to deduce the original changesets. cvs2svn allows changesets that affect multiple branches and/or multiple projects (as is allowed by CVS), or it can be configured to split such changesets up into separate commits (--no-cross-branch-commits; see also options file).
Multiproject conversions
cvs2svn can convert a CVS repository that contains multiple projects into a single Subversion repository with the conventional multiproject directory layout. See the FAQ for more information.
Branch vs. tag
CVS allows the same symbol name to be used sometimes as a branch, sometimes as a tag. cvs2svn has options and heuristics to decide how to convert such "mixed" symbols (--symbol-hints, --force-branch, --force-tag, --symbol-default).
Branch/tag exclusion
cvs2svn allows the user to specify branches and/or tags that should be excluded from the conversion altogether (--symbol-hints, --exclude). It checks that the exclusions are self-consistent (e.g., it doesn't allow a branch to be excluded if a branch that sprouts from it is not excluded).
Branch/tag renaming
cvs2svn can rename branches and tags during the conversion using regular-expression patterns (--symbol-transform).
Choosing SVN paths for branches/tags
You can choose what SVN paths to use as the trunk/branches/tags directories (--trunk, --branches, --tags), or set arbitrary paths for specific CVS branches/tags (--symbol-hints). For example, you might want to store some tags to the project/tags directory, but others to project/releases.
Branch and tag parents
In many cases, the CVS history is ambiguous about which branch served as the parent of another branch or tag. cvs2svn determines the most plausible parent for symbols using cross-file information. You can override cvs2svn's choices on a case-by-case basis by using the --symbol-hints option.
Branch and tag creation times
CVS does not record when branches and tags are created. cvs2svn creates branches and tags at a reasonable time, consistent with the file revisions that were tagged, and tries to create each one within a single Subversion commit if possible.
Mime types
CVS does not record files' mime types. cvs2svn provides several mechanisms for choosing reasonable file mime types (--mime-types, --auto-props).
Binary vs. text
Many CVS users do not systematically record which files are binary and which are text. (This is mostly important if the repository is used on non-Unix systems.) cvs2svn provides a number of ways to infer this information (--eol-from-mime-type, --default-eol, --keywords-off, --auto-props).
Subversion file properties
Subversion allows arbitrary text properties to be attached to files. cvs2svn provides a mechanism to set such properties when a file is first added to the repository (--auto-props) as well as a hook that users can use to set arbitrary file properties via Python code.
Handling of .cvsignore
.cvsignore files in the CVS repository are converted into the equivalent svn:ignore properties in the output. By default, the .cvsignore files themselves are not included in the output; this behavior can be changed by specifying the --keep-cvsignore option.
Subversion repository customization
cvs2svn provides many options that allow you to customize the structure of the resulting Subversion repository (--trunk, --branches, --tags, --include-empty-directories, --no-prune, --symbol-transform, etc.; see also the additional customization options available by using the --options-file method).
Support for multiple character encodings
CVS does not record which character encoding was used to store metainformation like file names, author names and log messages. cvs2svn provides options to help convert such text into UTF-8 (--encoding, --fallback-encoding).
Vendor branches
CVS supports "vendor branches", which (under some circumstances) affect the contents of the main line of development. cvs2svn detects vendor branches whenever possible and handles them intelligently. For example,
  • cvs2svn explicitly copies vendor branch revisions back to trunk so that a checkout of trunk gives the same results under SVN as under CVS.
  • If a vendor branch is excluded from the conversion, cvs2svn grafts the relevant vendor branch revisions onto trunk so that the contents of trunk are still the same as in CVS. If other tags or branches sprout from these revisions, they are grafted to trunk as well.
  • When a file is imported into CVS, CVS creates two revisions ("1.1" and "1.1.1.1") with the same contents. cvs2svn discards the redundant "1.1" revision in such cases (since revision "1.1.1.1" will be copied to trunk anyway).
  • Often users create vendor branches unnecessarily by using "cvs import" to import their own sources into the CVS repository. Such vendor branches do not contain any useful information, so by default cvs2svn excludes any vendor branch that was only used for a single import. You can change this default behavior by specifying the --keep-trivial-imports option.
CVS quirks
cvs2svn goes to great length to deal with CVS's many quirks. For example,
  • CVS introduces spurious "1.1" revisions when a file is added on a branch. cvs2svn discards these revisions.
  • If a file is added on a branch, CVS introduces a spurious "dead" revision at the beginning of the branch to indicate that the file did not exist when the branch was created. cvs2svn deletes these spurious revisions and adds the file on the branch at the correct time.
Robust against repository corruption
cvs2svn knows how to handle several types of CVS repository corruption that have been reported frequently, and gives informative error messages in other cases:
  • An RCS file that exists both in and out of the "Attic" directory.
  • Multiple deltatext blocks for a single CVS file revision.
  • Multiple revision headers for the same CVS file revision.
  • Tags and branches that refer to non-existent revisions or ill-formed revision numbers.
  • Repeated definitions of a symbol name to the same revision number.
  • Branches that have no associated labels.
  • A directory name that conflicts with a file name (in or out of the Attic).
  • Filenames that contain forbidden characters.
  • Log messages with variant end-of-line styles.
  • Vendor branch declarations that refer to non-existent branches.
Timestamp error correction
Many CVS repositories contain timestamp errors due to servers' clocks being set incorrectly during part of the repository's history. cvs2svn's history reconstruction is relatively robust against timestamp errors and it writes monotonic timestamps to the Subversion repository.
Scalable
cvs2svn stores most intermediate data to on-disk databases so that it can convert very large CVS repositories using a reasonable amount of RAM. Conversions are organized as multiple passes and can be restarted at an arbitrary pass in the case of problems.
Configurable/extensible using Python
Many aspects of the conversion can be customized using Python plugins that interact with cvs2svn through documented interfaces (--options).