Jun 28, 2010

Java copies files 80 times fast than commercial backup software

I cannot believe what I saw today. I bought a brand new hard drive, a 500 GB 2.5" USB hard disk. It's  a Western Digital My Passport Essential. It was about 15 Euros more expensive than a another 500 GB 2.5" USB drive from Western Digital - so it must be the software. I thought: Okay, let's try this backup software, I really need one.
So I installed everything and ran my first backup. After almost half an hour I took a look at the result. I was shocked. At this speed, the full backup would take days! I startet a self-written Java program called syncr with the sole purpose to copy files fast. Syncr was over 80 times faster. Of course, the comparison is not entirely fair. The software from Western Digital, called WD SmartWare 1.2.0.8, Copyright 2009 by Western Digital, can do much more than my hand-written syncr. The table below lists all differences. Nevertheless, 80 times is really a lot.

Syncr uses very few ticks to achieve this speed:
  • It does not copy redundant information - However, the speedup was measured by those files that were copied
    • It skips redundant folders with these names Temp", "RECYCLER", "System Volume Information", "Google Desktop"
    • It always skips files named "hiberfil.sys", "pagefile.sys", "Thumbs.db", "autorun.inf", "UsrClass.dat", "UsrClass.dat.LOG", "ntuser.dat.LOG","ntuser.dat", "parent.lock"
    • It does not copy files that are already present at the target drive with the same name and change date
  • It writes some kind of directories directly into a zip file in the target drive - this is much faster than copying first and zipping later and even faster than not zipping. Many small files cost a lot of file system operations. By default, only Eclipse projects and Eclipse workspaces are zipped.Overall, not many files are zipped.
  • It uses Java NIO, which is faster than the old Java.io.
All in all I cannot explain the slow speed of WD SmartWare nor the fast speed of Syncr. As a side remark: Syncr is also faster than the Windows Explorer (which is why Syncr has been written initially).
    Quick facts about Syncr:
    Because of all this, I released Syncr into the wild today. Enjoy.

    Feature WD SmartWare Syncr
    Business model Commercial Open source (BSD)
    Front-end graphical Java source code, run from Eclipse
    Installer Windows, Mac None
    Languages 28 English only
    Usability very easy easy for Java developers of any level, impossible for non-developers
    Design Aesthetic visualisation of current hard drive content Accurate log messages
    Backup storage format Special folder system that contains the original files plus additional .dcm files Original folder structure with original files. Some folders are automatically stores as .zip archives.
    Product URL http://www.wdc.com/en/products/wdsmartware/ http://code.google.com/p/syncr/
    Versioning (can get back an older version of a file) Yes No
    Incremental backup Yes Yes
    Constant incremental background backup Yes No
    Experimental resultsExperimental resultsExperimental results
    CPU usage 20-40% 20-40%
    Files created to backup 100 files ca. 200 100 or just one zip file
    Memory usage 164 MB 10-400 MB
    Files copied after first 25 minutes 4200 files 5152 files
    (1 resulting zip file counting as 1 file)
    Data copied after first 25 minutes 136 MB 11 GB
    Estimated time for a full backup (of my 88 GB) 11 days, 5.6 hours 3.4 hours

    2 comments:

    1. Reimplementing rsync? :)

      It would be interesting to compare it with rsync, from cygwin say?

      ReplyDelete
    2. Hmm, nice idea. One day I might do that :-)

      ReplyDelete