[Beowulf] strange problem with large file moving between server
j.sassmannshausen at ucl.ac.uk
Thu Oct 23 13:06:19 PDT 2014
further my last email, the problem is sorted. In the end it turned out that
the SCSI HBA had a problem. Trying to update the firmware resulted in a
complete inoperable card. :-(
Fortunately, as I had a different card the problem is sorted now.
Thanks to everybody for their suggestions.
All the best from London
On Sonntag 21 September 2014 Jörg Saßmannshausen wrote:
> Dear all,
> I got a rather strange problem with one of my file servers which I recently
> have upgraded in order to accommodate more disc space.
> The problem: I have copies the files from the old file space to a temporary
> disc storage space using this rsync command:
> rsync -vrltH -pgo --stats -D --numeric-ids -x oldserver:foo tempspace:baa
> I am doing this now for some years and never had any problems.
> As always, I am running md5sum afterwards to be sure ther is not a problem
> later and the user is loosing data. This time around a rather large file
> (around 16 GB) the md5sum failed after I moved the files from the temp
> space back to the new destination using the same command as above.
> Having still access to the old file space, I decided to move this file from
> the old file space. Strangely enough, rsync does not sync the file again
> so I had to delete the file. Even after deleting the file and re-sync it
> from the old source, the md5sum is wrong.
> Copying the file to a different file space did not cause these problem,
> i.e. the md5sum is correct.
> As it is a tar.gz file, I simply decided to decompress the original file on
> the different file server. That worked. The file where the md5sum is wrong
> did not decompress on the different file server but crashed with an error
> message when I executed gunzip. So the file is broken.
> The setup:
> Originally I was using an old Infortrand box which had old PATA discs in
> it. This box is connected via scsi to a frontend server which exports the
> file space via iscsi. The backend for that, i.e. the one the user is
> accessing is on a different physical machine and it is a XEN guest. The
> reason behind that setting is as the frontend is acting as a backup server
> and I don't want people to have access to it.
> I then exchanged the Infortrend box with a more recent model which got SATA
> capeabilities but still got scsi connection to the frontend. The frontend
> is the same. I got a new controller for that box as the old one was
> broken. There is no changes in the backend, that is still the same XEN
> guest on the same hardware.
> What I cannot work out is why the old Infortrend box does not have any
> problems with the new file, the newer one has a problem here. Also, when I
> have copied over some files (again using the rsync command above) a few
> files did not copy correctly (again md5sum) in the first instance but done
> so later.
> I find that highly alarming as that means that at least for larger and/or
> some binary files there seems to be a problem. However, I am not sure
> there to look at it as I am out of ideas.
> Could it be there is a problem with the 'new' controller?
> In all cases I was using ext4 as a file system and I did not have any
> problems with that.
> Anybody got some sentiments here?
> All the best from a sunny London
> P.S. To make things worse I am off on a work related trip from Monday
> onwards and I am working on that problem since Friday evening.
Dr. Jörg Saßmannshausen, MRSC
University College London
Department of Chemistry
email: j.sassmannshausen at ucl.ac.uk
Please avoid sending me Word or PowerPoint attachments.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 230 bytes
Desc: This is a digitally signed message part.
More information about the Beowulf