[Beowulf] strange problem with large file moving between server

Dimitris Zilaskos dimitrisz at gmail.com
Thu Oct 2 02:35:13 PDT 2014


Hello,

RAM somewhere could also be faulty. Have a look at the logs for any ECC
errors (both system memory and RAID controller) and memtest the boxes
involved for a couple of days. I would suggest some stress testing of the
new server if not done already.

Best regards,

Dimitris



On Sun, Sep 21, 2014 at 3:22 PM, Jörg Saßmannshausen <
j.sassmannshausen at ucl.ac.uk> wrote:

> Dear all,
>
> I got a rather strange problem with one of my file servers which I recently
> have upgraded in order to accommodate more disc space.
>
> The problem: I have copies the files from the old file space to a
> temporary disc
> storage space using this rsync command:
>
> rsync -vrltH -pgo --stats -D --numeric-ids -x oldserver:foo  tempspace:baa
>
> I am doing this now for some years and never had any problems.
>
> As always, I am running md5sum afterwards to be sure ther is not a problem
> later and the user is loosing data. This time around a rather large file
> (around 16 GB) the md5sum failed after I moved the files from the temp
> space
> back to the new destination using the same command as above.
>
> Having still access to the old file space, I decided to move this file
> from the
> old file space. Strangely enough, rsync does not sync the file again so I
> had to
> delete the file. Even after deleting the file and re-sync it from the old
> source, the md5sum is wrong.
>
> Copying the file to a different file space did not cause these problem,
> i.e. the
> md5sum is correct.
> As it is a tar.gz file, I simply decided to decompress the original file
> on the
> different file server. That worked. The file where the md5sum is wrong did
> not
> decompress on the different file server but crashed with an error message
> when I
> executed gunzip. So the file is broken.
>
> The setup:
>
> Originally I was using an old Infortrand box which had old PATA discs in
> it.
> This box is connected via scsi to a frontend server which exports the file
> space via iscsi. The backend for that, i.e. the one the user is accessing
> is
> on a different physical machine and it is a XEN guest. The reason behind
> that
> setting is as the frontend is acting as a backup server and I don't want
> people to have access to it.
> I then exchanged the Infortrend box with a more recent model which got SATA
> capeabilities but still got scsi connection to the frontend. The frontend
> is
> the same. I got a new controller for that box as the old one was broken.
> There is no changes in the backend, that is still the same XEN guest on the
> same hardware.
>
> What I cannot work out is why the old Infortrend box does not have any
> problems with the new file, the newer one has a problem here. Also, when I
> have
> copied over some files (again using the rsync command above) a few files
> did not
> copy correctly (again md5sum) in the first instance but done so later.
>
> I find that highly alarming as that means that at least for larger and/or
> some
> binary files there seems to be a problem. However, I am not sure there to
> look
> at it as I am out of ideas.
>
> Could it be there is a problem with the 'new' controller?
> In all cases I was using ext4 as a file system and I did not have any
> problems
> with that.
>
> Anybody got some sentiments here?
>
> All the best from a sunny London
>
> Jörg
>
> P.S. To make things worse I am off on a work related trip from Monday
> onwards
> and I am working on that problem since Friday evening.
>
>
>
> --
> *************************************************************
> Dr. Jörg Saßmannshausen, MRSC
> University College London
> Department of Chemistry
> Gordon Street
> London
> WC1H 0AJ
>
> email: j.sassmannshausen at ucl.ac.uk
> web: http://sassy.formativ.net
>
> Please avoid sending me Word or PowerPoint attachments.
> See http://www.gnu.org/philosophy/no-word-attachments.html
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20141002/f3b77303/attachment.html>


More information about the Beowulf mailing list