Discussion:
[Grml] dupmerge - : Value too large for defined data type
hansbkk
2011-02-06 15:59:54 UTC
Permalink
First of all, heaps of gratitude for the excellent contribution,
tremendously helpful tool.

I really like the way you keep my mdadm raids in order, compared to
some of the mangling I see under other "rescue disks" (actually
rw-writing the preferred minor?)

What brought me to Grml was the fact that you include the dupmerge
tool out of the box, but in running it over a large set of video files
(around 5 TB at the moment, ext3 filesystem) I'm getting tons of the
above error messages on large files (some are up to 16GB). Having such
a tool fail on the largest files kind of defeats the purpose doesn't
it 8-)

I see you're running v1.70, while Sourceforge lists v1.73 as current
(as of 2008!), and I see references on the interwebs to v1.74, so
consider this if nothing else a request to update.

If it's not difficult it's also be great to be able to check out
another one or two similar deduplication tools unless you've already
checked them all out and consider dupmerge2 the best?

Here's a good overview:

http://www.asheesh.org/note/software/duplicate-files.html

My vote would be for freedup and rdfind.

Thanks for your consideration
T o n g
2011-02-07 00:42:26 UTC
Permalink
Post by hansbkk
http://www.asheesh.org/note/software/duplicate-files.html
which he picked rdfind as the winner.
Post by hansbkk
My vote would be for freedup and rdfind.
+1 for rdfind, which was included in sid not long ago.

I looked at its algorithm and speed comparison -- it is definitely a
winner.
--
Tong (remove underscore(s) to reply)
http://xpt.sourceforge.net/techdocs/
http://xpt.sourceforge.net/tools/
Michael Prokop
2011-02-07 12:26:29 UTC
Permalink
Post by T o n g
Post by hansbkk
http://www.asheesh.org/note/software/duplicate-files.html
which he picked rdfind as the winner.
Post by hansbkk
My vote would be for freedup and rdfind.
+1 for rdfind, which was included in sid not long ago.
I looked at its algorithm and speed comparison -- it is definitely a
winner.
Therefore I just added rdfind to the software selection of GRML_FULL.
Thanks for reporting back!

regards,
-mika-
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://lists.mur.at/pipermail/grml/attachments/20110207/43d6b7cd/attachment.pgp>
hansbkk
2011-02-09 19:22:16 UTC
Permalink
Thanks much Mika, a great (and quick!) response.

I've been doing some scripting based on dupmerge, so please also let
us know whether that will be updated or replaced by rdfind.
Post by T o n g
Post by T o n g
Post by hansbkk
http://www.asheesh.org/note/software/duplicate-files.html
which he picked rdfind as the winner.
+1 for rdfind, which was included in sid not long ago.
Therefore I just added rdfind to the software selection of GRML_FULL.
Thanks for reporting back!
regards,
-mika-
Ulrich Dangel
2011-02-09 20:36:58 UTC
Permalink
Post by hansbkk
I've been doing some scripting based on dupmerge, so please also let
us know whether that will be updated or replaced by rdfind.
rdfind is already included in grml-full. It should be also in the daily
images. Im not sure if dupmerge will be dropped but it is still
included in Grml.

Ulrich
--
twitter: @mr_ud | identica: @mru
IRCNet: mru | freenode: mrud
Ulrich Dangel
2011-02-09 20:36:58 UTC
Permalink
Post by hansbkk
I've been doing some scripting based on dupmerge, so please also let
us know whether that will be updated or replaced by rdfind.
rdfind is already included in grml-full. It should be also in the daily
images. Im not sure if dupmerge will be dropped but it is still
included in Grml.

Ulrich
--
twitter: @mr_ud | identica: @mru
IRCNet: mru | freenode: mrud
Ulrich Dangel
2011-02-09 20:36:58 UTC
Permalink
Post by hansbkk
I've been doing some scripting based on dupmerge, so please also let
us know whether that will be updated or replaced by rdfind.
rdfind is already included in grml-full. It should be also in the daily
images. Im not sure if dupmerge will be dropped but it is still
included in Grml.

Ulrich
--
twitter: @mr_ud | identica: @mru
IRCNet: mru | freenode: mrud
hansbkk
2011-02-09 19:22:16 UTC
Permalink
Thanks much Mika, a great (and quick!) response.

I've been doing some scripting based on dupmerge, so please also let
us know whether that will be updated or replaced by rdfind.
Post by T o n g
Post by T o n g
Post by hansbkk
http://www.asheesh.org/note/software/duplicate-files.html
which he picked rdfind as the winner.
+1 for rdfind, which was included in sid not long ago.
Therefore I just added rdfind to the software selection of GRML_FULL.
Thanks for reporting back!
regards,
-mika-
hansbkk
2011-02-09 19:22:16 UTC
Permalink
Thanks much Mika, a great (and quick!) response.

I've been doing some scripting based on dupmerge, so please also let
us know whether that will be updated or replaced by rdfind.
Post by T o n g
Post by T o n g
Post by hansbkk
http://www.asheesh.org/note/software/duplicate-files.html
which he picked rdfind as the winner.
+1 for rdfind, which was included in sid not long ago.
Therefore I just added rdfind to the software selection of GRML_FULL.
Thanks for reporting back!
regards,
-mika-
Michael Prokop
2011-02-07 12:26:29 UTC
Permalink
Post by T o n g
Post by hansbkk
http://www.asheesh.org/note/software/duplicate-files.html
which he picked rdfind as the winner.
Post by hansbkk
My vote would be for freedup and rdfind.
+1 for rdfind, which was included in sid not long ago.
I looked at its algorithm and speed comparison -- it is definitely a
winner.
Therefore I just added rdfind to the software selection of GRML_FULL.
Thanks for reporting back!

regards,
-mika-
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://ml.grml.org/pipermail/grml/attachments/20110207/43d6b7cd/attachment-0002.pgp>
Michael Prokop
2011-02-07 12:26:29 UTC
Permalink
Post by T o n g
Post by hansbkk
http://www.asheesh.org/note/software/duplicate-files.html
which he picked rdfind as the winner.
Post by hansbkk
My vote would be for freedup and rdfind.
+1 for rdfind, which was included in sid not long ago.
I looked at its algorithm and speed comparison -- it is definitely a
winner.
Therefore I just added rdfind to the software selection of GRML_FULL.
Thanks for reporting back!

regards,
-mika-
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://ml.grml.org/pipermail/grml/attachments/20110207/43d6b7cd/attachment-0003.pgp>
hansbkk
2011-02-06 15:59:54 UTC
Permalink
First of all, heaps of gratitude for the excellent contribution,
tremendously helpful tool.

I really like the way you keep my mdadm raids in order, compared to
some of the mangling I see under other "rescue disks" (actually
rw-writing the preferred minor?)

What brought me to Grml was the fact that you include the dupmerge
tool out of the box, but in running it over a large set of video files
(around 5 TB at the moment, ext3 filesystem) I'm getting tons of the
above error messages on large files (some are up to 16GB). Having such
a tool fail on the largest files kind of defeats the purpose doesn't
it 8-)

I see you're running v1.70, while Sourceforge lists v1.73 as current
(as of 2008!), and I see references on the interwebs to v1.74, so
consider this if nothing else a request to update.

If it's not difficult it's also be great to be able to check out
another one or two similar deduplication tools unless you've already
checked them all out and consider dupmerge2 the best?

Here's a good overview:

http://www.asheesh.org/note/software/duplicate-files.html

My vote would be for freedup and rdfind.

Thanks for your consideration
T o n g
2011-02-07 00:42:26 UTC
Permalink
Post by hansbkk
http://www.asheesh.org/note/software/duplicate-files.html
which he picked rdfind as the winner.
Post by hansbkk
My vote would be for freedup and rdfind.
+1 for rdfind, which was included in sid not long ago.

I looked at its algorithm and speed comparison -- it is definitely a
winner.
--
Tong (remove underscore(s) to reply)
http://xpt.sourceforge.net/techdocs/
http://xpt.sourceforge.net/tools/
hansbkk
2011-02-06 15:59:54 UTC
Permalink
First of all, heaps of gratitude for the excellent contribution,
tremendously helpful tool.

I really like the way you keep my mdadm raids in order, compared to
some of the mangling I see under other "rescue disks" (actually
rw-writing the preferred minor?)

What brought me to Grml was the fact that you include the dupmerge
tool out of the box, but in running it over a large set of video files
(around 5 TB at the moment, ext3 filesystem) I'm getting tons of the
above error messages on large files (some are up to 16GB). Having such
a tool fail on the largest files kind of defeats the purpose doesn't
it 8-)

I see you're running v1.70, while Sourceforge lists v1.73 as current
(as of 2008!), and I see references on the interwebs to v1.74, so
consider this if nothing else a request to update.

If it's not difficult it's also be great to be able to check out
another one or two similar deduplication tools unless you've already
checked them all out and consider dupmerge2 the best?

Here's a good overview:

http://www.asheesh.org/note/software/duplicate-files.html

My vote would be for freedup and rdfind.

Thanks for your consideration
T o n g
2011-02-07 00:42:26 UTC
Permalink
Post by hansbkk
http://www.asheesh.org/note/software/duplicate-files.html
which he picked rdfind as the winner.
Post by hansbkk
My vote would be for freedup and rdfind.
+1 for rdfind, which was included in sid not long ago.

I looked at its algorithm and speed comparison -- it is definitely a
winner.
--
Tong (remove underscore(s) to reply)
http://xpt.sourceforge.net/techdocs/
http://xpt.sourceforge.net/tools/
Loading...