Duplicated files not the same? Interpreting the diff command (solved - sort of)

Issues and / or general discussion relating to Puppy

Moderator: Forum moderators

Post Reply
User avatar
cobaka
Posts: 572
Joined: Thu Jul 16, 2020 6:04 am
Location: Central Coast, NSW - au
Has thanked: 94 times
Been thanked: 63 times

Duplicated files not the same? Interpreting the diff command (solved - sort of)

Post by cobaka »

Hello all

For conclusion - see my second posting

I began with the file S15Pup32-22.12-230901.iso and used the command "duplicate" in ROX to create a new file.
I called this "file_S15X". I duplicated S15X (using ROX) to create a second file called "file_S15Y".

I ran diff and got the result below:
# diff -q <(xxd ./file_S15X) <(xxd ./file_S15Y) <--< the command
Files /dev/fd/63 and /dev/fd/62 differ <--< the result

What's going on here?
How can 'duplicated' files differ?
Clearly I don't understand something. What don't I understand?

собака

Last edited by cobaka on Fri Oct 13, 2023 7:16 am, edited 1 time in total.

собака --> это Русский --> a dog
"c" -- say "s" - as in "see" or "scent" or "sob".

User avatar
Keef
Posts: 274
Joined: Tue Dec 03, 2019 8:05 pm
Has thanked: 3 times
Been thanked: 75 times

Re: interpreting the diff command

Post by Keef »

Why can't you run diff on the 2 files directly? As in: diff -q file_S15X file_S15Y.

User avatar
Trapster
Posts: 181
Joined: Sat Aug 01, 2020 7:44 pm
Has thanked: 1 time
Been thanked: 52 times

Re: interpreting the diff command

Post by Trapster »

I cannot answer why "duplicate" gives you files that differ but here is a good page on comparing ISO files.

https://linuxhint.com/comparing_iso_images/

User avatar
bigpup
Moderator
Posts: 6985
Joined: Tue Jul 14, 2020 11:19 pm
Location: Earth, South Eastern U.S.
Has thanked: 906 times
Been thanked: 1522 times

Re: interpreting the diff command

Post by bigpup »

diff -q file_S15X file_S15Y

This should work, if the files are located in the same place on the partition, and in Rox window looking at this location, you select right click menu ->Window ->terminal here, and run this in that terminal.

Works for me.

I tried a few files, making duplicates as you did, and diff checking them.

For me it returned no differences.

I made a change in one to see if it detected this.

diff found the two files did now differ.

What Puppy version are you using?

What Rox version?

Maybe do the duplication's over again.

The things you do not tell us, are usually the clue to fixing the problem.
When I was a kid, I wanted to be older.
This is not what I expected :o

User avatar
Keef
Posts: 274
Joined: Tue Dec 03, 2019 8:05 pm
Has thanked: 3 times
Been thanked: 75 times

Re: interpreting the diff command

Post by Keef »

The files are in the same folder, but Cobaka is first running them through xxd (which I think has something to do with hexdumps). This means it is not really a diff problem, as another process is involved.
That's why I suggested just using diff.

User avatar
bigpup
Moderator
Posts: 6985
Joined: Tue Jul 14, 2020 11:19 pm
Location: Earth, South Eastern U.S.
Has thanked: 906 times
Been thanked: 1522 times

Re: interpreting the diff command

Post by bigpup »

I did name the duplicate file with the same type label at the end.

.iso
.txt
.pdf
etc.....................

If you duplicated .iso file

did you put .iso at the end of the duplicate file name? :idea:

The things you do not tell us, are usually the clue to fixing the problem.
When I was a kid, I wanted to be older.
This is not what I expected :o

User avatar
MochiMoppel
Posts: 1233
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 21 times
Been thanked: 437 times

Re: interpreting the diff command

Post by MochiMoppel »

Keef wrote: Thu Oct 12, 2023 6:32 pm

The files are in the same folder, but Cobaka is first running them through xxd (which I think has something to do with hexdumps). This means it is not really a diff problem, as another process is involved.
That's why I suggested just using diff.

..but that would have been too easy :lol:
I'm pretty sure it would have shown that the 2 duplicates are in fact identical.
Could also have been confirmed with running md5sum file_S15X file_S15Y

However I have no idea why Cobaka did what he did. It can have dramatic effects.
I tried to replicate his issue with an iso of F96-CE, size 475M
Used ROX to duplicate the iso to file A, then duplicated A to B . Everything fine, Diff and mdsum report that both files are equal.

Then I used bash process substitution to feed the ouput of xxd to diff (diff requires files as input)
diff <(xxd A) <(xxd B)
As a result my system froze. Had to do a hard reset :twisted:

Next attempt: Redirecting xxd A to file AX and xxd B to file BX.
The resulting hexdump files are - as can be expected - huge: 2GB each.
Can diff compare two 2GB text files on my machine? I have only 2GB RAM...
Short answer: It can't:

Code: Select all

# diff AX BX
Killed

I like the "Killed " response. Much better than what bash did by actually killing my whole system.

Maybe Cobaka's bash version is different, keeping file descriptors /dev/fd/63 and /dev/fd/62 open in a somewhat truncated state, readable by diff and subsequently leading to a correct "Files differ" message, but this wouldn't mean that the 2 duplicated binaries are different. Anyway, none of my problems :lol:

User avatar
cobaka
Posts: 572
Joined: Thu Jul 16, 2020 6:04 am
Location: Central Coast, NSW - au
Has thanked: 94 times
Been thanked: 63 times

Re: Duplicated files not the same? Interpreting the diff command (final)

Post by cobaka »

This posting is about the 'diff' command (from the terminal). I 'diff'd two files that should have been identical.
Following observations from @MochiMoppel, @bigpup, @Trapster and @Keef I created two new files. (I erased the original files).
Again, I used 'duplicate' from ROX. This time I retained the *.iso designation.
I ran the command:

# diff -s S_X.iso S_Y.iso
I got:
Files S_X.iso and S_Y.iso are identical

Then
# diff -q <(xxd ./S_X.iso) <(xxd ./S_Y.iso)
# diff -s <(xxd ./S_X.iso) <(xxd ./S_Y.iso)
I got
Files /dev/fd/63 and /dev/fd/62 are identical

Have no idea whы /dev/fd/62 and 63 are mentioned AND the report above was copied using cut and paste from the terminal screen.
And yes - looking at the terminal screen, I saw EXACTLY the text you see above.
Perhaps '62' and '63' are an intermediate result from xxd - a hex dump program.

However - the result of the comparison today came in moments.
Yesterday the comparison took many seconds. Perhaps a minute.

I cannot explain, but think (perhaps) "diff" was "diffing" an entire directory.
I sent the output to a text file ( > dump.txt ) At the end, the text file was greater than 2GiB.
I tried to view dump.txt with Geany. That brought uPupBB64 - the Bionic Pup to it's knees. I had to re-boot.

Thanks to all who answered. This is the first time I used the diff command. Trying is a good way to learn Linux.
Understanding terminal commands takes effort, but I think the reward is worth the effort.

собака

собака --> это Русский --> a dog
"c" -- say "s" - as in "see" or "scent" or "sob".

Burunduk
Posts: 251
Joined: Thu Jun 16, 2022 6:16 pm
Has thanked: 7 times
Been thanked: 127 times

Re: Duplicated files not the same? Interpreting the diff command (solved - sort of)

Post by Burunduk »

cobaka wrote: Thu Oct 12, 2023 7:43 am

I ran diff and got the result below:
# diff -q <(xxd ./file_S15X) <(xxd ./file_S15Y) <--< the command
Files /dev/fd/63 and /dev/fd/62 differ <--< the result

This is a very convoluted way to compare files.
Tested it with an about 500 MB iso file and two its duplicates on Fossapup64. It took about 5 minutes (!) to complete.
The result is weird:
first OK,
later - the files differ.
It turned out that this happened after suspending. Random file, random offset, and exactly 12 consecutive bytes changed.
It looks like a hardware problem but I don't get it. My system often runs for a week or so and is suspended many times between reboots. The puppy sfs files are copied to RAM. I should have noticed this problem before.
What I'm trying to say is: there is a small probability that those two files was indeed different.

MochiMoppel wrote: Fri Oct 13, 2023 3:07 am

Then I used bash process substitution to feed the ouput of xxd to diff (diff requires files as input)
diff <(xxd A) <(xxd B)
As a result my system froze. Had to do a hard reset :twisted:

The memory usage depends on the -q option.
In my tests, the CPU usage was about 50%, RAM usage:
diff -q <(xxd A) <(xxd B) - no noticeable change,
diff <(xxd A) <(xxd B) - increase steadily up to several gigabytes, released when done.

User avatar
MochiMoppel
Posts: 1233
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 21 times
Been thanked: 437 times

Re: Duplicated files not the same? Interpreting the diff command (solved - sort of)

Post by MochiMoppel »

Burunduk wrote: Fri Oct 13, 2023 1:43 pm

The memory usage depends on the -q option.
In my tests, the CPU usage was about 50%, RAM usage:
diff -q <(xxd A) <(xxd B) - no noticeable change,
diff <(xxd A) <(xxd B) - increase steadily up to several gigabytes, released when done.

OK, using the -q option, also used by Cobaka, might have prevented the crash, but it would have made the whole exercise even more pointless. I thought that the idea was to compare hexdumps in order to find out, where the ISO binaries differ. The -q option only tells us if they differ or not, and this could have been answered without creating these hexdump monsters.

A much more efficient way to find differing bytes in binaries would be to use the cmp command (or maybe even xdelta3). In contrast diff compares text files, loads everything into RAM and is the wrong tool for the job.

Post Reply

Return to “Users”