Page 1 of 1

Duplicated files not the same? Interpreting the diff command (solved - sort of)

Posted: Thu Oct 12, 2023 7:43 am
by cobaka

Hello all

For conclusion - see my second posting

I began with the file S15Pup32-22.12-230901.iso and used the command "duplicate" in ROX to create a new file.
I called this "file_S15X". I duplicated S15X (using ROX) to create a second file called "file_S15Y".

I ran diff and got the result below:
# diff -q <(xxd ./file_S15X) <(xxd ./file_S15Y) <--< the command
Files /dev/fd/63 and /dev/fd/62 differ <--< the result

What's going on here?
How can 'duplicated' files differ?
Clearly I don't understand something. What don't I understand?

собака


Re: interpreting the diff command

Posted: Thu Oct 12, 2023 12:18 pm
by Keef

Why can't you run diff on the 2 files directly? As in: diff -q file_S15X file_S15Y.


Re: interpreting the diff command

Posted: Thu Oct 12, 2023 1:34 pm
by Trapster

I cannot answer why "duplicate" gives you files that differ but here is a good page on comparing ISO files.

https://linuxhint.com/comparing_iso_images/


Re: interpreting the diff command

Posted: Thu Oct 12, 2023 6:26 pm
by bigpup

diff -q file_S15X file_S15Y

This should work, if the files are located in the same place on the partition, and in Rox window looking at this location, you select right click menu ->Window ->terminal here, and run this in that terminal.

Works for me.

I tried a few files, making duplicates as you did, and diff checking them.

For me it returned no differences.

I made a change in one to see if it detected this.

diff found the two files did now differ.

What Puppy version are you using?

What Rox version?

Maybe do the duplication's over again.


Re: interpreting the diff command

Posted: Thu Oct 12, 2023 6:32 pm
by Keef

The files are in the same folder, but Cobaka is first running them through xxd (which I think has something to do with hexdumps). This means it is not really a diff problem, as another process is involved.
That's why I suggested just using diff.


Re: interpreting the diff command

Posted: Thu Oct 12, 2023 6:35 pm
by bigpup

I did name the duplicate file with the same type label at the end.

.iso
.txt
.pdf
etc.....................

If you duplicated .iso file

did you put .iso at the end of the duplicate file name? :idea:


Re: interpreting the diff command

Posted: Fri Oct 13, 2023 3:07 am
by MochiMoppel
Keef wrote: Thu Oct 12, 2023 6:32 pm

The files are in the same folder, but Cobaka is first running them through xxd (which I think has something to do with hexdumps). This means it is not really a diff problem, as another process is involved.
That's why I suggested just using diff.

..but that would have been too easy :lol:
I'm pretty sure it would have shown that the 2 duplicates are in fact identical.
Could also have been confirmed with running md5sum file_S15X file_S15Y

However I have no idea why Cobaka did what he did. It can have dramatic effects.
I tried to replicate his issue with an iso of F96-CE, size 475M
Used ROX to duplicate the iso to file A, then duplicated A to B . Everything fine, Diff and mdsum report that both files are equal.

Then I used bash process substitution to feed the ouput of xxd to diff (diff requires files as input)
diff <(xxd A) <(xxd B)
As a result my system froze. Had to do a hard reset :twisted:

Next attempt: Redirecting xxd A to file AX and xxd B to file BX.
The resulting hexdump files are - as can be expected - huge: 2GB each.
Can diff compare two 2GB text files on my machine? I have only 2GB RAM...
Short answer: It can't:

Code: Select all

# diff AX BX
Killed

I like the "Killed " response. Much better than what bash did by actually killing my whole system.

Maybe Cobaka's bash version is different, keeping file descriptors /dev/fd/63 and /dev/fd/62 open in a somewhat truncated state, readable by diff and subsequently leading to a correct "Files differ" message, but this wouldn't mean that the 2 duplicated binaries are different. Anyway, none of my problems :lol:


Re: Duplicated files not the same? Interpreting the diff command (final)

Posted: Fri Oct 13, 2023 7:15 am
by cobaka

This posting is about the 'diff' command (from the terminal). I 'diff'd two files that should have been identical.
Following observations from @MochiMoppel, @bigpup, @Trapster and @Keef I created two new files. (I erased the original files).
Again, I used 'duplicate' from ROX. This time I retained the *.iso designation.
I ran the command:

# diff -s S_X.iso S_Y.iso
I got:
Files S_X.iso and S_Y.iso are identical

Then
# diff -q <(xxd ./S_X.iso) <(xxd ./S_Y.iso)
# diff -s <(xxd ./S_X.iso) <(xxd ./S_Y.iso)
I got
Files /dev/fd/63 and /dev/fd/62 are identical

Have no idea whы /dev/fd/62 and 63 are mentioned AND the report above was copied using cut and paste from the terminal screen.
And yes - looking at the terminal screen, I saw EXACTLY the text you see above.
Perhaps '62' and '63' are an intermediate result from xxd - a hex dump program.

However - the result of the comparison today came in moments.
Yesterday the comparison took many seconds. Perhaps a minute.

I cannot explain, but think (perhaps) "diff" was "diffing" an entire directory.
I sent the output to a text file ( > dump.txt ) At the end, the text file was greater than 2GiB.
I tried to view dump.txt with Geany. That brought uPupBB64 - the Bionic Pup to it's knees. I had to re-boot.

Thanks to all who answered. This is the first time I used the diff command. Trying is a good way to learn Linux.
Understanding terminal commands takes effort, but I think the reward is worth the effort.

собака


Re: Duplicated files not the same? Interpreting the diff command (solved - sort of)

Posted: Fri Oct 13, 2023 1:43 pm
by Burunduk
cobaka wrote: Thu Oct 12, 2023 7:43 am

I ran diff and got the result below:
# diff -q <(xxd ./file_S15X) <(xxd ./file_S15Y) <--< the command
Files /dev/fd/63 and /dev/fd/62 differ <--< the result

This is a very convoluted way to compare files.
Tested it with an about 500 MB iso file and two its duplicates on Fossapup64. It took about 5 minutes (!) to complete.
The result is weird:
first OK,
later - the files differ.
It turned out that this happened after suspending. Random file, random offset, and exactly 12 consecutive bytes changed.
It looks like a hardware problem but I don't get it. My system often runs for a week or so and is suspended many times between reboots. The puppy sfs files are copied to RAM. I should have noticed this problem before.
What I'm trying to say is: there is a small probability that those two files was indeed different.

MochiMoppel wrote: Fri Oct 13, 2023 3:07 am

Then I used bash process substitution to feed the ouput of xxd to diff (diff requires files as input)
diff <(xxd A) <(xxd B)
As a result my system froze. Had to do a hard reset :twisted:

The memory usage depends on the -q option.
In my tests, the CPU usage was about 50%, RAM usage:
diff -q <(xxd A) <(xxd B) - no noticeable change,
diff <(xxd A) <(xxd B) - increase steadily up to several gigabytes, released when done.


Re: Duplicated files not the same? Interpreting the diff command (solved - sort of)

Posted: Sat Oct 14, 2023 12:55 am
by MochiMoppel
Burunduk wrote: Fri Oct 13, 2023 1:43 pm

The memory usage depends on the -q option.
In my tests, the CPU usage was about 50%, RAM usage:
diff -q <(xxd A) <(xxd B) - no noticeable change,
diff <(xxd A) <(xxd B) - increase steadily up to several gigabytes, released when done.

OK, using the -q option, also used by Cobaka, might have prevented the crash, but it would have made the whole exercise even more pointless. I thought that the idea was to compare hexdumps in order to find out, where the ISO binaries differ. The -q option only tells us if they differ or not, and this could have been answered without creating these hexdump monsters.

A much more efficient way to find differing bytes in binaries would be to use the cmp command (or maybe even xdelta3). In contrast diff compares text files, loads everything into RAM and is the wrong tool for the job.