View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0000362||file||General||public||2022-06-25 14:43||2022-07-04 19:45|
|Fixed in Version||HEAD|
|Summary||0000362: File name in output shortened with -raw option|
|Description||With recent PR/351: CathyKMeow: octalify unprintable characters in filenames unless raw, a regression has been introduced that has ramifications for Midnight Commander (mc). MC doesn’t use the -raw option, and does not expect the octal version of the file name in the output. This is not the issue, though.|
With the -raw option, the output is shortened. It looks like the file name in the output is shortened to /n/ Bytes where /n/ is the number of Characters, not Bytes.
Testö.jpg = 9 characters, but 10 Bytes (because ö requires 2 bytes).
File then outputs testö.jp , which is a string of 9 byte length
|Steps To Reproduce||Have a file with one or more characters ≥ U+0080, e.g. ä ö ü Χ Л هل |
file -r <filename> will output the filename shorted
file äöü -r
|Additional Information||I noticed different behavior when using shell globbing.|
file -r * in a directory with äöü will output äöü correctly, but will shorten Χαίρετε to Χαίρετ , Здравствуйте to Здравс
|Tags||No tags attached.|
Thinking about it...
"PR/351: CathyKMeow: octalify unprintable characters in filenames unless raw"
why are characters ≥ U+0080 even considererd unprintable?
The change was originally introduced because of some issues with control characters < U+0020 (especially \n), see Bug 351.
Bug 351 is closed and i can't comment there, so commen here.
1) `ls` and `find` replace non-printable characters only if stdout is a tty. There are no character replacement for piped output:
$ mkdir a$'\n'b
$ ls | cat
$ find . | cat
2) Characters above 0x80 aren't non-printable.
Hm, the changes from bug 351 lead to Midnight commander not properly detecting images with umlauts etc. in the file name. See https://www.midnight-commander.org/ticket/4377
I find it strange that midnight commander would not use piped output, so in theory the change should not even have any effect.
without the -r option, all characters above 0x80 get octalified.
alex@horus:~> file testö.jpg
test\303\266.jp: JPEG image data, JFIF standard 1.01, resolution (DPCM), density 118x118, segment length 16, progressive, precision 8, 1706x1132, components 3
> I find it strange that midnight commander would not use piped output, so in theory the change should not even have any effect.
Yep, two overlapped issues together lead to the bug. First, `file` utility corrupts filenames. Second, `mc` relays on filename from file's output.
First one can be fixed by checking isatty(STDOUT_FILENO) as other tools do. Second one can be fixed by removing filename from output with --brief option (or even using libmagic directly).
But i can't understand why after all non-ascii characters are considered as non-printable.
Sorry ro-ee, maybe i misunderstood your previous comment. I know about mc bug and commented there also. I was going to create a ticket for `file` here but you made it first.
Fix for "bug 351" is implemented incorrectly. I'd take CathyKMeow's attention but can't comment or reopen ticket 351. This issue affects not only mc but any other software which invokes `file` and reads filename, also it confuses users.
||Try it now.|
|2022-06-25 14:43||ro-ee||New Issue|
|2022-06-25 14:43||ro-ee||File Added: image.png|
|2022-06-25 18:03||ro-ee||Note Added: 0003766|
|2022-06-28 03:05||dimich||Note Added: 0003767|
|2022-06-28 08:53||ro-ee||Note Added: 0003768|
|2022-06-28 09:15||dimich||Note Added: 0003769|
|2022-06-28 09:44||dimich||Note Added: 0003770|
|2022-07-04 19:45||christos||Assigned To||=> christos|
|2022-07-04 19:45||christos||Status||new => resolved|
|2022-07-04 19:45||christos||Resolution||open => fixed|
|2022-07-04 19:45||christos||Fixed in Version||=> HEAD|
|2022-07-04 19:45||christos||Note Added: 0003779|