View Issue Details

IDProjectCategoryView StatusLast Update
0000362fileGeneralpublic2022-07-04 19:45
Reporterro-ee Assigned Tochristos  
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
Product Version5.42 
Fixed in VersionHEAD 
Summary0000362: File name in output shortened with -raw option
DescriptionWith recent PR/351: CathyKMeow: octalify unprintable characters in filenames unless raw, a regression has been introduced that has ramifications for Midnight Commander (mc). MC doesn’t use the -raw option, and does not expect the octal version of the file name in the output. This is not the issue, though.

With the -raw option, the output is shortened. It looks like the file name in the output is shortened to /n/ Bytes where /n/ is the number of Characters, not Bytes.

Example:
Testö.jpg = 9 characters, but 10 Bytes (because ö requires 2 bytes).
File then outputs testö.jp , which is a string of 9 byte length
Steps To ReproduceHave a file with one or more characters ≥ U+0080, e.g. ä ö ü Χ Л هل

file -r <filename> will output the filename shorted

file äöü -r
ä�: empty

Additional InformationI noticed different behavior when using shell globbing.

file -r * in a directory with äöü will output äöü correctly, but will shorten Χαίρετε to Χαίρετ , Здравствуйте to Здравс
TagsNo tags attached.

Activities

ro-ee

2022-06-25 14:43

reporter  

image.png (3,142 bytes)   
image.png (3,142 bytes)   

ro-ee

2022-06-25 18:03

reporter   ~0003766

Thinking about it...
"PR/351: CathyKMeow: octalify unprintable characters in filenames unless raw"

why are characters ≥ U+0080 even considererd unprintable?
The change was originally introduced because of some issues with control characters < U+0020 (especially \n), see Bug 351.

dimich

2022-06-28 03:05

reporter   ~0003767

Bug 351 is closed and i can't comment there, so commen here.
1) `ls` and `find` replace non-printable characters only if stdout is a tty. There are no character replacement for piped output:
```
$ mkdir a$'\n'b
$ ls | cat
a
b
$ find . | cat
.
./a
b
```
2) Characters above 0x80 aren't non-printable.

ro-ee

2022-06-28 08:53

reporter   ~0003768

Hm, the changes from bug 351 lead to Midnight commander not properly detecting images with umlauts etc. in the file name. See https://www.midnight-commander.org/ticket/4377
I find it strange that midnight commander would not use piped output, so in theory the change should not even have any effect.

without the -r option, all characters above 0x80 get octalified.

alex@horus:~> file testö.jpg
test\303\266.jp: JPEG image data, JFIF standard 1.01, resolution (DPCM), density 118x118, segment length 16, progressive, precision 8, 1706x1132, components 3

dimich

2022-06-28 09:15

reporter   ~0003769

> I find it strange that midnight commander would not use piped output, so in theory the change should not even have any effect.
Yep, two overlapped issues together lead to the bug. First, `file` utility corrupts filenames. Second, `mc` relays on filename from file's output.
First one can be fixed by checking isatty(STDOUT_FILENO) as other tools do. Second one can be fixed by removing filename from output with --brief option (or even using libmagic directly).
But i can't understand why after all non-ascii characters are considered as non-printable.

dimich

2022-06-28 09:44

reporter   ~0003770

Sorry ro-ee, maybe i misunderstood your previous comment. I know about mc bug and commented there also. I was going to create a ticket for `file` here but you made it first.
Fix for "bug 351" is implemented incorrectly. I'd take CathyKMeow's attention but can't comment or reopen ticket 351. This issue affects not only mc but any other software which invokes `file` and reads filename, also it confuses users.

christos

2022-07-04 19:45

manager   ~0003779

Try it now.

Issue History

Date Modified Username Field Change
2022-06-25 14:43 ro-ee New Issue
2022-06-25 14:43 ro-ee File Added: image.png
2022-06-25 18:03 ro-ee Note Added: 0003766
2022-06-28 03:05 dimich Note Added: 0003767
2022-06-28 08:53 ro-ee Note Added: 0003768
2022-06-28 09:15 dimich Note Added: 0003769
2022-06-28 09:44 dimich Note Added: 0003770
2022-07-04 19:45 christos Assigned To => christos
2022-07-04 19:45 christos Status new => resolved
2022-07-04 19:45 christos Resolution open => fixed
2022-07-04 19:45 christos Fixed in Version => HEAD
2022-07-04 19:45 christos Note Added: 0003779