View Issue Details

IDProjectCategoryView StatusLast Update
0000447fileGeneralpublic2023-05-21 16:10
ReporterAlbrecht Assigned Tochristos  
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionfixed 
Platformx86_64OSDebianOS VersionBookworm
Product Version5.44 
Fixed in Version5.45 
Summary0000447: MIME type output: missing separator between matches from multiple magic files
DescriptionIn order to detect some broken or exotic file formats, I use a custom magic file in addition to the standard one coming with the Debian package. E.g. consider the following simple rule for broken (typically Malware) RTF files (which Word does open, btw.):

0 string {\\rt Rich Text Format (invalid header)
!:mime text/rtf

On Debian Bullseye (file v. 5.39) this used to work perfectly for detecting the MIME type, e.g. with the simple files in the attached ZIP:

file --mime-type -k -m ./magext.mgc:/usr/share/misc/magic Test.rtf
Test.rtf: text/rtf\012-
file --mime-type -k -m ./magext.mgc:/usr/share/misc/magic broken.rtf
broken.rtf: text/rtf\012-

On Debian Bookworm (file v. 5.44) the output is

file --mime-type -k -m ./magext.mgc:/usr/share/misc/magic Test.rtf
Test.rtf: text/rtftext/rtf
file --mime-type -k -m ./magext.mgc:/usr/share/misc/magic broken.rtf
broken.rtf: text/rtf

which looks as if the usual separator (“\012- ”) between multiple MIME types coming from different magic files is missing. For any input producing multiple MIME types from the same magic file the output is separated correctly.
Steps To Reproduce* unpack the attached ZIP file
* cd file_issue
* if necessary, edit the script variable MAGIC to point to the standard magic file (the value in the script is the Debian file location)
* ./runtest.sh

Note: the archive contains the results of running the script on Bullseye/5.39 and Bookworm/5.44, respectively.
Additional InformationFor the RTF example above, it would be possible to fix the issue by adding a check like “not followed by the char f”. However, I noticed some more complex cases where e.g. the standard magic patterns classify the input as text/plain, whereas my rules actually detect a message/rfc822. Similar to the RTF example above, the output is “message/rfc822text/plain”, so this looks like a more general issue to me.
TagsNo tags attached.

Activities

Albrecht

2023-05-11 17:24

reporter  

file_issue.zip (2,590 bytes)

christos

2023-05-21 16:10

manager   ~0003932

Fixed, thanks!

Issue History

Date Modified Username Field Change
2023-05-11 17:24 Albrecht New Issue
2023-05-11 17:24 Albrecht File Added: file_issue.zip
2023-05-21 16:09 christos Assigned To => christos
2023-05-21 16:09 christos Status new => assigned
2023-05-21 16:10 christos Status assigned => resolved
2023-05-21 16:10 christos Resolution open => fixed
2023-05-21 16:10 christos Fixed in Version => 5.45
2023-05-21 16:10 christos Note Added: 0003932