View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000447 | file | General | public | 2023-05-11 17:24 | 2023-05-21 16:10 |
Reporter | Albrecht | Assigned To | christos | ||
Priority | normal | Severity | minor | Reproducibility | always |
Status | resolved | Resolution | fixed | ||
Platform | x86_64 | OS | Debian | OS Version | Bookworm |
Product Version | 5.44 | ||||
Fixed in Version | 5.45 | ||||
Summary | 0000447: MIME type output: missing separator between matches from multiple magic files | ||||
Description | In order to detect some broken or exotic file formats, I use a custom magic file in addition to the standard one coming with the Debian package. E.g. consider the following simple rule for broken (typically Malware) RTF files (which Word does open, btw.): 0 string {\\rt Rich Text Format (invalid header) !:mime text/rtf On Debian Bullseye (file v. 5.39) this used to work perfectly for detecting the MIME type, e.g. with the simple files in the attached ZIP: file --mime-type -k -m ./magext.mgc:/usr/share/misc/magic Test.rtf Test.rtf: text/rtf\012- file --mime-type -k -m ./magext.mgc:/usr/share/misc/magic broken.rtf broken.rtf: text/rtf\012- On Debian Bookworm (file v. 5.44) the output is file --mime-type -k -m ./magext.mgc:/usr/share/misc/magic Test.rtf Test.rtf: text/rtftext/rtf file --mime-type -k -m ./magext.mgc:/usr/share/misc/magic broken.rtf broken.rtf: text/rtf which looks as if the usual separator (“\012- ”) between multiple MIME types coming from different magic files is missing. For any input producing multiple MIME types from the same magic file the output is separated correctly. | ||||
Steps To Reproduce | * unpack the attached ZIP file * cd file_issue * if necessary, edit the script variable MAGIC to point to the standard magic file (the value in the script is the Debian file location) * ./runtest.sh Note: the archive contains the results of running the script on Bullseye/5.39 and Bookworm/5.44, respectively. | ||||
Additional Information | For the RTF example above, it would be possible to fix the issue by adding a check like “not followed by the char f”. However, I noticed some more complex cases where e.g. the standard magic patterns classify the input as text/plain, whereas my rules actually detect a message/rfc822. Similar to the RTF example above, the output is “message/rfc822text/plain”, so this looks like a more general issue to me. | ||||
Tags | No tags attached. | ||||
Date Modified | Username | Field | Change |
---|---|---|---|
2023-05-11 17:24 | Albrecht | New Issue | |
2023-05-11 17:24 | Albrecht | File Added: file_issue.zip | |
2023-05-21 16:09 | christos | Assigned To | => christos |
2023-05-21 16:09 | christos | Status | new => assigned |
2023-05-21 16:10 | christos | Status | assigned => resolved |
2023-05-21 16:10 | christos | Resolution | open => fixed |
2023-05-21 16:10 | christos | Fixed in Version | => 5.45 |
2023-05-21 16:10 | christos | Note Added: 0003932 |