View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000427 | file | General | public | 2023-02-18 00:37 | 2023-05-09 18:08 |
Reporter | a1rind | Assigned To | christos | ||
Priority | high | Severity | major | Reproducibility | always |
Status | resolved | Resolution | reopened | ||
Product Version | 5.44 | ||||
Fixed in Version | 5.45 | ||||
Summary | 0000427: docx file is determined as zip | ||||
Description | Hi! There is an OOXML format docx file that is being determined as application/zip. Unfortunately I can not share the document yet but I have some debug info that hopefully can help. The `zipinfo` list following directories/files: ``` Zip file size: 36239 bytes, number of entries: 36 -rw---- 4.5 fat 399 b- stor 80-Jan-01 00:00 [trash]/0000.dat -rw---- 4.5 fat 739 b- defN 80-Jan-01 00:00 _rels/.rels -rw---- 4.5 fat 41347 b- defN 80-Jan-01 00:00 word/document.xml -rw---- 4.5 fat 1116 b- defN 80-Jan-01 00:00 docProps/app.xml -rw---- 4.5 fat 381 b- stor 80-Jan-01 00:00 [trash]/0002.dat -rw---- 4.5 fat 269 b- stor 80-Jan-01 00:00 [trash]/0003.dat -rw---- 4.5 fat 450 b- stor 80-Jan-01 00:00 [trash]/0001.dat -rw---- 4.5 fat 288 b- defN 80-Jan-01 00:00 word/_rels/header1.xml.rels -rw---- 4.5 fat 288 b- defN 80-Jan-01 00:00 word/_rels/header3.xml.rels -rw---- 4.5 fat 3225 b- defN 80-Jan-01 00:00 word/fontTable.xml -rw---- 4.5 fat 2864 b- defN 80-Jan-01 00:00 word/footer1.xml -rw---- 4.5 fat 3380 b- defN 80-Jan-01 00:00 word/header1.xml -rw---- 4.5 fat 9807 b- defN 80-Jan-01 00:00 word/header2.xml -rw---- 4.5 fat 3380 b- defN 80-Jan-01 00:00 word/header3.xml -rw---- 4.5 fat 680 b- defN 80-Jan-01 00:00 word/media/image1.wmf -rw---- 4.5 fat 38367 b- defN 80-Jan-01 00:00 word/numbering.xml -rw---- 4.5 fat 9410 b- defN 80-Jan-01 00:00 word/settings.xml -rw---- 4.5 fat 31843 b- defN 80-Jan-01 00:00 word/styles.xml -rw---- 4.5 fat 6992 b- defN 80-Jan-01 00:00 word/theme/theme1.xml -rw---- 4.5 fat 483 b- defN 80-Jan-01 00:00 word/webSettings.xml -rw---- 4.5 fat 1768 b- stor 80-Jan-01 00:00 [trash]/0005.dat -rw---- 4.5 fat 296 b- defS 80-Jan-01 00:00 customXml/_rels/item1.xml.rels -rw---- 4.5 fat 201 b- defS 80-Jan-01 00:00 customXml/itemProps2.xml -rw---- 4.5 fat 219 b- defS 80-Jan-01 00:00 customXml/item2.xml -rw---- 4.5 fat 201 b- defS 80-Jan-01 00:00 customXml/itemProps1.xml -rw---- 4.5 fat 296 b- defS 80-Jan-01 00:00 customXml/_rels/item2.xml.rels -rw---- 4.5 fat 443 b- stor 80-Jan-01 00:00 [trash]/0004.dat -rw---- 4.5 fat 2383 b- defN 80-Jan-01 00:00 word/_rels/document.xml.rels -rw---- 4.5 fat 236 b- stor 80-Jan-01 00:00 [trash]/0006.dat -rw---- 4.5 fat 201 b- defS 80-Jan-01 00:00 customXml/itemProps3.xml -rw---- 4.5 fat 296 b- defS 80-Jan-01 00:00 customXml/_rels/item3.xml.rels -rw---- 4.5 fat 775 b- defN 80-Jan-01 00:00 docProps/core.xml -rw---- 4.5 fat 563 b- defN 80-Jan-01 00:00 docProps/custom.xml -rw---- 4.5 fat 2530 b- defN 80-Jan-01 00:00 [Content_Types].xml -rw---- 4.5 fat 11932 b- defS 80-Jan-01 00:00 customXml/item1.xml -rw---- 4.5 fat 587 b- defS 80-Jan-01 00:00 customXml/item3.xml 36 files, 178635 bytes uncompressed, 31036 bytes compressed: 82.6% ``` The first file listed is coming from a [trash] directory e.g. [trash]/0000.dat and the regex at line 36 here (https://github.com/file/file/blob/master/magic/Magdir/msooxml#L36) isn't expecting such file. Furthermore according to OOXML specification there can exists a trash directory: > Trash items represent parts that have been discarded or are no longer in use. Trash items shall not conform to OPC part naming guidelines as defined in ECMA-376-2 and shall not be associated with a content type. All trash items shall follow the naming scheme: [trash]/HHHH.dat where H represents a hexadecimal digit. As I see and understood the msooxml magic rules expects a certain order for files in order to identify correct content type based on magic bytes at certain memory locations. The presence of trash items is causing it to fail. Any tips and tricks to skip over trash items? Thanks! | ||||
Tags | bug, magic | ||||
|
Hi! Any thoughts on this? |
|
Does this diff fix it? --- msooxml 16 Aug 2022 11:16:39 -0000 1.18 +++ msooxml 5 Mar 2023 19:51:25 -0000 @@ -33,7 +33,7 @@ # make sure the first file is correct >0x1E use msooxml >0x1E default x ->>0x1E regex \\[Content_Types\\]\\.xml|_rels/\\.rels|docProps|customXml +>>0x1E regex \\[trash\\]|\\[Content_Types\\]\\.xml|_rels/\\.rels|docProps|customXml # skip to the second local file header # since some documents include a 520-byte extra field following the file # header, we need to scan for the next header |
|
Hi! Thanks for the response. The suggested change doesn't fix the problem. I think we need to skip trash files and have the logic after the regex works by reading bytes from the expected file header. As you notice those trash files are not in ordered, they could be anywhere not just at start or at bottom. Unfortunately I can not share the document yet but soon I will for the ease of debugging. Kind Regards! |
|
Hi! I've attached the problematic document. Had to remove some confidential information and manually zip it according to the order of the same files as before. Thanks! |
|
Fixed, thanks! |
|
Hi! Thanks a lot for looking into this. However the latest changes doesn't fix the issue I think. When I try the latest magic rules it still recognizes it as application/zip: ``` file -m msooxml unsupported-prepared.docx ``` Produces: ``` Zip archive data, at least v2.0 to extract, compression method=store ``` Also when I try to compile the rules with the latest changes I get the following error: ``` /usr/share/file/magic/mail.news, 84: Warning: Unparsable number `xu \b, dcrypt version %d' ``` |
|
Hi! Any thoughts on the issue? or am I doing something wrong? Kind Regards! |
|
why is it picking up files from /usr/share/file/magic? Is there some environment setting? Also line 84 in the most recent version of file, does not match that string... |
|
Sorry for getting back late on this. Turned out the newer changes works only with the lates version. Tested with file-5.44 and works fine. But can not work with file-5.41, unable to test file-5.42 and file-5.43. |
|
Submitter verified it is fixed on the latest version. |
Date Modified | Username | Field | Change |
---|---|---|---|
2023-02-18 00:37 | a1rind | New Issue | |
2023-02-18 00:37 | a1rind | Tag Attached: bug | |
2023-02-18 00:37 | a1rind | Tag Attached: magic | |
2023-02-28 16:46 | a1rind | Note Added: 0003898 | |
2023-03-05 19:51 | christos | Assigned To | => christos |
2023-03-05 19:51 | christos | Status | new => assigned |
2023-03-05 19:52 | christos | Status | assigned => feedback |
2023-03-05 19:52 | christos | Note Added: 0003903 | |
2023-03-09 10:43 | a1rind | Note Added: 0003909 | |
2023-03-09 10:43 | a1rind | Status | feedback => assigned |
2023-03-13 11:33 | a1rind | Note Added: 0003915 | |
2023-03-13 11:33 | a1rind | File Added: unsupported-prepared.docx | |
2023-03-14 19:46 | christos | Status | assigned => resolved |
2023-03-14 19:46 | christos | Resolution | open => fixed |
2023-03-14 19:46 | christos | Fixed in Version | => 5.45 |
2023-03-14 19:46 | christos | Note Added: 0003916 | |
2023-03-15 12:49 | a1rind | Status | resolved => feedback |
2023-03-15 12:49 | a1rind | Resolution | fixed => reopened |
2023-03-15 12:49 | a1rind | Note Added: 0003918 | |
2023-03-21 12:16 | a1rind | Note Added: 0003919 | |
2023-03-21 12:16 | a1rind | Status | feedback => assigned |
2023-03-21 14:03 | christos | Note Added: 0003920 | |
2023-04-04 10:22 | a1rind | Note Added: 0003921 | |
2023-05-09 18:08 | christos | Status | assigned => resolved |
2023-05-09 18:08 | christos | Note Added: 0003924 |