View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000590 | file | General | public | 2024-12-17 20:15 | 2024-12-17 20:24 |
Reporter | hiran | Assigned To | |||
Priority | normal | Severity | minor | Reproducibility | always |
Status | new | Resolution | open | ||
Product Version | 5.45 | ||||
Summary | 0000590: Mime-type for .docx files not always detected correctly | ||||
Description | I have a couple of word documents (.docx). For some of them file detects this: hiran@silver:~/test$ file -i somefile.docx somefile.docx: application/octet-stream; charset=binary hiran@silver:~/test$ file somefile.docx somefile.docx: Microsoft OOXML hiran@silver:~/test$ for some others it detects this: hiran@silver:~/test$ file somefile2.docx somefile2.docx: Microsoft Word 2007+ hiran@silver:~/test$ file -i somefile2.docx somefile2.docx: application/vnd.openxmlformats-officedocument.wordprocessingml.document; charset=binary hiran@silver:~/test$ I highly suspect it is about the age of the documents. Word 2007 are detected ok, but any application relying on the mime type will be confused for the OOXML typed documents. The mime type emitted is application/extet-stream. On the other hand the documents are not new for file, as it reliably identifies them as OOXML. It looks like a mistake. Can it be corrected or is this as per specification? | ||||
Steps To Reproduce | Use word documents created by different versions of MS Office, all stored in .docx format. Run the command as mentioned in the description and check both the emitted file type and the mime type. | ||||
Tags | No tags attached. | ||||
|
This issue was detected and recorded here: https://github.com/paperless-ngx/paperless-ngx/discussions/8489 |