View Issue Details

IDProjectCategoryView StatusLast Update
0000590fileGeneralpublic2024-12-17 20:24
Reporterhiran Assigned To 
PrioritynormalSeverityminorReproducibilityalways
Status newResolutionopen 
Product Version5.45 
Summary0000590: Mime-type for .docx files not always detected correctly
DescriptionI have a couple of word documents (.docx). For some of them file detects this:

hiran@silver:~/test$ file -i somefile.docx
somefile.docx: application/octet-stream; charset=binary
hiran@silver:~/test$ file somefile.docx
somefile.docx: Microsoft OOXML
hiran@silver:~/test$

for some others it detects this:

hiran@silver:~/test$ file somefile2.docx
somefile2.docx: Microsoft Word 2007+
hiran@silver:~/test$ file -i somefile2.docx
somefile2.docx: application/vnd.openxmlformats-officedocument.wordprocessingml.document; charset=binary
hiran@silver:~/test$

I highly suspect it is about the age of the documents. Word 2007 are detected ok, but any application relying on the mime type will be confused for the OOXML typed documents. The mime type emitted is application/extet-stream. On the other hand the documents are not new for file, as it reliably identifies them as OOXML.

It looks like a mistake. Can it be corrected or is this as per specification?
Steps To ReproduceUse word documents created by different versions of MS Office, all stored in .docx format.
Run the command as mentioned in the description and check both the emitted file type and the mime type.
TagsNo tags attached.

Activities

hiran

2024-12-17 20:24

reporter   ~0004134

This issue was detected and recorded here: https://github.com/paperless-ngx/paperless-ngx/discussions/8489

Issue History

Date Modified Username Field Change
2024-12-17 20:15 hiran New Issue
2024-12-17 20:24 hiran Note Added: 0004134