View Issue Details

IDProjectCategoryView StatusLast Update
0000554fileGeneralpublic2024-11-10 16:59
ReporterSlush9 Assigned Tochristos  
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionfixed 
PlatformLinuxOSDebianOS Version12
Product Version5.44 
Fixed in VersionHEAD 
Summary0000554: .eml file identified as text/html (should be message/rfc822) when "Subject:" header is first
DescriptionHi,

I have a .eml file generated by Apple Mail which seems to fully conform to RFC 5322 but is being identified by file-5.44 as text/html:

$ file --mime-type Re_\ We\'ve\ made\ changes\ to\ your\ bill.eml
Re_ We've made changes to your bill.eml: text/html

$ file -v
file-5.44
magic file from /etc/magic:/usr/share/misc/magic

I've determined that the false identification is caused by the fact that the first header in the file is a "Subject:" header:

$ head -1 Re_\ We\'ve\ made\ changes\ to\ your\ bill.eml
Subject: Re: We've made changes to your bill

If I move the "Date:" header to the top and run the command again, I get the expected identification:

$ head -1 Re_\ We\'ve\ made\ changes\ to\ your\ bill.eml
Date: Mon, 18 Dec 2023 23:34:30 +1100

$ file --mime-type Re_\ We\'ve\ made\ changes\ to\ your\ bill.eml
Re_ We've made changes to your bill.eml: message/rfc822

False identifications also occur for these headers when first-billed:
Additional InformationOut of all the headers in this particular email, only "Date:" and "From:" cause `file` to correctly identify the file as message/rfc822, and only when one of those is first-billed. Any of the other headers in this email:

- Cc:
- Content-Type:
- In-Reply-To:
- Message-Id:
- Mime-Version:
- References:
- Subject:
- To:
- X-Apple-Base-Url:
- X-Apple-Mail-Remote-Attachments:
- X-Apple-Mail-Signature:
- X-Apple-Windows-Friendly:
- X-Uniform-Type-Identifier:
- X-Universally-Unique-Identifier:

if first-billed, cause the file to be misidentified as either text/html or text/plain.

I appreciate that it wouldn't be possible for the libmagic maintainers to predict every possible first header in a .eml file, but "To:", "Cc:", and "Subject:" are all highly prevalent and should perhaps be checked.
TagsNo tags attached.

Activities

christos

2024-11-10 16:59

manager   ~0004097

Added To, Cc, Subject

Issue History

Date Modified Username Field Change
2024-08-25 06:48 Slush9 New Issue
2024-11-10 16:59 christos Assigned To => christos
2024-11-10 16:59 christos Status new => assigned
2024-11-10 16:59 christos Status assigned => resolved
2024-11-10 16:59 christos Resolution open => fixed
2024-11-10 16:59 christos Fixed in Version => HEAD
2024-11-10 16:59 christos Note Added: 0004097