View Issue Details

IDProjectCategoryView StatusLast Update
0000445fileGeneralpublic2023-05-21 17:24
Reportermilahu Assigned Tochristos  
Status feedbackResolutionopen 
Product Version5.44 
Summary0000445: file/libmagic fails to detect cp1252 encoding
Descriptionactual result is the encoding "unknown-8bit"

$ printf "what\x92s up?\n" | file -i -
/dev/stdin: text/plain; charset=unknown-8bit

expected result is cp1252:

$ printf "what\x92s up?\n" | chardetect
<stdin>: Windows-1252 with confidence 0.73

$ printf "what\x92s up?\n" | iconv -f cp1252 -t utf8
what’s up?
TagsNo tags attached.



2023-05-21 17:24

manager   ~0003938

Fixing it would require either implementing more heuristics to improve charset detection, or using a 3rd party library that already does this well. Both of these are fairly large projects to implement.

Issue History

Date Modified Username Field Change
2023-04-28 16:00 milahu New Issue
2023-05-21 17:23 christos Assigned To => christos
2023-05-21 17:23 christos Status new => assigned
2023-05-21 17:24 christos Status assigned => feedback
2023-05-21 17:24 christos Note Added: 0003938