View Issue Details

IDProjectCategoryView StatusLast Update
0000265file[All Projects] Generalpublic2021-07-01 09:01
ReporterjidanniAssigned Tochristos 
PrioritynormalSeverityminorReproducibilityalways
Status feedbackResolutionopen 
Product Version 
Target VersionFixed in Version 
Summary0000265: big5 interpreted as ISO-8859
Description$ date
西元2021年05月14日 (週五) 23時29分33秒 CST
$ date|file -
/dev/stdin: UTF-8 Unicode text
$ date|iconv -t big5|file -
/dev/stdin: ISO-8859 text
TagsNo tags attached.

Activities

christos

2021-07-01 09:01

manager   ~0003624

Detecting non-unicode character sets is at best a heuristic. There are libraries dedicated to this like: https://github.com/google/compact_enc_det/tree/master/compact_enc_det. file(1) takes the simplistic approach of assuming 8859-1 for non-ascii files that include only those code-points, since it was the most popular in the past.

Issue History

Date Modified Username Field Change
2021-05-14 15:31 jidanni New Issue
2021-07-01 08:59 christos Assigned To => christos
2021-07-01 08:59 christos Status new => assigned
2021-07-01 09:01 christos Status assigned => feedback
2021-07-01 09:01 christos Note Added: 0003624