View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000265 | file | General | public | 2021-05-14 15:31 | 2021-10-28 15:34 |
Reporter | jidanni | Assigned To | christos | ||
Priority | normal | Severity | minor | Reproducibility | always |
Status | closed | Resolution | fixed | ||
Summary | 0000265: big5 interpreted as ISO-8859 | ||||
Description | $ date 西元2021年05月14日 (週五) 23時29分33秒 CST $ date|file - /dev/stdin: UTF-8 Unicode text $ date|iconv -t big5|file - /dev/stdin: ISO-8859 text | ||||
Tags | No tags attached. | ||||
|
Detecting non-unicode character sets is at best a heuristic. There are libraries dedicated to this like: https://github.com/google/compact_enc_det/tree/master/compact_enc_det. file(1) takes the simplistic approach of assuming 8859-1 for non-ascii files that include only those code-points, since it was the most popular in the past. |
|
Can't/won't fix. |
Date Modified | Username | Field | Change |
---|---|---|---|
2021-05-14 15:31 | jidanni | New Issue | |
2021-07-01 08:59 | christos | Assigned To | => christos |
2021-07-01 08:59 | christos | Status | new => assigned |
2021-07-01 09:01 | christos | Status | assigned => feedback |
2021-07-01 09:01 | christos | Note Added: 0003624 | |
2021-10-28 15:34 | christos | Status | feedback => closed |
2021-10-28 15:34 | christos | Resolution | open => fixed |
2021-10-28 15:34 | christos | Note Added: 0003655 |