View Issue Details

IDProjectCategoryView StatusLast Update
0000186file[All Projects] Generalpublic2021-10-12 18:24
ReporterjovelerAssigned Tochristos 
PrioritynormalSeverityminorReproducibilityalways
Status confirmedResolutionfixed 
Product Version5.39 
Target VersionFixed in Version5.40 
Summary0000186: Korean text file misidentified as 'COM executable for DOS'
Description[Summary]
Some Korean text file encoded as EUC-KR (aka CP949 on Windows) is misidentified as 'COM executable for DOS'.
Part of the COM signatures should be disabled to fix it.

[Technical Detail]
EUC-KR encodes 4% of Korean characters as 'B8xx' ('륫/B8A0' ~ '뫼/B8FE').
In libmagic, the simplest COM signature only checks for 0xB8 at offset 0.
As a result, libmagic causes false positive on EUC-KR text which starts with some Korean characters.

Windows notepad (prior to Windows 10 v19H1) used ANSI encoding as default.
It means almost every text file produced in Korean Windows is encoded as EUC-KR.
Therefore it is a critical issue on Korean text files, as much Korean text files are misidentified as executable.

[Fix]
To reduce the negative impact, I propose to disable the simplest COM file signature.
I have attached the diff file.

Steps To ReproduceRun file command with attached euckr_falsepositive.txt.

$ file euckr_falsepositive.txt
euckr_falsepositive.txt: COM executable for DOS

$ file euckr_falsepositive.txt --mime-type
euckr_falsepositive.txt: application/x-dosexec
TagsNo tags attached.

Activities

joveler

2020-08-24 02:14

reporter  

0001-Disable-simplest-COM-signature-to-avoid-FP.patch (1,869 bytes)
From 31245d71d9d279b649f5a13c2aee60525266d8f6 Mon Sep 17 00:00:00 2001
From: Hajin Jang <hajin_jang@worksmobile.com>
Date: Mon, 24 Aug 2020 11:02:34 +0900
Subject: [PATCH] Disable simplest COM signature to avoid FP

The simplest COM signature causes false-positive on EUC-KR text files.
Disable it to avoid misidentification.
---
 magic/Magdir/msdos | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/magic/Magdir/msdos b/magic/Magdir/msdos
index 8bf85892..0b7993ff 100644
--- a/magic/Magdir/msdos
+++ b/magic/Magdir/msdos
@@ -565,17 +565,19 @@
 # syslinux version (4.x)
 # "COM executable (COM32R)" or "Syslinux COM32 module" by TrID
 >>>1	lelong		0x21CD4CFe	\b, relocatable)
-# remaining are DOS COM executables starting with assembler instruction MOV
-# like FreeDOS BANNER*.COM FINDDISK.COM GIF2RAW.COM WINCHK.COM
-# MS-DOS SYS.COM RESTART.COM
-# SYSLINUX.COM (version 1.40 - 2.13)
-# GFXBOOT.COM (version 3.75)
-# COPYBS.COM POWEROFF.COM INT18.COM
->>1	default	x			COM executable for DOS
-!:mime	application/x-dosexec
-#!:mime	application/x-ms-dos-executable
-#!:mime	application/x-msdos-program
-!:ext com
+# Hajin Jang <hajin_jang@worksmobile.com>:
+# Disable simplest COM signature to prevent false positive on some EUC-KR text files.
+## remaining are DOS COM executables starting with assembler instruction MOV
+## like FreeDOS BANNER*.COM FINDDISK.COM GIF2RAW.COM WINCHK.COM
+## MS-DOS SYS.COM RESTART.COM
+## SYSLINUX.COM (version 1.40 - 2.13)
+## GFXBOOT.COM (version 3.75)
+## COPYBS.COM POWEROFF.COM INT18.COM
+#>>1	default	x			COM executable for DOS
+#!:mime	application/x-dosexec
+##!:mime	application/x-ms-dos-executable
+##!:mime	application/x-msdos-program
+#!:ext com
 
 # URL:		https://en.wikipedia.org/wiki/UPX
 # Reference:	https://github.com/upx/upx/archive/v3.96.zip/upx-3.96/
-- 
2.28.0.windows.1

christos

2020-09-06 15:14

manager   ~0003482

Patched, thanks!

christos

2021-10-12 18:24

manager   ~0003648

Will revert for now and revisit. Breaks too many com executables. Perhaps we can limit it on what follows b8?

Issue History

Date Modified Username Field Change
2020-08-24 02:14 joveler New Issue
2020-08-24 02:14 joveler File Added: euckr_falsepositive.txt
2020-08-24 02:14 joveler File Added: 0001-Disable-simplest-COM-signature-to-avoid-FP.patch
2020-09-06 15:14 christos Assigned To => christos
2020-09-06 15:14 christos Status new => assigned
2020-09-06 15:14 christos Status assigned => resolved
2020-09-06 15:14 christos Resolution open => fixed
2020-09-06 15:14 christos Fixed in Version => 5.40
2020-09-06 15:14 christos Note Added: 0003482
2021-10-12 18:24 christos Status resolved => confirmed
2021-10-12 18:24 christos Note Added: 0003648