View Issue Details

IDProjectCategoryView StatusLast Update
0000418fileGeneralpublic2023-01-24 20:30
Reporterjoveler Assigned Tochristos  
PrioritynormalSeveritytweakReproducibilityhave not tried
Status resolvedResolutionfixed 
Product Version5.44 
Fixed in Version5.45 
Summary0000418: Patch for HWP file format signature
DescriptionThis patch revises HancomOffice HWP (Hangul Word Processor) document file format signatures.
HancomOffice HWP is a word processor (or semi-desktop publishing software) mainly used in the Republic of Korea.

*Changes*
1. Add support for the HWPX format
- Hancom is promoting that they are changing the most supported format to HWPX from HWP 5.0.
- HWPX (OWPML) is based on OCF specification (PKZIP container), so the signature goes into magDir/archive.

2. Update filetype of HWP 3.0/5.0 format
- HWP 3.0/5.0 filetype now starts with `Hancom HWP (Hangul Word Processor) file`.
- Current HWP 3.0/5.0 format filetype contains `Hangul (Korean)`, but it is highly ambiguous.
  In this context, Hangul is a trademarked name of the word processor, not Korean characters.
  Also, the HWP formats do not have a distinction between Korean/Global HWP (program) releases.
- I put the company name (Hancom) and program name (HWP), following the OOXML filetype convention (e.g. Microsoft Word 2007+).
  I also added the full name of the HWP program, 'Hangul Word Processor', to avoid ambiguity between the program name and extension.
- HWP 3.0 format is a proprietary binary format, so it had been in magDir/wordprocessors.
- HWP 5.0 format uses MS compound data format similar to MS Office 97 ~ 2003.
  The filetype string is hardcoded on src/readcdf.c, and also exists on magDir/ole2compounddocs. Both two files were patched.

*Before Patch*
```
/c/Joveler/Build/Joveler.FileMagician/Joveler.FileMagician.Tests/Samples/HWP2016.hwp: Hangul (Korean) Word Processor File 5.x
/c/Joveler/Build/Joveler.FileMagician/Joveler.FileMagician.Tests/Samples/HWP2016.hwpx: Zip data (MIME type "application/hwp+zip"?)
/c/Joveler/Build/Joveler.FileMagician/Joveler.FileMagician.Tests/Samples/HWP97.hwp: Hangul (Korean) Word Processor File 3.0
```

*After Patch*
```
/c/Joveler/Build/Joveler.FileMagician/Joveler.FileMagician.Tests/Samples/HWP2016.hwp: Hancom HWP (Hangul Word Processor) file, version 5.0
/c/Joveler/Build/Joveler.FileMagician/Joveler.FileMagician.Tests/Samples/HWP2016.hwpx: Hancom HWP (Hangul Word Processor) file, HWPX
/c/Joveler/Build/Joveler.FileMagician/Joveler.FileMagician.Tests/Samples/HWP97.hwp: Hancom HWP (Hangul Word Processor) file, version 3.0
```
Tagshwp hwpx magic

Activities

joveler

2023-01-20 17:58

reporter  

file-5.44-hwp.diff (3,151 bytes)   
diff --git a/file-5.44-org/magic/Magdir/archive b/file-5.44-mod/magic/Magdir/archive
index a706556..abc8740 100644
--- a/file-5.44-org/magic/Magdir/archive
+++ b/file-5.44-mod/magic/Magdir/archive
@@ -1669,6 +1669,16 @@
 >>50	string	epub+zip	EPUB document
 !:mime application/epub+zip
 
+# From: Hajin Jang <jb6804@naver.com>
+# hwpx (OWPML) document format follows OCF specification.
+# Hangul Word Processor 2010+ supports HWPX format.
+# URL: https://www.hancom.com/etc/hwpDownload.do
+#      https://standard.go.kr/KSCI/standardIntro/getStandardSearchView.do?menuId=503&topMenuId=502&ksNo=KSX6101
+#      https://e-ks.kr/streamdocs/view/sd;streamdocsId=72059197557727331
+>>50	string	hwp+zip     Hancom HWP (Hangul Word Processor) file, HWPX
+!:mime application/hwp+zip
+!:ext	hwpx
+
 # From:	Joerg Jenderek
 # URL:	http://en.wikipedia.org/wiki/CorelDRAW
 # NOTE:	version; til 2 WL-based; from 3 til 13 by ./riff; from 14 zip based
diff --git a/file-5.44-org/magic/Magdir/ole2compounddocs b/file-5.44-mod/magic/Magdir/ole2compounddocs
index dc08e9c..9a89ebe 100644
--- a/file-5.44-org/magic/Magdir/ole2compounddocs
+++ b/file-5.44-mod/magic/Magdir/ole2compounddocs
@@ -262,9 +262,11 @@
 !:ext	tpl
 #
 # URL:	https://en.wikipedia.org/wiki/Hangul_(word_processor)
+#       https://www.hancom.com/etc/hwpDownload.do
 # Note:	"HWP Document File" signature found in FileHeader
+# Hangul Word Processor WORDIAN, 2002 and later is using HWP 5.0 format.
 # Second directory entry name FileHeader hint for Thinkfree Office document
->>>>128 	lestring16	FileHeader		: Hangul (Korean) 5.0 Word Processor File
+>>>>128 	lestring16	FileHeader		: Hancom HWP (Hangul Word Processor) file, version 5.0
 #!:mime	application/haansofthwp
 !:mime	application/x-hwp
 # https://example-files.online-convert.com/document/hwp/example.hwp
diff --git a/file-5.44-org/magic/Magdir/wordprocessors b/file-5.44-mod/magic/Magdir/wordprocessors
index be71676..034c034 100644
--- a/file-5.44-org/magic/Magdir/wordprocessors
+++ b/file-5.44-mod/magic/Magdir/wordprocessors
@@ -381,8 +381,11 @@
 >10	byte	!0	\b, v%d.
 >11	byte	x	\b%d
 
-# Hangul (Korean) Word Processor File
-0	string	HWP\ Document\ File	Hangul (Korean) Word Processor File 3.0
+# Hancom HWP (Hangul Word Processor)
+# Hangul Word Processor 3.0 through 97 used HWP 3.0 format.
+# URL: https://www.hancom.com/etc/hwpDownload.do
+0	string	HWP\ Document\ File     Hancom HWP (Hangul Word Processor) file, version 3.0
+!:ext	hwp
 
 # CosmicBook, from Benoit Rouits
 0       string  CSBK    Ted Neslson's CosmicBook hypertext file
diff --git a/file-5.44-org/src/readcdf.c b/file-5.44-mod/src/readcdf.c
index 1e2593a..5a730af 100644
--- a/file-5.44-org/src/readcdf.c
+++ b/file-5.44-mod/src/readcdf.c
@@ -613,7 +613,7 @@ file_trycdf(struct magic_set *ms, const struct buffer *b)
 		    sizeof(HWP5_SIGNATURE) - 1) == 0) {
 		    if (NOTMIME(ms)) {
 			if (file_printf(ms,
-			    "Hangul (Korean) Word Processor File 5.x") == -1)
+			    "Hancom HWP (Hangul Word Processor) file, version 5.0") == -1)
 			    return -1;
 		    } else if (ms->flags & MAGIC_MIME_TYPE) {
 			if (file_printf(ms, "application/x-hwp") == -1)
file-5.44-hwp.diff (3,151 bytes)   

joveler

2023-01-24 15:59

reporter   ~0003890

Here are test sample files of the HWP format family.
HWP97.hwp (8,975 bytes)
HWP2016.hwp (9,216 bytes)
HWP2016.hwpx (14,377 bytes)

christos

2023-01-24 20:30

manager   ~0003891

Committed, thanks

Issue History

Date Modified Username Field Change
2023-01-20 17:58 joveler New Issue
2023-01-20 17:58 joveler Tag Attached: hwp hwpx magic
2023-01-20 17:58 joveler File Added: file-5.44-hwp.diff
2023-01-24 15:59 joveler Note Added: 0003890
2023-01-24 15:59 joveler File Added: HWP97.hwp
2023-01-24 15:59 joveler File Added: HWP2016.hwp
2023-01-24 15:59 joveler File Added: HWP2016.hwpx
2023-01-24 20:30 christos Assigned To => christos
2023-01-24 20:30 christos Status new => assigned
2023-01-24 20:30 christos Status assigned => resolved
2023-01-24 20:30 christos Resolution open => fixed
2023-01-24 20:30 christos Fixed in Version => 5.45
2023-01-24 20:30 christos Note Added: 0003891