View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000603 | file | General | public | 2024-12-29 05:16 | 2024-12-31 22:54 |
Reporter | Anton Monroe | Assigned To | christos | ||
Priority | normal | Severity | minor | Reproducibility | always |
Status | assigned | Resolution | open | ||
Product Version | 5.46 | ||||
Summary | 0000603: regex: $ does not match CRLF line-ending | ||||
Description | In a regular expression, $ represents the end of a line. "regex <string>$" works for files with LF line-endings but not for CRLF line-endings. | ||||
Steps To Reproduce | Using the attached files, type file -m regex.magic lf.c crlf.c | ||||
Tags | No tags attached. | ||||
|
regex.magic (183 bytes)
0 regex \^#[[:space:]]*ifdef found #ifdef >0 regex \^#[[:space:]]*endif$ \b, found #endif$ # >0 default x \b, did not find #endif$ |
|
Perhaps use [\r\n] instead of $ if you want that? file(1) just uses regex(3).. |
|
Okay, it sounds like my complaint is with regex(3) then. I was misled by the fact that grep on OS/2 behaves sanely. Searching for \r\n would only work for a CRLF line ending. The portable way to do it seems to be to search for [[:space:]]*$. Most of the tests in Magdir/c-lang already use that; the test for "endif$" may have been an oversight. So in Magdir/c-lang you might want to change 0 search/8192 endif >0 regex \^#[[:space:]]*(if\|ifn)def >>&0 regex \^#[[:space:]]*endif$ C source text !:mime text/x-c to 0 search/8192 endif >0 regex \^#[[:space:]]*(if\|ifn)def >>&0 regex \^#[[:space:]]*endif[[:space:]]*$ C source text !:mime text/x-c There are some other files that look like they might benefit from the same fix, but I'm not qualified to offer suggestions about them. Start of rant: The documentation for 'grep' says "The caret '^' and the dollar sign '$' are special characters that respectively match the empty string at the beginning and end of a line." The "end of a line" is a concept that is common to all operating systems. A CRLF marks the end of a line just as much as an LF, and should be treated the same. I discovered today that GNU grep (version 3.6) on Linux does not do that. The GNU grep (version 3.8) that I use on OS/2 does-- a '$' matches the end of a line, whether the line is terminated by a LF, CRLF, or the end of the file. I don't know how it does it, but it is more logical and more useful this way. 'file' is a good example of why treating them alike is a good idea, because it must deal with files from multiple operating systems. If '$' only matches the LF character then why have the '^' and '$' meta-characters at all? And why document that '$' represents "end of a line" when it only represents the end of some lines? End of rant |
Date Modified | Username | Field | Change |
---|---|---|---|
2024-12-29 05:16 | Anton Monroe | New Issue | |
2024-12-29 05:16 | Anton Monroe | File Added: regex.magic | |
2024-12-29 05:16 | Anton Monroe | File Added: lf.c | |
2024-12-29 05:16 | Anton Monroe | File Added: crlf.c | |
2024-12-29 19:53 | christos | Assigned To | => christos |
2024-12-29 19:53 | christos | Status | new => assigned |
2024-12-29 19:54 | christos | Status | assigned => feedback |
2024-12-29 19:54 | christos | Note Added: 0004147 | |
2024-12-31 22:54 | Anton Monroe | Note Added: 0004148 | |
2024-12-31 22:54 | Anton Monroe | Status | feedback => assigned |