View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000142 | tcsh | General | public | 2020-02-14 14:32 | 2020-02-19 14:06 |
Reporter | bitstreamout | Assigned To | christos | ||
Priority | normal | Severity | major | Reproducibility | sometimes |
Status | assigned | Resolution | open | ||
Platform | x86_64 | OS | linux | ||
Product Version | 6.22.02 | ||||
Summary | 0000142: Dot locking sometimes does block after reboot | ||||
Description | During reboot it happens sometimes that one of my open tcsh sesssions does not get enough time to write out history and got killed by systemd before the dot lock file had been removed. It would be a a solution to combine the dot locking method with fcntl(F_SETLKW/F_UNLCK) as this does work over NFS at least for Linux. This also would allow to avoid polling due waiting on F_SETLKW ... | ||||
Steps To Reproduce | Open several tcsh sessions, do some work therein, and reboot | ||||
Tags | No tags attached. | ||||
|
Perhaps it is better to give up with a message to the user instead of getting stuck... I don't want to add more complexity to the locking. The point of dotlock is to not use the fcntl() based in the first place which is even less reliable over NFS... |
|
Hmmm ... on Linux based systems itz is possible to determine the process which holds the file descriptor. But this requires to parse /proc/ ... on the other hand most modern Linux systems have a tmpfs based /dev/shm worls writable and sticky directory which not only tmpfs but also a local file system. Besdide this : I had used some years a patch set from redhat with fcntl() based locking which never had shown problems over NFS ... only the porting to the next tcsh version had become complicated and fragile with increasing version number. |
|
Yes, but remember tcsh runs on other OS's including windows... And this is a corner case that does not usually happen. Doesn't systemd have a way to increase the grace time it waits before killing the processes? |
|
tcsh-6.22.02-local-dotlock.dif (1,861 bytes)
Avoid left over dot lock file after reboot --- dotlock.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) --- dotlock.c +++ dotlock.c 2020-02-17 11:16:22.785018224 +0000 @@ -30,8 +30,38 @@ #define O_SYNC 0 #endif +#if defined(__linux__) +# include <sys/statfs.h> +# include <unistd.h> +# ifndef TMPFS_MAGIC +# define TMPFS_MAGIC 0x01021994 +# endif +#endif + #include "dotlock.h" +#if defined(__linux__) +static char *sys_tmpdir; +static int +dosys_tmpdir () +{ + static char *shm = "/dev/shm"; + struct statfs fs; + static int doshm; + + if (doshm) + return (sys_tmpdir != NULL); + + doshm++; + + if (statfs(shm, &fs) < 0 || fs.f_type != TMPFS_MAGIC || eaccess(shm, W_OK|X_OK)) + return 0; + + sys_tmpdir = shm; + return 1; +} +#endif + static int create_exclusive(const char *); /* * Create a unique file. O_EXCL does not really work over NFS so we follow @@ -140,7 +170,17 @@ dot_lock(const char *fname, int pollinte (void)sigaddset(&nset, SIGTSTP); (void)sigaddset(&nset, SIGCHLD); +#if defined(__linux__) + const char *ptr; + if ((ptr = strrchr(fname, '/')) && dosys_tmpdir()) { + ptr++; + fname = ptr; + (void)snprintf(path, sizeof(path), "%s/%s.lock", sys_tmpdir, fname); + } else + (void)snprintf(path, sizeof(path), "%s.lock", fname); +#else (void)snprintf(path, sizeof(path), "%s.lock", fname); +#endif retval = -1; for (;;) { @@ -174,6 +214,16 @@ dot_unlock(const char *fname) { char path[MAXPATHLEN]; +#if defined(__linux__) + const char *ptr; + if ((ptr = strrchr(fname, '/')) && dosys_tmpdir()) { + ptr++; + fname = ptr; + (void)snprintf(path, sizeof(path), "%s/%s.lock", sys_tmpdir, fname); + } else + (void)snprintf(path, sizeof(path), "%s.lock", fname); +#else (void)snprintf(path, sizeof(path), "%s.lock", fname); +#endif (void)unlink(path); } |
|
Not really as the most tcsh process do poll by sleeping within usleep() ... using F_SETLKW would avoid this but requires interrupt handling. Currently I use a workaround, which is use /dev/shm as this file system is always fresh because it is tmpfs based |
|
What happens when the home directory is mounted over NFS and shared between different hosts? How will the other systems see the tmpfs lock? Let's try to treat the cause (systemd killing tcsh too quickly) instead of the symptoms (tcsh leaving a stray lock file around because it got kill -9'ed). |
|
Good point, personal I use different histories for different hosts. The fcntl() method had worked here over years without any problems, that means I had never seen any problems nor had got any bug reports. Maybe this is also Linux specific ... only rebooted (linux based) NFS servers may cause that a lock gets lost. For this a local tcsh has to write out its history at the same time as the NFS server is rebooted |
|
in create_exclusive(): How about open the fname after link(path, fname) and then unlink(fname)? As long as the process hold the file descriptor the dot lock file i That means that if the process gets kill-9'ed the file will disapear with the last open file descriptor tcsh-6.22.02-local-dotlock-2.dif (2,594 bytes)
Avoid left over dot lock file after reboot --- dotlock.c | 20 +++++++++++--------- dotlock.h | 2 +- sh.hist.c | 11 +++++++---- 3 files changed, 19 insertions(+), 14 deletions(-) --- dotlock.c +++ dotlock.c 2020-02-19 10:00:31.999490600 +0000 @@ -76,7 +76,7 @@ create_exclusive(const char *fname) * We try to create the unique filename. */ for (ntries = 0; ntries < 5; ntries++) { - fd = open(path, O_WRONLY|O_CREAT|O_TRUNC|O_EXCL|O_SYNC, 0); + fd = open(path, O_WRONLY|O_CREAT|O_TRUNC|O_EXCL|O_SYNC, S_IWUSR); if (fd != -1) { (void)close(fd); break; @@ -110,8 +110,14 @@ create_exclusive(const char *fname) errno = EEXIST; return -1; } - return 0; + fd = open(fname, O_WRONLY|O_EXCL|O_SYNC, 0); + if (fd < 0) { + errno = EEXIST; + return -1; + } + (void)unlink(fname); + return fd; bad: serrno = errno; (void)unlink(path); @@ -146,9 +152,8 @@ dot_lock(const char *fname, int pollinte for (;;) { handle_pending_signals(); (void)sigprocmask(SIG_BLOCK, &nset, &oset); - if (create_exclusive(path) != -1) { + if ((retval = create_exclusive(path)) != -1) { (void)sigprocmask(SIG_SETMASK, &oset, NULL); - retval = 0; break; } else @@ -170,10 +175,7 @@ dot_lock(const char *fname, int pollinte } void -dot_unlock(const char *fname) +dot_unlock(const int fd) { - char path[MAXPATHLEN]; - - (void)snprintf(path, sizeof(path), "%s.lock", fname); - (void)unlink(path); + close(fd); } --- dotlock.h +++ dotlock.h 2020-02-19 09:40:16.034422159 +0000 @@ -30,6 +30,6 @@ * pollinterval -- Interval (miliseconds) to check for lock, -1 return */ int dot_lock(const char *fname, int pollinterval); -void dot_unlock(const char *fname); +void dot_unlock(const int fd); #endif /* #ifndef _DOTLOCK_H_ */ --- sh.hist.c +++ sh.hist.c 2020-02-19 09:48:57.640589111 +0000 @@ -1209,9 +1209,11 @@ fmthist(int fmt, ptr_t ptr) } static void -dotlock_cleanup(void* lockpath) +dotlock_cleanup(void* xfdp) { - dot_unlock((char*)lockpath); + int *fdp; + fdp = xfdp; + dot_unlock(*fdp); } /* Save history before exiting the shell. */ @@ -1284,11 +1286,12 @@ rechist(Char *fname, int ref) jmp_buf_t osetexit; if (lock) { #ifndef WINNT_NATIVE + int fdlk; char *lockpath = strsave(short2str(fname)); cleanup_push(lockpath, xfree); /* Poll in 100 miliseconds interval to obtain the lock. */ - if ((dot_lock(lockpath, 100) == 0)) - cleanup_push(lockpath, dotlock_cleanup); + if ((fdlk = dot_lock(lockpath, 100)) != -1) + cleanup_push(&fdlk, dotlock_cleanup); #endif } getexit(osetexit); |
|
Then locking does not work. Another process can do the same (on the same or a different machine). This is the whole idea behind dot locking: that you can only touch the file once you've established you are the owner (by creating the link and making sure that the link count is 2). |
|
Hmm ... here it seems to work at least on the same machine ``` openat(AT_FDCWD, "/suse/werner/.tcsh/.noether.d574", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_SYNC, 0200) = 3 close(3) = 0 link("/suse/werner/.tcsh/.noether.d574", "/suse/werner/.tcsh/werner@noether.lock") = -1 EEXIST (File exists) unlink("/suse/werner/.tcsh/.noether.d574") = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 nanosleep({tv_sec=0, tv_nsec=100000000}, NULL) = 0 rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT TERM CHLD TSTP TTIN TTOU], [], 8) = 0 uname({sysname="Linux", nodename="noether", ...}) = 0 getpid() = 11676 openat(AT_FDCWD, "/suse/werner/.tcsh/.noether.2be3e", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_SYNC, 0200) = 3 close(3) = 0 link("/suse/werner/.tcsh/.noether.2be3e", "/suse/werner/.tcsh/werner@noether.lock") = 0 openat(AT_FDCWD, "/suse/werner/.tcsh/werner@noether.lock", O_WRONLY|O_EXCL|O_SYNC) = 3 unlink("/suse/werner/.tcsh/werner@noether.lock") = 0 stat("/suse/werner/.tcsh/.noether.2be3e", {st_mode=S_IFREG|0200, st_size=0, ...}) = 0 unlink("/suse/werner/.tcsh/.noether.2be3e") = 0 ``` ``` openat(AT_FDCWD, "/suse/werner/.tcsh/.noether.dc5c", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_SYNC, 0200) = 3 close(3) = 0 link("/suse/werner/.tcsh/.noether.dc5c", "/suse/werner/.tcsh/werner@noether.lock") = 0 openat(AT_FDCWD, "/suse/werner/.tcsh/werner@noether.lock", O_WRONLY|O_EXCL|O_SYNC) = 3 unlink("/suse/werner/.tcsh/werner@noether.lock") = 0 stat("/suse/werner/.tcsh/.noether.dc5c", {st_mode=S_IFREG|0200, st_size=0, ...}) = 0 unlink("/suse/werner/.tcsh/.noether.dc5c") = 0 ``` |
|
The point is that at the time you remove the lock there could be another process (on a different machine perhaps) who owns the lock and is currently writing the file. The lock is a sign that the file is busy. If you remove it indiscriminately there is no point in creating it in the first place. |
Date Modified | Username | Field | Change |
---|---|---|---|
2020-02-14 14:32 | bitstreamout | New Issue | |
2020-02-16 16:13 | christos | Assigned To | => christos |
2020-02-16 16:13 | christos | Status | new => assigned |
2020-02-16 16:15 | christos | Status | assigned => feedback |
2020-02-16 16:15 | christos | Note Added: 0003371 | |
2020-02-17 10:05 | bitstreamout | Note Added: 0003374 | |
2020-02-17 10:05 | bitstreamout | Status | feedback => assigned |
2020-02-17 17:53 | christos | Note Added: 0003375 | |
2020-02-17 17:54 | christos | Note Edited: 0003375 | |
2020-02-18 08:06 | bitstreamout | File Added: tcsh-6.22.02-local-dotlock.dif | |
2020-02-18 08:06 | bitstreamout | Note Added: 0003376 | |
2020-02-18 20:20 | christos | Note Added: 0003377 | |
2020-02-19 07:54 | bitstreamout | Note Added: 0003379 | |
2020-02-19 10:05 | bitstreamout | File Added: tcsh-6.22.02-local-dotlock-2.dif | |
2020-02-19 10:05 | bitstreamout | Note Added: 0003380 | |
2020-02-19 13:43 | christos | Note Added: 0003381 | |
2020-02-19 14:00 | bitstreamout | Note Added: 0003382 | |
2020-02-19 14:06 | christos | Note Added: 0003383 |