#3859 closed defect (fixed)
No output from joel scripts in telnet
Reported by: | Chris Johns | Owned by: | Chris Johns |
---|---|---|---|
Priority: | normal | Milestone: | 5.1 |
Component: | shell | Version: | 5 |
Severity: | blocker | Keywords: | |
Cc: | Blocked By: | ||
Blocking: |
Description
Running a joel
script in a telnet session results in the output being sent to the global stdout
. For example:
$ telnet 1.2.3.4 Trying 1.2.3.4... Connected to 1.2.3.4. Escape character is '^]'. RTEMS Shell on /dev/pty0. Use 'help' to list commands. [/] # cat j #! joel ls -las / [/] # ./j [/] #
The bug is a new shell main loop task will default to the global stdout
, stdin
etc and has no information about the parent's std
handles. A joel
script runs in it's own work task and does not know the telnet's std
handles.
There are a related set of issues in the handling of the shell_env
variable, POSIX key handling and the use of the external call rtems_shell_main_loop
.
The telnet example in libbsd
has:
static void telnet_shell(char *name, void *arg) { rtems_shell_env_t env; memset(&env, 0, sizeof(env)); env.devname = name; env.taskname = "TLNT"; env.login_check = NULL; env.forever = false; rtems_shell_main_loop(&env); }
This is problematic as control of the env
has been lost and this make backwards comptatable changes difficult. Control of this struct
needs to be brought back under the shell code.
Currently the posix key is set in the parent task only when the run entry point is used. The run's created shell_env
is then passed to the shell's main loop task as an argument from which it is cloned. This means an env
is malloc
ed in each run call and again in the main loop of the shell.
The current code leaks memory as repeated calls to a joel
script in a shell will set the key over and over. The destructor is only called when the task is deleted. We have to assume the cleanup of any shell_env
allocated externally to the shell code has to be handled externally.
Setting the key in the main loop task is problematic because telnet code such as the example in libbsd
uses a local stack shell_env
and the key has a destructor that blindly free
's the key's memory when a task is released.
Changes:
- Add
parent_stdout
,parent_stdin
, andparent_stderr
to theshell_env
and set to the parent'sstd
handles. - Add a
managed
flag toshell_env
and only set when allocated byrtems_shell_init_env
. Changertems_shell_env_free
to onlyfree
theshell_env
ifmanaged
. - Remove all key sets and have only one in the shell's main loop code.
- Change
rtems_shell_init_env
to get the current tasks key and clone that before cloning the global env. - Update
rtems_shell_dup_current_env
to set the parentstd
handles. - Have the main loop use the parent
std
handles rather than the global handles. - Check the
magic
field has been set in the shell's main loop and raise an error if not set. The only code to set this field should reside inshell.c
. Code such aslibbsd
will need to callrtems_shell_dup_current_env
.
Change History (35)
comment:1 follow-up: 2 Changed on 02/04/20 at 16:25:01 by Sebastian Huber
comment:2 follow-up: 3 Changed on 02/05/20 at 02:14:46 by Chris Johns
Replying to Sebastian Huber:
Sounds that it is currently pretty broken.
There seems to be a few overlapping issues that make things a little confusing.
What about changing the managed member into a destroy handler which is initialized by the constructor?
Yes, my patch I am testing does this. I have an issue on 4.11 where a printf line in changes what happens and I am not sure why. I need to resolve this. I will then create a 4.11 clone of this ticket and post a patch for that branch.
comment:3 Changed on 02/10/20 at 21:59:43 by Chris Johns
Replying to Chris Johns:
I have an issue on 4.11 where a printf line in changes what happens and I am not sure why. I need to resolve this. I will then create a 4.11 clone of this ticket and post a patch for that branch.
I have chased the issue down to how we set up the re-entrant struct newlib uses in our threads. My changes follow what exists ...
FILE *input = fopen(shell_env->input, "r"); if (input == NULL) error(...); stdin = input;
The only calls on stdin
before the assignment are fileno(stdin)
and this does not touch the re-entrant struct so the first call to libc after the assignment is setvbuf(stdout, NULL, _IONBF, 0);
and newlib performs it's lazy initialisation and resets the re-entrant struct.
This issue is present on 4.11 and 5 as far as I can see.
comment:4 Changed on 02/11/20 at 00:26:20 by Chris Johns
Blocked By: | 3870 added |
---|
comment:5 Changed on 02/17/20 at 04:51:50 by Chris Johns
It is not easy making a joel
test because of the way we have used the linker mapped printf
to printk
.
comment:6 Changed on 02/17/20 at 06:42:57 by Sebastian Huber
Only printf(), puts(), and putchar() are wrapped. The shell related code should use fprintf() etc. Using printf() would be a bug.
comment:7 Changed on 02/17/20 at 20:51:46 by Chris Johns
A test is not easy as a shell test cannot use any standard commands, eg echo
because these call printf
. The post test patch provides a command that uses fprintf
and this works enough to test the shell start, main loop, and input and output support.
comment:8 Changed on 02/18/20 at 02:55:20 by Chris Johns
Blocked By: | 3870 removed |
---|
comment:9 follow-up: 10 Changed on 02/18/20 at 06:06:14 by Sebastian Huber
It is a bug that the echo command uses putchar(). It should use fputc(c, stdout).
comment:10 Changed on 02/18/20 at 06:55:17 by Chris Johns
Replying to Sebastian Huber:
It is a bug that the echo command uses putchar(). It should use fputc(c, stdout).
Could you please explain why?
Other commands like ls
use printf
.
comment:11 follow-up: 12 Changed on 02/18/20 at 07:02:47 by Sebastian Huber
Every shell related code which uses printf(), puts(), putchar(), etc. which work on file descriptors is broken. The file descriptors are global. Only the stdin, stdout, and stderr file pointers are thread-specific.
comment:12 Changed on 02/18/20 at 07:09:20 by Chris Johns
But ls
works nicely with telnet, actually all commands I know of work well with the redirection we implement. I am sorry you have lost here with your bug comment.
My comments relate to our testsuite and the hack we have to switch printf via the linker to keep the tests as small as possible.
comment:13 Changed on 02/18/20 at 07:27:06 by Sebastian Huber
Sorry for the confusion. I checked the Newlib code and it actually uses stdout for printf(). I thought it uses directly a file descriptor, but this is not the case. I vaguely remember that Ralf changed some printf() to fprintf(stdout, ...) for whatever reason.
comment:14 Changed on 04/09/20 at 00:21:41 by Chris Johns
I have found the default command handler in libbsd needs updating ...
https://git.rtems.org/rtems-libbsd/tree/rtemsbsd/telnetd/telnetd-service.c#n72
The input
and output
fields needs have strings. I will change the shell's handling of NULL
to reference the parent descriptors to catch user's code.
comment:15 Changed on 04/14/20 at 22:30:49 by Chris Johns <chrisj@…>
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
In [changeset:"d007cc2ceef48ce47c157512370c1bf2c45596f3/rtems" d007cc2/rtems]:
comment:16 Changed on 08/06/20 at 06:47:29 by Sebastian Huber
Resolution: | fixed |
---|---|
Status: | closed → reopened |
The telnet shell
static void telnet_shell(char *name, void *arg) { rtems_shell_env_t env; memset(&env, 0, sizeof(env)); env.devname = name; env.taskname = "TLNT"; env.login_check = NULL; env.forever = false; rtems_shell_main_loop(&env); }
used in the libbsd test code is now broken:
telnet 169.254.140.254 Trying 169.254.140.254... Connected to 169.254.140.254. Escape character is '^]'. invalid shell environment passed to the main loop) Connection closed by foreign host.
How should the code be instead?
comment:17 Changed on 08/06/20 at 06:50:18 by Sebastian Huber
Severity: | normal → blocker |
---|
comment:18 Changed on 08/06/20 at 06:51:07 by Sebastian Huber
There should be an advice in the migration section in the user manual.
comment:19 Changed on 08/06/20 at 10:42:21 by Chris Johns
The example code provided is broken and has always been broken. I only know it exists because libbsd had it. The ticket describes the situation and item 7) in the list says how it is to be resolved and that interface has always been there for just this situation.
Please unblock this ticket. Thanks.
comment:20 Changed on 08/06/20 at 11:17:13 by Sebastian Huber
I have to fix the code in libbsd. This code was also copied to applications which are now also broken. This is definitely something for the migration section.
comment:21 follow-up: 22 Changed on 08/06/20 at 11:51:15 by Chris Johns
Thanks and yes something should be added. I will take a look.
comment:22 Changed on 08/06/20 at 11:58:17 by Sebastian Huber
Replying to Chris Johns:
Thanks and yes something should be added. I will take a look.
https://lists.rtems.org/pipermail/devel/2020-August/061166.html
I already fixed libbsd (the commits don't show up in the tickets).
comment:23 Changed on 08/06/20 at 17:22:46 by Sebastian Huber
The proposed fix doesn't work. If you close a telnet session and try to connect again the target crashes with a NULL pointer access:
#0 fileno (f=0x0) at ../../../../../../../../gnu-mirror-gcc-c72a1b6/newlib/libc/stdio/fileno.c:69 #1 0x0042e866 in rtems_shell_main_loop (shell_env=0x0) at ../../../cpukit/libmisc/shell/shell.c:910 #2 0x00104880 in telnet_shell (name=0x774600 "/dev/pty4", arg=0x0) at ../../testsuite/media01/test_main.c:133 #3 0x004113e8 in telnetd_session_task (arg=7816628) at ../../../cpukit/telnetd/telnetd.c:179 #4 0x00425f78 in _Thread_Handler () at ../../../cpukit/score/src/threadhandler.c:143
comment:24 Changed on 08/07/20 at 07:10:16 by Sebastian Huber
The following change fixes the NULL pointer access:
diff --git a/cpukit/libmisc/shell/shell.c b/cpukit/libmisc/shell/shell.c index 0b06e8b4d1..72aeb23c43 100644 --- a/cpukit/libmisc/shell/shell.c +++ b/cpukit/libmisc/shell/shell.c @@ -234,12 +234,6 @@ static void rtems_shell_clear_shell_env(void) eno = pthread_setspecific(rtems_shell_current_env_key, NULL); if (eno != 0) rtems_error(0, "pthread_setspecific(shell_current_env_key): clear"); - - /* - * Clear stdin and stdout file pointers of they will be closed - */ - stdin = NULL; - stdout = NULL; } /*
What is the purpose of this code?
comment:25 Changed on 08/07/20 at 07:18:44 by Chris Johns
A thread instance that inherits the std*
file handles does not close them, for example a joel
script would close the parents session's handles.
comment:26 Changed on 08/07/20 at 07:25:05 by Sebastian Huber
Setting the stdin and stdout to NULL pointers doesn't look like a good idea to me. I still don't understand what you want to do with this. Which code reads the stdin and stdout pointers and checks them for NULL?
comment:27 Changed on 08/07/20 at 07:28:25 by Chris Johns
Nothing more than what was in the code before ..
https://git.rtems.org/rtems/tree/cpukit/libmisc/shell/shell.c?h=4.11#n242
Something else must has changed. I had tested it on RTEMS 5.
comment:28 Changed on 08/07/20 at 07:31:24 by Sebastian Huber
I have this problem on the RTEMS 5 branch using RTEMS 5 tools.
comment:29 Changed on 08/07/20 at 07:33:38 by Chris Johns
Yes I am seeing it as well. I am little confused.
I am not sure why the closing of a previous session would effect a new session as the parent (telnetd) should be handling this. The new session should have a new set of std*
handles?
comment:30 Changed on 08/07/20 at 07:42:18 by Sebastian Huber
For my taste the shell code tinkers too much with the std*
handles. I lost track who is responsible for set up and tear down. What is the purpose of the
FILE *parent_stdin; FILE *parent_stdout; FILE *parent_stderr;
Who is the parent?
comment:31 Changed on 08/07/20 at 07:49:21 by Chris Johns
I have a pretty good handle (hah pun intended) on these handles. I will try and take a look.
The parent handles are the wrapper of the shell env. For example they are global handles for a console shell. With telnetd
they are the session handles. If you run a joel
script they are ones used:
[/] # cat j #! joel ls -las / cat /j [/] # ./j total 6 0 drwxr-xr-x 1 root root 6160 Jan 1 1988 dev 0 drwxr-xr-x 1 root root 2240 Jan 1 1988 etc 0 -rwxrwxrwx 1 root root 31 Jan 1 1988 j 1 drwx--x--x 3 root root 512 Jan 1 1988 ram 0 drwxr-xr-x 1 root root 0 Jan 1 1988 tmp 0 drwxr-xr-x 1 root root 280 Jan 1 1988 var #! joel ls -las / cat /j
The threads that run the joel
script inherit the telnet session handlers.
comment:32 Changed on 08/08/20 at 07:45:29 by Chris Johns
I have looked into the issue and the reuse of the telnetd
session thread is clashing with the way the shell's environment is currently being cleaned up. The telnetd
software creates a number of sessions and it reuses these sessions for the incoming connections. This means the pty
file handles assigned to each session are reused. Clearing these file handles corrupts the telnetd
session.
Setting of stdin
, stdout
and also stderr
to NULL
when cleaning up a shell environment is important because thread termination closes those files if present and open and if the file pointers have been inherited from the parent thread the parent can be left without a valid stdin
, stdout
and stderr
. This happens with a joel
script. It could also happen with a command that is placed in the background (if we supported this) or a command that dispatches a worker thread.
This ticket was originally about a bug where joel
script output was not redirected to the parent thread's stdout. They happened to work for the console because the console shell's standard handles are the global ones. A new thread inherits the global handles and the clean up also manages the closing of those, or the lack of it. However a joel
script with telnet was broken.
The solution is to move the setting of stdin
, stdout
and stderr
out of clearing the environment and to only do this when the rtems_shell_task
exits the rtems_shell_main_loop
.
For the record a debug trace from shell.c
running a joel
script is:
shell[0a010001]: run: env: 0x62accd0 shell[0a010001]: run out: 1 (0xbeb028) shell[0a010001]: run in: 0 (0xbeafa8) shell[0a01001c]: env: 0x62accd0 shell[0a01001c]: child out: 1 (0xbeb028) shell[0a01001c]: child in: 0 (0xbeafa8) shell[0a010017]: dup: global: copy: 0x2f5d1b8 out: 166 (0x2f02738) in: 165 (0x2f026b8) shell[0a010017]: env: 0x2f5d1b8 shell[0a010017]: child out: 166 (0x2f02738) shell[0a010017]: child in: 165 (0x2f026b8) shell[0a010017]: script: in: /j out: stdout shell[0a010017]: run: env: 0x63558a8 shell[0a010017]: run out: 166 (0x2f02738) shell[0a010017]: run in: 165 (0x2f026b8) shell[0a01001d]: env: 0x63558a8 shell[0a01001d]: child out: 166 (0x2f02738) shell[0a01001d]: child in: 35 (0x629b260) shell[0a01001d]: end: 0 0 shell[0a01001d]: child in-to-close: 0x629b260 shell[0a01001d]: child out-to-close: 0 shell[0a010017]: run: end: sc:0 ec:0 shell[0a010017]: script: end: 0 shell[0a010017]: end: 0 0 shell[0a010017]: child in-to-close: 0 shell[0a010017]: child out-to-close: 0
comment:33 Changed on 08/10/20 at 05:13:42 by Chris Johns <chrisj@…>
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
In [changeset:"95036a45914c6d0bb93f73c4f18f12d87a334293/rtems" 95036a45/rtems]:
comment:34 Changed on 08/10/20 at 09:47:45 by Sebastian Huber <sebastian.huber@…>
In [changeset:"cb4358c9b16289d938f41d04a71dfaddb9716b0e/rtems-docs" cb4358c/rtems-docs]:
comment:35 Changed on 08/13/20 at 09:38:54 by Sebastian Huber <sebastian.huber@…>
In [changeset:"49f7e05dbd0f7eb548f05211b8f04e8c8fdac823/rtems-docs" 49f7e05/rtems-docs]:
Sounds that it is currently pretty broken.
What about changing the managed member into a destroy handler which is initialized by the constructor?