#3859 closed defect (fixed)

No output from joel scripts in telnet

Reported by: Chris Johns Owned by: Chris Johns
Priority: normal Milestone: 5.1
Component: shell Version: 5
Severity: blocker Keywords:
Cc: Blocked By:
Blocking:

Description

Running a joel script in a telnet session results in the output being sent to the global stdout. For example:

$ telnet 1.2.3.4
Trying 1.2.3.4...
Connected to 1.2.3.4.
Escape character is '^]'.

RTEMS Shell on /dev/pty0. Use 'help' to list commands.
[/] # cat j
#! joel
ls -las /
[/] # ./j
[/] #

The bug is a new shell main loop task will default to the global stdout, stdin etc and has no information about the parent's std handles. A joel script runs in it's own work task and does not know the telnet's std handles.

There are a related set of issues in the handling of the shell_env variable, POSIX key handling and the use of the external call rtems_shell_main_loop.

The telnet example in libbsd has:

static void
telnet_shell(char *name, void *arg)
{
	rtems_shell_env_t env;

	memset(&env, 0, sizeof(env));

	env.devname = name;
	env.taskname = "TLNT";
	env.login_check = NULL;
	env.forever = false;

	rtems_shell_main_loop(&env);
}

This is problematic as control of the env has been lost and this make backwards comptatable changes difficult. Control of this struct needs to be brought back under the shell code.

Currently the posix key is set in the parent task only when the run entry point is used. The run's created shell_env is then passed to the shell's main loop task as an argument from which it is cloned. This means an env is malloced in each run call and again in the main loop of the shell.

The current code leaks memory as repeated calls to a joel script in a shell will set the key over and over. The destructor is only called when the task is deleted. We have to assume the cleanup of any shell_env allocated externally to the shell code has to be handled externally.

Setting the key in the main loop task is problematic because telnet code such as the example in libbsd uses a local stack shell_env and the key has a destructor that blindly free's the key's memory when a task is released.

Changes:

  1. Add parent_stdout, parent_stdin, and parent_stderr to the shell_env and set to the parent's std handles.
  2. Add a managed flag to shell_env and only set when allocated by rtems_shell_init_env. Change rtems_shell_env_free to only free the shell_env if managed.
  3. Remove all key sets and have only one in the shell's main loop code.
  4. Change rtems_shell_init_env to get the current tasks key and clone that before cloning the global env.
  5. Update rtems_shell_dup_current_env to set the parent std handles.
  6. Have the main loop use the parent std handles rather than the global handles.
  7. Check the magic field has been set in the shell's main loop and raise an error if not set. The only code to set this field should reside in shell.c. Code such as libbsd will need to call rtems_shell_dup_current_env.

Change History (35)

comment:1 Changed on Feb 4, 2020 at 4:25:01 PM by Sebastian Huber

Sounds that it is currently pretty broken.

What about changing the managed member into a destroy handler which is initialized by the constructor?

comment:2 in reply to:  1 ; Changed on Feb 5, 2020 at 2:14:46 AM by Chris Johns

Replying to Sebastian Huber:

Sounds that it is currently pretty broken.

There seems to be a few overlapping issues that make things a little confusing.

What about changing the managed member into a destroy handler which is initialized by the constructor?

Yes, my patch I am testing does this. I have an issue on 4.11 where a printf line in changes what happens and I am not sure why. I need to resolve this. I will then create a 4.11 clone of this ticket and post a patch for that branch.

comment:3 in reply to:  2 Changed on Feb 10, 2020 at 9:59:43 PM by Chris Johns

Replying to Chris Johns:

I have an issue on 4.11 where a printf line in changes what happens and I am not sure why. I need to resolve this. I will then create a 4.11 clone of this ticket and post a patch for that branch.

I have chased the issue down to how we set up the re-entrant struct newlib uses in our threads. My changes follow what exists ...

  FILE *input = fopen(shell_env->input, "r");
  if (input == NULL)
     error(...);
  stdin = input;

The only calls on stdin before the assignment are fileno(stdin) and this does not touch the re-entrant struct so the first call to libc after the assignment is setvbuf(stdout, NULL, _IONBF, 0); and newlib performs it's lazy initialisation and resets the re-entrant struct.

This issue is present on 4.11 and 5 as far as I can see.

comment:4 Changed on Feb 11, 2020 at 12:26:20 AM by Chris Johns

Blocked By: 3870 added

comment:5 Changed on Feb 17, 2020 at 4:51:50 AM by Chris Johns

It is not easy making a joel test because of the way we have used the linker mapped printf to printk.

comment:6 Changed on Feb 17, 2020 at 6:42:57 AM by Sebastian Huber

Only printf(), puts(), and putchar() are wrapped. The shell related code should use fprintf() etc. Using printf() would be a bug.

comment:7 Changed on Feb 17, 2020 at 8:51:46 PM by Chris Johns

A test is not easy as a shell test cannot use any standard commands, eg echo because these call printf. The post test patch provides a command that uses fprintf and this works enough to test the shell start, main loop, and input and output support.

comment:8 Changed on Feb 18, 2020 at 2:55:20 AM by Chris Johns

Blocked By: 3870 removed

comment:9 Changed on Feb 18, 2020 at 6:06:14 AM by Sebastian Huber

It is a bug that the echo command uses putchar(). It should use fputc(c, stdout).

comment:10 in reply to:  9 Changed on Feb 18, 2020 at 6:55:17 AM by Chris Johns

Replying to Sebastian Huber:

It is a bug that the echo command uses putchar(). It should use fputc(c, stdout).

Could you please explain why?

Other commands like ls use printf.

comment:11 Changed on Feb 18, 2020 at 7:02:47 AM by Sebastian Huber

Every shell related code which uses printf(), puts(), putchar(), etc. which work on file descriptors is broken. The file descriptors are global. Only the stdin, stdout, and stderr file pointers are thread-specific.

comment:12 in reply to:  11 Changed on Feb 18, 2020 at 7:09:20 AM by Chris Johns

But ls works nicely with telnet, actually all commands I know of work well with the redirection we implement. I am sorry you have lost here with your bug comment.

My comments relate to our testsuite and the hack we have to switch printf via the linker to keep the tests as small as possible.

comment:13 Changed on Feb 18, 2020 at 7:27:06 AM by Sebastian Huber

Sorry for the confusion. I checked the Newlib code and it actually uses stdout for printf(). I thought it uses directly a file descriptor, but this is not the case. I vaguely remember that Ralf changed some printf() to fprintf(stdout, ...) for whatever reason.

comment:14 Changed on Apr 9, 2020 at 12:21:41 AM by Chris Johns

I have found the default command handler in libbsd needs updating ...

https://git.rtems.org/rtems-libbsd/tree/rtemsbsd/telnetd/telnetd-service.c#n72

The input and output fields needs have strings. I will change the shell's handling of NULL to reference the parent descriptors to catch user's code.

comment:15 Changed on Apr 14, 2020 at 10:30:49 PM by Chris Johns <chrisj@…>

Resolution: fixed
Status: assignedclosed

In d007cc2/rtems:

libmisc/shell: Fix the handling of joel scripts in telnet

  • Fix the passing of std[in/out] to child threads
  • Fix deleting of managed memory in the key destructor
  • Only set the key in the main loop thread
  • Only allocate a shell env outside of the main loop
  • Fix memory leak if the task start fails
  • Remove error level from shell env, it cannot be returned this way. Add exit_code but the API is broken so it cannot be returned.

Closes #3859

comment:16 Changed on Aug 6, 2020 at 6:47:29 AM by Sebastian Huber

Resolution: fixed
Status: closedreopened

The telnet shell

static void
telnet_shell(char *name, void *arg)
{
	rtems_shell_env_t env;

	memset(&env, 0, sizeof(env));

	env.devname = name;
	env.taskname = "TLNT";
	env.login_check = NULL;
	env.forever = false;

	rtems_shell_main_loop(&env);
}

used in the libbsd test code is now broken:

telnet 169.254.140.254
Trying 169.254.140.254...
Connected to 169.254.140.254.
Escape character is '^]'.
invalid shell environment passed to the main loop)
Connection closed by foreign host.

How should the code be instead?

comment:17 Changed on Aug 6, 2020 at 6:50:18 AM by Sebastian Huber

Severity: normalblocker

comment:18 Changed on Aug 6, 2020 at 6:51:07 AM by Sebastian Huber

There should be an advice in the migration section in the user manual.

comment:19 Changed on Aug 6, 2020 at 10:42:21 AM by Chris Johns

The example code provided is broken and has always been broken. I only know it exists because libbsd had it. The ticket describes the situation and item 7) in the list says how it is to be resolved and that interface has always been there for just this situation.

Please unblock this ticket. Thanks.

comment:20 Changed on Aug 6, 2020 at 11:17:13 AM by Sebastian Huber

I have to fix the code in libbsd. This code was also copied to applications which are now also broken. This is definitely something for the migration section.

comment:21 Changed on Aug 6, 2020 at 11:51:15 AM by Chris Johns

Thanks and yes something should be added. I will take a look.

comment:22 in reply to:  21 Changed on Aug 6, 2020 at 11:58:17 AM by Sebastian Huber

Replying to Chris Johns:

Thanks and yes something should be added. I will take a look.

https://lists.rtems.org/pipermail/devel/2020-August/061166.html

I already fixed libbsd (the commits don't show up in the tickets).

comment:23 Changed on Aug 6, 2020 at 5:22:46 PM by Sebastian Huber

The proposed fix doesn't work. If you close a telnet session and try to connect again the target crashes with a NULL pointer access:

#0  fileno (f=0x0) at ../../../../../../../../gnu-mirror-gcc-c72a1b6/newlib/libc/stdio/fileno.c:69
#1  0x0042e866 in rtems_shell_main_loop (shell_env=0x0) at ../../../cpukit/libmisc/shell/shell.c:910
#2  0x00104880 in telnet_shell (name=0x774600 "/dev/pty4", arg=0x0) at ../../testsuite/media01/test_main.c:133
#3  0x004113e8 in telnetd_session_task (arg=7816628) at ../../../cpukit/telnetd/telnetd.c:179
#4  0x00425f78 in _Thread_Handler () at ../../../cpukit/score/src/threadhandler.c:143
Last edited on Aug 6, 2020 at 5:28:52 PM by Sebastian Huber (previous) (diff)

comment:24 Changed on Aug 7, 2020 at 7:10:16 AM by Sebastian Huber

The following change fixes the NULL pointer access:

diff --git a/cpukit/libmisc/shell/shell.c b/cpukit/libmisc/shell/shell.c
index 0b06e8b4d1..72aeb23c43 100644
--- a/cpukit/libmisc/shell/shell.c
+++ b/cpukit/libmisc/shell/shell.c
@@ -234,12 +234,6 @@ static void rtems_shell_clear_shell_env(void)
   eno = pthread_setspecific(rtems_shell_current_env_key, NULL);
   if (eno != 0)
     rtems_error(0, "pthread_setspecific(shell_current_env_key): clear");
-
-  /*
-   * Clear stdin and stdout file pointers of they will be closed
-   */
-  stdin = NULL;
-  stdout = NULL;
 }
 
 /*

What is the purpose of this code?

comment:25 Changed on Aug 7, 2020 at 7:18:44 AM by Chris Johns

A thread instance that inherits the std* file handles does not close them, for example a joel script would close the parents session's handles.

comment:26 Changed on Aug 7, 2020 at 7:25:05 AM by Sebastian Huber

Setting the stdin and stdout to NULL pointers doesn't look like a good idea to me. I still don't understand what you want to do with this. Which code reads the stdin and stdout pointers and checks them for NULL?

comment:27 Changed on Aug 7, 2020 at 7:28:25 AM by Chris Johns

Nothing more than what was in the code before ..
https://git.rtems.org/rtems/tree/cpukit/libmisc/shell/shell.c?h=4.11#n242
Something else must has changed. I had tested it on RTEMS 5.

comment:28 Changed on Aug 7, 2020 at 7:31:24 AM by Sebastian Huber

I have this problem on the RTEMS 5 branch using RTEMS 5 tools.

comment:29 Changed on Aug 7, 2020 at 7:33:38 AM by Chris Johns

Yes I am seeing it as well. I am little confused.

I am not sure why the closing of a previous session would effect a new session as the parent (telnetd) should be handling this. The new session should have a new set of std* handles?

Last edited on Aug 7, 2020 at 7:34:11 AM by Chris Johns (previous) (diff)

comment:30 Changed on Aug 7, 2020 at 7:42:18 AM by Sebastian Huber

For my taste the shell code tinkers too much with the std* handles. I lost track who is responsible for set up and tear down. What is the purpose of the

  FILE *parent_stdin;
  FILE *parent_stdout;
  FILE *parent_stderr;

Who is the parent?

comment:31 Changed on Aug 7, 2020 at 7:49:21 AM by Chris Johns

I have a pretty good handle (hah pun intended) on these handles. I will try and take a look.

The parent handles are the wrapper of the shell env. For example they are global handles for a console shell. With telnetd they are the session handles. If you run a joel script they are ones used:

[/] # cat j
#! joel
ls -las /
cat /j
[/] # ./j
total 6
0 drwxr-xr-x   1 root  root   6160 Jan  1  1988 dev
0 drwxr-xr-x   1 root  root   2240 Jan  1  1988 etc
0 -rwxrwxrwx   1 root  root     31 Jan  1  1988 j
1 drwx--x--x   3 root  root    512 Jan  1  1988 ram
0 drwxr-xr-x   1 root  root      0 Jan  1  1988 tmp
0 drwxr-xr-x   1 root  root    280 Jan  1  1988 var
#! joel
ls -las /
cat /j

The threads that run the joel script inherit the telnet session handlers.

comment:32 Changed on Aug 8, 2020 at 7:45:29 AM by Chris Johns

I have looked into the issue and the reuse of the telnetd session thread is clashing with the way the shell's environment is currently being cleaned up. The telnetd software creates a number of sessions and it reuses these sessions for the incoming connections. This means the pty file handles assigned to each session are reused. Clearing these file handles corrupts the telnetd session.

Setting of stdin, stdout and also stderr to NULL when cleaning up a shell environment is important because thread termination closes those files if present and open and if the file pointers have been inherited from the parent thread the parent can be left without a valid stdin, stdout and stderr. This happens with a joel script. It could also happen with a command that is placed in the background (if we supported this) or a command that dispatches a worker thread.

This ticket was originally about a bug where joel script output was not redirected to the parent thread's stdout. They happened to work for the console because the console shell's standard handles are the global ones. A new thread inherits the global handles and the clean up also manages the closing of those, or the lack of it. However a joel script with telnet was broken.

The solution is to move the setting of stdin, stdout and stderr out of clearing the environment and to only do this when the rtems_shell_task exits the rtems_shell_main_loop.

For the record a debug trace from shell.c running a joel script is:

shell[0a010001]: run: env: 0x62accd0
shell[0a010001]: run out: 1 (0xbeb028)
shell[0a010001]: run  in: 0 (0xbeafa8)
shell[0a01001c]: env: 0x62accd0
shell[0a01001c]: child out: 1 (0xbeb028)
shell[0a01001c]: child  in: 0 (0xbeafa8)
shell[0a010017]: dup: global: copy: 0x2f5d1b8 out: 166 (0x2f02738) in: 165 (0x2f026b8)
shell[0a010017]: env: 0x2f5d1b8
shell[0a010017]: child out: 166 (0x2f02738)
shell[0a010017]: child  in: 165 (0x2f026b8)
shell[0a010017]: script: in: /j out: stdout
shell[0a010017]: run: env: 0x63558a8
shell[0a010017]: run out: 166 (0x2f02738)
shell[0a010017]: run  in: 165 (0x2f026b8)
shell[0a01001d]: env: 0x63558a8
shell[0a01001d]: child out: 166 (0x2f02738)
shell[0a01001d]: child  in: 35 (0x629b260)
shell[0a01001d]: end: 0 0
shell[0a01001d]: child in-to-close: 0x629b260
shell[0a01001d]: child out-to-close: 0
shell[0a010017]: run: end: sc:0 ec:0
shell[0a010017]: script: end: 0
shell[0a010017]: end: 0 0
shell[0a010017]: child in-to-close: 0
shell[0a010017]: child out-to-close: 0

comment:33 Changed on Aug 10, 2020 at 5:13:42 AM by Chris Johns <chrisj@…>

Resolution: fixed
Status: reopenedclosed

In 95036a45/rtems:

shell: Only clear std handles when the shell task exits

Clearing the std file handles when the main loop exited crashes
telnetd as it reuses its session threads.

Closes #3859

comment:34 Changed on Aug 10, 2020 at 9:47:45 AM by Sebastian Huber <sebastian.huber@…>

In cb4358c/rtems-docs:

user: Add shell environment migration aid

Update #3859.

comment:35 Changed on Aug 13, 2020 at 9:38:54 AM by Sebastian Huber <sebastian.huber@…>

In 49f7e05/rtems-docs:

user: Fix format

Update #3859.

Note: See TracTickets for help on using tickets.