Discussion:
portable sftp oddity: sftp, redirection of stderr and ControPersist
(too old to reply)
Nick Holland
1970-01-01 00:00:00 UTC
Permalink
hi,

Ran into a problem which I thought was an AIXism, but have since found
that it can be reproduced on Linux and MacOS. It can NOT be reproduced
on OpenBSD.

Reproduced on:
AIXv7.1 OpenSSH v6.0p1
RedHat 6.8 OpenSSH 5.4p1
Redhat 7.2 OpenSSH 6.6.1p1
MacOS 10.11 (sorry, forgot to grab the OpenSSH version)

Could not reproduce on:
OpenBSD 5.9-current, march snapshot, OpenSSH_7.2
OpenBSD 5.3-current, April snapshot, OpenSSH_6.2

Demonstration: set up an account on a system that can ssh to itself and
authenticate via keys.

Create and run this script:
=====
#!/usr/bin/ksh

mkdir -p ~/dest

for X in 1 2 3 4 5 6 7 8 9 10; do
echo $X
echo "cd dest
put $0" | sftp -b - localhost 2>&1 | tee -a outfile
done
=====

Change first line to whatever shell your system uses (that's AIX's
default), otherwise, should be pretty portable.

Run it, it should copy itself to a directory in your home directory ten
times, should take just a few seconds.


Now, add the following to ~/.ssh/config :
=====
ControlMaster auto
ControlPath ~/.ssh/control/%h:%r:%p
ControlPersist 10s
=====

re-run script.
* What I think should happen is the persistent control channel should
greatly reduce the SSH connection time, so it should run significantly
faster.

* What DOES happen on platforms with a problem: the "ControlPersist"
value becomes an SSH rate limiter -- instead of holding a connection
OPEN for ten seconds, it PREVENTS another SSH session from starting for
ten seconds! So, instead of taking maybe four seconds before, now it
takes 104 seconds (10 x 10 seconds + ssh connection overhead). Change
the ControlPersist to some other value, the overall speed goes up or
down, but never as fast as without the ControlPersist options in place.


here's where it gets weird.
Remove the "2>&1" from the sftp line, and the problem goes away -- the
script runs much faster with the .ssh/config file than without it.
Unfortunately, I need the stderr output. :-/

It appears to be the "2>&1 | {cmd}" structure that is at fault --
doesn't matter if {cmd} is tee, wc, or a shell function (as it was in my
"real" project).

I have found that 2>error.file |tee outfile works, and then I can append
the error.file output into the end of the "outfile" and accomplish my
goals, but that's kinda ugly.

Thanks for looking!

Nick.
Jakub Jelen
2016-07-11 08:07:16 UTC
Permalink
Post by Nick Holland
hi,
Ran into a problem which I thought was an AIXism, but have since found
that it can be reproduced on Linux and MacOS. It can NOT be
reproduced on OpenBSD.
AIXv7.1 OpenSSH v6.0p1
RedHat 6.8 OpenSSH 5.4p1
Redhat 7.2 OpenSSH 6.6.1p1
MacOS 10.11 (sorry, forgot to grab the OpenSSH version)
OpenBSD 5.9-current, march snapshot, OpenSSH_7.2
OpenBSD 5.3-current, April snapshot, OpenSSH_6.2
Demonstration: set up an account on a system that can ssh to itself
and authenticate via keys.
=====
#!/usr/bin/ksh
mkdir -p ~/dest
for X in 1 2 3 4 5 6 7 8 9 10; do
echo $X
echo "cd dest
put $0" | sftp -b - localhost 2>&1 | tee -a outfile
done
=====
Change first line to whatever shell your system uses (that's AIX's
default), otherwise, should be pretty portable.
Run it, it should copy itself to a directory in your home directory
ten times, should take just a few seconds.
=====
ControlMaster auto
ControlPath ~/.ssh/control/%h:%r:%p
ControlPersist 10s
=====
re-run script.
* What I think should happen is the persistent control channel should
greatly reduce the SSH connection time, so it should run significantly
faster.
* What DOES happen on platforms with a problem: the "ControlPersist"
value becomes an SSH rate limiter -- instead of holding a connection
OPEN for ten seconds, it PREVENTS another SSH session from starting
for ten seconds! So, instead of taking maybe four seconds before, now
it takes 104 seconds (10 x 10 seconds + ssh connection overhead).
Change the ControlPersist to some other value, the overall speed goes
up or down, but never as fast as without the ControlPersist options in
place.
here's where it gets weird.
Remove the "2>&1" from the sftp line, and the problem goes away -- the
script runs much faster with the .ssh/config file than without it.
Unfortunately, I need the stderr output. :-/
It appears to be the "2>&1 | {cmd}" structure that is at fault --
doesn't matter if {cmd} is tee, wc, or a shell function (as it was in
my "real" project).
I have found that 2>error.file |tee outfile works, and then I can
append the error.file output into the end of the "outfile" and
accomplish my goals, but that's kinda ugly.
This looks like related to the bug #1988 [1]. The mux process is holding
the stderr. Could you test it with the patch from the referenced bug
applied?
Also as I see you are using RHEL, you can contact your support to
provide a test package (we plan to fix this in RHEL7.3).

[1] https://bugzilla.mindrot.org/show_bug.cgi?id=1988
--
Jakub Jelen
Associate Software Engineer
Security Technologies
Red Hat
Nick Holland
1970-01-01 00:00:00 UTC
Permalink
...
Post by Jakub Jelen
This looks like related to the bug #1988 [1]. The mux process is holding
the stderr. Could you test it with the patch from the referenced bug
applied?
Also as I see you are using RHEL, you can contact your support to
provide a test package (we plan to fix this in RHEL7.3).
[1] https://bugzilla.mindrot.org/show_bug.cgi?id=1988
well, RHEL is what I reproduced the error on, the actual problem cropped up
in an AIX project. Just not too easy to test AIX if you don't have power-
hungry, outrageously expensive hw on hand. :-/

So. I loaded up a CentOS 7.2 VM, downloaded the most recent OpenSSH-portable
tar file (openssh-7.2p2.tar.gz), added zlib-devel and openssl-devel,
./configure, make, make install'd the program. pkill'ed sshd,
/usr/local/sbin/sshd to start sshd....and same results.

I then applied this patch:

diff --git a/log.c b/log.c
index a5ba22f..9011f9d 100644
--- a/log.c
+++ b/log.c
@@ -308,7 +308,7 @@ log_change_level(LogLevel new_log_level)
int
log_is_on_stderr(void)
{
- return log_on_stderr;
+ return log_on_stderr && log_stderr_fd == STDERR_FILENO;
}

/* redirect what would usually get written to stderr to specified file */
diff --git a/ssh.c b/ssh.c
index ec8a515..65e0217 100644
--- a/ssh.c
+++ b/ssh.c
@@ -1346,7 +1346,7 @@ static void
control_persist_detach(void)
{
pid_t pid;
- int devnull;
+ int devnull, keep_stderr;

debug("%s: backgrounding master process", __func__);

@@ -1377,8 +1377,10 @@ control_persist_detach(void)
error("%s: open(\"/dev/null\"): %s", __func__,
strerror(errno));
} else {
+ keep_stderr = log_is_on_stderr() && debug_flag;
if (dup2(devnull, STDIN_FILENO) == -1 ||
- dup2(devnull, STDOUT_FILENO) == -1)
+ dup2(devnull, STDOUT_FILENO) == -1 ||
+ (!keep_stderr && dup2(devnull, STDERR_FILENO) == -1))
error("%s: dup2: %s", __func__, strerror(errno));
if (devnull > STDERR_FILENO)
close(devnull);

and seems to work, at least once I modified my script to use the sftp I had compiled in /usr/local/bin rather than the "factory" /usr/bin (oops).

my test script:
====
#!/bin/bash

mkdir -p ~/dest

for X in 1 2 3 4 5 6 7 8 9 10; do
echo $X
echo "cd dest
put $0" | /usr/local/bin/sftp -b - localhost 2>&1 | tee -a outfile
done
====
So yes, looks like there's a fix on the way. Thanks!

Nick.

Loading...