Discussion:
[Bug] Regression problem in transfer.sh for OpenSSH 7.1 P2 on HPE NSE above dd-size 32k
(too old to reply)
Randall S. Becker
2016-02-08 19:36:39 UTC
Permalink
G'Day,

I am requesting help in resolving an issue on the HPE NonStop platform in
OpenSSH 7.1 P2 in the regression suite for all dd-size above 32k. Previous
tests are all passing, but in the for-loop inside regress/transfer.sh, when
s is 64k, the command:

dd if=$DATA obs=${s} 2> /dev/null | \
${SSH} -q -$p -F $OBJ/ssh_proxy somehost "cat >
${COPY}"

does not transfer any data. regress/copy is empty and the test fails. I have
checked the platform's API for read/write code, which has no issue
transferring data at 1Mb - so that's probably not it. Is there some ioctl or
other system call might be new that may be falling in this specific
situation? I have been looking for math problems but have not found any -
int/long/size_t are 32bit on the box. 56k is a common boundary in some
functions, but I don't get what OpenSSH would be doing in this regard. Our
previous version, 6.7 based at commit 28453d5, did not have this problem,
and regress/transfer.sh does not have any substantively differences compared
to back then. The code is built using OpenSSL 1.0.2f.

Getting detailed debug logs of the regression test would also really help
here. Anyone have any pointers on where I can look to try to resolve this?

Thanks,

Randall
--
-- Randall Becker
-- Brief whoami: NonStop&UNIX developer since approximately
UNIX(421664400)/NonStop(211288444200000000)
-- In my real life, I talk too much.
Darren Tucker
2016-02-08 20:30:18 UTC
Permalink
On Tue, Feb 9, 2016 at 6:36 AM, Randall S. Becker
<***@nexbridge.com> wrote:
[...]
Post by Randall S. Becker
Getting detailed debug logs of the regression test would also really help
here. Anyone have any pointers on where I can look to try to resolve this?
The logs should be in the regress directory. There should be ssh.log
and ssh.log for the currently running test and failed-ssh.log and
failed-sshd.log containing the concatenation of the logs from any
failed tests. You can run "make t-exec LTESTS=transfer" to run just
that single test while you're playing around to save some time.

I have seen something similar on old FreeBSD systems when the test was
run on an NFS mount. I never figured out why this was, but running
the test on local disk worked in that case.
--
Darren Tucker (dtucker at zip.com.au)
GPG key 8FF4FA69 / D9A3 86E9 7EEE AF4B B2D4 37C9 C982 80C7 8FF4 FA69
Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.
Randall S. Becker
2016-02-08 21:51:18 UTC
Permalink
Subject: Re: [Bug] Regression problem in transfer.sh for OpenSSH 7.1 P2 on
HPE NSE above dd-size 32k
On Tue, Feb 9, 2016 at 6:36 AM, Randall S. Becker
[...]
Post by Randall S. Becker
Getting detailed debug logs of the regression test would also really
help here. Anyone have any pointers on where I can look to try to
resolve
this?
The logs should be in the regress directory. There should be ssh.log and
ssh.log for the currently running test and failed-ssh.log and
failed-sshd.log
containing the concatenation of the logs from any failed tests. You can
run
"make t-exec LTESTS=transfer" to run just that single test while you're
playing
around to save some time.
I have seen something similar on old FreeBSD systems when the test was run
on an NFS mount. I never figured out why this was, but running the test
on
local disk worked in that case.
I've got those logs. Unfortunately, hoping for more details. The logs do
show the test failure, but no details in failed-sshd.log indicating a
specific problem. Some of the formats are a bit wonky but I can live with
those. Since you mentioned NFS, which could have been constrained by UDP
packet sizes, and this platform sometimes cannot read beyond 56Kb off disk
(in some situations), it may be the read of regress/data that is failing.
Any pointers where that is hiding so that I can verify that data is actually
getting into sshd from the disk?

Cheers,
Randall
Darren Tucker
2016-02-08 22:05:13 UTC
Permalink
On Tue, Feb 9, 2016 at 8:51 AM, Randall S. Becker
<***@nexbridge.com> wrote:
[...]
Post by Randall S. Becker
I've got those logs. Unfortunately, hoping for more details.
Me too. For example, a copy of those logs :-)
Post by Randall S. Becker
The logs do
show the test failure, but no details in failed-sshd.log indicating a
specific problem. Some of the formats are a bit wonky but I can live with
those. Since you mentioned NFS, which could have been constrained by UDP
packet sizes, and this platform sometimes cannot read beyond 56Kb off disk
(in some situations), it may be the read of regress/data that is failing.
Any pointers where that is hiding so that I can verify that data is actually
getting into sshd from the disk?
Unfortunately debugging test failures is a pain that I don't have a
good answer too.

One thing I do is edit regress/test-exec.sh to add a "set -x" at the
top and "exit 1" into the fail() function. Now when you run the test
at the point it fails you'll have the exact command that failed and
exits without cleaning up, leaving the keys and configs in place so
you can rerun the command in you shell, playing around with
truss/strace/whatever your platform has.
--
Darren Tucker (dtucker at zip.com.au)
GPG key 8FF4FA69 / D9A3 86E9 7EEE AF4B B2D4 37C9 C982 80C7 8FF4 FA69
Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.
Randall S. Becker
2016-02-08 22:23:36 UTC
Permalink
Post by Darren Tucker
On Tue, Feb 9, 2016 at 8:51 AM, Randall S. Becker
[...]
Post by Randall S. Becker
I've got those logs. Unfortunately, hoping for more details.
Me too. For example, a copy of those logs :-)
Post by Randall S. Becker
The logs do
show the test failure, but no details in failed-sshd.log indicating a
specific problem. Some of the formats are a bit wonky but I can live
with those. Since you mentioned NFS, which could have been constrained
by UDP packet sizes, and this platform sometimes cannot read beyond
56Kb off disk (in some situations), it may be the read of regress/data that is
failing.
Post by Randall S. Becker
Any pointers where that is hiding so that I can verify that data is
actually getting into sshd from the disk?
Unfortunately debugging test failures is a pain that I don't have a good
answer too.
One thing I do is edit regress/test-exec.sh to add a "set -x" at the top and
"exit 1" into the fail() function. Now when you run the test at the point it fails
you'll have the exact command that failed and exits without cleaning up,
leaving the keys and configs in place so you can rerun the command in you
shell, playing around with truss/strace/whatever your platform has.
Tracked down to a bad 'dd' implementation. I'm working on fixing that. Likely of this problem being OpenSSH just dropped to under 5% ;)

Thanks Darren.

Cheers,
Randall
Randall S. Becker
2016-02-09 00:19:09 UTC
Permalink
Subject: Re: [Bug] Regression problem in transfer.sh for OpenSSH 7.1 P2 on
HPE NSE above dd-size 32k
On Tue, Feb 9, 2016 at 6:36 AM, Randall S. Becker
[...]
Post by Randall S. Becker
Getting detailed debug logs of the regression test would also really
help here. Anyone have any pointers on where I can look to try to
resolve
this?
The logs should be in the regress directory. There should be ssh.log and
ssh.log for the currently running test and failed-ssh.log and
failed-sshd.log
containing the concatenation of the logs from any failed tests. You can
run
"make t-exec LTESTS=transfer" to run just that single test while you're
playing
around to save some time.
I have seen something similar on old FreeBSD systems when the test was run
on an NFS mount. I never figured out why this was, but running the test
on
local disk worked in that case.
So here's where the 5% comes in. The dd code explicitly checks SSIZE_MAX on
the platform and restricts the obs values that are legitimate. From limits.h
on NSE:
#define SSIZE_MAX 53248 /* max single I/O size, 52K */
The result is that dd is failing parameter checks above 32k, so 64k, 128k,
and 256k are invalid and should have been omitted on the platform in any
event. Is this a legit bug in the OpenSSH regression suite?

Cheers,
Randall

Loading...