Tests FAIL on 32-bit architectures: "Simple profile unicast packet loss" test segfaults
Hello all,
forwarding Simon McVittie's corresponding Debian bug report here, with some inline comments (but omitting a Debian-specific final paragraph, and also cutting some parts that would otherwise by messed up by GitLab's interface):
The new version of librist in unstable is failing its build-time tests on all 32-bit architectures (armel, armhf, i386 and several ports architectures):
Here some pointers directly to the relevant build logs of the affected Debian release architectures:
General build log overview page
Summary of Failures:
3/21 libRIST:simple+unicast / Simple profile unicast packet loss 10% FAIL 0.05s killed by signal 11 SIGSEGV
4/21 libRIST:simple+unicast / Simple profile unicast packet loss 25% FAIL 0.03s killed by signal 11 SIGSEGV
It's likely to be easy to reproduce this on an amd64 machine by building the i386 package (debuild -ai386), either cross-compiling or in an i386 chroot, container or VM. This is a regression since 0.2.7+dfsg-1, so I would suggest starting by reviewing the differences between those versions.
JFTR, this failure can indeed easily be reproduced on i386, e.g. in a Docker container as follows:
# create and enter a Docker instance
docker run -it --rm --name librist_test i386/debian:sid
# inside that instance add the required package sources
sed -e 's#deb$#deb-src#' /etc/apt/sources.list.d/debian.sources > /etc/apt/sources.list.d/debian-src.sources
apt update
apt -y upgrade
# install librist build dependencies and provide librist sources
apt -y build-dep librist
cd $HOME
apt source librist
cd librist-0.2.8+dfsg/
# start building
dpkg-buildpackage -us -uc
This could be related to one of these compiler warnings, which indicate assumptions in the source code that are not true on ILP32 architectures:
../tools/srp_shared.c: In function ‘user_verifier_lookup’:
../tools/srp_shared.c:176:43: warning: left shift count >= width of type [-Wshift-count-overflow]
176 | *generation = (buf.st_mtim.tv_sec << 32) | buf.st_mtim.tv_nsec;
../tools/ristreceiver.c: In function ‘cb_stats’:
../tools/ristreceiver.c:446:86: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
446 | rist_prometheus_parse_stats(prom_stats_ctx, stats_container, (uint64_t)arg);
../tools/ristsender.c: In function ‘setup_rist_peer’:
../tools/ristsender.c:447:74: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
447 | if (rist_stats_callback_set(ctx, setup->statsinterval, cb_stats, (void*)w->id) == -1) {
Or it could be something more subtle that gcc is unable to diagnose, like type-size assumptions in a varargs function.
JFTR, I haven't yet spent any more time investigating this.
(It also failed to build on arm64, but that seems to be unrelated: a different test failed there with an assertion failure, that test has failed similarly in the past, and it doesn't fit the pattern of 64-bit architectures succeeding but 32-bit failing. That's out of scope for this particular bug report.)
This refers to another SIGABRT experienced on arm64 in the past, cf. Build logs for librist on arm64.
Overall I don't know how much time I can spend digging any deeper into this, hence reporting this issue just to let you know.
Best regards, Flo