Age | Commit message (Collapse) | Author |
|
The no_regression_p() predicate function is called from two steps:
check_regression and update_baseline. While we want no_regression_p
to output regression info during check_regression step, we don't want
that during update_baseline (during update_baseline no_regression_p
is called to compare current results against pre-baseline ones).
To fix the above we generate regression information in artifacts
directory of the current step, and copy it to the top-level only
during check_regression step.
Change-Id: Ib03d82f7afe60c406c7942f2fa086855df371171
|
|
Change-Id: I316f363139e1153539205a11e916df5da6f9d552
|
|
Annotations provide human-readable comments and data on regressions.
For this we ignore all lines in "results" file starting with "#".
Change-Id: I31328fe2c13895a7379137b00891a9d2fc0a39d7
|
|
... to be able to check whether problems occur with BFD and Gold linkers.
Change-Id: Ie7f9bb116cdc8d28c00483bb73c0a5aedbed5d11
|
|
... not just say it. Declare regressions at 1.5%+ executable size
increases.
Change-Id: I4f5a9aae08783bba0349cc5def7e12bc8ee9702c
|
|
no_regression_to_base_p
Fix thinko in second to last commit: we started using $new_artifacts
in no_regression_to_base_p(), but did not define it.
This patch defines it, as well as $ref_artifacts and replace $1 and $2
appropriately.
Change-Id: I6f24ae4144a9f876669fa1140175338aa5465c57
|
|
When a regression occurs after the build phase, don't include harmless
errors from the the build stage, and provide info from the test phase.
Change-Id: I75ce76e81ecbda4492bb49e7d765c25cac599f12
|
|
Change-Id: I37cfe3be1af1c7c76c96c9286ccfd378ce55ad38
|
|
... job into tcwg-benchmark-tk1 and tcwg-benchmark-tx1.
Change-Id: Ieccafcaa47fa522cc00d6ffd69fcc3494eb5a120
|
|
Ignore symbols in LTO results, and consider only DSOs with 10% and
above contribution to profile.
Change-Id: I14d003baf0c9ba7fbf938657b84048af99eaef73
|
|
Change-Id: Iad249faa9f2299f064251e95c82cf2cc638ef2e6
|
|
Change-Id: If0f2a847aeca409bc7c7833482c6808bc45aec10
|
|
With recent changes to bmk-scripts we can handle failed benchmarks.
Change-Id: Ice28da9d95a455cb9de5f76e24afdfb79a47dae9
|
|
Change-Id: I68c6296bea24fe795375dc583172efdfac2a8536
|
|
Change-Id: If24c9ee3a8d42224284905f6d1a1fd6a9748874e
|
|
Change-Id: Ib8ec48ad9670f722df57c844ce44f9936c0f96b9
|
|
We trigger bisect for regressions that made it to score 0 and above.
For tcwg_gnu and tcwg_bmk builds we have an off-by-1 error at the
moment.
E.g., for benchmarking we bisect problems only when
benchmarking step has successfully finish, which ignores benchmarking
build failures and miscompilations.
Similarly, for tcwg_gnu we ignore bootstrap failures and bisect only
testsuite regressions.
This patch fixes that.
Change-Id: I35412788b243d7dc7098ea0d6fd3b200c1df3e32
|
|
Now that we have tcwg_bmk_tk1 and tcwg_bmk_tx1 bmk projects.
Change-Id: I1d783379fde9b8c7d76cc44abab3a775db5bdeb0
|
|
Change-Id: Ie3c2f99e7ec5359d21529db659ac857d3a7750cf
|
|
Change-Id: I74d7b740d8a8e0807fe27024c95c2f0102fb4c22
|
|
... which happens when benchmark has several symbols regressed.
Change-Id: Ife19a7a70fe20ef884fd6184f59efdb4eb698518
|
|
... for several reasons:
1. The first reason is to avoid picking up noise from the rest
of benchmarks and mistakenly mark "noisy" revisions bad. In particular,
bi-modal benchmarks tend to interfere with bisections.
2. The second reason is to avoid picking up regressions that since have
been fixed.
3. The last reason is to speed up bisections. For -O2/-O3 benchmarking
we run 3-4 benchmarks sequentially, and reducing that to 1 gives us 3x-4x
improvement.
Change-Id: Iad6275e3b3efe50ed804f79426a1f94e09cd49bb
|
|
... during 2 months between gcc-8-branch and gcc-9-branch.
Change-Id: I1fd2187bf2521d32a45ada0e59b64f8fe1bbe972
|
|
Update missed uses of ${cflags[@]}.
Change-Id: I415e4d1298344bcb9e38d560ffd8928e60786755
|
|
... as part of ci_config setting. E.g., for "-O3 -funroll-loops" one
could use gnu-master-aarch64-spec2k6-O3_funroll-loops.
Change-Id: I666c708e79070895e3bfd4b10de7484823c5ccdb
|
|
... and fix tcwg_bmk-build.sh's hook along the way. Bash's read()
accepts only complete lines, so we need to add newlines (using echo).
Change-Id: I62950c598872dd23bef4aea3d981f09c1fa14945
|
|
... down to a component. E.g., in tcwg_bmk-gnu-* builds we track
binutils, gcc and glibc, but most regressions happen due to gcc
changes. Therefore instead of splitting binutils+gcc+glibc build
into builds for binutils, gcc and glibc it is better to trigger
binutils+glibc and gcc builds. If the regression does occur in
binutils+glibc build, then it will be split into binutils and glibc
builds.
Change-Id: Ifd2effea69fe17b3271645f9a6c1e482cb79ceac
|
|
Change-Id: If952d6851add543bbbd149968c6279a484c4cfac
|
|
... for handling unexpected inputs. Fail when "$metric" or
"$result" is in unexpected format (not a number).
Change-Id: I0e113e6dfc4e0e60f76665ae3fed6c690b5a0dc8
|
|
Turns out that shell pipe creates a subshell, and, therefore,
all environment changes in the 2nd part of pipe are gone as soon
as command finishes. I.e.,
===
str=foobar
echo hello | read str
echo $str
===
will print "foobar".
In tcwg_bmk-build.sh we use "tail | while read; do ...; status=1; done"
to set $status to "1" if we detect regressions. Turns out this doesn't
work, due to "status=1" being executed in a subshell.
Rewrite using process substitution.
Good summary on bash subshells: https://unix.stackexchange.com/questions/442692/is-a-subshell
Change-Id: I69949b38802dfba445e2a1dc30bc696cee38bb15
|
|
When we pass ${rr[top_artifacts]}/results_id to "run_step ... benchmark"
we end up with artifacts directory for this step something like
artifacts/11-benchmark-.../home/$USER/<path>/.../results_id/ -- because
we include all arguments to the run_step in the pretty name of the step.
This patch makes $pretty_step name absorb arguments up to "--" in its
name, and also flattens any paths by changing "/" to "-".
Change-Id: I700818409337b9884284d785641edae4db419462
|
|
... e.g., for Os-vs-Os_LTO and O2-vs-O2_LTO configurations.
We build a single toolchain and then run 2 sets of benchmarks --
one for Opt1 flags and another for Opt2 flags. We then compare
Opt1 results vs Opt2 results at the binary level (comparing at
symbol level doesn't make sense at least for LTO). We expect
Opt2 results to be no worse than Opt1 results.
The regression is defined as Opt2 results for benchmark is no
worse that Opt1 results in base-artifacts, but Opt2 results for
same benchmarks are worse than Opt1 results in current artifacts.
Change-Id: I73bc6fe8c9ae2aa40c5f3910e41a4eae8138c878
|
|
... to be more modular. Logic to compare CSVs is moved to
compare_results(), which outputs result "1" for no CSV entries
with no regression, and "100" for regressed CSV entries.
Use no_build_regression_p() helper to check for build regressions.
Change-Id: Ifaec214843b052bc2fb5e280b0f5e6fbcdf6574f
|
|
... which was ignored due to symbol being padded with spaces at the end
for better formatting by csvs2table.py.
Change-Id: Id391e8f77e5c718998a809633a3a8a98d8969191
|
|
... after adding more fields to output from bmk-scripts/csvs2table.py.
The extra fields all ended up in the "$size" variable.
Change-Id: Iabeee2f6b0892d9e27b047be8ea273bc04bc7fc4
|
|
... from top-level artifacts. This is preparation to run two
benchmarking steps to compare non-LTO vs LTO results.
Change-Id: I45e5658513ce2eda92fa30f165df35f0ba9200f5
|
|
... to improve consistency of results.
Change-Id: Id5e9034e79f9c35544fcb694f61b447b6272a01c
|
|
... so that it'll be easier to analyze history of bmk results.
Change-Id: Ib7188df3d61f045c12d2759b50f19b039f85b8b9
|
|
... to a file.
Change-Id: I205efe0a039be84b1674609ea076f4349dab33db
|
|
Change-Id: Ia34ac0f4998644c593684e3d3239d5f82f21e4d4
|
|
... by putting results to bash array ${results[@]}.
Change-Id: I8e189f6f14578e8a96a33c0beb2dc8ae13ca2ef2
|
|
Change-Id: I55457f26521b9bbd7e5b4dd30c9828c94fcebf3f
|
|
Change-Id: Ifb27567fafa37df78fb979eccf7a14f990dbc637
|
|
Update to launch benchmarking job with new labels. Note that this will
also change the paths to the results directories.
Change-Id: Ia31d208b75f331d61a0d523fb9c170aaf6f39c5e
|
|
For GCC this is done by default, for LLVM --target=armv7a-linux-gnueabihf
switches from Thumb1 to Thumb2.
Change-Id: I3a42df06090067988936f0ae63b95b41c0e1eda5
|
|
Change-Id: I38b92233e5b45a32787f197694d05cfec2d0c8ae
|
|
... on AArch32 for LLVM in -Os and -Oz configurations.
These are miscompiled and take 5+ hours to fail.
See https://projects.linaro.org/browse/LLVM-592 for details.
Change-Id: Id059fe1cb34809a03dccea00aa2c0aafc532fe56
|
|
... to get less false positives.
Change-Id: Iacf660810fe30a9ccae6137e948671fdea6abc76
|
|
... because with static linking we import -O2-built glibc,
which accounts for 50-80% of final binary size.
Change-Id: I3357832783a100da004c27180da56eeabb5d9a0f
|
|
Pass arguments to build_abe() before "--" and extra arguments
to various components after "--".
Change-Id: I35d56a2eecc5bc710d81e4cf02da1bdbbc67cdea
|