llvm-project/libc/test/src/math/exhaustive · Cangjie/llvm-project - AtomGit

TTue Ly[libc] Implement sinf function that is correctly rounded to all rounding modes.

d883a4ad创建于 2022年7月22日历史提交

文件	最后提交记录	最后更新时间
CMakeLists.txt	[libc] Implement sinf function that is correctly rounded to all rounding modes. Implement sinf function that is correctly rounded to all rounding modes. - We use a simple range reduction for `pi/16 < \|x\|` : Let `k = round(x / pi)` and `y = (x/pi) - k`. So `k` is an integer and `-0.5 <= y <= 0.5`. Then `sin(x) = sin(ypi + kpi) = (-1)^(k & 1) * sin(ypi) ~ (-1)^(k & 1) y * P(y^2)` where `yP(y^2)` is a degree-15 minimax polynomial generated by Sollya with: `> P = fpminimax(sin(xpi)/x, [\|0, 2, 4, 6, 8, 10, 12, 14\|], [\|D...\|], [0, 0.5]);` - Performance benchmark using perf tool from CORE-MATH project (https://gitlab.inria.fr/core-math/core-math/-/tree/master) on Ryzen 1700: Before this patch (not correctly rounded): `$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh sinf CORE-MATH reciprocal throughput : 17.892 System LIBC reciprocal throughput : 25.559 LIBC reciprocal throughput : 29.381` After this patch (correctly rounded): `` `$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh sinf CORE-MATH reciprocal throughput : 17.896 System LIBC reciprocal throughput : 25.740 LIBC reciprocal throughput : 27.872 LIBC reciprocal throughput : 20.012 (with` -msse4.2 `flag) LIBC reciprocal throughput : 14.244 (with` -mfma `flag)` `` Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D123154	3 年前
cosf_test.cpp	[libc] Fix 64-bit Apple ARM support and header includes Summary: Reviewers: sivachandra Subscribers: Differential Revision: https://reviews.llvm.org/D114236	4 年前
exhaustive_test.cpp	[libc][math] Improved ExhaustiveTest performance. Previous implementation splits value ranges around threads. Because of very different performance of testing functions over different ranges, CPU utilization were poor. Current implementation split test range over small pieces and threads take the pieces when they finish with previous. Therefore the CPU load is constant during testing. Differential Revision: https://reviews.llvm.org/D128995	3 年前
exhaustive_test.h	[libc][math] Improved ExhaustiveTest performance. Previous implementation splits value ranges around threads. Because of very different performance of testing functions over different ranges, CPU utilization were poor. Current implementation split test range over small pieces and threads take the pieces when they finish with previous. Therefore the CPU load is constant during testing. Differential Revision: https://reviews.llvm.org/D128995	3 年前
exp2f_test.cpp	[libc][math] Improved ExhaustiveTest performance. Previous implementation splits value ranges around threads. Because of very different performance of testing functions over different ranges, CPU utilization were poor. Current implementation split test range over small pieces and threads take the pieces when they finish with previous. Therefore the CPU load is constant during testing. Differential Revision: https://reviews.llvm.org/D128995	3 年前
expf_test.cpp	[libc][math] Improved ExhaustiveTest performance. Previous implementation splits value ranges around threads. Because of very different performance of testing functions over different ranges, CPU utilization were poor. Current implementation split test range over small pieces and threads take the pieces when they finish with previous. Therefore the CPU load is constant during testing. Differential Revision: https://reviews.llvm.org/D128995	3 年前
expm1f_test.cpp	[libc][math] Improved ExhaustiveTest performance. Previous implementation splits value ranges around threads. Because of very different performance of testing functions over different ranges, CPU utilization were poor. Current implementation split test range over small pieces and threads take the pieces when they finish with previous. Therefore the CPU load is constant during testing. Differential Revision: https://reviews.llvm.org/D128995	3 年前
fmod_generic_impl_test.cpp	[libc][math] fmod/fmodf implementation. This is a implementation of find remainder fmod function from standard libm. The underline algorithm is developed by myself, but probably it was first invented before. Some features of the implementation: 1. The code is written on more-or-less modern C++. 2. One general implementation for both float and double precision numbers. 3. Spitted platform/architecture dependent and independent code and tests. 4. Tests covers 100% of the code for both float and double numbers. Tests cases with NaN/Inf etc is copied from glibc. 5. The new implementation in general 2-4 times faster for “regular” x,y values. It can be 20 times faster for x/y huge value, but can also be 2 times slower for double denormalized range (according to perf tests provided). 6. Two different implementation of division loop are provided. In some platforms division can be very time consuming operation. Depend on platform it can be 3-10 times slower than multiplication. Performance tests: The test is based on core-math project (https://gitlab.inria.fr/core-math/core-math). By Tue Ly suggestion I took hypot function and use it as template for fmod. Preserving all test cases. `./check.sh <--special\|--worst> fmodf` passed. `CORE_MATH_PERF_MODE=rdtsc ./perf.sh fmodf` results are `GNU libc version: 2.35 GNU libc release: stable 21.166 <-- FPU 51.031 <-- current glibc 37.659 <-- this fmod version.`	3 年前
hypotf_test.cpp	[libc][math] Improved ExhaustiveTest performance. Previous implementation splits value ranges around threads. Because of very different performance of testing functions over different ranges, CPU utilization were poor. Current implementation split test range over small pieces and threads take the pieces when they finish with previous. Therefore the CPU load is constant during testing. Differential Revision: https://reviews.llvm.org/D128995	3 年前
log10f_test.cpp	[libc][math] Improved ExhaustiveTest performance. Previous implementation splits value ranges around threads. Because of very different performance of testing functions over different ranges, CPU utilization were poor. Current implementation split test range over small pieces and threads take the pieces when they finish with previous. Therefore the CPU load is constant during testing. Differential Revision: https://reviews.llvm.org/D128995	3 年前
log1pf_test.cpp	[libc][math] Improved ExhaustiveTest performance. Previous implementation splits value ranges around threads. Because of very different performance of testing functions over different ranges, CPU utilization were poor. Current implementation split test range over small pieces and threads take the pieces when they finish with previous. Therefore the CPU load is constant during testing. Differential Revision: https://reviews.llvm.org/D128995	3 年前
log2f_test.cpp	[libc][math] Improved ExhaustiveTest performance. Previous implementation splits value ranges around threads. Because of very different performance of testing functions over different ranges, CPU utilization were poor. Current implementation split test range over small pieces and threads take the pieces when they finish with previous. Therefore the CPU load is constant during testing. Differential Revision: https://reviews.llvm.org/D128995	3 年前
logf_test.cpp	[libc][math] Improved ExhaustiveTest performance. Previous implementation splits value ranges around threads. Because of very different performance of testing functions over different ranges, CPU utilization were poor. Current implementation split test range over small pieces and threads take the pieces when they finish with previous. Therefore the CPU load is constant during testing. Differential Revision: https://reviews.llvm.org/D128995	3 年前
sinf_test.cpp	[libc] Implement sinf function that is correctly rounded to all rounding modes. Implement sinf function that is correctly rounded to all rounding modes. - We use a simple range reduction for `pi/16 < \|x\|` : Let `k = round(x / pi)` and `y = (x/pi) - k`. So `k` is an integer and `-0.5 <= y <= 0.5`. Then `sin(x) = sin(ypi + kpi) = (-1)^(k & 1) * sin(ypi) ~ (-1)^(k & 1) y * P(y^2)` where `yP(y^2)` is a degree-15 minimax polynomial generated by Sollya with: `> P = fpminimax(sin(xpi)/x, [\|0, 2, 4, 6, 8, 10, 12, 14\|], [\|D...\|], [0, 0.5]);` - Performance benchmark using perf tool from CORE-MATH project (https://gitlab.inria.fr/core-math/core-math/-/tree/master) on Ryzen 1700: Before this patch (not correctly rounded): `$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh sinf CORE-MATH reciprocal throughput : 17.892 System LIBC reciprocal throughput : 25.559 LIBC reciprocal throughput : 29.381` After this patch (correctly rounded): `` `$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh sinf CORE-MATH reciprocal throughput : 17.896 System LIBC reciprocal throughput : 25.740 LIBC reciprocal throughput : 27.872 LIBC reciprocal throughput : 20.012 (with` -msse4.2 `flag) LIBC reciprocal throughput : 14.244 (with` -mfma `flag)` `` Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D123154	3 年前
sqrtf_test.cpp	[libc] Fix 64-bit Apple ARM support and header includes Summary: Reviewers: sivachandra Subscribers: Differential Revision: https://reviews.llvm.org/D114236	4 年前