| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
[libc] Implement sinf function that is correctly rounded to all rounding modes. Implement sinf function that is correctly rounded to all rounding modes. - We use a simple range reduction for pi/16 < |x| : Let k = round(x / pi) and y = (x/pi) - k. So k is an integer and -0.5 <= y <= 0.5. Then sin(x) = sin(y*pi + k*pi) = (-1)^(k & 1) * sin(y*pi) ~ (-1)^(k & 1) * y * P(y^2) where y*P(y^2) is a degree-15 minimax polynomial generated by Sollya with: > P = fpminimax(sin(x*pi)/x, [|0, 2, 4, 6, 8, 10, 12, 14|], [|D...|], [0, 0.5]); - Performance benchmark using perf tool from CORE-MATH project (https://gitlab.inria.fr/core-math/core-math/-/tree/master) on Ryzen 1700: Before this patch (not correctly rounded): $ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh sinf CORE-MATH reciprocal throughput : 17.892 System LIBC reciprocal throughput : 25.559 LIBC reciprocal throughput : 29.381 After this patch (correctly rounded): `` $ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh sinf CORE-MATH reciprocal throughput : 17.896 System LIBC reciprocal throughput : 25.740 LIBC reciprocal throughput : 27.872 LIBC reciprocal throughput : 20.012 (with -msse4.2 flag) LIBC reciprocal throughput : 14.244 (with -mfma flag) `` Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D123154 | 3 年前 | |
[libc] Fix 64-bit Apple ARM support and header includes Summary: Reviewers: sivachandra Subscribers: Differential Revision: https://reviews.llvm.org/D114236 | 4 年前 | |
[libc][math] Improved ExhaustiveTest performance. Previous implementation splits value ranges around threads. Because of very different performance of testing functions over different ranges, CPU utilization were poor. Current implementation split test range over small pieces and threads take the pieces when they finish with previous. Therefore the CPU load is constant during testing. Differential Revision: https://reviews.llvm.org/D128995 | 3 年前 | |
[libc][math] Improved ExhaustiveTest performance. Previous implementation splits value ranges around threads. Because of very different performance of testing functions over different ranges, CPU utilization were poor. Current implementation split test range over small pieces and threads take the pieces when they finish with previous. Therefore the CPU load is constant during testing. Differential Revision: https://reviews.llvm.org/D128995 | 3 年前 | |
[libc][math] Improved ExhaustiveTest performance. Previous implementation splits value ranges around threads. Because of very different performance of testing functions over different ranges, CPU utilization were poor. Current implementation split test range over small pieces and threads take the pieces when they finish with previous. Therefore the CPU load is constant during testing. Differential Revision: https://reviews.llvm.org/D128995 | 3 年前 | |
[libc][math] Improved ExhaustiveTest performance. Previous implementation splits value ranges around threads. Because of very different performance of testing functions over different ranges, CPU utilization were poor. Current implementation split test range over small pieces and threads take the pieces when they finish with previous. Therefore the CPU load is constant during testing. Differential Revision: https://reviews.llvm.org/D128995 | 3 年前 | |
[libc][math] Improved ExhaustiveTest performance. Previous implementation splits value ranges around threads. Because of very different performance of testing functions over different ranges, CPU utilization were poor. Current implementation split test range over small pieces and threads take the pieces when they finish with previous. Therefore the CPU load is constant during testing. Differential Revision: https://reviews.llvm.org/D128995 | 3 年前 | |
[libc][math] fmod/fmodf implementation. This is a implementation of find remainder fmod function from standard libm. The underline algorithm is developed by myself, but probably it was first invented before. Some features of the implementation: 1. The code is written on more-or-less modern C++. 2. One general implementation for both float and double precision numbers. 3. Spitted platform/architecture dependent and independent code and tests. 4. Tests covers 100% of the code for both float and double numbers. Tests cases with NaN/Inf etc is copied from glibc. 5. The new implementation in general 2-4 times faster for “regular” x,y values. It can be 20 times faster for x/y huge value, but can also be 2 times slower for double denormalized range (according to perf tests provided). 6. Two different implementation of division loop are provided. In some platforms division can be very time consuming operation. Depend on platform it can be 3-10 times slower than multiplication. Performance tests: The test is based on core-math project (https://gitlab.inria.fr/core-math/core-math). By Tue Ly suggestion I took hypot function and use it as template for fmod. Preserving all test cases. ./check.sh <--special|--worst> fmodf passed. CORE_MATH_PERF_MODE=rdtsc ./perf.sh fmodf results are GNU libc version: 2.35 GNU libc release: stable 21.166 <-- FPU 51.031 <-- current glibc 37.659 <-- this fmod version. | 3 年前 | |
[libc][math] Improved ExhaustiveTest performance. Previous implementation splits value ranges around threads. Because of very different performance of testing functions over different ranges, CPU utilization were poor. Current implementation split test range over small pieces and threads take the pieces when they finish with previous. Therefore the CPU load is constant during testing. Differential Revision: https://reviews.llvm.org/D128995 | 3 年前 | |
[libc][math] Improved ExhaustiveTest performance. Previous implementation splits value ranges around threads. Because of very different performance of testing functions over different ranges, CPU utilization were poor. Current implementation split test range over small pieces and threads take the pieces when they finish with previous. Therefore the CPU load is constant during testing. Differential Revision: https://reviews.llvm.org/D128995 | 3 年前 | |
[libc][math] Improved ExhaustiveTest performance. Previous implementation splits value ranges around threads. Because of very different performance of testing functions over different ranges, CPU utilization were poor. Current implementation split test range over small pieces and threads take the pieces when they finish with previous. Therefore the CPU load is constant during testing. Differential Revision: https://reviews.llvm.org/D128995 | 3 年前 | |
[libc][math] Improved ExhaustiveTest performance. Previous implementation splits value ranges around threads. Because of very different performance of testing functions over different ranges, CPU utilization were poor. Current implementation split test range over small pieces and threads take the pieces when they finish with previous. Therefore the CPU load is constant during testing. Differential Revision: https://reviews.llvm.org/D128995 | 3 年前 | |
[libc][math] Improved ExhaustiveTest performance. Previous implementation splits value ranges around threads. Because of very different performance of testing functions over different ranges, CPU utilization were poor. Current implementation split test range over small pieces and threads take the pieces when they finish with previous. Therefore the CPU load is constant during testing. Differential Revision: https://reviews.llvm.org/D128995 | 3 年前 | |
[libc] Implement sinf function that is correctly rounded to all rounding modes. Implement sinf function that is correctly rounded to all rounding modes. - We use a simple range reduction for pi/16 < |x| : Let k = round(x / pi) and y = (x/pi) - k. So k is an integer and -0.5 <= y <= 0.5. Then sin(x) = sin(y*pi + k*pi) = (-1)^(k & 1) * sin(y*pi) ~ (-1)^(k & 1) * y * P(y^2) where y*P(y^2) is a degree-15 minimax polynomial generated by Sollya with: > P = fpminimax(sin(x*pi)/x, [|0, 2, 4, 6, 8, 10, 12, 14|], [|D...|], [0, 0.5]); - Performance benchmark using perf tool from CORE-MATH project (https://gitlab.inria.fr/core-math/core-math/-/tree/master) on Ryzen 1700: Before this patch (not correctly rounded): $ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh sinf CORE-MATH reciprocal throughput : 17.892 System LIBC reciprocal throughput : 25.559 LIBC reciprocal throughput : 29.381 After this patch (correctly rounded): `` $ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh sinf CORE-MATH reciprocal throughput : 17.896 System LIBC reciprocal throughput : 25.740 LIBC reciprocal throughput : 27.872 LIBC reciprocal throughput : 20.012 (with -msse4.2 flag) LIBC reciprocal throughput : 14.244 (with -mfma flag) `` Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D123154 | 3 年前 | |
[libc] Fix 64-bit Apple ARM support and header includes Summary: Reviewers: sivachandra Subscribers: Differential Revision: https://reviews.llvm.org/D114236 | 4 年前 |
| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
| 3 年前 | ||
| 4 年前 | ||
| 3 年前 | ||
| 3 年前 | ||
| 3 年前 | ||
| 3 年前 | ||
| 3 年前 | ||
| 3 年前 | ||
| 3 年前 | ||
| 3 年前 | ||
| 3 年前 | ||
| 3 年前 | ||
| 3 年前 | ||
| 3 年前 | ||
| 4 年前 |