This is a patch to implement pr30640.
When a 64bit constant has the same hi/lo words, we can use rldimi to copy the low word into high word of the same register.
This optimization caused failure of test case bperm.ll because of not optimal heuristic in function SelectAndParts64. It chooses AND or ROTATE to extract bit groups from a register, and OR them together. This optimization lowers the cost of loading 64bit constant mask used in AND method, and causes different code sequence. But actually ROTATE method is better in this test case. The reason is in ROTATE method the final OR operation can be avoided since rldimi can insert the rotated bits into target register directly. So this patch also enhances SelectAndParts64 to prefer ROTATE method when the two methods have same cost and there are multiple bit groups need to be ORed together.
Differential Revision: https://reviews.llvm.org/D25521
llvm-svn: 284276
Eli noted this potential bug in the post-commit thread for:
https://reviews.llvm.org/rL284239
...but I'm not sure how to trigger it, so there's no test case yet.
llvm-svn: 284268
Summary:
We are using this helper for our 24-bit arithmetic combines, so we are now able to eliminate multi-use operations that mask the high-bits of 24-bit inputs (e.g. and x, 0xffffff)
Reviewers: arsenm, nhaehnle
Subscribers: tony-tye, arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl
Differential Revision: https://reviews.llvm.org/D24672
llvm-svn: 284267
Summary:
The main purpose of this new helper is to enable simplifying operations that
have multiple uses. SimplifyDemandedBits does not handle multiple uses
currently, and this new function makes it possible to optimize:
and v1, v0, 0xffffff
mul24 v2, v1, v1 ; Multiply ignoring high 8-bits.
To:
mul24 v2, v0, v0
Where before this would not be optimized, because v1 has multiple uses.
Reviewers: bogner, arsenm
Subscribers: nhaehnle, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D24964
llvm-svn: 284266
X86. The pass optimizes as a unit the entire wide load + shuffles pattern
produced by interleaved vectorization. This initial patch optimizes one pattern
(64-bit elements interleaved by a factor of 4). Future patches will generalize
to additional patterns.
Patch by Farhana Aleen
Differential revision: http://reviews.llvm.org/D24681
llvm-svn: 284260
Use PackedRegisterRef to store the register information in the graph nodes.
This commit also removes support for virtual registers. It has never been
tested or used. It will be possible to add it back if there is a need.
llvm-svn: 284255
Add support for loading multiple coverage readers into a single
CoverageMapping instance. This should make it easier to prepare a
unified coverage report for multiple binaries.
Differential Revision: https://reviews.llvm.org/D25535
llvm-svn: 284251
This is an improvement when compiling with llvm. llvm doesn't inline
the call to insert, so the align is always executed and shows up in
the profile.
With gcc the call to insert is inlined and the align computation moved
and done only if needed.
With this patch we explicitly only compute it if it is needed.
In the two tests with debug info, the speedup was
scylla
master 3.008959365
patch 2.932080942 1.02621974786x faster
firefox
master 6.709823604
patch 6.592387227 1.01781393795x faster
In all others the difference was in the noise.
llvm-svn: 284249
This change adds transformations such as:
zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0))))
To:
srl(or(ctlz(x), ctlz(y)), log2(bitsize(x))
This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput.
Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it.
For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar.
Differential Revision: https://reviews.llvm.org/D23446
llvm-svn: 284248
Summary:
* Describe new (3.3) parameter attribute group encoding, leaving old encoding there with a note about legacy
* Bring TYPE_BLOCK docs up to date
* Remove docs about obsolete (pre 3.0) TYPE_SYMTAB_BLOCK, TST_CODE_ENTRY
* Fix a couple of incorrect comments and remove one unused enum definition along the way
This addresses https://llvm.org/bugs/show_bug.cgi?id=28941.
Patch by: Ismail Badawi <ibadawi@cisco.com>
Differential Revision: https://reviews.llvm.org/D25623
llvm-svn: 284246
This test was apparently checking for 2 independent folds, but we have
plenty of tests for those individual folds already. We are lacking
vector tests, however, because we don't have the shift folds for vectors.
llvm-svn: 284243
Prefer add/zext because they are better supported in terms of value-tracking.
Note that the backend should be prepared for this IR canonicalization
(including vector types) after:
https://reviews.llvm.org/rL284015
Differential Revision: https://reviews.llvm.org/D25135
llvm-svn: 284241
Summary:
This will be used for 64-bit MULHU, which is in turn used for the 64-bit
divide-by-constant optimization (see D24822).
Reviewers: arsenm, tstellarAMD
Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D25289
llvm-svn: 284224
Mostly this just means changing the triple from aarch64-apple-ios to the generic
aarch64--. Only one test needs more significant changes, but GlobalISel already
does the right thing so it's ok to just change the checks.
Differential Revision: https://reviews.llvm.org/D25532
llvm-svn: 284223
For compatiblity with binutils, define these instructions to take
two registers with a 16bit unsigned immediate. Both of the registers
have to be same for dahi and dati.
Reviewers: dsanders, zoran.jovanovic
Differential Review: https://reviews.llvm.org/D21473
llvm-svn: 284218
Committing in the name of Ziv Izhar: After check-all and LGTM .
The following patch is for compatability with Microsoft.
Microsoft ignores the keyword "short" when used after a jmp, for example:
__asm {
jmp short label
label:
}
A test for that patch will be added in another patch, since it's located in clang's codegen tests. Link will be added shortly.
link to test: https://reviews.llvm.org/D24958
Differential Revision: https://reviews.llvm.org/D24957
llvm-svn: 284211
This will be needed by a future commit to support sign/zero extending from v8i8 to v8i64 which requires a sign/zero_extend_vector_inreg to be created which requires v8i8 to be concatenated upto v64i8 and goes through this code.
llvm-svn: 284204
Added relocation names:
- R_AMDGPU_GOTPCREL32_LO
- R_AMDGPU_GOTPCREL32_HI
- R_AMDGPU_REL32_LO
- R_AMDGPU_REL32_HI
AMDGPU isa only supports 32-bit immediates. In order to access 64-bit address we need to generate 32-bit lo/hi relocations, and do the right math (separate patch). Currently we only generate one 32 bit relocation for lower bits for each access, losing higher bits. Hence we need relocations listed above.
Differential Revision: https://reviews.llvm.org/D25546
llvm-svn: 284191
Summary:
This will be used by ThinLTO to set the amount of backend
parallelism, which performs better when restricted to the number
of physical cores (on X86 at least, where getHostNumPhysicalCores is
currently defined). If not available this falls back to
thread::hardware_concurrency.
Note I didn't add to the thread class since that is a typedef to
std::thread where available.
Reviewers: mehdi_amini
Subscribers: beanz, llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D25585
llvm-svn: 284180
Windows itanium is identical to MSVC when dealing with everything but C++.
Lower the math routines into msvcrt rather than compiler-rt.
llvm-svn: 284175
Windows itanium is equivalent to MSVC except in C++ mode. Ensure that the
promote the 32-bit floating point operations to their 64-bit equivalences.
llvm-svn: 284173