On some targets, the penalty of executing runtime unrolling checks
and then not the unrolled loop can be significantly detrimental to
performance. This results in the need to be more conservative with
the unroll count, keeping a trip count of 2 reduces the overhead as
well as increasing the chance of the unrolled body being executed. But
being conservative leaves performance gains on the table.
This patch enables the unrolling of the remainder loop introduced by
runtime unrolling. This can help reduce the overhead of misunrolled
loops because the cost of non-taken branches is much less than the
cost of the backedge that would normally be executed in the remainder
loop. This allows larger unroll factors to be used without suffering
performance loses with smaller iteration counts.
Differential Revision: https://reviews.llvm.org/D36309
llvm-svn: 310824
causing compile time issues.
Moreover, the patch *deleted* the flag in addition to changing the
default, and links to a code review that doesn't even discuss the flag
and just has an update to a Clang test case.
I've followed up on the commit thread to ask for numbers on compile time
at this point, leaving the flag in place until things stabilize, and
pointing at specific code that seems to exhibit excessive compile time
with this patch.
Original commit message for r310583:
"""
[ValueTracking] Enabling ValueTracking patch by default (recommit). Part 2.
The original patch was an improvement to IR ValueTracking on
non-negative integers. It has been checked in to trunk (D18777,
r284022). But was disabled by default due to performance regressions.
Perf impact has improved. The patch would be enabled by default.
""""
llvm-svn: 310816
Summary:
In Python 2, calling `dict.items()` returns an indexable `list`, whereas
on Python 3 it returns a set-like `dict_items` object, which cannot be
indexed. Explicitly onvert the `dict_items` object so that it can be
indexed when using Python 3.
In combination with D36622, D36623, and D36624, this change allows
`opt-viewer.py` to exit successfully when run with Python 3.4.
Test Plan:
Run `opt-viewer.py` using Python 3.4 and confirm it does not encounter a
runtime error when when indexing into `dict.items()`.
Reviewers: anemet
Reviewed By: anemet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36630
llvm-svn: 310810
introduce a miscompile bug.
There appears to be a bug where the generated code to extract the sign
bit doesn't work correctly for 32-bit inputs. I've replied to the
original commit pointing out the problem. I think I see by inspection
(and reading the manual for PPC) how to fix this, but I can't be 100%
confident and I also don't know what the best way to test this is.
Currently it seems nearly impossible to get the backend to hit this code
path, but the patch autohr is likely in a better position to craft such
test cases than I am, and based on where the bug is it should be easily
done.
Original commit message for r310346:
"""
[PowerPC] Eliminate compares - add i32 sext/zext handling for SETLE/SETGE
Adds handling for SETLE/SETGE comparisons on i32 values. Furthermore, it
adds the handling for the special case where RHS == 0.
Differential Revision: https://reviews.llvm.org/D34048
"""
llvm-svn: 310809
Summary:
These functions were overly complicated. The body of this function was rechecking for an And operation to find the constant, but we already knew we were looking at two Ands ORed together and the pieces are in variables. We already had earlier nearby code that checked for ConstantInts. So just inline the remaining parts into the earlier code.
Next step is to use m_APInt instead of ConstantInt.
Reviewers: spatel, efriedma, davide, majnemer
Reviewed By: spatel
Subscribers: zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D36439
llvm-svn: 310806
This allows using semicolons for bundling up more than one
statement per line. This is used within the mingw-w64 project in some
assembly files that contain code for multiple architectures.
Differential Revision: https://reviews.llvm.org/D36366
llvm-svn: 310797
It was mistakenly added to that list in D23560 (committed in rL285712). RISCV
is an experimental backend and should never have been in that list, I
mistakenly interpreted LLVM_ALL_TARGETS as a list of all targets rather than
targets to build by default. Unfortunately, because of this the RISCV backend
has been building by default when it shouldn't be.
This commet adds a description comment, which should help to avoid such
mistakes in the future.
See my message to llvm-dev for more information and analysis
<http://lists.llvm.org/pipermail/llvm-dev/2017-August/116347.html>.
Differential Revision: https://reviews.llvm.org/D36538
llvm-svn: 310796
Previously it would not return true for extracting either of the upper quarters of a 512-bit registers.
For mask registers we support extracting anything from index 0. And otherwise we only support extracting the upper half of a register.
Differential Revision: https://reviews.llvm.org/D36638
llvm-svn: 310794
Summary:
Without the SrcVT its hard to know what is really being asked for. For example if your target has 128, 256, and 512 bit vectors. Maybe extracting 128 from 256 is cheap, but maybe extracting 128 from 512 is not.
For x86 we do support extracting a quarter of a 512-bit register. But for i1 vectors we don't have isel patterns for extracting arbitrary pieces. So we need this to have a correct implementation of isExtractSubvectorCheap for mask vectors.
Reviewers: RKSimon, zvi, efriedma
Reviewed By: RKSimon
Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D36649
llvm-svn: 310793
This is a continuation patch for commit r307529 which completely replaces the scheduling information for the SandyBridge architecture target by modifying the file X86SchedSandyBridge.td located under the X86 Target (see also https://reviews.llvm.org/D35019).
In this patch we added the scheduling information of additional SNB instructions that were missing from the patch commit r307529, fixed the scheduling of several resource groups that include only port0 instead of port05 (i.e., port0 OR port5) and fixed several incorrect instructions' scheduling in the r307529 commit.
The patch also includes the X87 instructions which were missing in previous patch commit r307529 as reported in bugzilla bug 34080.
Reviewers: zvi, RKSimon, chandlerc, igorb, m_zuckerman, craig.topper, aymanmus, dim
Differential Revision: https://reviews.llvm.org/D36388
llvm-svn: 310792
An existing test should have covered this but a typo caused it to fail. I've kept both as the codegen for the typo case needs addressing as well.
llvm-svn: 310791
K0 isn't expected as a write-mask, so provide a detailed error here, instead of the more generic one (invalid op for insn)
Conforms with gas
Differential Revision: https://reviews.llvm.org/D36570
llvm-svn: 310789
Add an X86 combine for TESTM when one of the operands is a BUILD_VECTOR(0,0,...).
TESTM op0, BUILD_VECTOR(0,0,...) -> BUILD_VECTOR(0,0,...)
TESTM BUILD_VECTOR(0,0,...), op1 -> BUILD_VECTOR(0,0,...)
Differential Revision:
https://reviews.llvm.org/D36536
llvm-svn: 310787
Summary:
Previously we were creating the flag result with MVT::Other which is interpretted as a Chain node. If we used a memory form of the instruction we would end up with a copyToReg that consumed the chain result of the adcx instruction instead of the flag result.
Pretty sure we should be using MVT::i32 here, that's what we do other places we create these node types.
We should probably consider this for 5.0 as well.
Reviewers: RKSimon, zvi, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36645
llvm-svn: 310784
If all the operands of a BUILD_VECTOR extract elements from same vector then split the vector efficiently based on the maximum vector access index.
Reapplied with fix to only work with simple value types.
Committed on behalf of @jbhateja (Jatin Bhateja)
Differential Revision: https://reviews.llvm.org/D35788
llvm-svn: 310782
Summary:
isThumb returns true for Thumb triples (little and big endian), isARM
returns true for ARM triples (little and big endian).
There are a few more checks using arm/thumb that are not covered by
those functions, e.g. that the architecture is either ARM or Thumb
(little endian) or ARM/Thumb little endian only.
Reviewers: javed.absar, rengolin, kristof.beyls, t.p.northover
Reviewed By: rengolin
Subscribers: llvm-commits, aemerson
Differential Revision: https://reviews.llvm.org/D34682
llvm-svn: 310781
An `external_weak` global may be intended to resolve as a null pointer if it's
not defined, so it doesn't make sense to use a copy relocation for it.
Differential Revision: https://reviews.llvm.org/D36604
llvm-svn: 310773
As noted in the test comment, instcombine now produces the masked
shift value even when it's not included in the source, so we should
handle this.
Although the AMD/Intel docs don't say it explicitly, over-rotating
the narrow ops produces the same results. An existence proof that
this works as expected on all x86 comes from gcc 4.9 or later:
https://godbolt.org/g/K6rc1A
llvm-svn: 310770
Summary:
The stack alignment depends on the ABI (16 bytes for N32 and N64 and 8
bytes for O32), not the CPU type.
Reviewers: sdardis
Reviewed By: sdardis
Subscribers: atanasyan, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D36326
llvm-svn: 310768
Updating remark API to newer OptimizationDiagnosticInfo API. This
allows remarks to show up in diagnostic yaml file, and enables use
of opt-viewer tool.
Hotness information for remarks (L505 and L751) do not display hotness
information, most likely due to profile information not being
propagated yet. Unsure if this is the desired outcome.
Patch by Tarun Rajendran.
Differential Revision: https://reviews.llvm.org/D36127
llvm-svn: 310763
Summary:
Previously we would use these instructions if sse was disabled and fastmath was enabled.
As mentioned in D28335, this is a bad idea.
Reviewers: efriedma, scanon, DavidKreitzer
Reviewed By: DavidKreitzer
Subscribers: zvi, llvm-commits
Differential Revision: https://reviews.llvm.org/D36344
llvm-svn: 310762
When the access to a weak symbol is not a call, the access has to be
able to produce the value 0 at runtime.
We were sometimes producing code sequences where that was not possible
if the code was leaded more than 4g away from 0.
llvm-svn: 310756
PDBs need to contain 1 module for each object file/compiland,
and a special one synthesized by the linker. This one contains
a symbol record for each output section in the executable with
its address information. This patch adds such symbols to the
linker module. Note that we also are supposed to add an
S_COFFGROUP symbol for what appears to be each input section that
contributes to each output section, but it's not entirely clear
how to generate these yet, so I'm leaving that for a separate
patch.
llvm-svn: 310754
Two of the Windows bots are failing test\CodeGen\X86\GlobalISel\select-inc.mir
which should not have been affected by the change. Reverting while I investigate.
Also reverted r310735 because it builds on r310716.
llvm-svn: 310745