Forward declare DemandedBits in IVDescriptors, and move include
into the cpp file. Also drop the include from LoopUtils, which
does not need it at all.
When clang is built against a prebuilt LLVM, LLVM_SOURCE_DIR is
empty, which due to a cmake quirk caused list lengths to get out
of sync. Add a workaround.
After finding all such gadgets in a given function, the pass minimally inserts
LFENCE instructions in such a manner that the following property is satisfied:
for all SOURCE+SINK pairs, all paths in the CFG from SOURCE to SINK contain at
least one LFENCE instruction. The algorithm that implements this minimal
insertion is influenced by an academic paper that minimally inserts memory
fences for high-performance concurrent programs:
http://www.cs.ucr.edu/~lesani/companion/oopsla15/OOPSLA15.pdf
The algorithm implemented in this pass is as follows:
1. Build a condensed CFG (i.e., a GadgetGraph) consisting only of the following components:
-SOURCE instructions (also includes function arguments)
-SINK instructions
-Basic block entry points
-Basic block terminators
-LFENCE instructions
2. Analyze the GadgetGraph to determine which SOURCE+SINK pairs (i.e., gadgets) are already mitigated by existing LFENCEs. If all gadgets have been mitigated, go to step 6.
3. Use a heuristic or plugin to approximate minimal LFENCE insertion.
4. Insert one LFENCE along each CFG edge that was cut in step 3.
5. Go to step 2.
6. If any LFENCEs were inserted, return true from runOnFunction() to tell LLVM that the function was modified.
By default, the heuristic used in Step 3 is a greedy heuristic that avoids
inserting LFENCEs into loops unless absolutely necessary. There is also a
CLI option to load a plugin that can provide even better optimization,
inserting fewer fences, while still mitigating all of the LVI gadgets.
The plugin can be found here: https://github.com/intel/lvi-llvm-optimization-plugin,
and a description of the pass's behavior with the plugin can be found here:
https://software.intel.com/security-software-guidance/insights/optimized-mitigation-approach-load-value-injection.
Differential Revision: https://reviews.llvm.org/D75937
Graceful lit shutdown on user keyboard interrupt [Ctrl+C] was a
longstanding goal of mine. After a few refactorings this revision
finally enables it. We use the following strategy to deal with
KeyboardInterrupt:
https://noswap.com/blog/python-multiprocessing-keyboardinterrupt
Printing of a helpful summary for interrupted runs (just as the one for
completed runs) will be tackled in future revisions.
Reviewed By: serge-sans-paille, rnk
Differential Revision: https://reviews.llvm.org/D77365
Adds a new data structure, ImmutableGraph, and uses RDF to find LVI gadgets and add them to a MachineGadgetGraph.
More specifically, a new X86 machine pass finds Load Value Injection (LVI) gadgets consisting of a load from memory (i.e., SOURCE), and any operation that may transmit the value loaded from memory over a covert channel, or use the value loaded from memory to determine a branch/call target (i.e., SINK).
Also adds a new target feature to X86: +lvi-load-hardening
The feature can be added via the clang CLI using -mlvi-hardening.
Differential Revision: https://reviews.llvm.org/D75936
Summary:
This patch includes two extensions:
1. It extends the GraphDiff to also keep the original list of updates
after legalization, not just the deletes/insert vectors.
It also provides an API to pop the first update (the updates are store
in reverse, such that the first update is at the end of the list)
2. It adds a bool to mark whether the given updates should be applied as
given, or applied in reverse. This moves the task of reversing the
updates (when the caller needs this) to a functionality inside
GraphDiff, versus having the caller do this.
The two changes could be split into two patches, but they seemed
reasonably small to be reviewed together.
Reviewers: kuhar, dblaikie
Subscribers: hiraditya, george.burgess.iv, mgrang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77167
Adding a pass that replaces every ret instruction with the sequence:
pop <scratch-reg>
lfence
jmp *<scratch-reg>
where <scratch-reg> is some available scratch register, according to the
calling convention of the function being mitigated.
Differential Revision: https://reviews.llvm.org/D75935
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: kparzysz, sdesmalen, efriedma
Reviewed By: kparzysz
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77267
Previously, the tablegen() cmake command, which defines custom
commands for running tablegen, included several hardcoded paths. This
becomes unwieldy as there are more users for which these paths are
insufficient. For most targets, cmake uses include_directories() and
the INCLUDE_DIRECTORIES directory property to specify include paths.
This change picks up the INCLUDE_DIRECTORIES property and adds it
to the include path used when running tablegen. As a side effect, this
allows us to remove several hard coded paths to tablegen that are redundant
with specified include_directories().
I haven't removed the hardcoded path to CMAKE_CURRENT_SOURCE_DIR, which
seems generically useful. There are several users in clang which apparently
don't have the current directory as an include_directories(). This could
be considered separately.
The new version of this path uses list APPEND rather than list TRANSFORM,
in order to be compatible with cmake 3.4.3. If we update to cmake 3.12 then
we can use list TRANSFORM instead.
Differential Revision: https://reviews.llvm.org/D77156
We can fix register class of PHI based on its all AGPR uses.
That leaves behind all PHIs which were already processed
earlier. Propagate RC back to PHI operands of a PHI.
Differential Revision: https://reviews.llvm.org/D77344
As detailed on PR45043, static analysis was warning that the StringRef::iterator Position argument was being ignored and the function was hardwired to use the Current iterator.
This patch ensures we use the provided iterator and removes the (barely necessary) setError wrapper that always used Current.
Differential Revision: https://reviews.llvm.org/D76512
Extracting to the same index that we are going to insert back into
allows forming select ("blend") shuffles and enables further transforms.
Admittedly, this is a quick-fix for a more general problem that I'm
hoping to solve by adding transforms for patterns that start with an
insertelement.
But this might resolve some regressions known to be caused by the
extract-extract transform (although I have not gotten more details on
those yet).
In the motivating case from PR34724:
https://bugs.llvm.org/show_bug.cgi?id=34724
The combination of subsequent instcombine and codegen transforms gets us this improvement:
vmovshdup %xmm0, %xmm2 ## xmm2 = xmm0[1,1,3,3]
vhaddps %xmm1, %xmm1, %xmm4
vmovshdup %xmm1, %xmm3 ## xmm3 = xmm1[1,1,3,3]
vaddps %xmm0, %xmm2, %xmm0
vaddps %xmm1, %xmm3, %xmm1
vshufps $200, %xmm4, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm4[0,3]
vinsertps $177, %xmm1, %xmm0, %xmm0 ## xmm0 = zero,xmm0[1,2],xmm1[2]
-->
vmovshdup %xmm0, %xmm2 ## xmm2 = xmm0[1,1,3,3]
vhaddps %xmm1, %xmm1, %xmm1
vaddps %xmm0, %xmm2, %xmm0
vshufps $200, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm1[0,3]
Differential Revision: https://reviews.llvm.org/D76623
Added unit tests for 2 scenarios that were failing.
Made replace_path_prefix back to 3 parameters instead of 5, simplifying the implementation. The other 2 were always used with the default value.
This commit is intended to be the first of 3:
1) simplify/fix replace_path_prefix.
2) use it in the context of -fdebug-prefix-map and -fmacro-prefix-map (see D76869).
3) Make Windows version of replace_path_prefix insensitive to both case and separators (slash vs backslash).
Differential Revision: https://reviews.llvm.org/D77223
Previously, the tablegen() cmake command, which defines custom
commands for running tablegen, included several hardcoded paths. This
becomes unwieldy as there are more users for which these paths are
insufficient. For most targets, cmake uses include_directories() and
the INCLUDE_DIRECTORIES directory property to specify include paths.
This change picks up the INCLUDE_DIRECTORIES property and adds it
to the include path used when running tablegen. As a side effect, this
allows us to remove several hard coded paths to tablegen that are redundant
with specified include_directories().
I haven't removed the hardcoded path to CMAKE_CURRENT_SOURCE_DIR, which
seems generically useful. There are several users in clang which apparently
don't have the current directory as an include_directories(). This could
be considered separately.
Differential Revision: https://reviews.llvm.org/D77156
Extend lowerShuffleWithPACK/matchShuffleWithPACK/createPackShuffleMask to handle compaction style shuffle masks that can be lowered to chains of PACKSS/PACKUS if their inputs are suitably sign/zero extended.
This helps avoid PSHUFB (and its mask load) for short shuffle chains, shuffle combining will still replace with a PSHUFB if we have enough shuffles as getFauxShuffleMask should recognise the PACKSS/PACKUS chains.
As discussed in post-commit review in https://reviews.llvm.org/D73501
if the goal of this is to help vectorizer, then we should actually
be teaching vectorizer to do this, because right now this rewrite
is still budget-limited, which isn't what we'd want.
Additionally, while the rest of the patch series was universally profitable,
this particular patch is reportedly (https://reviews.llvm.org/D73501#1905171)
exposing cost-modeling issues on ARM.
So let's just back this particular patch out. Once there's an undo transform,
this could be considered for reintegration.
This reverts commit 44edc6fd2c63b7db43e13cc8caf1fee79bebdb5f.
We got some of the potential optimizations with D76727 and D76844.
There are 2 likely enhancements that we could add to -vector-combine
to get most of the remaining cases:
1. Allow bitcasted shuffle mask narrowing (widen the elements).
2. Combine shuffle-of-shuffle into a single shuffle.
This is already partly handled by the x86 backend, but the
tests here show that we still miss some of the potential
combines.
Currently when the target is big-endian vmov.i64 reverses the order of the two
words of the vector. This is correct only when the underlying element type is
32-bit, as actually what it should be doing is considering it a vector of the
underlying type and reversing the elements of that.
Differential Revision: https://reviews.llvm.org/D76515
If we have an element-wise vmov immediate instruction then a subsequent vrev
with width greater or equal to the vmov element width, then that vrev won't do
anything. Add a DAG combine to convert bitcasts that would become such vrevs
into vector_reg_casts instead.
Differential Revision: https://reviews.llvm.org/D76514