mirror of
https://github.com/RPCS3/llvm-mirror.git
synced 2025-02-01 05:01:59 +01:00
make fast unaligned memory accesses implicit with SSE4.2 or SSE4a
This is a follow-on from the discussion in http://reviews.llvm.org/D12154. This change allows memset/memcpy to use SSE or AVX memory accesses for any chip that has generally fast unaligned memory ops. A motivating use case for this change is a clang invocation that doesn't explicitly set the CPU, but does target a feature that we know only exists on a CPU that supports fast unaligned memops. For example: $ clang -O1 foo.c -mavx This resolves a difference in lowering noted in PR24449: https://llvm.org/bugs/show_bug.cgi?id=24449 Before this patch, we used different store types depending on whether the example can be lowered as a memset or not. Differential Revision: http://reviews.llvm.org/D12288 llvm-svn: 245950
This commit is contained in:
parent
a1bc27c2d3
commit
7089a6d8e2
@ -192,6 +192,13 @@ void X86Subtarget::initSubtargetFeatures(StringRef CPU, StringRef FS) {
|
||||
// Parse features string and set the CPU.
|
||||
ParseSubtargetFeatures(CPUName, FullFS);
|
||||
|
||||
// All CPUs that implement SSE4.2 or SSE4A support unaligned accesses of
|
||||
// 16-bytes and under that are reasonably fast. These features were
|
||||
// introduced with Intel's Nehalem/Silvermont and AMD's Family10h
|
||||
// micro-architectures respectively.
|
||||
if (hasSSE42() || hasSSE4A())
|
||||
IsUAMemUnder32Slow = false;
|
||||
|
||||
InstrItins = getInstrItineraryForCPU(CPUName);
|
||||
|
||||
// It's important to keep the MCSubtargetInfo feature bits in sync with
|
||||
|
@ -55,6 +55,11 @@
|
||||
; Slow chips use 4-byte stores. Fast chips with SSE or later use something other than 4-byte stores.
|
||||
; Chips that don't have SSE use 4-byte stores either way, so they're not tested.
|
||||
|
||||
; Also verify that SSE4.2 or SSE4a imply fast unaligned accesses.
|
||||
|
||||
; RUN: llc < %s -mtriple=i386-unknown-unknown -mattr=sse4.2 2>&1 | FileCheck %s --check-prefix=FAST
|
||||
; RUN: llc < %s -mtriple=i386-unknown-unknown -mattr=sse4a 2>&1 | FileCheck %s --check-prefix=FAST
|
||||
|
||||
define void @store_zeros(i8* %a) {
|
||||
; SLOW-NOT: not a recognized processor
|
||||
; SLOW-LABEL: store_zeros:
|
||||
|
Loading…
x
Reference in New Issue
Block a user