llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-25 22:12:57 +02:00

Author	SHA1	Message	Date
Rafael Espindola	4a3584032f	For COFF and MachO, compute the gap between to symbols. Before r238028 we used to do this in O(N^2), now we do it in O(N log N). llvm-svn: 238698	2015-05-31 23:15:35 +00:00
NAKAMURA Takumi	1533760873	ARMConstantIslandPass.cpp: Prune an empty \brief. [-Wdocumentation] llvm-svn: 238697	2015-05-31 23:05:35 +00:00
Colin LeMahieu	1e71c87eeb	[Hexagon] Including raw_ostream for debug builds. llvm-svn: 238695	2015-05-31 22:29:33 +00:00
Colin LeMahieu	f339dd00b7	[Hexagon] classes are actually structs. llvm-svn: 238694	2015-05-31 22:18:42 +00:00
Rafael Espindola	86cf1be090	Use a range loop. NFC. llvm-svn: 238693	2015-05-31 22:13:51 +00:00
Colin LeMahieu	fd4f2786fc	[Hexagon] Adding MC packet shuffler. llvm-svn: 238692	2015-05-31 21:57:09 +00:00
Tim Northover	0eb976c493	ARM: recommit r237590: allow jump tables to be placed as constant islands. The original version didn't properly account for the base register being modified before the final jump, so caused miscompilations in Chromium and LLVM. I've fixed this and tested with an LLVM self-host (I don't have the means to build & test Chromium). The general idea remains the same: in pathological cases jump tables can be too far away from the instructions referencing them (like other constants) so they need to be movable. Should fix PR23627. llvm-svn: 238680	2015-05-31 19:22:07 +00:00
Benjamin Kramer	bccd30ee34	[MC] Simplify code. No functionality change intended. llvm-svn: 238676	2015-05-31 18:49:28 +00:00
Davide Italiano	ef67976879	Clarify how the binary file checked in was generated. llvm-svn: 238665	2015-05-30 22:43:36 +00:00
Colin LeMahieu	60beb2e4b6	[Hexagon] Adding override specifier and removing erroneous assertion llvm-svn: 238664	2015-05-30 20:03:07 +00:00
Keno Fischer	6925aa5ef4	Add RelocVisitor support for MachO This commit adds partial support for MachO relocations to RelocVisitor. A simple test case is added to show that relocations are indeed being applied and that using llvm-dwarfdump on MachO files no longer errors. Correctness is not yet tested, due to an unrelated bug in DebugInfo, which will be fixed with appropriate testcase in a followup commit. Differential Revision: http://reviews.llvm.org/D8148 llvm-svn: 238663	2015-05-30 19:44:53 +00:00
Colin LeMahieu	c59d7c9784	[Hexagon] Adding basic relaxation functionality. llvm-svn: 238660	2015-05-30 18:55:47 +00:00
Colin LeMahieu	8b4d5b0298	[MC] Allow backends to decide relaxation for unresolved fixups. Differential Revision: http://reviews.llvm.org/D8217 llvm-svn: 238659	2015-05-30 18:42:22 +00:00
Kostya Serebryany	444683ece7	[lib/Fuzzer] make assertions more informative and update comments for the user-supplied mutator llvm-svn: 238658	2015-05-30 17:33:13 +00:00
Benjamin Kramer	54e2da2000	[MC] Reorder MCSymbol members to reduce padding. sizeof(MCSymbol) goes from 72 to 64 bytes on x86_64. llvm-svn: 238655	2015-05-30 13:52:30 +00:00
Simon Pilgrim	280d31052e	Stripped trailing whitespace. NFC. llvm-svn: 238654	2015-05-30 13:01:42 +00:00
Renato Golin	068c4bcd5c	Comment change. NFC That comment misleads the current discussions in mentioned bug. Leave the discussions to the bug. Also, adding a future change FIXME. llvm-svn: 238653	2015-05-30 10:44:07 +00:00
Chandler Carruth	e918d09d5b	[x86] Unify the horizontal adding used for popcount lowering taking the best approach of each. For vNi16, we use SHL + ADD + SRL pattern that seem easily the best. For vNi32, we use the PUNPCK + PSADBW + PACKUSWB pattern. In some cases there is a huge improvement with this in IACA's estimated throughput -- over 2x higher throughput!!!! -- but the measurements are too good to be true. In one narrow case, the SHL + ADD + SHL + ADD + SRL pattern looks slightly faster, but I'm not sure I believe any of the measurements at this point. Both are the exact same uops though. Hard to be confident of anything past that. If anyone wants to collect very detailed (Agner-level) timings with the result of this patch, or with the i32 case replaced with SHL + ADD + SHl + ADD + SRL, I'd be very interested. Note that you'll need to test it on both Ivybridge and Haswell, with both SSE3, SSSE3, and AVX selected as I saw unique behavior in each of these buckets with IACA all of which should be checked against measured performance. But this patch is still a useful improvement by dropping duplicate work and getting the much nicer PSADBW lowering for v2i64. I'd still like to rephrase this in terms of generic horizontal sum. It's a bit lame to have a special case of that just for popcount. llvm-svn: 238652	2015-05-30 10:35:03 +00:00
Renato Golin	c0a1933df0	[ARMTargetParser] Move IAS arch ext parser. NFC The plan was to move the whole table into the already existing ArchExtNames but some fields depend on a table-generated file, and we don't yet have this feature in the generic lib/Support side. Once the minimum target-specific table-generated files are available in a generic fashion to these libraries, we'll have to keep it in the ASM parser. llvm-svn: 238651	2015-05-30 10:30:02 +00:00
Chandler Carruth	5708da7c46	[x86] Split out the horizontal byte sum lowering component of the LUT lowering into a helper function. NFC. llvm-svn: 238650	2015-05-30 09:46:16 +00:00
Craig Topper	baa40e9044	[TableGen] Merge RecTy::typeIsConvertibleTo and RecTy::baseClassOf. NFC typeIsConvertibleTo was just calling baseClassOf(this) on the argument passed to it, but there weren't different signatures for baseClassOf so passing 'this' didn't really do anything interesting. typeIsConvertibleTo could have just been a non-virtual method in RecTy. But since that would be kind of a silly method, I instead re-distributed the logic from baseClassOf into typeIsConvertibleTo. llvm-svn: 238648	2015-05-30 07:36:01 +00:00
Craig Topper	0ffd711178	Fix indentation. NFC. llvm-svn: 238647	2015-05-30 07:35:21 +00:00
Craig Topper	06ba30a59e	[TableGen] Remove all the variations of RecTy::convertValue and just handle the conversions in convertInitializerTo directly. This saves a bunch of vtable entries. NFC llvm-svn: 238646	2015-05-30 07:34:51 +00:00
Chandler Carruth	b3f0f1a41d	[x86] Update the order of instructions after I switched to a bitcast helper that skips creating a cast when it isn't necessary. It's really somewhat concerning that this was caused by the the presence of a no-op bitcast, but... llvm-svn: 238642	2015-05-30 06:02:37 +00:00
David Majnemer	4c20c6ffa9	[WinCOFF] Add support for the .safeseh directive .safeseh adds an entry to the .sxdata section to register all the appropriate functions which may handle an exception. This entry is not a relocation to the symbol but instead the symbol table index of the function. llvm-svn: 238641	2015-05-30 04:56:02 +00:00
Chandler Carruth	230340df2b	[x86] Replace the long spelling of getting a bitcast with the much shorter one. NFC. In addition to being much shorter to type and requiring fewer arguments, this change saves over 30 lines from this one file, all wasted on total boilerplate... llvm-svn: 238640	2015-05-30 04:23:13 +00:00
Chandler Carruth	2cc1323bd7	[x86] Replace the long spelling of getting a bitcast with the new short spelling. NFC. llvm-svn: 238639	2015-05-30 04:19:57 +00:00
Chandler Carruth	6f372f0515	[sdag] Add the helper I most want to the DAG -- building a bitcast around a value using its existing SDLoc. Start using this in just one function to save omg lines of code. llvm-svn: 238638	2015-05-30 04:14:10 +00:00
Chandler Carruth	0c88847b8a	[x86] Restore the bitcasts I removed when refactoring this to avoid shifting vectors of bytes as x86 doesn't have direct support for that. This removes a bunch of redundant masking in the generated code for SSE2 and SSE3. In order to avoid the really significant code size growth this would have triggered, I also factored the completely repeatative logic for shifting and masking into two lambdas which in turn makes all of this much easier to read IMO. llvm-svn: 238637	2015-05-30 04:05:11 +00:00
Chandler Carruth	11c24e4998	[x86] Implement a faster vector population count based on the PSHUFB in-register LUT technique. Summary: A description of this technique can be found here: http://wm.ite.pl/articles/sse-popcount.html The core of the idea is to use an in-register lookup table and the PSHUFB instruction to compute the population count for the low and high nibbles of each byte, and then to use horizontal sums to aggregate these into vector population counts with wider element types. On x86 there is an instruction that will directly compute the horizontal sum for the low 8 and high 8 bytes, giving vNi64 popcount very easily. Various tricks are used to get vNi32 and vNi16 from the vNi8 that the LUT computes. The base implemantion of this, and most of the work, was done by Bruno in a follow up to D6531. See Bruno's detailed post there for lots of timing information about these changes. I have extended Bruno's patch in the following ways: 0) I committed the new tests with baseline sequences so this shows a diff, and regenerated the tests using the update scripts. 1) Bruno had noticed and mentioned in IRC a redundant mask that I removed. 2) I introduced a particular optimization for the i32 vector cases where we use PSHL + PSADBW to compute the the low i32 popcounts, and PSHUFD + PSADBW to compute doubled high i32 popcounts. This takes advantage of the fact that to line up the high i32 popcounts we have to shift them anyways, and we can shift them by one fewer bit to effectively divide the count by two. While the PSHUFD based horizontal add is no faster, it doesn't require registers or load traffic the way a mask would, and provides more ILP as it happens on different ports with high throughput. 3) I did some code cleanups throughout to simplify the implementation logic. 4) I refactored it to continue to use the parallel bitmath lowering when SSSE3 is not available to preserve the performance of that version on SSE2 targets where it is still much better than scalarizing as we'll still do a bitmath implementation of popcount even in scalar code there. With #1 and #2 above, I analyzed the result in IACA for sandybridge, ivybridge, and haswell. In every case I measured, the throughput is the same or better using the LUT lowering, even v2i64 and v4i64, and even compared with using the native popcnt instruction! The latency of the LUT lowering is often higher than the latency of the scalarized popcnt instruction sequence, but I think those latency measurements are deeply misleading. Keeping the operation fully in the vector unit and having many chances for increased throughput seems much more likely to win. With this, we can lower every integer vector popcount implementation using the LUT strategy if we have SSSE3 or better (and thus have PSHUFB). I've updated the operation lowering to reflect this. This also fixes an issue where we were scalarizing horribly some AVX lowerings. Finally, there are some remaining cleanups. There is duplication between the two techniques in how they perform the horizontal sum once the byte population count is computed. I'm going to factor and merge those two in a separate follow-up commit. Differential Revision: http://reviews.llvm.org/D10084 llvm-svn: 238636	2015-05-30 03:20:59 +00:00
Chandler Carruth	7c7805b691	[x86] Restructure the parallel bitmath lowering of popcount into a separate routine, generalize it to work for all the integer vector sizes, and do general code cleanups. This dramatically improves lowerings of byte and short element vector popcount, but more importantly it will make the introduction of the LUT-approach much cleaner. The biggest cleanup I've done is to just force the legalizer to do the bitcasting we need. We run these iteratively now and it makes the code much simpler IMO. Other changes were minor, and mostly naming and splitting things up in a way that makes it more clear what is going on. The other significant change is to use a different final horizontal sum approach. This is the same number of instructions as the old method, but shifts left instead of right so that we can clear everything but the final sum with a single shift right. This seems likely better than a mask which will usually have to read the mask from memory. It is certaily fewer u-ops. Also, this will be temporary. This and the LUT approach share the need of horizontal adds to finish the computation, and we have more clever approaches than this one that I'll switch over to. llvm-svn: 238635	2015-05-30 03:20:55 +00:00
Jim Grosbach	30efd68a58	MC: Clean up MCExpr naming. NFC. llvm-svn: 238634	2015-05-30 01:25:56 +00:00
Filipe Cabecinhas	37a6f20080	[BitcodeReader] Change an assert to a call to a call to Error() It's reachable from user input. Bug found with AFL fuzz. llvm-svn: 238633	2015-05-30 00:17:20 +00:00
Fiona Glaser	246d8a03ef	SelectionDAG: fix logic for promoting shift types r238503 fixed the problem of too-small shift types by promoting them during legalization, but the correct solution is to promote only the operands that actually demand promotion. This fixes a crash on an out-of-tree target caused by trying to promote an operand that can't be promoted. llvm-svn: 238632	2015-05-29 23:37:22 +00:00
Reid Kleckner	2f05d5a280	[WinEH] Adjust the 32-bit SEH prologue to better match reality It turns out that _except_handler3 and _except_handler4 really use the same stack allocation layout, at least today. They just make different choices about encoding the LSDA. This is in preparation for lowering the llvm.eh.exceptioninfo(). llvm-svn: 238627	2015-05-29 22:57:46 +00:00
Jingyue Wu	5cf995662b	[docs] fix the declarations of the llvm.nvvm.ptr.gen.to.* intrinsics Summary: These intrinsics should take a generic input address space and outputs a non-generic address space. Test Plan: no Reviewers: jholewinski, eliben Reviewed By: eliben Subscribers: eliben, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10132 llvm-svn: 238620	2015-05-29 22:18:03 +00:00
Reid Kleckner	44a9f03b05	Disable FP elimination in funcs using 32-bit MSVC EH personalities The value in 'ebp' acts as an implicit argument to the outlined handlers, and is recovered with frameaddress(1). llvm-svn: 238619	2015-05-29 21:58:11 +00:00
Rafael Espindola	27d0b57917	Remove getData. This completes the mechanical part of merging MCSymbol and MCSymbolData. llvm-svn: 238617	2015-05-29 21:45:01 +00:00
Reid Kleckner	0679765fdf	Only add the EH state insertion pass on 32-bit Windows llvm-svn: 238612	2015-05-29 20:43:10 +00:00
Rafael Espindola	b365b7fead	Remove the MCSymbolData typedef. The getData member function is next. llvm-svn: 238611	2015-05-29 20:41:47 +00:00
Rafael Espindola	d8fde42970	Merge MCSymbol and MCSymbolData. As a transition hack leave MCSymbolData as a typedef of MCSymbol. I will be removing that in a second. llvm-svn: 238609	2015-05-29 20:31:23 +00:00
Kostya Serebryany	74916b0deb	[lib/Fuzzer] relax an assertion llvm-svn: 238608	2015-05-29 20:31:17 +00:00
Rafael Espindola	caf32aa8f6	Rename getOrCreateSymbolData to registerSymbol and return void. Another step in merging MCSymbol and MCSymbolData. llvm-svn: 238607	2015-05-29 20:21:02 +00:00
Benjamin Kramer	0e31955b32	Replace push_back(Constructor(foo)) with emplace_back(foo) for non-trivial types If the type isn't trivially moveable emplace can skip a potentially expensive move. It also saves a couple of characters. Call sites were found with the ASTMatcher + some semi-automated cleanup. memberCallExpr( argumentCountIs(1), callee(methodDecl(hasName("push_back"))), on(hasType(recordDecl(has(namedDecl(hasName("emplace_back")))))), hasArgument(0, bindTemporaryExpr( hasType(recordDecl(hasNonTrivialDestructor())), has(constructExpr()))), unless(isInTemplateInstantiation())) No functional change intended. llvm-svn: 238602	2015-05-29 19:43:39 +00:00
Rafael Espindola	f5a21976c3	Move Flags from MCSymbolData to MCSymbol. llvm-svn: 238598	2015-05-29 19:07:51 +00:00
Rafael Espindola	144bb0a67a	Fix build without asserts. llvm-svn: 238597	2015-05-29 19:04:38 +00:00
Rafael Espindola	d72e3e8f02	Pass MCSymbols to the helper functions in MCELF.h. llvm-svn: 238596	2015-05-29 18:47:23 +00:00
Chris Bieneman	0f8139f9c6	[CMake] Bug 23468 - LLVM_OPTIMIZED_TABLEGEN does not work with Visual Studio Summary: Multi-configuration builds put their binaries into ${CMAKE_BINARY_DIR}/Release/bin/. The table-gen cross-compilation support needs to take that into account. Reviewers: yaron.keren Reviewed By: yaron.keren Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10102 llvm-svn: 238592	2015-05-29 18:34:41 +00:00
Rafael Espindola	3bfb0f5ec1	Use an explicitly defaulted constructor. llvm-svn: 238591	2015-05-29 18:31:17 +00:00
Rafael Espindola	72bab9e25e	Pass a MCSymbol to needsRelocateWithSymbol. llvm-svn: 238589	2015-05-29 18:26:09 +00:00

1 2 3 4 5 ...

117678 Commits