* Ensure ::PtrSame<T, Derived> is true.
* Allow id_base, id_step and id_count to be of enumeration type.
* Fix potential deadlock in kernel explorer.
idm::select:
* Allow to select multiple inherited object types for idm::select.
* Allow function reference types. (they don't allow access to operator() directly, use deducing std::function constructor instead)
* Ensure ::is_same_ptr<T, object_type> is true.
Turns out some games don't configure proper channel counts after all,
which triggers an ensure in cellAudioOutGetState.
Let's select the current sound_mode in cellAudioOutConfigure.
Keep the old one if no match was found.
Also moves some code from AudioBackend to cellAudioOut for thread safety (see mutex).
* BufferUtils: use naive function pointer on Apple arm64
Use naive function pointer on Apple arm64 because ASLR breaks asmjit.
See BufferUtils.cpp comment for explanation on why this happens and how
to fix if you want to use asmjit.
* build-macos: fix source maps for Mac
Tell Qt not to strip debug symbols when we're in debug or relwithdebinfo
modes.
* LLVM PPU: fix aarch64 on macOS
Force MachO on macOS to fix LLVM being unable to patch relocations
during codegen. Adds Aarch64 NEON intrinsics for x86 intrinsics used by
PPUTranslator/Recompiler.
* virtual memory: use 16k pages on aarch64 macOS
Temporary hack to get things working by using 16k pages instead of 4k
pages in VM emulation.
* PPU/SPU: fix NEON intrinsics and compilation for arm64 macOS
Fixes some intrinsics usage and patches usages of asmjit to properly
emit absolute jmps so ASLR doesn't cause out of bounds rel jumps. Also
patches the SPU recompiler to properly work on arm64 by telling LLVM to
target arm64.
* virtual memory: fix W^X toggles on macOS aarch64
Fixes W^X on macOS aarch64 by setting all JIT mmap'd regions to default
to RW mode. For both SPU and PPU execution threads, when initialization
finishes we toggle to RX mode. This exploits Apple's per-thread setting
for RW/RX to let us be technically compliant with the OS's W^X
enforcement while not needing to actually separate the memory
allocated for code/data.
* PPU: implement aarch64 specific functions
Implements ppu_gateway for arm64 and patches LLVM initialization to use
the correct triple. Adds some fixes for macOS W^X JIT restrictions when
entering/exiting JITed code.
* PPU: Mark rpcs3 calls as non-tail
Strictly speaking, rpcs3 JIT -> C++ calls are not tail calls. If you
call a function inside e.g. an L2 syscall, it will clobber LR on arm64
and subtly break returns in emulated code. Only JIT -> JIT "calls"
should be tail.
* macOS/arm64: compatibility fixes
* vm: patch virtual memory for arm64 macOS
Tag mmap calls with MAP_JIT to allow W^X on macOS. Fix mmap calls to
existing mmap'd addresses that were tagged with MAP_JIT on macOS. Fix
memory unmapping on 16K page machines with a hack to mark "unmapped"
pages as RW.
* PPU: remove wrong comment
* PPU: fix a merge regression
* vm: remove 16k page hacks
* PPU: formatting fixes
* PPU: fix arm64 null function assembly
* ppu: clean up arch-specific instructions
- Adds AVX512 path for upload_untouched u16 with primitive restart, and
AVX2 and AVX512 paths for upload_untouched without restart
- The AVX512 paths handle the remainder in simd code with masking, which
provided a large speedup
- On my i5-1135G7 in demons souls benchmarking a scene in boletaria with
a lot of geometry on screen via perf:
SSE4_1 0.64%
AVX2 0.59%
AVX512 0.56%
AVX512 w/ remainder masking 0.51%
- This instruction can clamp a value between a range of values, something which previously needed 2 instructions.
- With the immediate byte set to 0x2 it will compute the minimum between the absolute value of the first input and the second input, and then copy the sign from the first input to the result.