Hyper-Threading

I ran my own experiments to better understand how Hyper-Threading works internally on Windows, using hand-written assembly benchmarks — from single-register multiplications to SHA-256 hashing.

NUMA and memory performance

If you have a dual-socket CPU, each socket may have dedicated RAM assigned to it. This can lead to performance penalties when a processor accesses memory from the “wrong” memory bank. Below is a simple benchmark and an explanation of how to utilize the full bandwidth properly.

Search String Performance

For some reason, I need a fast method for string search. The LabVIEW Search/Split String function is relatively slow, I’ve done this using String Zilla, and achieved around a 20x boost with AVX2/AVX512.

Slow String Performance

I just encountered slow string performance in LabVIEW when a large string is passed to a DLL as a C string pointer. It is always better to pass it as ‘Adapt to Type’ instead of a pointer.