Assembly

Measuring the True Cost of Div (32‑bit vs 64‑bit with Rust and Inline Asm)

Modern CPUs are fast—but some instructions still hide surprising costs. One of the most misunderstood is DIV. Is 32‑bit division faster than 64‑bit? Does instruction width matter anymore on x86‑64?

To answer this properly, we need more than wall‑clock timers. We need cycle counters, instruction retirement statistics, serialization barriers, and tight control over CPU affinity.

Hyper-Threading

I ran my own experiments to better understand how Hyper-Threading works internally on Windows, using hand-written assembly benchmarks — from single-register multiplications to SHA-256 hashing.

Search String Performance

For some reason, I need a fast method for string search. The LabVIEW Search/Split String function is relatively slow, I’ve done this using String Zilla, and achieved around a 20x boost with AVX2/AVX512.

Slow String Performance

I just encountered slow string performance in LabVIEW when a large string is passed to a DLL as a C string pointer. It is always better to pass it as ‘Adapt to Type’ instead of a pointer.

Writing DLL for LabVIEW on Assembler

Nowadays development on pure Assembler is not very popular, because modern compilers can generate “good” code. On the other hand this is a very good exercise, which helps to understand a calling conventions and how it work in very low details.