Assembly

Measuring the True Cost of Div (32‑bit vs 64‑bit with Rust and Inline Asm)

Modern CPUs are fast—but some instructions still hide surprising costs. One of the most misunderstood is DIV. Is 32‑bit division faster than 64‑bit? Does instruction width matter anymore on x86‑64?

To answer this properly, we need more than wall‑clock timers. We need cycle counters, instruction retirement statistics, serialization barriers, and tight control over CPU affinity.

How to get assembly output from Rust Code

From time to time, it’s useful to inspect the assembly output generated from Rust code. A common problem is locating a specific piece of Rust code inside the assembly listing. Below is a simple trick to make this much easier.

Hyper-Threading

I ran my own experiments to better understand how Hyper-Threading works internally on Windows, using hand-written assembly benchmarks — from single-register multiplications to SHA-256 hashing.

Search String Performance

For some reason, I need a fast method for string search. The LabVIEW Search/Split String function is relatively slow, I’ve done this using String Zilla, and achieved around a 20x boost with AVX2/AVX512.

Slow String Performance

I just encountered slow string performance in LabVIEW when a large string is passed to a DLL as a C string pointer. It is always better to pass it as ‘Adapt to Type’ instead of a pointer.