Smallest Hello World Application
A simple exercise — how to develop the smallest possible Hello World application.
A simple exercise — how to develop the smallest possible Hello World application.
When iterating diagonals of a 3D array, the performance differences between LabVIEW and Rust become very noticeable. Even with a straightforward implementation, the generated machine code tells a story about what’s happening under the hood.
Modern CPUs are fast—but some instructions still hide surprising costs. One of the most misunderstood is DIV. Is 32‑bit division faster than 64‑bit? Does instruction width matter anymore on x86‑64?
To answer this properly, we need more than wall‑clock timers. We need cycle counters, instruction retirement statistics, serialization barriers, and tight control over CPU affinity.
From time to time, it’s useful to inspect the assembly output generated from Rust code. A common problem is locating a specific piece of Rust code inside the assembly listing. Below is a simple trick to make this much easier.
Sometimes it’s necessary to inspect the machine code behind a LabVIEW-generated executable. Running such an application under a binary debugger is straightforward; locating the desired code is only slightly tricky.
Basically the base (not turbo boosted) CPU Frequency is amount of TSC Increments per one second.
This code will read latency and throughput of the CPU instructions (Draft)
This code will read memory at different sizes and measure bandwidth
The easiest way to print integer and floating-point values from Assembly Code to the Console
I ran my own experiments to better understand how Hyper-Threading works internally on Windows, using hand-written assembly benchmarks — from single-register multiplications to SHA-256 hashing.
llvm-mca is a performance analysis tool to statically measure the performance of machine code in a specific CPU.
For some reason, I need a fast method for string search. The LabVIEW Search/Split String function is relatively slow, I’ve done this using String Zilla, and achieved around a 20x boost with AVX2/AVX512.
I just encountered slow string performance in LabVIEW when a large string is passed to a DLL as a C string pointer. It is always better to pass it as ‘Adapt to Type’ instead of a pointer.
B&R PLC could be programmed with C and C++. But you can also use Assembly. Below short instruction how to do that.
In some cases we need to measure a very short intervals (hundreds of the CPU Tacts) directly in Assembler. We can perform measurement by using cpuid/rdtsc combination.