How to write DLL in Rust and call it from LabVIEW

13 May, 2024

(Last Modified: 19 November, 2024)

Calling external code from different languages is great feature of LabVIEW. In general you can wrap almost anything into DLL, then call it by Call Library Function Node. This time we will mix Rust code with LabVIEW.

The Rust programming language helps you write faster, more reliable software. High-level ergonomics and low-level control are often at odds in programming language design; Rust challenges that conflict. Through balancing powerful technical capacity and a great developer experience, Rust gives you the option to control low-level details (such as memory usage) without all the hassle traditionally associated with such control.

You need 64-bit LabVIEW and the Rust, of course. To get started, download Rust and install as described.

If you have Rust installed already, then you can update it to the latest version by calling

> rustup update

Before continue I would like to recommend to read Reference Manual: Call Library Function Node Manual and Configuring the Call Library Function Node Article.

The first step is — how to create “minimal valuable” DLL in Rust. By default it will be compiled to executable, so you should properly configure cargo.

All what you need is just two files Cargo.toml and lib.rs in \src SubFolder:

│
│ Cargo.toml
│
└─── src
lib.rs

Cargo.toml:

[package]
name = "my_rust_lib"
version = "0.1.0"

[lib]
crate-type = ["cdylib"]

and first lib.rs with just one function which will add two floats:

// return sum of two floats
#[no_mangle]
pub extern "C" fn Add(a: f64, b: f64) -> f64{
	let sum: f64 = a + b;
	return sum;
}

To compile this you will need to call

> cargo build --release

   Compiling my_rust_lib v0.1.0
    Finished `release` profile [optimized] target(s) in 0.62s

Your library will be located in \target\release folder:

│ Cargo.toml
│
├─── src
│ lib.rs
│
└─── target
    │
    └─── release
        │ my_rust_lib.dll
        │ my_rust_lib.dll.exp
        │ my_rust_lib.dll.lib
        │ my_rust_lib.pdb

Now its a time to launch LabVIEW and configure Call Library Function Node:

parameters:

And it works:

Of course, you can pass more complicated data structures, not only scalars, for example, arrays:

// return sum of array
#[allow(dead_code)]
#[no_mangle]
pub extern "C" fn SumArray(ptr: *mut f64, length: usize) -> f64{
    	let data = unsafe { std::slice::from_raw_parts_mut(ptr, length) };
    	let sum: f64 = data.iter().sum();
	return sum;
}

Then LabVIEW Code:

Array passed as Data Pointer:

If you need to iterate across all elements and return result, then the following snippet could be useful:

#[no_mangle]
pub extern "C" fn ffc_rust(
    arr1: *const i16,
    arr2: *const i16,
    arr3: *const f32,
    result: *mut i16,
    size: usize,
) {
    unsafe {
        for i in 0..size {
            let subtraction_result = *arr1.offset(i as isize) - *arr2.offset(i as isize);
            let multiplication_result = (subtraction_result as f32 * *arr3.offset(i as isize)) as i16;
            *result.offset(i as isize) = multiplication_result;
        }
    }
}

Here above we will perform kind of Flat Field Correction - subtract arr2 (which act as “offset”) from arr1 (our data), then multiply by arr3 (gain) and return result:

This is how it works.

Technically Rust positioned as fast and safe alternative to C. Programming in Rust instead of C has several benefits, including improved memory safety, better performance, and enhanced security. From generated code point of view its inetersting to disassemble Rust-based library and take a look inside:

This is how Add(int a, int b) compiled:

public Add
	proc near
	addsd   xmm0, xmm1
	retn
	endp

Everything like in learning book, no questions. Two parameters passed via xmm0 and xmm1, then result returned in xmm0 as described in x64 calling conventions.

SumArray is not very optimal, but fully acceptable:

                public SumArray
SumArray        proc near               ; DATA XREF: .rdata:off_18001CD98↓o
                test    rdx, rdx
                jz      short loc_180001029
                mov     eax, edx
                and     eax, 7
                cmp     rdx, 8
                jnb     short loc_18000102D
                xorpd   xmm0, xmm0
                xor     r8d, r8d
                jmp     short loc_180001080
; ---------------------------------------------------------------------------

loc_180001029:                          ; CODE XREF: SumArray+3↑j
                xorps   xmm0, xmm0
                retn
; ---------------------------------------------------------------------------

loc_18000102D:                          ; CODE XREF: SumArray+E↑j
                and     rdx, 0FFFFFFFFFFFFFFF8h
                xorpd   xmm0, xmm0
                xor     r8d, r8d
                nop     dword ptr [rax+rax+00000000h]

loc_180001040:                          ; CODE XREF: SumArray+6E↓j
                addsd   xmm0, qword ptr [rcx+r8*8]
                addsd   xmm0, qword ptr [rcx+r8*8+8]
                addsd   xmm0, qword ptr [rcx+r8*8+10h]
                addsd   xmm0, qword ptr [rcx+r8*8+18h]
                addsd   xmm0, qword ptr [rcx+r8*8+20h]
                addsd   xmm0, qword ptr [rcx+r8*8+28h]
                addsd   xmm0, qword ptr [rcx+r8*8+30h]
                addsd   xmm0, qword ptr [rcx+r8*8+38h]
                add     r8, 8
                cmp     rdx, r8
                jnz     short loc_180001040

loc_180001080:                          ; CODE XREF: SumArray+17↑j
                test    rax, rax
                jz      short locret_18000109D
                lea     rcx, [rcx+r8*8]
                xor     edx, edx
                nop     dword ptr [rax+rax+00h]

loc_180001090:                          ; CODE XREF: SumArray+8B↓j
                addsd   xmm0, qword ptr [rcx+rdx*8]
                inc     rdx
                cmp     rax, rdx
                jnz     short loc_180001090

locret_18000109D:                       ; CODE XREF: SumArray+73↑j
                retn
SumArray        endp

My experience is that the C code compiled with highly optimizing compiler like Intel could be still slightly faster than Rust, but in general I’m impressed about results. Anyway, there are some performance tuning trick described in the article Cheap tricks for high-performance Rust.

Technically for optimization you should place config.toml into .cargo SubFolder:

│ Cargo.toml
│
├───.cargo
│      config.toml
│
├───src
│      lib.rs
│
└───target
    │
    └───release
        │ my_rust_lib.dll
        │ tester.vi

And in config.toml you have to place options:

[build]
# rustflags = ["-O", "-C", "target-feature=+avx"]
rustflags = ["-O", "-C", "target-cpu=native"]

Of course, be careful with “target-cpu=native”, because it will work only on the same CPU architecture where it compiled, but this make sense, for example the assembly listing with default release flag:

loc_180001190:                          ; CODE XREF: ffc_rust+18B↓j
	movq    xmm2, qword ptr [rcx+r11*2]
	movq    xmm3, qword ptr [rdx+r11*2]
	psubw   xmm2, xmm3
	punpcklwd xmm2, xmm2
	psrad   xmm2, 10h
	cvtdq2ps xmm2, xmm2
	movups  xmm3, xmmword ptr [r8+r11*4]
	mulps   xmm3, xmm2
	movshdup xmm2, xmm3
	movaps  xmm4, xmm0
	maxss   xmm4, xmm2
	movaps  xmm2, xmm1
	minss   xmm2, xmm4
	cvttss2si esi, xmm2
...

and with enabled optimization we can see

loc_180001210:                          ; CODE XREF: ffc_rust+587↓j
	vmovdqu xmm2, xmmword ptr [rcx+r11*2]
	vmovdqu xmm3, xmmword ptr [rcx+r11*2+10h]
	vmovdqu xmm4, xmmword ptr [rcx+r11*2+20h]
	vmovdqu xmm5, xmmword ptr [rcx+r11*2+30h]
	vpsubw  xmm2, xmm2, xmmword ptr [rdx+r11*2]
	vpsubw  xmm3, xmm3, xmmword ptr [rdx+r11*2+10h]
	vpsubw  xmm4, xmm4, xmmword ptr [rdx+r11*2+20h]
	vpsubw  xmm5, xmm5, xmmword ptr [rdx+r11*2+30h]
	vpmovsxwd ymm2, xmm2
	vcvtdq2ps ymm2, ymm2
	vpmovsxwd ymm3, xmm3
	vcvtdq2ps ymm3, ymm3
	vpmovsxwd ymm4, xmm4
	vcvtdq2ps ymm4, ymm4
	vpmovsxwd ymm5, xmm5
	vmulps  ymm16, ymm2, ymmword ptr [r8+r11*4]
	vmulps  ymm17, ymm3, ymmword ptr [r8+r11*4+20h]
	vmulps  ymm4, ymm4, ymmword ptr [r8+r11*4+40h]
	vcvtdq2ps ymm2, ymm5
	vmulps  ymm2, ymm2, ymmword ptr [r8+r11*4+60h]
	vmovshdup xmm3, xmm16
	vmaxss  xmm3, xmm0, xmm3
	vminss  xmm3, xmm1, xmm3
	vcvttss2si esi, xmm3
	vmaxss  xmm3, xmm0, xmm16
	vminss  xmm3, xmm1, xmm3
	vcvttss2si edi, xmm3
	...

So, the loop get unrolled with usage of ymm* registers.

Quick benchmark shows me some improvement from 19 to 16 ms for U16 “image” 4096x4096:

Two more links:

Creating A DLL With Rust.

How to call Rust code from LabVIEW / LabVIEW wrapper around the Rust.