My take on an embedded "Hello World!" example

13 Oct, 2025

Intro

In my opinion, the usual embedded "Hello World!" aka "Blink a LED" examples tend to leave you with a lot of question marks.
They help to overcome common difficulties with setting up the toolchain and getting something to run. There is a lot of value in that, but when I started learning embedded Rust, I had wished for more.

Out of this motivation, I created my own "Hello World!" example in 2023 as a reflection of my learning journey from 2019 to 2023.
When I recently came across it again, I wanted to "modernise" it. The stm32f3xx-hal crate is no longer actively developed, and by now there is a very well-maintained HAL available with embassy-stm32. Also, the embedded-hal 1.0 release has taken place in the meantime.

With that in mind, I decided to update the used crates, polish the example a bit, and make it accessible to others. Maybe it will fill a gap for some people.

Blinking examples are fairly unimpressive from a functionality point of view. You flash them onto the board, and an LED blinks... wow.
The interesting part is exploring the code base to see how it is done. Rust is not just C with different syntax. I think these examples do a fairly good job of demonstrating that Rust has more to offer than just new syntax.

So I decided to write this post to read while exploring the code to blink an LED on different abstraction levels, so it is possible to explore the differences between Rust and C apart from the syntax.

In the following, I don’t want to simply repeat the content of the examples, but rather describe what learning content each example can provide beyond code and comments. In these five examples, most of the important concepts of embedded Rust are present. The idea is to have the post open in addition to the code and explore these concepts.

You can find the example code on GitHub.

Examples

In the repository you can find five examples that shows how to blink LEDs. Any example can be flashed to the demoboard via probe-rs and every time you will see a blinking LED (last example even two). We start from raw pointers and work the way up the abstraction stack.

Minimal - Unsafe raw pointer

This first example is as minimal as it gets from a dependency and language-features point of view.

Disclaimer: It is possible to make it even more minimal without cortex-m-rt and stm32f3, but I consider the stuff before main as out of scope.

It shows the basic way to declare constants and work with raw pointers in Rust to access the memory-mapped peripherals. A little proof that, in fact, it is possible to have raw access to the hardware in Rust.

It also highlights where the unsafe keyword needs to be used. When I was exploring where unsafe was needed and where not, I fell down quite the rabbit hole.

Creating a Raw Pointer

Creating a const pointer is considered safe.

const RCC_BASE: *const u32 = 0x4002_1000 as *const u32;

This made sense because the mere existence of a pointer is not a problem; only when we dereference the pointer might we run into problems. So creating a pointer on its own doesn’t introduce undefined behavior.

The Confusing Part: Unsafe Pointer Arithmetic

What really got me digging deep was this line of code:

    // Create mutable pointer to AHBENR register
    let ahbenr = unsafe { RCC_BASE.add(RCC_AHBENR / size_of::<usize>()).cast_mut() };

Here we add an offset to the register base to get the pointer to the correct register. At first glance, this looked identical to the previous example — no dereferencing, just the creation of a pointer to a address. So why does this require unsafe?

It turns out I am not the only one wondering, so I quickly found a Reddit post that does a fairly good job of providing an answer. In its comments and the provided links, you are able to dig as deep as you like. I will leave it at this point here, as I am in no way qualified to explain it better than the sources you find in the Reddit post.

Why is .add() unsafe?

In short: pointer arithmetic on its own can invoke undefined behavior, and therefore Rust demands the unsafe keyword here. If this is not satisfying, read the sources linked in the post.

From pointers to PAC (Peripheral Access Crate)

Apart from demonstrating the basic usage of a PAC crate, this example also shows how to work with global static data and interrupts in Rust.

Coming from C, I was used to declaring a global static uint32_t to access a tick value from both the interrupt context and the main context. On a Cortex-M4, I knew this was safe because there a atomic 32-Bit operations present. But when I tried to port this approach to Rust, I quickly realized it led to a lot of unsafe blocks. Writing unsafe feels uncomfortable — and that’s a good thing. It pushes you to look for other better ways to solve the problem.

Fortunately, the Cortex-M4 core supports atomic operations, and Rust provides atomic types to take advantage of that. By using a static AtomicUsize, we can read and write to a global variable without any unsafe. This is one of the beautiful aspects of Rust: it lets you express your assumptions and safety contracts directly in the type system. If I try to compile this code for hardware that lacks atomic support, the compiler will fail the build — reducing the risk of subtle bugs or undefined behavior in the future.

So, problem solved? Yes but there is more to learn!

When learning Rust, you quickly notice that variables are immutable by default. Adding the mut keyword makes them mutable. So, mut controls whether a variable can be modified. Simple enough — but not the whole story.

Rust claims to be memory-safe and free of data races, proven at compile time. It achieves this through the rules of ownership and borrowing. When you create a reference to a value, you borrow it. Rust enforces that there can either be:

One exclusive mutable reference (&mut T), or
Any number of non-exclusive immutable references (&T).

This ensures that if there’s a single mutable reference, all reads and writes are safe — and if there are multiple immutable references, only reads can occur.

Now, when we define a global static variable, the compiler has no way to know how and from where that variable will be accessed. If the variable isn’t mutable, that’s fine — no mutation can happen. But if it is mutable (static mut), the compiler can no longer guarantee exclusive access. That’s why any use of static mut is automatically considered unsafe. The compiler is telling us: “You’re on your own — I can’t check this for you.”

This is, of course, a simplified explanation of Rust’s ownership and borrowing model but it should be enough for our example here. There are excellent resources for exploring this topic in depth, and I highly recommend doing so.

But here’s an interesting question: how can you mutate a supposedly immutable static, like an AtomicUsize, using methods like .store()?

This is where the concept of interior mutability comes in. Interior mutability allows data to be changed even when you only have an immutable (&T) reference. At first, that sounds like it breaks Rust’s guarantees — but it doesn’t, because the type itself enforces safety internally.

The way I like to think about it: when you write unsafe, you are responsible for ensuring correctness. With types that implement interior mutability, the type’s author has already done that work for you. You interact with the type through safe methods that internally handle synchronization, atomicity, or borrowing rules. From the compiler’s perspective, you’re still using only immutable references — but mutation happens safely inside the type.

There are several clever implementations of interior mutability in Rust. A great place to learn more is the core::cell module documentation.

In the case of AtomicUsize, the safety is easy to reason about: every access is atomic, so there are no data races — even with multiple concurrent writers.

In embedded systems, we often need global statics to synchronize data between the interrupt and main contexts. Understanding safe abstractions like AtomicUsize is essential. They let us write correct, concurrent code without resorting to unsafe, and without having to manually reason about every access pattern each time.

A general remark: It is a widely shared consensus in the Rust community that it’s more helpful to think in terms of exclusive versus non-exclusive references rather than mutable versus immutable. The concept of interior mutability effectively breaks the simple mutable/immutable distinction, making the exclusivity model a more accurate way to understand Rust’s borrowing rules.

From PAC to HAL (Hardware Abstraction Layer)

I think the most interesting part here is to explore how the HAL crates (same as the PAC crates) model the peripherals as a singelton structure.

// PAC example
    // PAC models all MCU Vendor specific peripherals as a singleton struct to invoke ownership rules on hardware
    // take() call will only succeed once as it takes the singleton instance
    // all subsequent calls will return None
    let p = Peripherals::take().expect("First line in main - Singleton should not be taken here");

// HAL example

    // same as in the PAC example the peripherals are taken as a singleton struct
    // embassy_stm32::init will panic internally if a second call occurs
    // in addition init also performs initialization of the clock tree and other necessary setup
    // the default config is a safe but maybe not optimal configuration
    let p = embassy_stm32::init(Config::default());

Methods like take or the embassy-stm32 init return a struct combining all hardware resources of the MCU. A second call to these methods will panic or return a None value. After obtaining the resources Rust keeps track of the hardware via its ownership rules. This is very helpful as it eleminates potential sources for errors. In traditional embedded C programming, it’s easy to make mistakes that involve accessing a peripheral in multiple places at once. For example, two modules might both think they “own” a GPIO pin, or an interrupt handler might modify a peripheral register while the main loop also tries to use it. Such issues can lead to undefined behavior, random glitches, or even hardware lockups.

In C, it’s up to you — the developer — to ensure that only one part of your program interacts with a given hardware resource at a time. The compiler can’t help you; it doesn’t understand the concept of ownership for hardware.

In Rust on the other hand we have the compiler helping us:

    let mut orange_led_pin = Output::new(p.PE10, Level::Low, Speed::High);

The ownership of Pin PE10 is moved from the peripheral structure p to the LED. Subsequent uses of PE10 will lead to a compiler error.

This is the power of Rust’s ownership system applied to hardware: once you hand over a peripheral or a pin to a driver, you can’t accidentally use it again. The compiler enforces exclusivity at compile time.

What’s remarkable is that all of this safety happens at compile time. There’s no runtime cost — no checks or extra flags. The compiler ensures correct usage before your code even runs.

From HAL to BSP (Board Support Package)

While HAL crates give you safe, device-specific access to microcontroller peripherals, BSPs take the idea one step further — they build hardware abstraction at the board level, allowing you to write code that’s portable across different microcontrollers or even entirely different platforms.

In this example you can find a implementation of a BSP layer to enable MCU and board agnostic application developement.

A HAL crate is usually specific to a family of microcontrollers — for example, stm32f3xx-hal, nrf52-hal, or, as in this example, embassy-stm32. It provides safe abstractions over registers, clocks, and peripherals such as GPIO, UART, or SPI.

But if you’re developing a complete product, you typically care not just about the MCU, but about how its peripherals are wired up on your board. Maybe LED1 is on pin PE10, a sensor is connected to I2C1, and a UART goes to a debug connector. This is the level where we talk about components.

A Board Support Package captures exactly that information.

Defining a Board Struct

In this example, a BSP is implemented as a module (for a real application this might also be a separate crate) that exposes a Board struct describing the available components.

We define our board like this:

pub struct Board {
    pub orange_led: OrangeLed,
    pub red_led: RedLed,
}

Our board has two LEDs to keep it simple for now. The types of the two members describe what they are. In our case, OrangeLed and RedLed refer to two LEDs with the colors found on our STM32F3Discovery board. But what are these types?

Here is the full definition of the board module:

#[cfg(feature = "bsp_stm32f3discovery")]
mod board_stm32f3discovery;

#[cfg(feature = "bsp_host")]
mod board_host;

#[cfg(feature = "bsp_stm32f3discovery")]
pub use board_stm32f3discovery::{OrangeLed, RedLed};

#[cfg(feature = "bsp_host")]
pub use board_host::{OrangeLed, RedLed};

/// A board struct that describes all available hardware components on the board
pub struct Board {
    pub orange_led: OrangeLed,
    pub red_led: RedLed,
}

At the BSP level, I want not only to define board-level components but also the possibility to be generic over different target boards. In the board module, we use type aliasing to define a Board structure that can be implemented by different board configurations. Here we use Rust’s feature system to either build for the actual demo board or for a host-based simulated board. Depending on the selected feature, the types in the board structure resolve to different implementations provided by the mod board_* modules.

The application code using the Board doesn’t know about this at all. It only sees the public Board type and the component type names like OrangeLed and RedLed. Type aliasing, in combination with embedded-hal traits, allows for a board definition that can be provided for different targets and then consumed by application logic that accepts this generic Board definition.

When you create the Board, it initializes and takes ownership of all the underlying pins and peripherals needed through the HAL crate. All unused peripherals are not accessible and are dropped at the end of new. Only the actual components created here are available to the application code.

impl Board {
    fn new() -> Self {
        use embassy_stm32::gpio::{Level, Output, Speed};
        let p = embassy_stm32::init(Default::default());
        // init method takes ownership of all peripherals
        // and performs initialization of clocks, etc.
        let orange_pin = Output::new(p.PE10, Level::Low, Speed::High);
        let red_pin = Output::new(p.PE9, Level::Low, Speed::High);
        let orange_led = Led::new(orange_pin, LedState::Off);
        let red_led = Led::new(red_pin, LedState::Off);

        Self {
            orange_led,
            red_led,
        }
    }
}

From that point on, your application simply uses the Board instance — it doesn’t need to know which MCU or which exact pins are involved.

Generic Components via embedded-hal

The application logic depends not on the Board itself but is again generic and bound by the embedded-hal traits. The embedded-hal crate defines a set of common traits that describe how embedded hardware behaves — traits for things like digital input/output, SPI, I2C, timers, and serial interfaces.

An application task then looks like this:

fn led_task<T: OutputPin>(led: &mut Led<T>) {
    // toggle LED
    led.toggle();
    // wait for 1 second
    block_for(Duration::from_millis(1000));
}

It doesn’t matter to what type the Board members resolve, as the contract to the application is only the embedded-hal trait. As long as your implementation of the Board provides the relevant embedded-hal traits, the application will run — whether on an STM32, an nRF, an RP2040, or even a simulated environment on your PC.

True Hardware Independence

This approach creates a clean separation of concerns:

The HAL deals with MCU-specific register access.

The BSP defines how that MCU’s peripherals are wired on your particular board and allows for multi-target support beyond MCU hardware.

The application only interacts with abstract traits defined by embedded-hal.

Your application never directly touches a microcontroller or peripheral. Instead, it “talks” to a Board that implements the right traits. When you move to a different MCU — or even to a host PC for testing — you can simply implement another Board struct that fulfills the same interface. The rest of your code doesn’t need to change; it continues using the same traits it already knows.

The Final Step Toward Portability

With this layer in place, your embedded Rust project achieves true hardware independence. The BSP serves as the final bridge between device-specific HALs and fully generic, portable application logic.

All that is left to do is to adjust the applications entry point to support both cortex-m and host.

// Main entry point for the application running on the embedded target
#[cfg(feature = "bsp_stm32f3discovery")]
#[cortex_m_rt::entry]
fn main() -> ! {
    let board = Board::default();
    app::run(board);
}

// Main entry point for the application running on the host target
#[cfg(feature = "bsp_host")]
fn main() {
    let board = Board::default();
    app::run(board);
}

mod app {
    // Application logic that is generic over the board
    use embassy_time::{Duration, block_for};
    use embedded_hal::digital::OutputPin;
    use lets_blink::board::Board;
    use lets_blink::led::Led;

    // Run the application with the given board
    pub fn run(board: Board) -> ! {
        let mut orange_led = board.orange_led;
        loop {
            led_task(&mut orange_led);
        }
    }
    /// Simple blocking LED task
    fn led_task<T: OutputPin>(led: &mut Led<T>) {
        // toggle LED
        led.toggle();
        // wait for 1second
        block_for(Duration::from_millis(1000));
    }
}

With this in place the BSP example can be build for both the demoboard and a host target.

With the alias cargo rh you can run the BSP example on your host machine.

It’s a powerful idea: once the interfaces are defined in terms of embedded-hal traits, your application becomes agnostic to the hardware it’s running on. You can now focus entirely on the logic of what your system does — and let the BSP handle how that logic interacts with the physical world.

Finally from blocking to ´async`

Many bare-metal systems are heavily I/O-driven, where the application must perform multiple concurrent tasks that depend on external events — sensor readings, communication interfaces, timers, and so on.

In this example we run two independent concurrent tasks in a single thread context with the help of Rusts async await support.

In the absence of an RTOS with threads, all code runs within a single context — typically the super loop. This means you cannot simply block on I/O operations, as doing so would prevent other parts of the system from running. The usual solution in C or traditional embedded code is to manage a tangle of state machines and non-blocking polling loops manually. It works, but it’s cumbersome, error-prone, and hard to maintain as complexity grows.

This is exactly where async in Rust becomes such a powerful tool.

What async Brings to Embedded Systems

Rust’s async support takes care of the state machines for you. You can now write linear, blocking-looking code inside separate asynchronous tasks — and the compiler automatically transforms it into the necessary non-blocking state machines.

This means you can express complex concurrent logic in a clean, sequential way while still running efficiently in a single-threaded embedded environment.

And the best part: Rust’s design choice for async is genius for embedded systems. The language itself only provides the syntax and compile-time machinery to generate and manage these state machines — the actual execution model is left completely open for implementation.

There is no built-in runtime, no implicit threading, and no hidden allocation — only the building blocks you need to implement concurrency safely and efficiently on your own terms.

The Role of the Waker

At the heart of async lies the Future trait and the Waker concept. The Waker defines a universal interface between the executor, async tasks, and I/O drivers. This separation means that executors, tasks, and drivers can be written independently and still interoperate seamlessly.

You can run async tasks and drivers from any executor, and conversely, use any executor to drive your async code — a level of modularity that fits perfectly into Rust’s philosophy of composition and explicit control.

This flexibility allows embedded developers to scale from extremely lightweight loops up to feature-rich, multi-task executors — all using the same async syntax and semantics.

Lightweight by Design

The basic async/await logic in Rust adds very little overhead. Threading, scheduling, and memory allocation are all left to the specific executor implementation, so you can keep things as minimal or as sophisticated as your target allows.

As is customary in Rust, there is no magic in async. Everything revolves around the Future trait and a few key types. This makes async not only transparent but also predictable and deterministic — both critical properties in embedded systems.

The result is a model that allows composable, concurrent programs without the cost or complexity of an RTOS.

Async vs. RTOS

An RTOS solves concurrency by running multiple threads in parallel, each with its own stack and context, managed by a kernel. This works well but comes with overhead — in memory usage, context switches, and synchronization mechanisms.

Rust’s async model solves the same problem differently. Instead of threads with their own contexts, async uses compiler-generated state machines within a single stack and context. This eliminates much of the runtime overhead and fits perfectly on resource-constrained systems.

Actually this is not a real Async vs. RTOS discussion. RTOS solve also other problems not only concurrency. So given a problem it might be RTOS with async. Using an RTOS does not exclude async usage and the other way around.

Executors in the Embedded Rust Ecosystem

A popular async executor for bare-metal is embassy . It provides an advanced, efficient, and well-maintained runtime tailored for embedded devices.

If you prefer something even more lightweight, there’s lilos from Cliff Biffle — a minimal async executor designed to run on the tiniest systems. It has also really helpful documentation, not only about the executor but also great general explanations on async in Rust.

And in my own example, I demonstrate an even simpler executor that just runs all async tasks unconditionally in a loop. It’s a minimal setup, but it’s enough to explore the power of async and to start building up your own executor with exactly the features you need.

    /// Async provides Waker possibility to only wake async tasks if progress is expected
    /// we use a no operation waker as we just poll futures without sleeping
    /// we poll all futures without checking that there was any progress made
    static NOOP_VTABLE: RawWakerVTable = RawWakerVTable::new(
        |x| RawWaker::new(x, &NOOP_VTABLE), // clone
        |_| (),                             // wake
        |_| (),                             // wake_by_ref
        |_| (),                             // drop
    );
    /// absolut minimal async executor
    pub fn execute(task: &mut [Pin<&mut dyn Future<Output = Infallible>>]) -> ! {
        let w =
            unsafe { core::task::Waker::from_raw(RawWaker::new(core::ptr::null(), &NOOP_VTABLE)) };
        let mut c = core::task::Context::from_waker(&w);
        loop {
            for t in task.iter_mut() {
                match t.as_mut().poll(&mut c) {
                    core::task::Poll::Ready(_) => unreachable!(),
                    core::task::Poll::Pending => (),
                }
            }
        }
    }

If you only want to use async for its state machine generation, you can even skip the Waker and sleeping logic entirely — just poll all tasks continuously. It’s a great way to get started without adding complexity.

Why This Matters

For small, I/O-bound embedded programs, this model is truly revolutionary. It gives you the expressive power of concurrent code, without the memory and timing overhead of an RTOS.

A major milestone in my Rust journey was discovering async and seeing how it transformed my code. When I ported a fairly complex device driver from C to async Rust, that was the moment I became completely convinced about embedded Rust. It fundamentally changed how I think about concurrency and structure in firmware.

Closing Thoughts

Learning embedded Rust is a journey that starts small but quickly scales in depth. From minimal unsafe pointers to safe Peripheral Access Crates, HAL abstractions, BSPs, and finally async tasks, each layer builds upon the last.
By exploring these layers, you can see how Rust’s ownership, type system, and concurrency model help write safe, maintainable, and portable firmware. And to demonstrate all of it you only need a blinking LED!

Happy Blinking!