Safe zero-copy operations in C#

216 points by sedatk 2 days ago

> Spans and slice-like structures in are the future of safe memory operations in modern programming languages. Embrace them.

I've been using Span<T> very aggressively and it makes a massive difference in cases where you need logical views into the same physical memory. All of my code has been rewritten to operate in terms of spans instead of arrays where possible.

It can be easy to overlook ToArray() or (more likely) code that implies its use in a large codebase. Even small, occasional allocations are all it takes to move your working set out of the happy place and get the GC cranking. The difference in performance can be unreasonable in some cases.

You can even do things like:

  var arena = stackalloc byte[1024];
  var segment0 = arena.Slice(10);
  var segment1 = arena.Slice(10, 200);
  ...

The above will incur no GC pressure/activity at all. Everything happens on the stack.

rurban - 2 days ago

[flagged]
- adrian_b - 2 days ago
  
  That looks more like the effect of a bad compiler.
  On most modern CPUs, including Intel/AMD & ARM-based, memory accesses with relative-addressing to the stack have at least the same performance if not much better performance than memory accesses through pointers to the heap. The stack is also normally cached automatically by the memory prefetcher.
  So whichever was the cause of your performance increase was not an inefficiency of the stack accesses, but some other difference in the code generated by the compiler. Such an unexpected and big performance difference could normally be caused only by a compiler bug.
  A special case when variables allocated in the stack can lead to low performance is when a huge amount of variables are allocated or many arrays are allocated, and then they are only sparsely used and the initial stack is much smaller. Then any new allocation of an array or big bunch of variables may exceed the stack size, which will cause a page fault, so that the operating system will grow the stack by one memory page.
  This kind of transparent memory allocation by page faults can be much slower than the explicit memory allocation done by malloc or new. This is why big arrays should normally be allocated either statically or in the heap.
  - osigurdson - 2 days ago
    
    What they are saying is the stackalloc approach causes no GC pressure. You can run that in a tight / infinite loop with essentially no downside. Using a regular heap allocated array in the same situation will hammer the GC.
    C# doesn't do escape analysis and automatically put things on the stack like (for example) Go does. However the stack size is limited in C# so I wouldn't suggest going too crazy with stackalloc and deep stacks / recursion. You don't want a stack overflow!
    
    ygra - 2 days ago
    
    > C# doesn't do escape analysis and automatically put things on the stack
    There has been work ongoing in this direction since .NET 9 at least, but the effect is very limited currently. The following code however, has no allocations at runtime, despite having an object creation in the code:
    https://sharplab.io/#v2:C4LghgzgtgPgAgJgIwFgBQcDMACR2DC2A3ut...
    
    Izikiel43 - 2 days ago
    
    > C# doesn't do escape analysis
    In net9 they started on that, and on net10 they have improved it, you can check the performance blogpost by Stephen toub for net10 for more info
- uecker - 2 days ago
  
  I do not understand this comment. A stack access should never be more expensive than access to malloced data. This is easy to see: Malloc gives you a pointer. An address computation to find a variable on the stack also gives you a pointer but is much cheaper than malloc. If the compiler recomputes the address to a stack variable then this must be because it is deemed cheaper than wasting a register to cache the pointer. Nowadays compilers can transform malloc to stack allocations variables in certain cases.
- whizzter - 2 days ago
  
  Not trying to mock here, but did you program assembly in the 80s/90s and/or look at compilers involved with CPU's of comparable complexity to the 68000 series or older?
  Yes, relative addressing on older machines like that does hurt a ton (I was coding a jam-game of the GameBoy recently and had to re-orient some of my code away from stack-usage since I've gotten a tad "lazy" over the years and didn't care to program 100% assembly).
  On modern machines (since circa Pentium generation CPU's) that can do one cycle-multiplications memory-offset accesses are often irrelevant for the performance of access operations.
  More importantly, at about 500mhz-1ghz the sheer latency of main-memory accesses vs cached accesses will start to become your main worry since cache-misses start to approach _hundreds of cycles_ (this is why we have the entire data-oriented-design philosophy growing), while on the other hand the modern CPU's can pipeline many instructions to even make the impact of bounds checks more or less insignificant compared cache misses.
- nlitened - 2 days ago
  
  > Just a few days ago I changed stack access to heap access in a larger generated perfect hash and got 100x better performance
  Link me the code so that I can show what you did wrong
  - rurban - 2 days ago
    
    https://github.com/rurban/cmph/commit/4bb99882cc21fd44e86602...
    I did not nothing wrong :)
    And I remember better now. I changed stack access to immediate global access to const arrays. And insane was the compilation time, not so much the runtime. -O2 timed out, and if it compiled it was 10x slower run-time. Now compilation is immediate.
    This compiles large perfect hashes to C code. Like gperf, just better and for huge arrays.
    
    nlitened - 2 days ago
    
    Thank you for linking.
    In this case, as I understand, you switched from a local array which is re-initialized with values upon each function invocation to a global array that is pre-initialized before even the program starts. So my understanding is you're measuring the speed between repeated re-initialization of your local array and no initialization at all — not between "relative" stack access and "absolute" heap or global memory access.
    As I understand, to do it the way you wanted initially, your local array must be `static const` instead of just const (so that there's a single copy of it, essentially making it a global variable as well). To be honest, I am not sure why C compiler doesn't optimize it this way automatically, likely I don't know C spec well enough to know the reasons.
- davidatbu - 2 days ago
  
  Super interesting! Was this C# or something? is there a write-up/mini-blogpost about this somewhere?

buybackoff - 2 days ago

For working with arrays elements without bound checks, this is the modern alternative to pointers, without object pinning for GC and "unsafe" keyword: MemoryMarshal. GetArrayDataReference<T>(T[]). This is still totally unsafe, but is "modern safer unsafe" that works with `ref`s and makes friends with System.Runtime.CompileServices.Unfafe.

Funny point: the verbosity of this method and SRCS.Unsafe ones make them look slower vs pointers at subconscious level for me, but they are as fast if not faster to juggle with knifes in C#.

The `fixed` keyword is mostly for fast transient pinning of data. Raw pointers from `fixed` remain handy in some cases, e.g. for alignment when working with AVX, but even this can be done with `ref`s, which can reference an already pinned array from Pinned Object Heap or native memory. Most APIs accept `ref`s and GC continues tracking underlying objects.

See the subtle difference here for common misuse of fixed to get array data pointer: https://sharplab.io/#v2:C4LghgzgtgPgAgJgIwFgBQcDMACR2DC2A3ut...

Spans are great, but sometimes raw `ref`s are a better fit for a task, to get the last bits of performance.

bengarney - 2 days ago
I increasingly wonder if writing and binding performance critical things in C/C++ would be less overall effort. Performant zero-alloc C# vs C/C++ is backdoor magic vs first class language support. Boxing gloves vs. surgical gloves.
C# _can_ do this! But I face many abstractions: special perf APIs, C#, IL, asm. Outcomes will vary with language version, runtime version, platform, IL2CPP/Burst/Mono/dotnet. But C/C++ has one layer of abstraction (the compiler), and it's locked in once I compile it.
I want to do the thing as exactly and consistently as possible in the simplest way possible!
A build environment that compiles .cpp alongside .cs (no automatic bindings, just compilation) would be so nice for this.
----
Example of what I mean regarding abstractions:
```
  void addBatch(int *a, int *b, int count)
  {
    for(int i=0; i<count; i++) 
      a[i] += b[i]; 
  }
```
versus:
```
    [MethodImpl(MethodImplOptions.AggressiveOptimization)]
    public static void AddBatch(int[] a, int[] b, int count)
    {
        ref int ra = ref MemoryMarshal.GetArrayDataReference(a);
        ref int rb = ref MemoryMarshal.GetArrayDataReference(b);
        for (nint i = 0, n = (nint)count; i < n; i++)
            Unsafe.Add(ref ra, i) += Unsafe.Add(ref rb, i);
    }
```
(This is obviously a contrived example, my point is to show the kinds of idioms at play.)
- int_19h - 2 days ago
  
  But your first code snippet is also valid C# if you just throw in `unsafe` there. And, generally speaking, everything that you can do in C (not C++) can be done in C# with roughly the same verbosity.
  - bengarney - a day ago
    
    It is, but it isn't quite the same as C, either. That is to say, there is some semi-unknowable stack of stuff happening under the covers.
    I will predict the future: you will pull up the JIT assembly output to make the case that they output similarly performant assembly on your preferred platform, and that you just have to do X to make sure that the code behaves that way.
    But my problem is that we are invoking the JIT in the conversation at all. The mental model for any code like this inevitably involves a big complex set of interacting systems and assumptions. Failure to respect them results in crashes or unexpected performance roadblocks.
    
    int_19h - a day ago
    
    I don't see what makes JIT any different from AOT in this case. But C# can be AOT-compiled as well.
    Will it be as efficient? Probably not; C++ compilers have been in the optimization game for a very long time and have gotten crazy good at it. Not to mention that the language itself is defined in a way that essentially mandates a highly optimizing compiler to get decent performance out of it (and avoid unnecessary creation of temporaries and lots of calls to very tiny functions), which then puts pressure on implementations.
    But my point is that this is not a question of language, but implementation. Again, your C example is literally, token-for-token, valid C# as well. And, in general, you can take any random C program and mechanically convert it to C# with the exact same semantics and mostly the same look (with minor variations like the need to use stackalloc for local arrays). So if it's all 1:1, equivalent perf is certainly achievable, and indeed I'd expect a C# AOT compiler to do exactly the same thing as the C compiler here, especially if both are using the same backend; e.g. LLVM.
    Now in practice the implementations are what they are, and so even if you are writing C# code "C-style", it's likely to be marginally slower because optimizer is not as good. But the question then becomes whether it's "good enough", and in many cases the answer is "yes" - by writing low-level C# you already get the 90% perf boost compared to high-level code, and rewriting that in C so that it can be compiled with a more optimizing compiler will net you maybe 10% for a lot more effort needed to then integrate the pieces.
- buybackoff - 2 days ago
  I use an extension for arrays, something like:
  internal static class ArrayExtensions { [MethodImpl(MethodImplOptions.AggressiveInlining)] public static ref T RefAtUnsafe<T>(this T[] array, nint index) { #if DEBUG return ref array[index]; #else Debug.Assert((uint)index < array.Length, "RefAtUnsafe: (uint)index < array.Length"); return ref Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(array), (nuint)index); #endif } }
  then your example turns into:
  public static void AddBatch(int[] a, int[] b, int count) { // Storing a reference is often more expensive that re-taking it in a loop, requires benchmarking for (nint i = 0; i < (uint)count; i++) a.RefAtUnsafe(i) += b.RefAtUnsafe(i); }
  The JITted assembly: https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8AB...
  I'm convinced C# is so much better for high perf code, because yes it can do everything (including easy-to-use x-arch SIMD), but it lets one not bother about things that do not matter and use safe code. It's so pragmatic.
  See also the top comments from a recent thread, I totally agree. https://news.ycombinator.com/item?id=45253012
  BTW, do not use [MethodImpl(MethodImplOptions.AggressiveOptimization)], it disables TieredPGO, which is a huge thing for latest .NET versions.
  - bengarney - a day ago
    
    The world falls into two categories for me. "Must be fast" and "I don't care (much)". C/C++ is ideal for the first one, and C# is awesome for the second.
    My argument isn't that C# is bad or performance is unachievable. It's that the mental overhead to write something that has consistent, high performance in C/C++ is very low. In other words, for the amount of mental effort, knowledge, and iteration it takes to write something fast + maintainable in C#, would I be better served by just writing it in C/C++?
    The linked assembly is almost certainly non-optimal; compare to -O3 of the C version: https://godbolt.org/z/f5qKhrq1G - I automatically get SIMD usage and many other optimizations.
    You can certainly make the argument that if X, Y, Z is done, your thing would be fast/faster. But that's exactly my argument. I don't want to do X, Y, Z to get good results if I don't have to (`return ref Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(array), (nuint)index);` and using/not using `[MethodImpl(MethodImplOptions.AggressiveOptimization)]` are non-trivial mental overhead!).
    I want to write `foo.bar` and get good, alloc free, optimized results... and more importantly, results that behave the same everywhere I deploy them, not dependent on language version, JIT specifics, etc.
    If I was operating in a domain where I could not ever take the C/C++ path, these features of C# are of course very welcome. And in general more power/expressiveness is very good. But circling back, I wonder if my energy is better spent doing a C version than contorting C# to do what I want.
    
    buybackoff - a day ago
    
    It just looks like you are much more fluent in C/C++ than in C#.
EgorBo - a day ago

"moder safer unsafe". It's actually quite the opposite most of the time, void* is safer as it's often doesn't involve GC. I recommend reading https://learn.microsoft.com/en-us/dotnet/standard/unsafe-cod...
- buybackoff - a day ago
  
  I have mostly GC holes in mind when say "safer". Or heap fragmentation, even if it's POH

progmetaldev - 2 days ago

I truly appreciate articles like this. I am using the Umbraco CMS, and have written code to use lower than the recommended requirements to keep the entire system running. While I don't see a use for using a Span<T> yet, I could definitely see it being useful for a website with an enormous amount of content.

I am currently looking into making use of "public readonly record struct" for the models that I create for my views. Of course, I need to performance profile the code versus using standard classes with readonly properties where appropriate, but since most of my code is short-lived for pulling from the CMS to hydrate classes for the views, I'm not sure how much of a benefit I will get. Luckily I'm in a position to work on squeezing as much performance as possible between major projects.

I'm curious if anyone has found any serious performance benefit from using a Span<T> or a "public readonly record struct" in a .NET CMS, where the pages are usually fire and forget? I have spent years (since 2013) trying to squeeze every ounce of performance from the code, as I work with quite a few smaller businesses, and even the rest of my team are starting to look into Wix or Squarespace, since it doesn't require a "me" to be involved to get a site up and running.

To my credit and/or surprise, I haven't dealt with a breach to my knowledge, and I read logs and am constantly reviewing code as it is my passion (at least working within the confines of the Umbraco CMS, although it isn't my only place of knowledge). I used to work with PHP and CodeIgniter pre-2013 (then Kohana a bit while making the jump from PHP to .NET). I enjoy C#, and feel like I am able to gain quite a bit of performance from it, but if anyone has any ideas for me on how to create even more value from this, I would be extremely interested.

jiggawatts - 2 days ago

For a CMS or any similar situation, you can get huge performance improvements from higher level changes than Span<T>. Using the HTTP cache-control headers correctly in conjunction with a CDN can provide an order of magnitude improvement. Simply sending less HTML/CSS/JS by using a more efficient layout template can similarly have a multiplier effect on the entire site.
In my experience, the biggest wins by far were achieved by using the network tab of the browser F12 tools. The next biggest was Azure Application Insights profiler running in production. Look at the top ten most expensive database queries and tune them to death.
The use of Span<T> and the like is much more important for the authors of shared libraries more than "end users" writing a web app. Speaking of which, you can increase your usage of it by simply updating your NuGet package versions, .NET framework version to 9 or 10, etc... This will provide thousands of such micro optimisations for very little effort!
fabian2k - 2 days ago

For a CMS I'd usually suspect the major bottlenecks to be in the DB queries. Especially when the language is already pretty fast by default like C#.
You really need to measure before going to low level optimizations like this. Odds are in this case that the overhead is in the framework/CMS, and you gain the most by understanding how it works and how to use it better.
Span<T> is really more of an optimization you should pay attention to when you write lower level library code.
- LorenPechtel - a day ago
  
  There's space for evil.
  MySQL, C#. I have a rather nasty query, two of the fields in it are actually arrays and have their own tables, in most cases all the children must be read. Strange, my code takes a lot longer to execute the child-reading portion than the console does. Profiler time....the hot spot is the routine (in the library, not my code) that returns the value of the named field! Rewrote the big reads to translate the column names to indexes, then use those to read the fields. I've forgotten just how big the speedup was but that lookup was using the majority of the time of the whole routine.
MarkSweep - 2 days ago

> I'm curious if anyone has found any serious performance benefit from using a Span<T> or a "public readonly record struct" in a .NET CMS
This response is not directly answering that "in a .NET CMS" part of your question. I'm just trying to say how to think about when to worry about optimizations.
These sorts of micro optimizations are best considered when your are trying to solve a particular performance problem, particularly when you are dealing with a site that is not getting a lot of hits. I've experienced using small business ecommerce websites where each page load takes 5 seconds and given up trying to buy something. In that case profiling the site and figuring out the problem is very worth while.
When you have a site getting a lot of hits, these sorts of performance optimizations can help you save cost. If your service takes 100 servers to run and you can find some performance tweaks to get down to 75 server, that may be worth the engineering effort.
My recommendation is to use a profiler of some type. Either on your application in aggregate to identify hot spots in in search of the source of a particular performance problem. Once you identify a hot spot, construct a micro benchmark of the problem in BenchmarkDotNet and try to use tools like Span<T> to fix the problem.
whizzter - 2 days ago

Like others have mentioned, in a CMS like project your bottlenecks are more likely in terms of database and/or caching.
Span<T> , stackalloc and value-structs will matter more when writing heavy data/number crunching scenarios like imageprocessing, games, "AI"/vector queries or things like _implementing_ database engines (see yesterdays discussion on the guys announcing they're using C++ where Rust, Go, Erlang, Java and C# was discussed for comparisons https://news.ycombinator.com/item?id=45389744 ).
I'm often spending my days on writing applications that are reminiscent of CMS workloads and while I sometimes do structs, I've not really bought out my lowlevel optimization skills more than a few times in the past 6 years, 95% of the time it's bad usage of DB's.
pjmlp - 2 days ago

Usually CMS performance problems are related to the database, or how rendering components are being used, or wrongly cached.
The two .NET CMS I have experience with, Sitecore and Optimizely, something like Span would hardly bring any improvement, rather check the way their ORM is being used, do some direct SQL, cache some renderings in a different way, cross check if the CMS APIs are being correctly used.
WorldMaker - 2 days ago

> I'm curious if anyone has found any serious performance benefit from using a Span<T> or a "public readonly record struct" in a .NET CMS, where the pages are usually fire and forget?
Most of the benefits of Span<T> you gain by keeping up with .NET upgrades. Span<T> is a low level optimization that benefits things like ASP.NET internals far more than most user code. Each version of .NET since Span<T> was added has improved the use of it. Additionally in C#, the compiler prefers Span<T> overloads when they make sense so just rebuilding for the most recent .NET opts you in to the benefits. Whether or not those are "serious" benefits is a matter of taste and also a reminder that your code probably doesn't spend all of its time doing low level things. (Your database query time, for instance is generally going to have a bigger impact.)
I'd add a big word of caution for "public readonly record struct". I've seen a few codebases start using that wordy version "by default" and then build themselves into a far bigger terrible performance pit than they expected. There's a lot of good reasons that "record" defaults to class and has you opt in to struct behavior. It's a lot easier to reason about classes. It's a lot easier to understand the performance trade-offs on the side of classes. The GC is your friend, not your enemy, even and sometimes especially for short-lived data. (Gen0 collections are often very fast. The "nursery" was designed for fire-and-forget data churn. It's the bread-and-butter job of a generational garbage collector to separate the churn from the stable, handle the churn quickly and keep the stable stabler.)
Structs are pass-by-value, which means as soon as they exit the "lucky path" of staying in the same stack they are copied from place to place. If your models include a lot of other structs, you start copying memory a lot more regularly. If your structs grow too large for certain stackframe quotas they get boxed onto the GC heap anyway not saving you heap allocations.
Classes are pass-by-reference. If you are using "readonly" as a part of your structs to build immutable data models, all the copies add up from every immutable data change creating a new struct. Whereas "regular' immutable records (classes) can share structure between each other by reference (the immutable parts that don't change can use the same references and thus share the same memory).
If your models are more than a couple integers and have any sort of nesting or complex relationships, "public readonly record struct" can be a premature optimization that actually ends up costing you performance. Not every bit of data can be moved to the stack and not every bit of data should be moved to the stack. Keep in mind there are trade-offs and a healthy performing .NET application generally uses a smart mixture of stack and GC, because they are both important tools in the toolbelt. Like I said, there are reasons that "public record" defaults to "class" and "public readonly record struct" is the wordy opt-in and it is useful to keep them in mind.

Freedom2 - 2 days ago

I recall a specific project involving a network appliance that generated large log streams. Our bottleneck was the log parser, which was aggressively using string.Substring() to isolate fields. This approach continuously allocated new string objects on the heap, which led to excessive pressure on the GC.

The transition to using ReadOnlySpan<char> immediately addressed the allocation issue. We were able to represent slices of the incoming buffer without any heap allocations and the parser logic was simplified significantly.

rkagerer - 2 days ago

Array element accesses are bounds-checked in C# for safety. But, that means that there's performance impact...

Why don't we have hardware support for this yet? (i.e. CPU instructions that are bounds-aware?)

Edit: Do we?

https://stackoverflow.com/questions/40752436/do-any-cpus-hav...

mwsherman - 2 days ago

I move between Go and C#. I wrote a zero-allocation package in Go [1] and then ported to C# — and the allocations exploded!

I had forgotten, or perhaps never realized, that substrings in C# allocate. The solution was Spans.

Notably, it caused me to realize that Go had “spans” designed in from the start.

[1] https://github.com/clipperhouse/uax29

yeasku - 2 days ago

Strings in C# are inmutable.
To work with strings you should use StringBuilder.
- kbolino - 2 days ago
  
  Eric Lippert describes the difference between immutability and what he calls "persistence" and explains why C#/.NET copies the string contents to make a substring: https://stackoverflow.com/a/6750591/814422
  Go's strings are also immutable and yet substrings share the same internal memory. Java/JVM also has immutable strings and yet substrings shared the char[] array of the parent string up until Java 7, when they switched to copying instead (for the same reason as .NET): https://mail.openjdk.org/pipermail/core-libs-dev/2012-June/0...
  - yeasku - 7 hours ago
    
    That SO link is really good, thank you for the comment.
- orphea - 2 days ago
  > Strings in C# are inmutable.
  Yes, but
  > To work with strings you should use StringBuilder.
  It helps combine strings together. The author needed the opposite - split/slice strings.
neonsunset - 2 days ago

No, slices in Go are more akin to ArraySegment but with resizing/copy-on-append. It does not have the same `byref` mechanism .NET supports, which can reference arbitrary memory (GC-owned or otherwise) in a unified way as a single (special) pointer type.
- kbolino - 2 days ago
  This is wrong.
  Slices in Go are not restricted to GC memory. They can also point to stack memory (simply slice a stack-allocated array; though this often fails escape analysis and spills onto the heap anyway), global memory, and non-Go memory.
  The three things in a slice are the (arbitrary) pointer, the length, and the capacity: https://go.dev/src/runtime/slice.go
  Go's GC recognizes internal pointers, so unlike ArraySegment<T>, there's no requirement to point at the beginning of an allocation, nor any need to store an offset (the pointer is simply advanced instead). Go's GC also recognizes off-heap (foreign) pointers, so the ordinary slice type handles them just fine.
  The practical differences between a Go slice []T and a .NET Span<T> are only that:
  1. []T has an extra field (capacity), which is only really used by append() 2. []T itself can spill onto the managed heap without issue (*)
  Go 1.17 even made it easy to construct slices around off-heap memory with unsafe.Slice: https://pkg.go.dev/unsafe#Slice
  (*): Span<T> is a "ref struct" which restricts it to the stack (see https://learn.microsoft.com/en-us/dotnet/csharp/language-ref...); whereas, []T can be safely stored anywhere *T can
  - kbolino - a day ago
    
    (can't respond directly and don't have the rep to vouch)
    > Span bounds are guaranteed to be correct at all times and compiler explicitly trusts this (unless constructed with unsafe), because span is larger than a single pointer, its assignment is not atomic, therefore observing a torn span will lead to buffer overrun, heap corruption, etc. when such access is not synchronized, which would make .NET not memory safe
    Indeed, the lack of this restriction is actually a (minor) problem in Go. It is possible to have a torn slice, string, or interface (the three fat pointers) by mutably sharing such a variable across goroutines. This is the only (known) source of memory unsafety in otherwise safe Go, but it is a notable hole: https://research.swtch.com/gorace
  - neonsunset - a day ago
    
    [dead]
- int_19h - 2 days ago
  
  Go pointers can point at the stack or inside objects just fine, they are exactly as expressive as C# unsafe pointers (i.e. more expressive than `ref`).
  What Go can't do is create a single-element slice out of a variable or pointer to it. But that just means code duplication if you need to cover both cases, not that it's not expressible at all.
  - kbolino - a day ago
    
    > What Go can't do is create a single-element slice out of a variable or pointer to it.
    var x int s := unsafe.Slice(&x, 1) fmt.Println(&x == &s[0]) // Output: true
    
    int_19h - a day ago
    
    Good catch! That takes care of the unsafe pointer case, but not the safe ref case.
    There's no reason for this to be unsafe - you're asking for a 1-element slice, and the compiler knows that the variable is always going to be there as long as the reference exists.
    In C#, `Span<T>` has a (safe) constructor from `ref T`.

hahn-kev - 2 days ago

I'm a C# dev main and love spans.

I understand it's not in the same realm as Rust, but how comparable is this to some of the power that Rust gives you?

whizzter - 2 days ago

I think the main differentiator is this:
C# is GC'd, the system will protect memory in use and while it can allow things like Span<>, Memory<>,etc there are some constraints due to sloppy lifetime reasoning but in general "easy" usage since you do not need to care about lifetime.
Rust has lifetime semantics built down to the core of the language, the compiler will know much better about what's safe but also forbid you early from things that are safe but not provable (they're improving the checkers based on experience though), due to it's knowledge it will be better at handling allocations more exactly.
Personally as someone with an assembly, C,C++,etc background, while I see the allure of Rust and do see it as a plus if less experienced devs that really need perf go for Rust for critical components, and thinking I'm going to try to do some project in Rust...
I've so far not seen a project where the slight performance improvement of Rust will outweight the "productivity" gain from being able to do a bit "sloppy" connections that C# allows for.
- int_19h - 2 days ago
  
  C# ref types also have a form of borrow checker that ensures that they never outlive whatever they point to. It's a more simplified version than Rust because lifetimes are always inferred.
  - whizzter - 19 hours ago
    
    Yeah, ran into one of those rules the other week. Was kinda annoying since it was a very theoretical issue compared to the real case I had that was completely safe.
    
    neonsunset - 18 hours ago
    
    [dead]
osigurdson - 2 days ago

For many projects, the GC is a complete non-issue and you never think about it. For some it is a huge issue and you are always thinking about this ghost in the background and how to craft things so that the ghost doesn't haunt you. If you always have to think about it like this you are probably better off using a non-GCed language.
afdbcreid - 2 days ago

It's a taste. C# allows the more common patterns, but Rust allows much more.
For example, Rust allows the equivalent of storing `Span<T>` (called slice in Rust) everywhere (including on the heap, although this is rare).
- sedatk - 2 days ago
  
  C# has a separate heap storable span equivalent called Memory<T>.
  - afdbcreid - 2 days ago
    
    AFAIK it's less efficient though.
    
    sedatk - 2 days ago
    
    Yes, but it can still beat copying.
- Rohansi - 2 days ago
  
  The restriction in C# comes from its ability to reference stack allocated memory. I'm not familiar with Rust but it probably figures it out based on the lifetime of T.
  - Measter - 2 days ago
    
    It's not the lifetime of T, but the lifetime of whatever is storing T.
louthy - 2 days ago

This should help:
“A comparison of Rust’s borrow checker to the one in C#”
https://em-tg.github.io/csborrow/

titzer - 2 days ago

This is why Virgil has support for ranges in the language, which are better than slices. They are value types that represent a subset of a larger array. They can also be off-heap, which allows a Range<byte> to safely refer to memory-mapped buffers.

https://github.com/titzer/virgil/blob/master/doc/tutorial/Ra...

algorithmsRcool - 2 days ago

Isn't Span<T> the exact same concept?
- titzer - 2 days ago
  
  Conceptually they are almost identical, though Virgil has explicit source syntax for making a subrange. One important difference, according to [1] Span<T> can only live on the stack. There are no lifetime restrictions for Range<T> in Virgil.
  [1] https://learn.microsoft.com/en-us/archive/msdn-magazine/2018...
  - achandlerwhite - 2 days ago
    
    Memory<T> is the heap compatible version of Span<T> in C# for anyone curious.

klabetron - 2 days ago

For the op, awesome article. Quick question: what’s the definition of your `swap(array, i, pivotIndex)` function? Am I missing something? Or just assumes it’s the standard set temp to a, set a to b, and set b to temp?

sedatk - 2 days ago

Thanks! I've had several iterations on the code samples while writing the article, swap() was just a remnant. You guessed it correct, it was supposed to be replaced with tuple swaps: "(a,b)=(b,a)". Now, done. :)

pjmlp - 2 days ago

This is a good example to learn how to use the tools a programming language offers, just saying a programing language has a GC thus bad is meaningless, without understanding what is actually available.

Regarding,

> Spans and slice-like structures in are the future of safe memory operations in modern programming languages.

It is sad how long stuff takes to reach mainstream technology, in Oberon the equivalent declaration to partition would be,

    PROCEDURE partition(span: ARRAY OF INTEGER): INTEGER

And if the type is the special case of ARRAY OF BYTE (need to import SYSTEM for that), then any type representation can be mapped into a span of bytes.

You will find similar capabilities in Cedar, Modula-2+, Modula-3, among several others.

Modern safe memory langaguage are finally catching up with the 1990's research, pity it always takes this much for adoption of cool ideas.

Having said this, I feel modern .NET has all the features that made me like Modula-3 back in the day, even if some are a bit convoluted like inline arrays in structs.

whizzter - 2 days ago

Even on the contrary, once you introduce GC usage into languages without direct support you're basically always running a substandard GC because so many advances(enabled by read/write barriers) aren't really available without language support.
Then people go around shouting GC's are bad because they used them with a language that made them use some of the worst ones out there.

uecker - 2 days ago

Also works in C: https://uecker.codeberg.page/2025-07-02.html

aw1621107 - 2 days ago

I don't think that's quite a 1-to-1 match for what's described in the article. Both C#'s Span<T> and your span type are type- and bounds-safe, but the former has additional restrictions placed on its usage thanks to the `ref` keyword that guarantee that it will be free of lifetime errors as well without needing to involve the runtime.
- uecker - 2 days ago
  
  True, that part is different and not safe in C. Lifetime is something that will need new language extensions / annotations.

smilekzs - 2 days ago

Anecdote: 9 years ago I was at MSFT. Hands forced by long GC pauses, eventually many teams turned to hand-rolling their flavor of string_view in C#. It was literally xkcd.com/927 back then when you tried to interface with some other team's packages and each side has the same but different string_view classes. Glad to see that finally enjoying language and stdlib support.

ZeroConcerns - 2 days ago

(ReadOnly)Span<T> has been available for 8 years now, and even before that, in the legacy Framework, there were common readonly-ref string slicers.
- pjmlp - 2 days ago
  
  ArraySegment for example.
- Izikiel43 - a day ago
  
  You are assuming most devs know about this.
  I do check the standard library for things that sound like they should be there as their common enough. My experience tells me this approach is not as common as you would expect, same for C# in msft, I don’t know how many people using framework knew about array segment.