"But almost all programs have paths that crash, and perhaps the density of crashes will be tolerable."
This is a very odd statement. Mature C programs written by professional coders (Redis is a good example) basically never crash in the experience of users. Crashing, in such programs, is a rare occurrence mostly obtained by attackers on purpose, looking for code paths that generate a memory error that - if the program is used as it should - are never reached.
This does not mean that C code never segfaults: it happens, especially when developed without care and the right amount of testing. But the code that is the most security sensitive, like C Unix servers, is high quality and crashes are mostly a security problem and a lot less a stability problem.
Notice that it says "almost all programs" and not "almost all _C_ programs".
I think if you understand the meaning of "crash" to include any kind of unhandled state that causes the program to terminate execution then it includes things like unwrapping a None value in Rust or any kind of uncaught exception in Python.
That interpretation makes sense to me in terms of the point he's making: Fil-C replaces memory unsafety with program termination, which is strictly worse than e.g. (safe) Rust which replaces memory unsafety with a compile error. But it's also true that most programs (irrespective of language, and including Rust) have some codepaths in which programs can terminate where the assumed variants aren't upheld, so in practice that's often an acceptable behaviour, as long as the defect rate is low enough.
Of course there is also a class of programs for which that behaviour is not acceptable, and in those cases Fil-C (along with most other languages, including Rust absent significant additional tooling) isn't appropriate.
> Rust which replaces memory unsafety with a compile error
Rust uses panics for out-of-bounds access protection.
The benefit of dynamic safety checking is that it's more precise. There's a large class of valid programs that are not unsafe that will run fine in Fil-C but won't compile in Rust.
A lot of my programs crash, and that’s a deliberate choice. If you call one of them like “./myprog.py foo.txt”, and foo.txt doesn’t exist, it’ll raise a FileNotFound exception and fail with a traceback. Thing is, that’s desirable here. If could wrap that in a try/except block, but I’d either be adding extraneous info (“print(‘the file does not exist’); raise”) or throwing away valuable info by swallowing the traceback so the user doesn’t see the context of what failed.
My programs can’t do anything about that situation, so let it crash.
Same logic for:
* The server in the config file doesn’t exist.
* The given output file has bad permissions.
* The hard drive is full.
Etc. And again, that’s completely deliberate. There’s nothing I can do in code to fix those issues, so it’s better to fail with enough info that the user can diagnose and fix the problem.
That was in Python. I do the same in Rust, again, deliberately. While of course we all handle the weird cases we’re prepared to handle, I definitely write most database calls like “foo = db.exec(query)?” because if PostgreSQL can’t execute the query, the safest option is to panic instead of trying foolhardily to get back the last known safe state.
And of course that’s different for different use cases. If you’re writing a GUI app, it makes much more sense to pop up a dialog and make the user go fix the issue before retrying.
I don't think it's odd statement. It's not about segfaults, but use-after-free (and similar) bugs, which don't crash in C, but do crash in Fil-C. With Fil-C, if there is such a bug, it will crash, but if the density of such bugs is low enough, it is tolerable: it will just crash the program, but will not cause an expensive and urgent CVE ticket. The bug itself may still need to be fixed.
The paragraph refers to detecting such bugs during compilation versus crashing at runtime. The "almost all programs have paths that crash" means all programs have a few bugs that can cause crashes, and that's true. Professional coders do not attempt to write 100% bug-free code, as that wouldn't be efficient use of the time. Now the question is, should professional coders convert the (existing) C code to eg. Rust (where likely the compiler detects the bug), or should he use Fil-C, and so safe the time to convert the code?
> it will just crash the program, but will not cause an expensive and urgent CVE ticket.
Unfortunately, security hysteria also treats any crash as "an expensive and urgent CVE ticket". See, for instance, ReDoS, where auditors will force you to update a dependency even if there's no way for a user to provide the vulnerable input (for instance, it's fixed in the configuration file).
Doesn't Fil-C use a garbage collector to address use-after-free? For a real use-after-free to be possible there must be some valid pointer to the freed allocation, in which case the GC just keeps it around and there's no overt crash.
Yes, Fil-C uses some kind of garbage collector. But it can still detect use-after-free: In the 'free' call, the object is marked as free. In the garbage collection (in the mark phase), if a reference is detected to an object that was freed, then the program panics. Sure, it is also possible to simply ignore the 'free' call - in which case you "just" have a memory leak. I don't think that's what Fil-C does by default however. (This would be more like the behavior of the Boehm GC library for C, if I understand correctly.)
Ok, you are right. My point is, yes it is possible to panic on use-after-free with Fil-C. With Fil-C, a life reference to a freed object can be detected.
I'm not sure what you mean. Do you mean there is a bug _in the garbage collection algorithm_, if the object is not freed in the very next garbage collection cycle? Well, it depends: the garbage collection could defers collection of some objects until memory is low. Multi-generation garbage collection algorithm often do this.
I think what you've written is pretty much what the "almost all programs have paths that crash" was intended to convey.
I think "perhaps the density of crashes will be tolerable" means something like "we can reasonably hope that the crashes from Fil-C's memory checks will only be of the same sort, that aren't reached when the program is used as it should be".
I think the point is that Fil-C makes programs crash which didn't crash before because use-after-free didn't trigger a segfault. If anything, I'd cite Redis as an example that you can build a safe C program if you go above and beyond in engineering effort... most software doesn't, sadly.
Redis uses a whole lot of fiddly data structures that turn out to involve massive amounts of unsafe code even in Rust. You'd need to use something like Frama-C to really prove it safe beyond reasonable doubt. (Or the Rust equivalents that are currently in the works, and being used in an Amazon-funded effort to meticulously prove soundness of the unsafe code in libstd.) Compiling it using Fil-C is a nice academic exercise but not really helpful, since the whole point of those custom data structures is peak performance.
There are obviously multiple levels of correctness. Formal verification is just the very top of that spectrum, but it does comes at extraordinary effort.
It is a question of probability and effort. My personal estimation rule for my type of projects is it takes 3 times longer from my prototype to something I‘m comfortable having others use it and another factor to get to an early resemblance of a product. A recent interview I read an AI expert said each 9 in terms of error probability is the same effort.
Most software written does not serve a serious nation level user base but caters to so a relatively small set of users. The effort spent eradicating errors needs to be justified by the effort of workarounds, remediation work and customer impact. Will not be fixed can a rationale decision.
I think the focus should be on tools with high surface area that enforce security bounadaries. Especially those where performance is not so important. Like sudo, openssh, polkit, PAM modules. It would be make a lot more sense than these half-baked rust rewrites that just take away features. (I'm biased I personally had a backup script broken by uutils) I think rewrites in rust need 100% bit for bit feature parity before replacing the battletested existing tools in c userland. I say this as someone who writes rust security tools for linux.
I heard this argument about Rust vs. C that Rust might be memory safe, but the reason why memory safety issues are so prominent in C programs, is that basically every other kind of problem has been fixed throughout its lifetime, so these are the only kind of issues that remain. Both in terms of security and stability.
This is very much not the case for programs that are much newer, even if they are written in Rust they still need years of maturation before they reach the quality of older C programs, as Rust programs suffer from non-memory safety issues just as much. That's why just rewriting things in Rust isn't a panacea.
The perfect example of this the Rust coreutils drama that has been going on.
I can only quote (from the top of my head) the Android team's findings, that having a C++ codebase extended with Rust cut down significantly on the number of memory safety-related issues. The reasoning was that since the stable C++ codebase was no longer actively changed, only patched, and new features were implemented in Rust, the C++ codebase could go through this stabilization phase where almost all safety issues are found.
I don't agree with that assessment at all. The reason memory safety issues are so prominent is that they are extremely likely to be exploitable. Of course you can write exploitable bugs in any language, but most bug classes are unlikely to be exploitable. A bug that always crashes is about a trillion times less severe than a bug that allows someone else to take control of your computer.
How many "mature C programs" try to recover in a usable way when malloc() returns NULL? That's a crash - a well-behaved one (no UB involved) hence not one that would be sought by most attackers other than a mere denial of service - but still a crash.
On 64-bit systems (esp Linux ones) malloc almost never returns NULL but keeps overallocating (aka overcommiting). You don't get out of memory errors / kills until you access it.
One divide when it comes to using Fil-C is C as an application (git) vs C as a library from another language (libgit2).
Suppose we assume that many C applications aren’t performance sensitive and can easily take a 2-4x performance hit without noticing. Browsers and OS internals being obvious exceptions. The ideal candidates are like the ones djb writes, and he’s already a convert to Fil-C. sudo, sshd, curl - all seem like promising candidates.
But as far as I can tell, Fil-C doesn’t work for C libraries that can be called from elsewhere. Even if it could be made to work, the reason other languages like Python or Node use C libraries is for speed. If they were ok with it being 2-4x slower, they would just write ordinary Python or Javascript.
C (and C++) are fundamentally important because of their use in performance sensitive contexts like operating systems, browsers and libraries. If we’re restricting Fil-C to pure C/C++ applications that aren’t performance sensitive, that might still be very important and useful, but it’s a small slice of the large C/C++ pie.
Also, it’s a great tool for an existing C application, certainly. A performance hit in exchange for security is a reasonable trade off while making a battle hardened application work. But for a new application, would someone choose Fil-C over other performant GC languages like Go or Java or C#? I’d be keen to hear why.
Still, I want to stress - this is a great project and it’ll generate a lot of value.
> If they were ok with it being 2-4x slower, they would just write ordinary Python or Javascript.
Python and JavaScript are much more than 4x slower than C/C++ for workloads that are git-like (significant amount of compute, not just I/O bound)
> C (and C++) are fundamentally important because of their use in performance sensitive contexts like operating systems, browsers and libraries
That's a fun thing to say but it isn't really true. C/C++ are fundamentally important for lots of reasons. In many cases, folks choose C and C++ because that's the only way to get access to the APIs you need to get the job done.
Why can't it work? You need to assume that the C library is only ever passed well-behaved pointers and callbacks in order to avoid invoking UB that it can't know about - but other than that it's just a matter of marshaling from the usual C ABI to the Fil-C ABI, which should be doable.
I’m assuming the calling program is a GC language like Python or Node (the most popular run times by far), but the same holds with other popular languages like Ruby. Why would a GC language call out to slow code, that runs its own separate GC. Now you have two GCs running, neither of which knows about the other. I’m not declaring it’s impossible, I’m asking why someone would want to do this.
An example: GitHub’s entire business revolves around calling libgit2 (C) from Ruby. Are they more likely to slow down libgit2 and make it substantially more complex by running 2 GCs side by side, or are they going to risk accept any potential unsafety in regular C? It’s 100% the latter, I’ll bet on that.
A native library is an obvious memory safety hole so I don't see why would it be that controversial to want to fill it even if it introduces another GC (but working on an independent heap, so the slowdown is not necessarily multiplicative)
> ...Now you have two GCs running, neither of which knows about the other. ...
For a strictly time-limited interaction (like what's involved in a FFI call) it's not that bad. Everything that GC2 might directly access is temporarily promoted to a root for GC1, and vice versa.
No one is asking them to stop using libgit2 though. They’re going to continue using it. If they find a serious bug, they’ll fix it and continue using it.
The cost of all the additional hardware is just not worth it. If it was a choice between higher hardware costs, higher request latency, greater operational complexity of a new technology and rewriting libgit2 in a different language without all those tradeoffs, GitHub definitely chooses the latter.
But it’s never going to reach that point because they’ll continue using libgit2 compiled by clang forever.
One thing I've been wondering recently about Fil-C - why now? And I don't mean that in a dismissive way at all, I'm genuinely curious about the history. Was there some relatively recent fundamental breakthrough or other change that prevented a Fil-C-like approach from being viable before? Was it a matter of finding the right approach/implementation (i.e., a "software" problem), or is there something about modern hardware which makes the approach impractical otherwise? Something else?
I wrote a bounds checking patch to GCC (mentioned in a link from the article) back in 1995. It did full bounds checking of C & C++ while being compatible with existing libraries and ABIs, making it a bit more practical than Fil-C to deploy in the real world. You only had to recompile your application, if you trusted the libraries (although the bounds checking obviously didn't extend into the libraries unless you recompiled them). It didn't do the GC thing, but instead detected use after free at the point of use.
> Was there some relatively recent fundamental breakthrough or other change that prevented a Fil-C-like approach from being viable before?
The provenance model for C is very recent (and still a TS, not part of the standard). Prior to that, there was a vague notion that the C abstract machine has quasi-segmented memory (you aren't really allowed to do arithmetic on a pointer to an "object" to reach a different "object") but this was not clearly stated in usable terms.
Also in practical terms, you have a lot more address space to "waste" in 64 bit. It would have been frivolous in 32 and downright offending in 16 bit code.
Beyond the Git history, is there any write-up of the different capability designs you've gone with?
I'm interested in implementing a safe low-level language with less static information around than C has (e.g. no static pointer-int distinction), but I'd rather keep around the ability to restrict capabilities to only refer to subobjects than have the same compatibility guarantees Invisicaps provide, so I was hoping to look into Monocaps (or maybe another design, if there's one that might fit better).
That's a really interesting timeline! Sounds like it's been stewing for a lot longer than I expected. Was there anything in particular around 2018 that changed your opinion on the idiotic-ness of the premise?
If a hypothetical time machine allowed you to send the InvisiCaps idea back to your 2004-era self, do you think the approach would have been feasible back then as well?
Long long ago, in 2009, Graydon was my official on-boarding mentor when I joined the Mozilla Javascript team. Rust already existed then but, as he notes, was quite different then. For one thing, it was GC'd, like Fil-C. Which I like -- I write a lot of my C/C++ code using Boehm GC, have my own libraries designed knowing GC is there, etc.
This has obviously been 'rust'ling some feathers, as it challenges some of the arguments laid past; but once the dust settles, it is a major net benefit to the community.
I hope you get financed and can support other platforms than linux again.
> This has obviously been 'rust'ling some feathers,
I'm a Rust user and a fan. But memory safe C is actually an exciting prospect. I was hoping that the rise of Rust would encourage others to prioritize memory safety and come up with approaches that are much more ergonomic to the developers.
> as it challenges some of the arguments laid past
Genuinely curious. What are the assumptions you have in mind that Fil-C challenges? (This isn't a rhetorical question. I'm just trying to understand memory safety concepts better.)
> but once the dust settles, it is a major net benefit to the community.
Agreed, this is big! If Fil-C can fulfill its promise to make old C code memory safe, it will be a massive benefit to the world. God knows how many high-consequnce bugs and vulnerabilities hide in those.
Same here, I don't have any use for Rust, and am perfectly fine with automatic resource management languages (regardless of the approach).
However, Rust has been quite successful making more developers think about less known type systems, besides affine types, there is also linear types, effects, dependent types, prof systems.
And we as industry aren't going to throw away the millions and millions of stuff that was written in C, C++ and less extent Objective-C, thus efforts like Fil-C are quite welcomed.
> I was hoping that the rise of Rust would encourage others to prioritize memory safety and come up with approaches that are much more ergonomic to the developers.
That's the end-goal right? I don't write Rust code myself, but I'm glad its existence means there's safer code out there now, and like you I have been looking forward to seeing shifts in safety expectations. I'm not surprised that it's happening so slowly though.
Yes, safety got more important, and it's great to support old C code in a safe way. The performance drop and specially the GC of Fil-C do limit the usage however. I read there are some ideas for Fil-C without GC; I would love to hear more about that!
But all existing programming languages seem to have some disadvange: C is fast but unsafe. Fil-C is C compatible but requires GC, more memory, and is slower. Rust is fast, uses little memory, but us verbose and hard to use (borrow checker). Python, Java, C# etc are easy to use, concise, but, like Fil-C, require tracing GC and so more memory, and are slow.
I think the 'perfect' language would be as concise as Python, statically typed, not require tracing GC like Swift (use reference counting), support some kind of borrow checker like Rust (for the most performance critical sections). And leverage the C ecosystem, by transpiling to C. And so would run on almost all existing hardware, and could even be used in the kernel.
> The performance drop and specially the GC of Fil-C do limit the usage however. I read there are some ideas for Fil-C without GC; I would love to hear more about that!
I love how people assume that the GC is the reason for Fil-C being slower than C and that somehow, if it didn't have a GC, it wouldn't be slower.
Well I didn't mean GC is the reason for Fil-C being slower. I mean the performance drop of Fil-C (as described in the article) limits the usage, and the GC (independently) limits the usage.
I understand raw speed (of the main thread) of Fil-C can be faster with tracing GC than Fil-C without. But I think there's a limit on how fast and memory efficient Fil-C can get, given it necessarily has to do a lot of things at runtime, versus compile time. Energy usage, and memory usage or a programming language that uses a tracing GC is higher than one without. At least, if memory management logic can be done at compile time.
For Fil-C, a lot of the memory management logic, and checks, necessarily needs to happen at runtime. Unless if the code is annotated somehow, but then it wouldn't be pure C any longer.
These might all be slower than well written C or rust, but they're not nearly the same magnitude of slow. Java is often within a magnitude of C/C++ in practice, and threading is less of a pain. Python can easily be 100x slower, and until very recently, threading wasn't even an option for more CPU due to the GIL so you needed extra complexity to deal with that
There's also Golang, which is in the same ballpark as java and c
You are right, languages with tracing GC are fast. Often, they are faster than C or Rust, if you measure peak performance of a micro-benchmark that does a lot of memory management. But that is only true if you just measure the speed of the main thread :-) Tracing garbage collection does most of the work in separate threads, and so is often not visible in benchmarks. Memory usage is also not easily visible, but languages with tracing GC need about twice the amount of memory than eg. C or Rust. (When using an area allocator in C, you can get faster, at the cost of memory usage.)
Yes, Python is specially slow, but I think it's probably more because it's dynamically typed, and not not compiled. I found PyPy is quite fast.
I've built high load services in Java. GC can be an issue if it gets bad enough to have to pause, but it's in no way a big performance drain regularly.
pypy is fast compared to plain python, but it's not remotely in the same ballpark as C, Java, Golang
Sure, it's not a big performance drain. For the vast majority of software, it is fine. Usually, the ability to write programs more quickly in eg. Java (not having to care about memory management) outweighs the possible gain of Rust that can reduce memory usage, and total energy usage (because no background thread are needed for GC). I also write most software in Java. Right now, the ergonomics of languages that don't require tracing GC is just too high. But I don't think this is a law of nature; it's just that there a now better languages yet that don't require a tracing GC. The closest is probably Swift, from a memory / energy usage perspective, but it has other issues.
Surprisingly, Java is right behind manual memory managed languages in terms of energy use, due to its GC being so efficient. It turns out that if your GC can "sprint very fast", you can postpone running it till the last second, and memory drains the same amount no matter what kind of garbage it holds. Also, just "booking" that this region is now garbage without doing any work is also cheaper than calling potentially a chain of destructors or incrementing/decrementing counters.
In most cases the later entries in a language for the benchmark game are increasingly hyper-optimized and non-idiomatic for that language, which is exactly where C# will say "Here's some dangerous features, be careful" and the other languages are likely to suggest you use a bare metal language instead.
Presumably the benchmark game doesn't allow "I wrote this code in C" as a Python submission, but it would allow unsafe C# tricks ?
Unsafe C# is still C# though. Also C# has a lot more control over memory than Java for example, so you don't actually need to use unsafe to be fast. Or are you trying to say that C# is only fast when using unsafe?
Likely just that the fastest implementations in the benchmarks game are using those features and so aren't really a good reflection of the language as it is normally used. This is a problem for any language on the list, really; the fastest implementations are probably not going to reflect idiomatic coding practices.
Nim fits most of those descriptors, and it’s become my favorite language to use. Like any language, it’s still a compromise, but it sits in a really nice spot in terms of compromises, at least IMO. Its biggest downsides are all related to its relative “obscurity” (compared to the other mentioned languages) and resulting small ecosystem.
The advantage of Fil-C is that it's C, not some other language. For the problem domain it's most suited to, you'd do C/C++, some other ultra-modern memory-safe C/C++ system, or Rust.
I agree. Nim is memory safe, concise, and fast. In my view, Nim lacks a very clear memory management strategy: it supports ARC, ORC, manual (unsafe) allocation, move semantics. Maybe supporting viewer options would be better? Usually, adding things that are lacking is easier than removing features, specially if the community is small and if you don't want to alienate too many people.
> And leverage the C ecosystem, by transpiling to C
I heavily doubt that this would work on arbitrary C compilers reliably as the interpretation of the standard gets really wonky and certain constructs that should work might not even compile. Typically such things target GCC because it has such a large backend of supported architectures. But LLVM supports a large overlapping number too - thats why it’s supported to build the Linux kernel under clang and why Rust can support so many microcontrollers. For Rust, that’s why there’s the rust codegen gcc effort which uses GCC as the backend instead of LLVM to flush out the supported architectures further. But generally transpiration is used as a stopgap for anything in this space, not an ultimate target for lots of reasons, not least of which that there’s optimizations that aren’t legal in C that are in another language that transpilation would inhibit.
> Rust is fast, uses little memory, but us verbose and hard to use (borrow checker).
It’s weird to me that my experience is that it was as hard to pick up the borrow checker as the first time I came upon list comprehension. In essence it’s something new I’d never seen before but once I got it it went into the background noise and is trivial to do most of the time, especially since the compiler infers most lifetimes anyway. Resistance to learning is different than being difficult to learn.
Well "transpiling to C" does include GCC and clang, right? Sure, trying to support _all_ C compilers is nearly impossible, and not what I mean. Quite many languages support transpiling to C (even Go and Lua), but in my view that alone is not sufficient for a C replacement in places like the Linux kernel: for this to work, tracing GC can not be used. And this is what prevents Fil-C and many other languages to be used in that area.
Rust borrow checker: the problem I see is not so much that it's hard to learn, but requires constant effort. In Rust, you are basically forced to use it, even if the code is not performance critical. Sure, Rust also supports reference counting GC, but that is more _verbose_ to use... It should be _simpler_ to use in my view, similar to Python. The main disadvantage of Rust, in my view, is that it's verbose. (Also, there is a tendency to add too many features, similar to C++, but that's a secondary concern).
> Rust also supports reference counting GC, but that is more _verbose_ to use... It should be _simpler_ to use in my view, similar to Python. The main disadvantage of Rust, in my view, is that it's verbose.
I think there's space for Rust to become more ergonomic, but its goals limit just how far it can go. At the same time I think there's space to take Rust and make a Rust# that goes further on the Swift/Scala end of the spectrum, where things like auto-cloning of references are implemented first, that can consume Rust libraries. From the organizational point of you, you can see it as a mix between nightly and editions. From a user's point of view you can look at it as a mode to make refactoring faster, onbiarding easier and a test bed for language evolution. Not being Rust itself it would also allow for different stability guarantees (you can have breaking changes every year), which also means you can be holder on tryin things out knowing you're not permanently stuck with them. People who care about performance, correctness and reuse can still use Rust. People who would be well served by Swift/Scala, have access to Rust's libraries and toolchain.
> (Also, there is a tendency to add too many features, similar to C++, but that's a secondary concern).
These two quoted sentiments seem contradictory: making Rust less verbose to interact with reference counted values would indeed be adding a feature.
Someone, maybe Tolnay?, recently posted a short Go snippet that segfaults because the virtual function table pointer and data pointer aren't copied atomically or mutexed. The same thing works in swift, because neither is thread safe. Swift is also slower than go unless you pass unchecked making it even less safe than go. C#/f# are safer from that particular problem and more performant than either go or swift, but have suffered from the same deserialization attacks that java does. Right now if you want true memory and thread safety, you need to limit a GC language to zero concurrency, use a borrow checker, i.e. rust, or be purely functional which in production would mean haskell. None of those are effortless, and which is easiest depends on you and your problem. Rust is easiest for me, but I keep thinking if I justvwrite enough haskell it will all click. I'm worried if my brain starts working that way about the impacts on things other than writing Haskell.
Replying to myself because a vouch wasn't enough to bring the post back from the dead. They were partially right and educated me. The downvotes were unnecessary. MS did start advising against dangerous deserializers 8yrs ago. They were only deprecated three years ago though, and only removed last year. Some of the remaining are only mostly safe and then only if you follow best practice. So it isn't a problem entirely of the past, but it has gotten a lot better.
Unless you are writing formal proofs nothing is completely safe, GC languages had found a sweet spot until increased concurrency started uncovering thread safety problems. Rust seems to have found a sweet spot that is usable despite the grumbling. It could probably be made a bit easier. The compiler already knows when something needs to be send or synch, and it could just do that invisibly, but that would lead people to code in a way that had lots of locking which is slow and generates deadlocks too often. This way the wordiness of shared mutable state steers you towards avoiding it except when a functional design pattern wouldn't be performant. If you have to use mutex a lot in rust stop fighting the borrow checker and listen to what it is saying.
Yes. I do like Swift as a language. The main disadvantages of Swift, in my view, are: (A) The lack of an (optional) "ownership" model for memory management. So you _have_ to use reference counting everywhere. That limits the performance. This is measurable: I converted some micro-benchmarks to various languages, and Swift does suffer for the memory managment intensive tasks [1]. (B) Swift is too Apple-centric currently. Sure, this might be become a non-issue over time.
The borrow checker involves documenting the ownership of data throughout the program. That's what people are calling "overly verbose" and saying it "makes comprehensive large-scale refactoring impractical" as an argument against Rust. (And no it doesn't, it's just keeping you honest about what the refactor truly involves.)
The annoying experience with the borrow checker is when following the compiler errors after making a change until you hit a fundamental ownership problem a few levels away from the original change that precludes the change (like ending up with a self referencial borrow). This can bite even experienced developers, depending on how many layers of indirection there are (and sometimes the change that would be adding a single Rc or Cell in a field isn't applicable because it happens in a library you don't control). I do still prefer hitting that wall than having it compile and end up with rare incorrect runtime behaviour (with any luck, a segfault), but it is more annoying than "it just works because the GC dealt with it for me".
> Quite many languages support transpiling to C (even Go and Lua)
Source? I’m not familiar with official efforts here. I see one in the community for Lua but nothing for Go. It’s rare for languages to use this as anything other than a stopgap or a neat community poc. But my point was precisely this - if you’re only targeting GCC/LLVM, you can just use their backend directly rather than transpiling to C which only buys you some development velocity at the beginning (as in easier to generate that from your frontend vs the intermediate representation) at the cost of a worse binary output (since you have to encode the language semantics on top of the C virtual machine which isn’t necessarily free). Specifically this is why transpile to C makes no sense for Rust - it’s already got all the infrastructure to call the compiler internals directly without having to go through the C frontend.
> Rust borrow checker: the problem I see is not so much that it's hard to learn, but requires constant effort. In Rust, you are basically forced to use it, even if the code is not performance critical
Your only forced to use it when you’re storing references within a struct. In like 99% of all other cases the compiler will correctly infer the lifetimes for you. Not sure when the last time was you tried to write rust code.
> Sure, Rust also supports reference counting GC, but that is more _verbose_ to use... It should be _simpler_ to use in my view, similar to Python.
Any language targeting the performance envelope rust does needs GC to be opt in. And I’m not sure how much extra verbosity there is to wrap the type with RC/Arc unless you’re referring to the need to throw in a RefCell/Mutex to support in place mutation as well, but that goes back to there not being an alternative easy way to simultaneously have safety and no runtime overhead.
> The main disadvantage of Rust, in my view, is that it's verbose.
Sure, but compared to what? It’s actually a lot more concise than C/C++ if you consider how much boilerplate dancing there is with header files and compilation units. And if you start factoring in that few people actually seem to actually know what the rule of 0 is and how to write exception safe code, there’s drastically less verbosity and the verbosity is impossible to use incorrectly. Compared to Python sure, but then go use something like otterlang [1] which gives you close to Rust performance with a syntax closer to Python. But again, it’s a different point on the Pareto frontier - there’s no one language that could rule them all because they’re orthogonal design criteria that conflict with each other. And no one has figured out how to have a cohesive GC that transparently and progressively lets you go between no GC, ref GC and tracing GC despite foundational research a few years back showing that ref GC and tracing GC are part of the same spectrum and high performing implementations in both the to converge on the same set of techniques.
I agree transpile to C will not result in the fastest code (and of course not the fastest toolchain), but having the ability to convert to C does help in some cases. Besides the ability to support some more obscure targets, I found it's useful for building a language, for unit tests [1]. One of the targets, in my case, is the XCC C compiler, which can run in WASM and convert to WASM, and so I built the playground for my language using that.
> transpiling to C (even Go and Lua)
Go: I'm sorry, I thought TinyGo internally converts to C, but it turns out that's not true (any more?). That leaves https://github.com/opd-ai/go2c which uses TinyGo and then converts the LLVM IR to C. So, I'm mistaken, sorry.
> Your only forced to use it when you’re storing references within a struct.
Well, that's quite often, in my view.
> Not sure when the last time was you tried to write rust code.
I'm not a regular user, that's true [2]. But I do have some knowledge in quite many languages now [3] and so I think I have a reasonable understanding of the advantages and disadvantages of Rust as well.
> Any language targeting the performance envelope rust does needs GC to be opt in.
Yes, I fully agree. I just think that Rust has the wrong default: it uses single ownership / borrowing by _default_, and RC/Arc is more like an exception. I think most programs could use RC/Arc by default, and only use ownership / borrowing where performance is critical.
> The main disadvantage of Rust, in my view, is that it's verbose.
>> Sure, but compared to what?
Compared to most languages, actually [4]. Rust is similar to Java and Zig in this regard. Sure, we can argue the use case of Rust is different than eg. Python.
Yes, they might lose the meaningless benchmarks game that gets thrown around, what matters is are they fast enough for the problem that is being solved.
If everyone actually cared about performance above anything else, we wouldn't have an Electron crap crisis.
Seems like Windows is trying to address the Electron problem by adopting React Native for their WinAppSDK. RN is not just a cross-platform solution, but a framework that allows Windows to finally tap into the pool of devs used to that declarative UI paradigm. They appear to be standardizing on TypeScript, with C++ for the performance-critical native parts. They leverage the scene graph directly from WinAppSDK. By prioritizing C++ over C# for extensions and TS for the render code, they might actually hit the sweet spot.
That C++ support that WinUI team marketing keeps talking about relies on a framework that is no longer being developed.
> The reason the issues page only lets you create a bug report is because cppwinrt is in maintenance mode and no longer receiving new feature work. cppwinrt serves an important and specific role, but further feature development risks destabilizing the project. Additional helpers are regularly contributed to complimentary projects such as https://github.com/microsoft/wil/.
I don't know I think what matters is that performance is close to the best you can reasonably get in any other language.
People don't like leaving performance on the table. It feels stupid and it lets competitors have an easy advantage.
The Electron situation is not because people don't care about performance; it's because they care more about some other things (e.g. not having to do 4x the work to get native apps).
Your second paragraph kind of contradicts the last one.
And yes, caring more about other things is why performance isn't the top number one item, and most applications have long stopped being written in pure C or C++ since the early 2000's.
We go even further in several abstraction layers, nowadays with the ongoing uptake of LLMs and agentic workflows in iPaaS low code tools.
Personally at work I haven't written a pure 100% C or C++ application since 1999, always a mix of Tcl, Perl, Python, C# alongside C or C++, private projects is another matter.
Most applications stopped being written in C/C++ when Java first came out - the first memory safe language with mass enterprise adoption. Java was the Rust of the mid-1990s, even though it used a GC which made it a lot slower and clunkier than actual Rust.
I would say that the "first" belongs to Smalltalk, Visual Basic and Delphi.
What Java had going for it was the massive scale of Sun's marketing, and the JDK being available as free beer, however until Eclipse came to be, all IDEs were commercial, and everyone was coding in Emacs, vi (no vim yet), nano, and so on.
However it only became viable after Java 1.3, when Hotspot became part of Java's runtime.
I agree with the spirit of your comment though, and I also think that the blow given by Java to C and C++ wasn't bigger, only because AOT tools were only available under high commercial prices.
Many folks use C and C++, not due to their systems programming features, rather they are the only AOT compiled languages that they know.
There are surprisingly many languages that support transpiling to C: Python (via Cython), Go (via TinyGo), Lua (via eLua), Nim, Zig, Vlang. The main advantage (in my view) is to support embedded systems, which might not match your use case.
I suppose /some/ performance loss is inevitable. But this could be quite a game changer. As more folks play with it, performing benchmarks, etc -- it should reveal which C idioms incur the most/least performance hits under Fil-C. So with some targetted patching of C code, we may end up with a rather modest price for the memory safety
And I'm not done optimizing. The perf will get better. Rust and Yolo-C will always be faster, but right now we can't know what the difference will be.
Top optimization opportunities:
- InvisiCaps 2.0. While implementing the current capability model, when I was about 3/4 of the way done with the rewrite, I realized that if I had done it differently I would have avoided two branch+compares on every pointer load. That's huge! I just haven't had the appetite for doing yet another rewrite recently. But I'll do it eventually.
- ABI. Right now, Fil-C uses a binary interface that relies on lowering to what ELF is capable of. This introduces a bunch of overhead on every global variable access and every function call. All of this goes away if Fil-C gets its own object file format. That's a lot of work, but it will happen in Fil-C gets more adoption.
- Better abstract interpreter. Fil-C already has an abstract interpreter in the compiler, but it's not nearly as smart as it could be. For example, it doesn't have octagon domain yet. Giving it octagon domain will dramatically improve the performance of loops.
- More intrinsics. Right now, a lot of libc functions that are totally memory safe but are implemented in assembly are implemented in plain Fil-C instead right now, just because of how the libc ports happened to work out. Like, say you call some <math.h> function that takes doubles and returns doubles - it's going to be slower in Fil-C today because you'll end up in the generic C code version compiled with Fil-C. No good reason for this! It's just grunt work to fix!
- The calling convention itself is trash right now - it involves passing things through a thread-local buffer. It's less trashy than the calling convention I started out with (that allocated everything in the heap lmao), but still. There's nothing fundamentally preventing a Fil-C register-based calling convention, but it would take a decent amount of work to implement.
There are probably other perf optimization opportunities that I'm either forgetting right now or that haven't been found yet. It's still early days!
I've always been firmly in the 'let it crash' camp for bugs, the sooner and the closer to the offending piece of code you can generate a crash the better. Maybe it would be possible to embed Fil-C in a test-suite combined with a fuzzing like tool that varies input to try really hard to get a program to trigger an abend. As long as it is possible to fuzz your way to a crash in Fil-C that would be a sign that there is more work to do.
That way 'passes Fil-C' would be a bit like running code under valgrind and move the penalty to the development phase rather than the runtime. Is this feasible or am I woolgathering, and is Fil-C only ever going to work by using it to compile the production code?
From what I understand some things in Fil-C work "as expected" instead of crashing (e.g. dereferencing a pointer to an out of scope variable will give you the old value of that variable), so it won't work as a sanitizer.
Fil-C will crash on memory corruption too. In fact, its main advantage is crashing sooner.
All the quick fixes for C that don't require code rewrites boil down to crashing. They don't make your C code less reliable, they just make the unreliability more visible.
To me, Fil-C is most suited to be used during development and testing. In production you can use other sandboxing/hardening solutions that have lower overhead, after hopefully shaking out most of the bugs with Fil-C.
The great thing about such crashes is if you have coredumps enabled that you can just load the crashed binary into GDB and type 'where' and you most likely can immediately figure out from inspecting the call stack what the actual problem is. This was/is my go-to method to find really hard to reproduce bugs.
I think the issue with this approach is it’s perfectly reasonable in Fil-C to never call `free` because the GC will GC. So if you develop on Fil-C, you may be leaking memory if you run in production with Yolo-C.
Fil-C uses `free()` to mark memory as no longer valid, so it is important to keep using manual memory management to let Fil-C catch UAF bugs (which are likely symptoms of logic bugs, so you'd want to catch them anyway).
The whole point of Fil-C is having C compatibility. If you're going to treat it as a deployment target on its own, it's a waste: you get overhead of a GC language, but with clunkiness and tedium of C, instead of nicer language features that ground-up GC languages have.
graydon points in that direction, but since you're here: how feasible is a hypothetical Fil-Unsafe-Rust? would you need to compile the whole program in Fil-Rust to get the benefits of Fil-Unsafe-Rust?
It's reasonably easy if you can treat the Safe Rust and Fil-Unsafe-Rust code as accessing different address spaces (in the C programming sense of "a broad subset of memory that a pointer is limited to", not the general OS/hardware sense), since that's essentially what the bespoke Fil-C ABI amounts to in the first place. Which of course is not really a good fit for every use of Unsafe Rust, but might suffice for some of them.
Miri does do that? It is not aware of the distinction to begin with (which is one of the use cases of the tool: it lets us exercise safe code to ensure there aren't memory violations caused by incorrect MIR lowering). I might be mistaking what you mean. Miri's big limitation is not being able to interface with FFI.
hmmm I thought miri was used in the compiler for static analysis, wasn't aware it's a runtime interpreter.
I guess the primary reason would be running hardened code in production without compromising performance too much, same as you would run Fil-C compiled software instead of the usual way. I've no idea if it's feasible to run miri in prod.
- Don’t put flags in the high bits of the aux pointer. Instead if an object has flags, it’ll have a fatter header. Most objects don’t have flags.
- Give up on lock freedom of atomic pointers. This is a fun one because theoretically, it’s worse. But it comes with a net perf improvement because there’s no need to check the low bit of lowers.
If you are not writing anything performance sensitive, you shouldn't be using C in the first place. Even if Fil-C greatly reduces its overhead, I can't see it ever being a good idea for actual release builds.
As a Linux user of two decades, memory safety has never been a major issues that I would be willing to trade performance for. It doesn't magically make my application work it just panics instead of crashes, same end result for me. It just makes it so the issue can not be exploited by an attacker. Which is good but like Linux has been already safe enough to be the main choice to run on servers so meh. The whole memory safety cult is weird.
I guess Fil-C could have a place in the testing pipeline. Run some integration tests on builds made with it and see if stuff panics.
That said, Fil-C is a super cool projects. I don't mean to throw any shades at it.
People with Linux servers keep getting hacked so idk if I buy the argument “if it’s in use it’s good enough”. That’s like saying “everyone else runs Pentium 2, why would I upgrade to Pentium 3?”
While memory safety can help reduce many security vulnerabilities it is not the only source of vulnerabilities. Furthermore as for getting hacked I would suspect the main problems to be social engineering, bad configuration and lack of maintenance and not really the software itself being insecure.
> That’s like saying “everyone else runs Pentium 2, why would I upgrade to Pentium 3?”
No one should blindly upgrade because bigger number is better. If I look into new hardware I research benchmarks and figure out if it would enable me to (better) run the software/games I care about it and if the improvement is worth my money.
Same with security. You need to read actual studies and figure out what the cost/benefit of certain measures is.
There are safer alternatives to Linux but apparently the situation isn't bad enough for people to switch to them.
And I am not saying you should create new projects in C or C++. Most people should not. But there is a lot of battle tested C and C++ code out there and to act as if we suddenly have this big problem with memory safety is a weird narrative to push. And if you discover a vulnerability, well fix it instead of wrapping it Fil-C and making the whole thing slower.
Getting a "not available in your state" page, does anyone have an archive? I've only recently tried out fil-c and hope to use it in some work projects.
I am very excited about this. Thanks to HN I see such things. To bad normal media is not any longer interested in anything not AI. On Topic: I am quite sceptical about all this rusting only because we can. Going rust makes the amount of programmers willing to look at the code quite small. A way to add this static testing to c will on the other hand open up the whole c community to a needed thing: memory
As article points out, this does not solve all the things Rust does (apart for memory/performance, things like point 3.). So new code would be preferable in something like Rust (and some other PLs). However lot of existing code is in C and most of it will stay in C. So Fil-C seems to be really useful here.
Here’s what Fil-C gives you that -fbounds-safety doesn’t:
- Fil-C gives you comprehensive memory safety while -fbounds-safety just covers bounds. For example, Fil-C panics on use after free and has well defined semantics on ptr-int type confusion.
- -fbounds-safety requires you to modify your code. Fil-C makes unmodified C/C++ code memory safe.
FWIW, I worked on -fbounds-safety and I still think it’s a good idea. :-)
It's always seemed obvious to me that it would be better to make C safer than it would be to rewrite the billions of lines of C that run all our digital infrastructure. Of course that will get pushback from people who care more about rewriting it in a specific language, but pragmatically it's the obvious solution. Nice to see stuff like Fil-C proving it's possible, and if the performance gap can get within 10% (which seems very possible) it would be a no-brainer.
It depends how much the C software is "done" vs being updated and extended. Some legacy projects need a rewrite/rearchitecting anyway (even well-written battle-tested code may stop meeting requirements simply due to the world changing around it).
It also doesn't have to be a complete all-at-once rewrite. Plain C can easily co-exist with other languages, and you can gradually replace it by only writing new code in another language.
Seems like soapboxing for Rust via backhanded compliments about this amazing tool. If anything, this tool makes rewriting in Rust that much less attractive. If C and C++ get tools like this that deliver 90% of the benefits of Rust without a rewrite or learning a new and equally complex language, then we can avoid needlessly fracturing the software world. I really think we were there before Fil-C, but this is potentially a game-changer.
I don’t think Fil-C supplants Rust; Rust still has a place for things like kernel development where Fil-C would not be accepted since it wouldn’t work there. But also Rust today has significantly better performance and memory usage so makes more sense for greenfield projects that might otherwise consider C/C++. Not to mention that Rust as a language is drastically easier and faster to develop in due to a modern package management system, a good fast cohesive std library, true cross platform support, static catching of all the issues that would otherwise cause Fil-C to crash instead in addition to having better performance without effort.
Fil-C is an important tool to secure traditional software but it doesn’t yet compete with Rust in the places it’s competing with C and C++ in greenfield projects (and it may never - that’s ok - it’s still valuable to have a way to secure existing code without rewriting it).
And I disagree with the characterization of Graydon’s blog. It’s literally praising Fil-C and saying it’s a valuable piece of tech in the landscape of language dev and worth paying attention to as a serious way to secure a huge amount of existing code. The only position Graydon takes is that safety is a critically important quality of software and Fil-C is potentially an important part of the story of moving the industry forward.
Don't get me wrong, it sounds positive. A direct attack on Fil-C would have seemed mean-spirited so there is a lot of misdirection. Maybe the author doesn't even see what he's doing because he's so deep in it. But to me the message is clear. No matter what tools are developed for C and C++ to mitigate memory issues, Rust people will never concede that enough of these issues have been solved. They demand a complete rewrite of everything, or at least gradual replacement of all C and C++ code with Rust. Even if Rust is worse in other ways, does not deliver true safety, has technical shortcomings and worse licensing, etc.
This post is very polite compared to what I've seen from some Rust fanatics. But it still strikes me as talking down to the C and C++ community, as if these languages are beyond redemption because they don't work the same as Rust.
Graydon's post was about as full-throated an endorsement of Fil-C as you can get, including noting where it's innovations could be used to improve Rust safety. The fact that you see undertones of some sort of deepset Rust agenda to unseat C and C++ is, I think, more a reflection on just how deep down the rabbit hole some Rust critics have gone, seeing so-called Rust zealots hiding in every shadow of the internet.
> deliver 90% of the benefits of Rust without a rewrite
Rust with 1/4 of the speed doesn't feel like 90% of the benefits of Rust. I'm sure the author will make Fil-C faster in time, but Rust is always going to be much faster.
I wasn't suggesting that you should run everything with Fil-C all the time. If you run it sometimes, you're likely to catch most problems. The ideal tool would be CHERI or something. I think Rust makes a big mistake with its maximal error checking every time you compile, among its other flaws. Rust compile times are high compared to similar C++ code. The compiler has a high amount of necessary complexity that comes into play every time you run the code with a few lines of changes. Of course, C++ has higher compile times than Go and C, and probably some other languages, but they are fairly different languages with different error modes.
Let me put it another way. We could say that documentation, code formatting, and even profiling, all have a place in development. So would running a model checker or something. But we don't generally make compilers implement these features and make that a mandatory part of the build. I think the complex memory borrowing scheme of Rust unnecessarily forces the compiler to check a lot of extra stuff specific to a certain class of problem. You don't get a choice about how paranoid you need to be about errors. You just have to eat that long compile time, every time.
And people who just don't like like the Rust "style" and would rather write new software in a familiar language with all the features like classic OOP they they are used to.
There's something off-putting about how Rust people communicate. Like you're talking to people selling something, or people with ulterior motives. They love talikng about memory safety as that's the only thing standing between humanity having secure and reliable software. They love external audiences, as their brand of fearmongering, virtue signaling and offering of panacea-like solutions resonates with a certain kind of risk-averse decision maker, who tend to in power in uncomfortable numbers.
First of all, Rust's 'fearless concurrency' largely boils down to 'no concurrency' - Rust has about as much concurrency as Javascript - you can copy objects between threads, but not share memory beyond that, with certain libraries allowing some escape hatches in Rust's case.
Additionally the case for aliasing control leading to better and safer code that's easier to reason about just isn't really true in practice - it's so rare that, for example your function can accidentally can alias memory in strictly typed languages - and when it does it's kind of intentional. The concern pretty much only manifests in memcpy, as the ugly hack of 'strict aliasing' - assuming 2 pointers of incompatible types pointing to different bits of memory - works very well in practice.
It even helps with situations people complain about in Rust, like when object A and B (both mutably borrowed) take a reference to a some common service object (which is super common) - but that sort of code simply doesn't compile in Rust.
All in all, I don't dislike Rust as it is but the project's tendency to do activism trying to bully its technical skeptics into submission (which is unfortunately what all activism really is - when you lost the argument, try to louder than the other guy and paint him as reprehensible) - they focused on fixing the technical issues. There has been research into ownership schemes, some exist that are less restrictive than Rust's while offering the same safety guarantees.
In my personal opinion Rust is not done on the conceptual level - by 'done' I mean Rust serving its purpose as the language it claims to be. Maybe there will be Rust 2.0 which will overhaul the ownership system completely or maybe there'll be another language that will do what Rust does but better.
Edit: I wish I could claim I'm some sort of tin-foil conspiracy theorist, but I'm commenting under an article written by one of the key people behind Rust, and it reeks of this attitude.
> First of all, Rust's 'fearless concurrency' largely boils down to 'no concurrency' - Rust has about as much concurrency as Javascript - you can copy objects between threads, but not share memory beyond that
...this is just blatantly false? Like it is false to the extent that I am confused as to what you could even possibly be talking about - I don't know how anyone who has actually written non-trivial programs in both languages could come to the conclusion that they have the same memory model (or much of anything in common when it comes to threads, really).
"But almost all programs have paths that crash, and perhaps the density of crashes will be tolerable."
This is a very odd statement. Mature C programs written by professional coders (Redis is a good example) basically never crash in the experience of users. Crashing, in such programs, is a rare occurrence mostly obtained by attackers on purpose, looking for code paths that generate a memory error that - if the program is used as it should - are never reached.
This does not mean that C code never segfaults: it happens, especially when developed without care and the right amount of testing. But the code that is the most security sensitive, like C Unix servers, is high quality and crashes are mostly a security problem and a lot less a stability problem.
Notice that it says "almost all programs" and not "almost all _C_ programs".
I think if you understand the meaning of "crash" to include any kind of unhandled state that causes the program to terminate execution then it includes things like unwrapping a None value in Rust or any kind of uncaught exception in Python.
That interpretation makes sense to me in terms of the point he's making: Fil-C replaces memory unsafety with program termination, which is strictly worse than e.g. (safe) Rust which replaces memory unsafety with a compile error. But it's also true that most programs (irrespective of language, and including Rust) have some codepaths in which programs can terminate where the assumed variants aren't upheld, so in practice that's often an acceptable behaviour, as long as the defect rate is low enough.
Of course there is also a class of programs for which that behaviour is not acceptable, and in those cases Fil-C (along with most other languages, including Rust absent significant additional tooling) isn't appropriate.
> Rust which replaces memory unsafety with a compile error
Rust uses panics for out-of-bounds access protection.
The benefit of dynamic safety checking is that it's more precise. There's a large class of valid programs that are not unsafe that will run fine in Fil-C but won't compile in Rust.
A lot of my programs crash, and that’s a deliberate choice. If you call one of them like “./myprog.py foo.txt”, and foo.txt doesn’t exist, it’ll raise a FileNotFound exception and fail with a traceback. Thing is, that’s desirable here. If could wrap that in a try/except block, but I’d either be adding extraneous info (“print(‘the file does not exist’); raise”) or throwing away valuable info by swallowing the traceback so the user doesn’t see the context of what failed.
My programs can’t do anything about that situation, so let it crash.
Same logic for:
* The server in the config file doesn’t exist.
* The given output file has bad permissions.
* The hard drive is full.
Etc. And again, that’s completely deliberate. There’s nothing I can do in code to fix those issues, so it’s better to fail with enough info that the user can diagnose and fix the problem.
That was in Python. I do the same in Rust, again, deliberately. While of course we all handle the weird cases we’re prepared to handle, I definitely write most database calls like “foo = db.exec(query)?” because if PostgreSQL can’t execute the query, the safest option is to panic instead of trying foolhardily to get back the last known safe state.
And of course that’s different for different use cases. If you’re writing a GUI app, it makes much more sense to pop up a dialog and make the user go fix the issue before retrying.
I don't think it's odd statement. It's not about segfaults, but use-after-free (and similar) bugs, which don't crash in C, but do crash in Fil-C. With Fil-C, if there is such a bug, it will crash, but if the density of such bugs is low enough, it is tolerable: it will just crash the program, but will not cause an expensive and urgent CVE ticket. The bug itself may still need to be fixed.
The paragraph refers to detecting such bugs during compilation versus crashing at runtime. The "almost all programs have paths that crash" means all programs have a few bugs that can cause crashes, and that's true. Professional coders do not attempt to write 100% bug-free code, as that wouldn't be efficient use of the time. Now the question is, should professional coders convert the (existing) C code to eg. Rust (where likely the compiler detects the bug), or should he use Fil-C, and so safe the time to convert the code?
> it will just crash the program, but will not cause an expensive and urgent CVE ticket.
Unfortunately, security hysteria also treats any crash as "an expensive and urgent CVE ticket". See, for instance, ReDoS, where auditors will force you to update a dependency even if there's no way for a user to provide the vulnerable input (for instance, it's fixed in the configuration file).
Doesn't Fil-C use a garbage collector to address use-after-free? For a real use-after-free to be possible there must be some valid pointer to the freed allocation, in which case the GC just keeps it around and there's no overt crash.
Yes, Fil-C uses some kind of garbage collector. But it can still detect use-after-free: In the 'free' call, the object is marked as free. In the garbage collection (in the mark phase), if a reference is detected to an object that was freed, then the program panics. Sure, it is also possible to simply ignore the 'free' call - in which case you "just" have a memory leak. I don't think that's what Fil-C does by default however. (This would be more like the behavior of the Boehm GC library for C, if I understand correctly.)
I don’t think that’s how it works. Once an object is freed, any access will crash. You’re allowed to still have a reference to it.
Ok, you are right. My point is, yes it is possible to panic on use-after-free with Fil-C. With Fil-C, a life reference to a freed object can be detected.
A free()-d object that is NOT garbage-collected during the next collection is a bug in itself.
The Fil-C GC will only GC a free'd object if it succeeds at repointing all capabilities to it to point at the free singleton instead.
Don't worry, it's totally sound.
I'm not sure what you mean. Do you mean there is a bug _in the garbage collection algorithm_, if the object is not freed in the very next garbage collection cycle? Well, it depends: the garbage collection could defers collection of some objects until memory is low. Multi-generation garbage collection algorithm often do this.
I think what you've written is pretty much what the "almost all programs have paths that crash" was intended to convey.
I think "perhaps the density of crashes will be tolerable" means something like "we can reasonably hope that the crashes from Fil-C's memory checks will only be of the same sort, that aren't reached when the program is used as it should be".
I think the point is that Fil-C makes programs crash which didn't crash before because use-after-free didn't trigger a segfault. If anything, I'd cite Redis as an example that you can build a safe C program if you go above and beyond in engineering effort... most software doesn't, sadly.
Redis uses a whole lot of fiddly data structures that turn out to involve massive amounts of unsafe code even in Rust. You'd need to use something like Frama-C to really prove it safe beyond reasonable doubt. (Or the Rust equivalents that are currently in the works, and being used in an Amazon-funded effort to meticulously prove soundness of the unsafe code in libstd.) Compiling it using Fil-C is a nice academic exercise but not really helpful, since the whole point of those custom data structures is peak performance.
sel4 is the example of building a safe C program if you go above and beyond in effort.
It's provably safer than rust, e.g.
There are obviously multiple levels of correctness. Formal verification is just the very top of that spectrum, but it does comes at extraordinary effort.
It is a question of probability and effort. My personal estimation rule for my type of projects is it takes 3 times longer from my prototype to something I‘m comfortable having others use it and another factor to get to an early resemblance of a product. A recent interview I read an AI expert said each 9 in terms of error probability is the same effort.
Most software written does not serve a serious nation level user base but caters to so a relatively small set of users. The effort spent eradicating errors needs to be justified by the effort of workarounds, remediation work and customer impact. Will not be fixed can a rationale decision.
I think the focus should be on tools with high surface area that enforce security bounadaries. Especially those where performance is not so important. Like sudo, openssh, polkit, PAM modules. It would be make a lot more sense than these half-baked rust rewrites that just take away features. (I'm biased I personally had a backup script broken by uutils) I think rewrites in rust need 100% bit for bit feature parity before replacing the battletested existing tools in c userland. I say this as someone who writes rust security tools for linux.
I heard this argument about Rust vs. C that Rust might be memory safe, but the reason why memory safety issues are so prominent in C programs, is that basically every other kind of problem has been fixed throughout its lifetime, so these are the only kind of issues that remain. Both in terms of security and stability.
This is very much not the case for programs that are much newer, even if they are written in Rust they still need years of maturation before they reach the quality of older C programs, as Rust programs suffer from non-memory safety issues just as much. That's why just rewriting things in Rust isn't a panacea.
The perfect example of this the Rust coreutils drama that has been going on.
I can only quote (from the top of my head) the Android team's findings, that having a C++ codebase extended with Rust cut down significantly on the number of memory safety-related issues. The reasoning was that since the stable C++ codebase was no longer actively changed, only patched, and new features were implemented in Rust, the C++ codebase could go through this stabilization phase where almost all safety issues are found.
I don't agree with that assessment at all. The reason memory safety issues are so prominent is that they are extremely likely to be exploitable. Of course you can write exploitable bugs in any language, but most bug classes are unlikely to be exploitable. A bug that always crashes is about a trillion times less severe than a bug that allows someone else to take control of your computer.
How many "mature C programs" try to recover in a usable way when malloc() returns NULL? That's a crash - a well-behaved one (no UB involved) hence not one that would be sought by most attackers other than a mere denial of service - but still a crash.
> when malloc() returns NULL? That's a crash - a well-behaved one (no UB involved)
Wrong, dereferencing a NULL pointer is UB.
On 64-bit systems (esp Linux ones) malloc almost never returns NULL but keeps overallocating (aka overcommiting). You don't get out of memory errors / kills until you access it.
Exactly. Also, it is extremely rare.
One divide when it comes to using Fil-C is C as an application (git) vs C as a library from another language (libgit2).
Suppose we assume that many C applications aren’t performance sensitive and can easily take a 2-4x performance hit without noticing. Browsers and OS internals being obvious exceptions. The ideal candidates are like the ones djb writes, and he’s already a convert to Fil-C. sudo, sshd, curl - all seem like promising candidates.
But as far as I can tell, Fil-C doesn’t work for C libraries that can be called from elsewhere. Even if it could be made to work, the reason other languages like Python or Node use C libraries is for speed. If they were ok with it being 2-4x slower, they would just write ordinary Python or Javascript.
C (and C++) are fundamentally important because of their use in performance sensitive contexts like operating systems, browsers and libraries. If we’re restricting Fil-C to pure C/C++ applications that aren’t performance sensitive, that might still be very important and useful, but it’s a small slice of the large C/C++ pie.
Also, it’s a great tool for an existing C application, certainly. A performance hit in exchange for security is a reasonable trade off while making a battle hardened application work. But for a new application, would someone choose Fil-C over other performant GC languages like Go or Java or C#? I’d be keen to hear why.
Still, I want to stress - this is a great project and it’ll generate a lot of value.
> If they were ok with it being 2-4x slower, they would just write ordinary Python or Javascript.
Python and JavaScript are much more than 4x slower than C/C++ for workloads that are git-like (significant amount of compute, not just I/O bound)
> C (and C++) are fundamentally important because of their use in performance sensitive contexts like operating systems, browsers and libraries
That's a fun thing to say but it isn't really true. C/C++ are fundamentally important for lots of reasons. In many cases, folks choose C and C++ because that's the only way to get access to the APIs you need to get the job done.
Why can't it work? You need to assume that the C library is only ever passed well-behaved pointers and callbacks in order to avoid invoking UB that it can't know about - but other than that it's just a matter of marshaling from the usual C ABI to the Fil-C ABI, which should be doable.
I’m assuming the calling program is a GC language like Python or Node (the most popular run times by far), but the same holds with other popular languages like Ruby. Why would a GC language call out to slow code, that runs its own separate GC. Now you have two GCs running, neither of which knows about the other. I’m not declaring it’s impossible, I’m asking why someone would want to do this.
An example: GitHub’s entire business revolves around calling libgit2 (C) from Ruby. Are they more likely to slow down libgit2 and make it substantially more complex by running 2 GCs side by side, or are they going to risk accept any potential unsafety in regular C? It’s 100% the latter, I’ll bet on that.
A native library is an obvious memory safety hole so I don't see why would it be that controversial to want to fill it even if it introduces another GC (but working on an independent heap, so the slowdown is not necessarily multiplicative)
> ...Now you have two GCs running, neither of which knows about the other. ...
For a strictly time-limited interaction (like what's involved in a FFI call) it's not that bad. Everything that GC2 might directly access is temporarily promoted to a root for GC1, and vice versa.
The former could still be cheaper than stop using libgit altogether.
No one is asking them to stop using libgit2 though. They’re going to continue using it. If they find a serious bug, they’ll fix it and continue using it.
The cost of all the additional hardware is just not worth it. If it was a choice between higher hardware costs, higher request latency, greater operational complexity of a new technology and rewriting libgit2 in a different language without all those tradeoffs, GitHub definitely chooses the latter.
But it’s never going to reach that point because they’ll continue using libgit2 compiled by clang forever.
That's very interesting to learn.
One thing I've been wondering recently about Fil-C - why now? And I don't mean that in a dismissive way at all, I'm genuinely curious about the history. Was there some relatively recent fundamental breakthrough or other change that prevented a Fil-C-like approach from being viable before? Was it a matter of finding the right approach/implementation (i.e., a "software" problem), or is there something about modern hardware which makes the approach impractical otherwise? Something else?
I wrote a bounds checking patch to GCC (mentioned in a link from the article) back in 1995. It did full bounds checking of C & C++ while being compatible with existing libraries and ABIs, making it a bit more practical than Fil-C to deploy in the real world. You only had to recompile your application, if you trusted the libraries (although the bounds checking obviously didn't extend into the libraries unless you recompiled them). It didn't do the GC thing, but instead detected use after free at the point of use.
https://www.doc.ic.ac.uk/~phjk/BoundsChecking.html
> Was there some relatively recent fundamental breakthrough or other change that prevented a Fil-C-like approach from being viable before?
The provenance model for C is very recent (and still a TS, not part of the standard). Prior to that, there was a vague notion that the C abstract machine has quasi-segmented memory (you aren't really allowed to do arithmetic on a pointer to an "object" to reach a different "object") but this was not clearly stated in usable terms.
Also in practical terms, you have a lot more address space to "waste" in 64 bit. It would have been frivolous in 32 and downright offending in 16 bit code.
I’ve been thinking about this problem since 2004.
Here’s a rough timeline:
- 2004-2018: I had ideas of how to do it but I thought the whole premise (memory safe C) was idiotic.
- 2018-2023: I no longer thought the premise was idiotic but I couldn’t find a way to do it that would result in fanatical compatibility.
- 2023-2024: early Fil-C versions that were much less compatible and much less performant
- end of 2024: InvisiCaps breakthrough that gives current fanatical compatibility and “ok” performance.
It’s a hard problem. Lots of folks have tried to find a way to do it. I’ve tried many approaches before finding the current one.
Beyond the Git history, is there any write-up of the different capability designs you've gone with?
I'm interested in implementing a safe low-level language with less static information around than C has (e.g. no static pointer-int distinction), but I'd rather keep around the ability to restrict capabilities to only refer to subobjects than have the same compatibility guarantees Invisicaps provide, so I was hoping to look into Monocaps (or maybe another design, if there's one that might fit better).
I summarize past attempts in https://fil-c.org/invisicaps
That's a really interesting timeline! Sounds like it's been stewing for a lot longer than I expected. Was there anything in particular around 2018 that changed your opinion on the idiotic-ness of the premise?
If a hypothetical time machine allowed you to send the InvisiCaps idea back to your 2004-era self, do you think the approach would have been feasible back then as well?
> Was there anything in particular around 2018 that changed your opinion on the idiotic-ness of the premise?
The observation that the C variants used on GPUs are simplistic takes on memory safe C
This is super kind and awesome, I'm seriously flattered!
Did you know each other at Apple?
Long long ago, in 2009, Graydon was my official on-boarding mentor when I joined the Mozilla Javascript team. Rust already existed then but, as he notes, was quite different then. For one thing, it was GC'd, like Fil-C. Which I like -- I write a lot of my C/C++ code using Boehm GC, have my own libraries designed knowing GC is there, etc.
Yeah we interacted at Apple
Great work for the community.
This has obviously been 'rust'ling some feathers, as it challenges some of the arguments laid past; but once the dust settles, it is a major net benefit to the community.
I hope you get financed and can support other platforms than linux again.
> This has obviously been 'rust'ling some feathers,
I'm a Rust user and a fan. But memory safe C is actually an exciting prospect. I was hoping that the rise of Rust would encourage others to prioritize memory safety and come up with approaches that are much more ergonomic to the developers.
> as it challenges some of the arguments laid past
Genuinely curious. What are the assumptions you have in mind that Fil-C challenges? (This isn't a rhetorical question. I'm just trying to understand memory safety concepts better.)
> but once the dust settles, it is a major net benefit to the community.
Agreed, this is big! If Fil-C can fulfill its promise to make old C code memory safe, it will be a massive benefit to the world. God knows how many high-consequnce bugs and vulnerabilities hide in those.
Same here, I don't have any use for Rust, and am perfectly fine with automatic resource management languages (regardless of the approach).
However, Rust has been quite successful making more developers think about less known type systems, besides affine types, there is also linear types, effects, dependent types, prof systems.
And we as industry aren't going to throw away the millions and millions of stuff that was written in C, C++ and less extent Objective-C, thus efforts like Fil-C are quite welcomed.
> I was hoping that the rise of Rust would encourage others to prioritize memory safety and come up with approaches that are much more ergonomic to the developers.
That's the end-goal right? I don't write Rust code myself, but I'm glad its existence means there's safer code out there now, and like you I have been looking forward to seeing shifts in safety expectations. I'm not surprised that it's happening so slowly though.
Yes, safety got more important, and it's great to support old C code in a safe way. The performance drop and specially the GC of Fil-C do limit the usage however. I read there are some ideas for Fil-C without GC; I would love to hear more about that!
But all existing programming languages seem to have some disadvange: C is fast but unsafe. Fil-C is C compatible but requires GC, more memory, and is slower. Rust is fast, uses little memory, but us verbose and hard to use (borrow checker). Python, Java, C# etc are easy to use, concise, but, like Fil-C, require tracing GC and so more memory, and are slow.
I think the 'perfect' language would be as concise as Python, statically typed, not require tracing GC like Swift (use reference counting), support some kind of borrow checker like Rust (for the most performance critical sections). And leverage the C ecosystem, by transpiling to C. And so would run on almost all existing hardware, and could even be used in the kernel.
> The performance drop and specially the GC of Fil-C do limit the usage however. I read there are some ideas for Fil-C without GC; I would love to hear more about that!
I love how people assume that the GC is the reason for Fil-C being slower than C and that somehow, if it didn't have a GC, it wouldn't be slower.
Fil-C is slower than C because of InvisiCaps. https://fil-c.org/invisicaps
The GC is is crazy fast and fully concurrent/parallel. https://fil-c.org/fugc
Removing the GC is likely to make Fil-C slower, not faster.
Well I didn't mean GC is the reason for Fil-C being slower. I mean the performance drop of Fil-C (as described in the article) limits the usage, and the GC (independently) limits the usage.
I understand raw speed (of the main thread) of Fil-C can be faster with tracing GC than Fil-C without. But I think there's a limit on how fast and memory efficient Fil-C can get, given it necessarily has to do a lot of things at runtime, versus compile time. Energy usage, and memory usage or a programming language that uses a tracing GC is higher than one without. At least, if memory management logic can be done at compile time.
For Fil-C, a lot of the memory management logic, and checks, necessarily needs to happen at runtime. Unless if the code is annotated somehow, but then it wouldn't be pure C any longer.
> Python, Java, C# [...] are slow
These might all be slower than well written C or rust, but they're not nearly the same magnitude of slow. Java is often within a magnitude of C/C++ in practice, and threading is less of a pain. Python can easily be 100x slower, and until very recently, threading wasn't even an option for more CPU due to the GIL so you needed extra complexity to deal with that
There's also Golang, which is in the same ballpark as java and c
You are right, languages with tracing GC are fast. Often, they are faster than C or Rust, if you measure peak performance of a micro-benchmark that does a lot of memory management. But that is only true if you just measure the speed of the main thread :-) Tracing garbage collection does most of the work in separate threads, and so is often not visible in benchmarks. Memory usage is also not easily visible, but languages with tracing GC need about twice the amount of memory than eg. C or Rust. (When using an area allocator in C, you can get faster, at the cost of memory usage.)
Yes, Python is specially slow, but I think it's probably more because it's dynamically typed, and not not compiled. I found PyPy is quite fast.
I've built high load services in Java. GC can be an issue if it gets bad enough to have to pause, but it's in no way a big performance drain regularly.
pypy is fast compared to plain python, but it's not remotely in the same ballpark as C, Java, Golang
Sure, it's not a big performance drain. For the vast majority of software, it is fine. Usually, the ability to write programs more quickly in eg. Java (not having to care about memory management) outweighs the possible gain of Rust that can reduce memory usage, and total energy usage (because no background thread are needed for GC). I also write most software in Java. Right now, the ergonomics of languages that don't require tracing GC is just too high. But I don't think this is a law of nature; it's just that there a now better languages yet that don't require a tracing GC. The closest is probably Swift, from a memory / energy usage perspective, but it has other issues.
> and total energy usage
Surprisingly, Java is right behind manual memory managed languages in terms of energy use, due to its GC being so efficient. It turns out that if your GC can "sprint very fast", you can postpone running it till the last second, and memory drains the same amount no matter what kind of garbage it holds. Also, just "booking" that this region is now garbage without doing any work is also cheaper than calling potentially a chain of destructors or incrementing/decrementing counters.
Of these languages, C# may actually be the fastest.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
In most cases the later entries in a language for the benchmark game are increasingly hyper-optimized and non-idiomatic for that language, which is exactly where C# will say "Here's some dangerous features, be careful" and the other languages are likely to suggest you use a bare metal language instead.
Presumably the benchmark game doesn't allow "I wrote this code in C" as a Python submission, but it would allow unsafe C# tricks ?
Unsafe C# is still C# though. Also C# has a lot more control over memory than Java for example, so you don't actually need to use unsafe to be fast. Or are you trying to say that C# is only fast when using unsafe?
Likely just that the fastest implementations in the benchmarks game are using those features and so aren't really a good reflection of the language as it is normally used. This is a problem for any language on the list, really; the fastest implementations are probably not going to reflect idiomatic coding practices.
[dead]
Nim fits most of those descriptors, and it’s become my favorite language to use. Like any language, it’s still a compromise, but it sits in a really nice spot in terms of compromises, at least IMO. Its biggest downsides are all related to its relative “obscurity” (compared to the other mentioned languages) and resulting small ecosystem.
The advantage of Fil-C is that it's C, not some other language. For the problem domain it's most suited to, you'd do C/C++, some other ultra-modern memory-safe C/C++ system, or Rust.
I agree. Nim is memory safe, concise, and fast. In my view, Nim lacks a very clear memory management strategy: it supports ARC, ORC, manual (unsafe) allocation, move semantics. Maybe supporting viewer options would be better? Usually, adding things that are lacking is easier than removing features, specially if the community is small and if you don't want to alienate too many people.
> And leverage the C ecosystem, by transpiling to C
I heavily doubt that this would work on arbitrary C compilers reliably as the interpretation of the standard gets really wonky and certain constructs that should work might not even compile. Typically such things target GCC because it has such a large backend of supported architectures. But LLVM supports a large overlapping number too - thats why it’s supported to build the Linux kernel under clang and why Rust can support so many microcontrollers. For Rust, that’s why there’s the rust codegen gcc effort which uses GCC as the backend instead of LLVM to flush out the supported architectures further. But generally transpiration is used as a stopgap for anything in this space, not an ultimate target for lots of reasons, not least of which that there’s optimizations that aren’t legal in C that are in another language that transpilation would inhibit.
> Rust is fast, uses little memory, but us verbose and hard to use (borrow checker).
It’s weird to me that my experience is that it was as hard to pick up the borrow checker as the first time I came upon list comprehension. In essence it’s something new I’d never seen before but once I got it it went into the background noise and is trivial to do most of the time, especially since the compiler infers most lifetimes anyway. Resistance to learning is different than being difficult to learn.
Well "transpiling to C" does include GCC and clang, right? Sure, trying to support _all_ C compilers is nearly impossible, and not what I mean. Quite many languages support transpiling to C (even Go and Lua), but in my view that alone is not sufficient for a C replacement in places like the Linux kernel: for this to work, tracing GC can not be used. And this is what prevents Fil-C and many other languages to be used in that area.
Rust borrow checker: the problem I see is not so much that it's hard to learn, but requires constant effort. In Rust, you are basically forced to use it, even if the code is not performance critical. Sure, Rust also supports reference counting GC, but that is more _verbose_ to use... It should be _simpler_ to use in my view, similar to Python. The main disadvantage of Rust, in my view, is that it's verbose. (Also, there is a tendency to add too many features, similar to C++, but that's a secondary concern).
> Rust also supports reference counting GC, but that is more _verbose_ to use... It should be _simpler_ to use in my view, similar to Python. The main disadvantage of Rust, in my view, is that it's verbose.
I think there's space for Rust to become more ergonomic, but its goals limit just how far it can go. At the same time I think there's space to take Rust and make a Rust# that goes further on the Swift/Scala end of the spectrum, where things like auto-cloning of references are implemented first, that can consume Rust libraries. From the organizational point of you, you can see it as a mix between nightly and editions. From a user's point of view you can look at it as a mode to make refactoring faster, onbiarding easier and a test bed for language evolution. Not being Rust itself it would also allow for different stability guarantees (you can have breaking changes every year), which also means you can be holder on tryin things out knowing you're not permanently stuck with them. People who care about performance, correctness and reuse can still use Rust. People who would be well served by Swift/Scala, have access to Rust's libraries and toolchain.
> (Also, there is a tendency to add too many features, similar to C++, but that's a secondary concern).
These two quoted sentiments seem contradictory: making Rust less verbose to interact with reference counted values would indeed be adding a feature.
> Sure, Rust also supports reference counting GC, but that is more _verbose_ to use... It should be _simpler_ to use in my view, similar to Python.
If that's what you're looking for, you can use Swift. The latest release has memory safety by default, just like Rust.
Someone, maybe Tolnay?, recently posted a short Go snippet that segfaults because the virtual function table pointer and data pointer aren't copied atomically or mutexed. The same thing works in swift, because neither is thread safe. Swift is also slower than go unless you pass unchecked making it even less safe than go. C#/f# are safer from that particular problem and more performant than either go or swift, but have suffered from the same deserialization attacks that java does. Right now if you want true memory and thread safety, you need to limit a GC language to zero concurrency, use a borrow checker, i.e. rust, or be purely functional which in production would mean haskell. None of those are effortless, and which is easiest depends on you and your problem. Rust is easiest for me, but I keep thinking if I justvwrite enough haskell it will all click. I'm worried if my brain starts working that way about the impacts on things other than writing Haskell.
Replying to myself because a vouch wasn't enough to bring the post back from the dead. They were partially right and educated me. The downvotes were unnecessary. MS did start advising against dangerous deserializers 8yrs ago. They were only deprecated three years ago though, and only removed last year. Some of the remaining are only mostly safe and then only if you follow best practice. So it isn't a problem entirely of the past, but it has gotten a lot better.
Unless you are writing formal proofs nothing is completely safe, GC languages had found a sweet spot until increased concurrency started uncovering thread safety problems. Rust seems to have found a sweet spot that is usable despite the grumbling. It could probably be made a bit easier. The compiler already knows when something needs to be send or synch, and it could just do that invisibly, but that would lead people to code in a way that had lots of locking which is slow and generates deadlocks too often. This way the wordiness of shared mutable state steers you towards avoiding it except when a functional design pattern wouldn't be performant. If you have to use mutex a lot in rust stop fighting the borrow checker and listen to what it is saying.
[dead]
Yes. I do like Swift as a language. The main disadvantages of Swift, in my view, are: (A) The lack of an (optional) "ownership" model for memory management. So you _have_ to use reference counting everywhere. That limits the performance. This is measurable: I converted some micro-benchmarks to various languages, and Swift does suffer for the memory managment intensive tasks [1]. (B) Swift is too Apple-centric currently. Sure, this might be become a non-issue over time.
[1] https://github.com/thomasmueller/bau-lang/blob/main/doc/perf...
Re: borrow checker
Isn't it just enforcing something you should be doing in every language anyway, i.e. thinking about ownership of data.
The borrow checker involves documenting the ownership of data throughout the program. That's what people are calling "overly verbose" and saying it "makes comprehensive large-scale refactoring impractical" as an argument against Rust. (And no it doesn't, it's just keeping you honest about what the refactor truly involves.)
The annoying experience with the borrow checker is when following the compiler errors after making a change until you hit a fundamental ownership problem a few levels away from the original change that precludes the change (like ending up with a self referencial borrow). This can bite even experienced developers, depending on how many layers of indirection there are (and sometimes the change that would be adding a single Rc or Cell in a field isn't applicable because it happens in a library you don't control). I do still prefer hitting that wall than having it compile and end up with rare incorrect runtime behaviour (with any luck, a segfault), but it is more annoying than "it just works because the GC dealt with it for me".
> Quite many languages support transpiling to C (even Go and Lua)
Source? I’m not familiar with official efforts here. I see one in the community for Lua but nothing for Go. It’s rare for languages to use this as anything other than a stopgap or a neat community poc. But my point was precisely this - if you’re only targeting GCC/LLVM, you can just use their backend directly rather than transpiling to C which only buys you some development velocity at the beginning (as in easier to generate that from your frontend vs the intermediate representation) at the cost of a worse binary output (since you have to encode the language semantics on top of the C virtual machine which isn’t necessarily free). Specifically this is why transpile to C makes no sense for Rust - it’s already got all the infrastructure to call the compiler internals directly without having to go through the C frontend.
> Rust borrow checker: the problem I see is not so much that it's hard to learn, but requires constant effort. In Rust, you are basically forced to use it, even if the code is not performance critical
Your only forced to use it when you’re storing references within a struct. In like 99% of all other cases the compiler will correctly infer the lifetimes for you. Not sure when the last time was you tried to write rust code.
> Sure, Rust also supports reference counting GC, but that is more _verbose_ to use... It should be _simpler_ to use in my view, similar to Python.
Any language targeting the performance envelope rust does needs GC to be opt in. And I’m not sure how much extra verbosity there is to wrap the type with RC/Arc unless you’re referring to the need to throw in a RefCell/Mutex to support in place mutation as well, but that goes back to there not being an alternative easy way to simultaneously have safety and no runtime overhead.
> The main disadvantage of Rust, in my view, is that it's verbose.
Sure, but compared to what? It’s actually a lot more concise than C/C++ if you consider how much boilerplate dancing there is with header files and compilation units. And if you start factoring in that few people actually seem to actually know what the rule of 0 is and how to write exception safe code, there’s drastically less verbosity and the verbosity is impossible to use incorrectly. Compared to Python sure, but then go use something like otterlang [1] which gives you close to Rust performance with a syntax closer to Python. But again, it’s a different point on the Pareto frontier - there’s no one language that could rule them all because they’re orthogonal design criteria that conflict with each other. And no one has figured out how to have a cohesive GC that transparently and progressively lets you go between no GC, ref GC and tracing GC despite foundational research a few years back showing that ref GC and tracing GC are part of the same spectrum and high performing implementations in both the to converge on the same set of techniques.
[1] https://github.com/jonathanmagambo/otterlang
I agree transpile to C will not result in the fastest code (and of course not the fastest toolchain), but having the ability to convert to C does help in some cases. Besides the ability to support some more obscure targets, I found it's useful for building a language, for unit tests [1]. One of the targets, in my case, is the XCC C compiler, which can run in WASM and convert to WASM, and so I built the playground for my language using that.
> transpiling to C (even Go and Lua)
Go: I'm sorry, I thought TinyGo internally converts to C, but it turns out that's not true (any more?). That leaves https://github.com/opd-ai/go2c which uses TinyGo and then converts the LLVM IR to C. So, I'm mistaken, sorry.
Lua: One is https://github.com/davidm/lua2c but I thought eLua also converts to C.
> Your only forced to use it when you’re storing references within a struct.
Well, that's quite often, in my view.
> Not sure when the last time was you tried to write rust code.
I'm not a regular user, that's true [2]. But I do have some knowledge in quite many languages now [3] and so I think I have a reasonable understanding of the advantages and disadvantages of Rust as well.
> Any language targeting the performance envelope rust does needs GC to be opt in.
Yes, I fully agree. I just think that Rust has the wrong default: it uses single ownership / borrowing by _default_, and RC/Arc is more like an exception. I think most programs could use RC/Arc by default, and only use ownership / borrowing where performance is critical.
> The main disadvantage of Rust, in my view, is that it's verbose. >> Sure, but compared to what?
Compared to most languages, actually [4]. Rust is similar to Java and Zig in this regard. Sure, we can argue the use case of Rust is different than eg. Python.
[1] https://github.com/thomasmueller/bau-lang [2] https://github.com/thomasmueller/lz4_simple [3] https://github.com/thomasmueller/bau-lang/tree/main/src/test... [4] https://github.com/thomasmueller/bau-lang/blob/main/doc/conc...
Slow to whom, though?
Yes, they might lose the meaningless benchmarks game that gets thrown around, what matters is are they fast enough for the problem that is being solved.
If everyone actually cared about performance above anything else, we wouldn't have an Electron crap crisis.
Seems like Windows is trying to address the Electron problem by adopting React Native for their WinAppSDK. RN is not just a cross-platform solution, but a framework that allows Windows to finally tap into the pool of devs used to that declarative UI paradigm. They appear to be standardizing on TypeScript, with C++ for the performance-critical native parts. They leverage the scene graph directly from WinAppSDK. By prioritizing C++ over C# for extensions and TS for the render code, they might actually hit the sweet spot.
https://microsoft.github.io/react-native-windows/docs/new-ar...
Anything related to WinUI is a bad joke.
Have fun following the discussions and amount of bugs,
https://github.com/microsoft/microsoft-ui-xaml
That C++ support that WinUI team marketing keeps talking about relies on a framework that is no longer being developed.
> The reason the issues page only lets you create a bug report is because cppwinrt is in maintenance mode and no longer receiving new feature work. cppwinrt serves an important and specific role, but further feature development risks destabilizing the project. Additional helpers are regularly contributed to complimentary projects such as https://github.com/microsoft/wil/.
From https://github.com/microsoft/cppwinrt/issues/1289#issuecomme...
I don't know I think what matters is that performance is close to the best you can reasonably get in any other language.
People don't like leaving performance on the table. It feels stupid and it lets competitors have an easy advantage.
The Electron situation is not because people don't care about performance; it's because they care more about some other things (e.g. not having to do 4x the work to get native apps).
Your second paragraph kind of contradicts the last one.
And yes, caring more about other things is why performance isn't the top number one item, and most applications have long stopped being written in pure C or C++ since the early 2000's.
We go even further in several abstraction layers, nowadays with the ongoing uptake of LLMs and agentic workflows in iPaaS low code tools.
Personally at work I haven't written a pure 100% C or C++ application since 1999, always a mix of Tcl, Perl, Python, C# alongside C or C++, private projects is another matter.
Most applications stopped being written in C/C++ when Java first came out - the first memory safe language with mass enterprise adoption. Java was the Rust of the mid-1990s, even though it used a GC which made it a lot slower and clunkier than actual Rust.
I would say that the "first" belongs to Smalltalk, Visual Basic and Delphi.
What Java had going for it was the massive scale of Sun's marketing, and the JDK being available as free beer, however until Eclipse came to be, all IDEs were commercial, and everyone was coding in Emacs, vi (no vim yet), nano, and so on.
However it only became viable after Java 1.3, when Hotspot became part of Java's runtime.
I agree with the spirit of your comment though, and I also think that the blow given by Java to C and C++ wasn't bigger, only because AOT tools were only available under high commercial prices.
Many folks use C and C++, not due to their systems programming features, rather they are the only AOT compiled languages that they know.
I think transpiling to C is probably the least interesting way to tap into C. FFI is a lot valuable (and doable).
There are surprisingly many languages that support transpiling to C: Python (via Cython), Go (via TinyGo), Lua (via eLua), Nim, Zig, Vlang. The main advantage (in my view) is to support embedded systems, which might not match your use case.
Eiffel, that is how it always worked, a VM based workflow for development (Melt VM), compilation via C or C++ for release builds.
I suppose /some/ performance loss is inevitable. But this could be quite a game changer. As more folks play with it, performing benchmarks, etc -- it should reveal which C idioms incur the most/least performance hits under Fil-C. So with some targetted patching of C code, we may end up with a rather modest price for the memory safety
And I'm not done optimizing. The perf will get better. Rust and Yolo-C will always be faster, but right now we can't know what the difference will be.
Top optimization opportunities:
- InvisiCaps 2.0. While implementing the current capability model, when I was about 3/4 of the way done with the rewrite, I realized that if I had done it differently I would have avoided two branch+compares on every pointer load. That's huge! I just haven't had the appetite for doing yet another rewrite recently. But I'll do it eventually.
- ABI. Right now, Fil-C uses a binary interface that relies on lowering to what ELF is capable of. This introduces a bunch of overhead on every global variable access and every function call. All of this goes away if Fil-C gets its own object file format. That's a lot of work, but it will happen in Fil-C gets more adoption.
- Better abstract interpreter. Fil-C already has an abstract interpreter in the compiler, but it's not nearly as smart as it could be. For example, it doesn't have octagon domain yet. Giving it octagon domain will dramatically improve the performance of loops.
- More intrinsics. Right now, a lot of libc functions that are totally memory safe but are implemented in assembly are implemented in plain Fil-C instead right now, just because of how the libc ports happened to work out. Like, say you call some <math.h> function that takes doubles and returns doubles - it's going to be slower in Fil-C today because you'll end up in the generic C code version compiled with Fil-C. No good reason for this! It's just grunt work to fix!
- The calling convention itself is trash right now - it involves passing things through a thread-local buffer. It's less trashy than the calling convention I started out with (that allocated everything in the heap lmao), but still. There's nothing fundamentally preventing a Fil-C register-based calling convention, but it would take a decent amount of work to implement.
There are probably other perf optimization opportunities that I'm either forgetting right now or that haven't been found yet. It's still early days!
This is such an interesting project.
I've always been firmly in the 'let it crash' camp for bugs, the sooner and the closer to the offending piece of code you can generate a crash the better. Maybe it would be possible to embed Fil-C in a test-suite combined with a fuzzing like tool that varies input to try really hard to get a program to trigger an abend. As long as it is possible to fuzz your way to a crash in Fil-C that would be a sign that there is more work to do.
That way 'passes Fil-C' would be a bit like running code under valgrind and move the penalty to the development phase rather than the runtime. Is this feasible or am I woolgathering, and is Fil-C only ever going to work by using it to compile the production code?
From what I understand some things in Fil-C work "as expected" instead of crashing (e.g. dereferencing a pointer to an out of scope variable will give you the old value of that variable), so it won't work as a sanitizer.
You can use the built-in sanitizer from your compiler though.
At that point why use Fil-C for this though?
Because you don't want to let it crash in production? Sanitizer for testing Fil-C for shipping.
Fil-C will crash on memory corruption too. In fact, its main advantage is crashing sooner.
All the quick fixes for C that don't require code rewrites boil down to crashing. They don't make your C code less reliable, they just make the unreliability more visible.
To me, Fil-C is most suited to be used during development and testing. In production you can use other sandboxing/hardening solutions that have lower overhead, after hopefully shaking out most of the bugs with Fil-C.
The great thing about such crashes is if you have coredumps enabled that you can just load the crashed binary into GDB and type 'where' and you most likely can immediately figure out from inspecting the call stack what the actual problem is. This was/is my go-to method to find really hard to reproduce bugs.
I think the issue with this approach is it’s perfectly reasonable in Fil-C to never call `free` because the GC will GC. So if you develop on Fil-C, you may be leaking memory if you run in production with Yolo-C.
Fil-C uses `free()` to mark memory as no longer valid, so it is important to keep using manual memory management to let Fil-C catch UAF bugs (which are likely symptoms of logic bugs, so you'd want to catch them anyway).
The whole point of Fil-C is having C compatibility. If you're going to treat it as a deployment target on its own, it's a waste: you get overhead of a GC language, but with clunkiness and tedium of C, instead of nicer language features that ground-up GC languages have.
> Rust and Yolo-C will always be faster
graydon points in that direction, but since you're here: how feasible is a hypothetical Fil-Unsafe-Rust? would you need to compile the whole program in Fil-Rust to get the benefits of Fil-Unsafe-Rust?
What is Fil-Rust and Fil-Unsafe-Rust?
It's reasonably easy if you can treat the Safe Rust and Fil-Unsafe-Rust code as accessing different address spaces (in the C programming sense of "a broad subset of memory that a pointer is limited to", not the general OS/hardware sense), since that's essentially what the bespoke Fil-C ABI amounts to in the first place. Which of course is not really a good fit for every use of Unsafe Rust, but might suffice for some of them.
what would fil-rust do that miri doesn't?
e.g. validate safety across safe/unsafe boundaries
Miri does do that? It is not aware of the distinction to begin with (which is one of the use cases of the tool: it lets us exercise safe code to ensure there aren't memory violations caused by incorrect MIR lowering). I might be mistaking what you mean. Miri's big limitation is not being able to interface with FFI.
hmmm I thought miri was used in the compiler for static analysis, wasn't aware it's a runtime interpreter.
I guess the primary reason would be running hardened code in production without compromising performance too much, same as you would run Fil-C compiled software instead of the usual way. I've no idea if it's feasible to run miri in prod.
Can you elaborate on what makes ELF (potentially with custom sections/extension and maybe custom ld.so plugin) insufficient?
A lot of remarkably unusual stuff has been shoved into the format without breaking the tooling, so wondering what the restrictions are.
Love your Yolo-C remark. :)
The savings of two conditional branches sounds interesting; what would the change be?
- Don’t put flags in the high bits of the aux pointer. Instead if an object has flags, it’ll have a fatter header. Most objects don’t have flags.
- Give up on lock freedom of atomic pointers. This is a fun one because theoretically, it’s worse. But it comes with a net perf improvement because there’s no need to check the low bit of lowers.
Scary! I'm excited to see how it turns out.
So you'd have to implement binfmt_misc for the new binary format? Will you need to write your own ld.so?
Yes and yes
If you are not writing anything performance sensitive, you shouldn't be using C in the first place. Even if Fil-C greatly reduces its overhead, I can't see it ever being a good idea for actual release builds.
As a Linux user of two decades, memory safety has never been a major issues that I would be willing to trade performance for. It doesn't magically make my application work it just panics instead of crashes, same end result for me. It just makes it so the issue can not be exploited by an attacker. Which is good but like Linux has been already safe enough to be the main choice to run on servers so meh. The whole memory safety cult is weird.
I guess Fil-C could have a place in the testing pipeline. Run some integration tests on builds made with it and see if stuff panics.
That said, Fil-C is a super cool projects. I don't mean to throw any shades at it.
> If you are not writing anything performance sensitive, you shouldn't be using C in the first place.
Then why are all of the IO-bound low level pieces of Linux userland written in C?
Take just one example: udevd. I have a Fil-C version. There is zero observable difference in performance.
People with Linux servers keep getting hacked so idk if I buy the argument “if it’s in use it’s good enough”. That’s like saying “everyone else runs Pentium 2, why would I upgrade to Pentium 3?”
While memory safety can help reduce many security vulnerabilities it is not the only source of vulnerabilities. Furthermore as for getting hacked I would suspect the main problems to be social engineering, bad configuration and lack of maintenance and not really the software itself being insecure.
> That’s like saying “everyone else runs Pentium 2, why would I upgrade to Pentium 3?”
No one should blindly upgrade because bigger number is better. If I look into new hardware I research benchmarks and figure out if it would enable me to (better) run the software/games I care about it and if the improvement is worth my money.
Same with security. You need to read actual studies and figure out what the cost/benefit of certain measures is.
There are safer alternatives to Linux but apparently the situation isn't bad enough for people to switch to them.
And I am not saying you should create new projects in C or C++. Most people should not. But there is a lot of battle tested C and C++ code out there and to act as if we suddenly have this big problem with memory safety is a weird narrative to push. And if you discover a vulnerability, well fix it instead of wrapping it Fil-C and making the whole thing slower.
403's until you go to https://graydon2.dreamwidth.org/ first
403 until you go to https://graydon2.dreamwidth.org/ first with JavaScript enabled and temporarily allow third-party scripts from awswaf.com.
I only got a captcha prompt on the direct link. Perhaps you have something that disables the captcha, and thus got 403'd?
Getting a "not available in your state" page, does anyone have an archive? I've only recently tried out fil-c and hope to use it in some work projects.
https://web.archive.org/web/20251107024022/https://graydon2....
I am very excited about this. Thanks to HN I see such things. To bad normal media is not any longer interested in anything not AI. On Topic: I am quite sceptical about all this rusting only because we can. Going rust makes the amount of programmers willing to look at the code quite small. A way to add this static testing to c will on the other hand open up the whole c community to a needed thing: memory
Garbage collection is GOFAI (the LISP folks came up with it) and of course GOFAI is AI.
As article points out, this does not solve all the things Rust does (apart for memory/performance, things like point 3.). So new code would be preferable in something like Rust (and some other PLs). However lot of existing code is in C and most of it will stay in C. So Fil-C seems to be really useful here.
I can see the value this brings vs regular C, but I’m less clear on what this brings on top of -fbounds-safety
-fbounds-safety is awesome!
Here’s what Fil-C gives you that -fbounds-safety doesn’t:
- Fil-C gives you comprehensive memory safety while -fbounds-safety just covers bounds. For example, Fil-C panics on use after free and has well defined semantics on ptr-int type confusion.
- -fbounds-safety requires you to modify your code. Fil-C makes unmodified C/C++ code memory safe.
FWIW, I worked on -fbounds-safety and I still think it’s a good idea. :-)
It's always seemed obvious to me that it would be better to make C safer than it would be to rewrite the billions of lines of C that run all our digital infrastructure. Of course that will get pushback from people who care more about rewriting it in a specific language, but pragmatically it's the obvious solution. Nice to see stuff like Fil-C proving it's possible, and if the performance gap can get within 10% (which seems very possible) it would be a no-brainer.
It depends how much the C software is "done" vs being updated and extended. Some legacy projects need a rewrite/rearchitecting anyway (even well-written battle-tested code may stop meeting requirements simply due to the world changing around it).
It also doesn't have to be a complete all-at-once rewrite. Plain C can easily co-exist with other languages, and you can gradually replace it by only writing new code in another language.
Memory safety isn't the only benefit of rewriting C code in Rust. IMO it's maybe not even the biggest.
For example you also get a far stronger type system (leading to fewer logic bugs) and modern tooling.
Seems like soapboxing for Rust via backhanded compliments about this amazing tool. If anything, this tool makes rewriting in Rust that much less attractive. If C and C++ get tools like this that deliver 90% of the benefits of Rust without a rewrite or learning a new and equally complex language, then we can avoid needlessly fracturing the software world. I really think we were there before Fil-C, but this is potentially a game-changer.
I don’t think Fil-C supplants Rust; Rust still has a place for things like kernel development where Fil-C would not be accepted since it wouldn’t work there. But also Rust today has significantly better performance and memory usage so makes more sense for greenfield projects that might otherwise consider C/C++. Not to mention that Rust as a language is drastically easier and faster to develop in due to a modern package management system, a good fast cohesive std library, true cross platform support, static catching of all the issues that would otherwise cause Fil-C to crash instead in addition to having better performance without effort.
Fil-C is an important tool to secure traditional software but it doesn’t yet compete with Rust in the places it’s competing with C and C++ in greenfield projects (and it may never - that’s ok - it’s still valuable to have a way to secure existing code without rewriting it).
And I disagree with the characterization of Graydon’s blog. It’s literally praising Fil-C and saying it’s a valuable piece of tech in the landscape of language dev and worth paying attention to as a serious way to secure a huge amount of existing code. The only position Graydon takes is that safety is a critically important quality of software and Fil-C is potentially an important part of the story of moving the industry forward.
> Seems like soapboxing for Rust via backhanded compliments about this amazing tool.
I'm not sure how you read it that way? To me it reads like "yes, this is a good and notable thing even if it's not perfect".
(The creator of Fil-C is also in this thread and doesn't appear to be reading it that way...)
Don't get me wrong, it sounds positive. A direct attack on Fil-C would have seemed mean-spirited so there is a lot of misdirection. Maybe the author doesn't even see what he's doing because he's so deep in it. But to me the message is clear. No matter what tools are developed for C and C++ to mitigate memory issues, Rust people will never concede that enough of these issues have been solved. They demand a complete rewrite of everything, or at least gradual replacement of all C and C++ code with Rust. Even if Rust is worse in other ways, does not deliver true safety, has technical shortcomings and worse licensing, etc.
This post is very polite compared to what I've seen from some Rust fanatics. But it still strikes me as talking down to the C and C++ community, as if these languages are beyond redemption because they don't work the same as Rust.
Graydon's post was about as full-throated an endorsement of Fil-C as you can get, including noting where it's innovations could be used to improve Rust safety. The fact that you see undertones of some sort of deepset Rust agenda to unseat C and C++ is, I think, more a reflection on just how deep down the rabbit hole some Rust critics have gone, seeing so-called Rust zealots hiding in every shadow of the internet.
From my skim, I didn't see anything mean spirited.
So far as I've seen, Graydon is not a zealot and he doesn't play political games. It was a shame to lose his guiding hand
> deliver 90% of the benefits of Rust without a rewrite
Rust with 1/4 of the speed doesn't feel like 90% of the benefits of Rust. I'm sure the author will make Fil-C faster in time, but Rust is always going to be much faster.
Maybe always faster, but perhaps not always much faster.
I wasn't suggesting that you should run everything with Fil-C all the time. If you run it sometimes, you're likely to catch most problems. The ideal tool would be CHERI or something. I think Rust makes a big mistake with its maximal error checking every time you compile, among its other flaws. Rust compile times are high compared to similar C++ code. The compiler has a high amount of necessary complexity that comes into play every time you run the code with a few lines of changes. Of course, C++ has higher compile times than Go and C, and probably some other languages, but they are fairly different languages with different error modes.
Let me put it another way. We could say that documentation, code formatting, and even profiling, all have a place in development. So would running a model checker or something. But we don't generally make compilers implement these features and make that a mandatory part of the build. I think the complex memory borrowing scheme of Rust unnecessarily forces the compiler to check a lot of extra stuff specific to a certain class of problem. You don't get a choice about how paranoid you need to be about errors. You just have to eat that long compile time, every time.
And people who just don't like like the Rust "style" and would rather write new software in a familiar language with all the features like classic OOP they they are used to.
You can use classic OOP in Rust (even implementation inheritance, via the generic typestate pattern!) It's just not a good idea.
When I heard that writing a linked list in Rust is a challenging problem, I knew it wasn't for me lol.
There are many languages in this space to choose from.
There's something off-putting about how Rust people communicate. Like you're talking to people selling something, or people with ulterior motives. They love talikng about memory safety as that's the only thing standing between humanity having secure and reliable software. They love external audiences, as their brand of fearmongering, virtue signaling and offering of panacea-like solutions resonates with a certain kind of risk-averse decision maker, who tend to in power in uncomfortable numbers.
First of all, Rust's 'fearless concurrency' largely boils down to 'no concurrency' - Rust has about as much concurrency as Javascript - you can copy objects between threads, but not share memory beyond that, with certain libraries allowing some escape hatches in Rust's case.
Additionally the case for aliasing control leading to better and safer code that's easier to reason about just isn't really true in practice - it's so rare that, for example your function can accidentally can alias memory in strictly typed languages - and when it does it's kind of intentional. The concern pretty much only manifests in memcpy, as the ugly hack of 'strict aliasing' - assuming 2 pointers of incompatible types pointing to different bits of memory - works very well in practice.
It even helps with situations people complain about in Rust, like when object A and B (both mutably borrowed) take a reference to a some common service object (which is super common) - but that sort of code simply doesn't compile in Rust.
All in all, I don't dislike Rust as it is but the project's tendency to do activism trying to bully its technical skeptics into submission (which is unfortunately what all activism really is - when you lost the argument, try to louder than the other guy and paint him as reprehensible) - they focused on fixing the technical issues. There has been research into ownership schemes, some exist that are less restrictive than Rust's while offering the same safety guarantees.
In my personal opinion Rust is not done on the conceptual level - by 'done' I mean Rust serving its purpose as the language it claims to be. Maybe there will be Rust 2.0 which will overhaul the ownership system completely or maybe there'll be another language that will do what Rust does but better.
Edit: I wish I could claim I'm some sort of tin-foil conspiracy theorist, but I'm commenting under an article written by one of the key people behind Rust, and it reeks of this attitude.
> First of all, Rust's 'fearless concurrency' largely boils down to 'no concurrency' - Rust has about as much concurrency as Javascript - you can copy objects between threads, but not share memory beyond that
...this is just blatantly false? Like it is false to the extent that I am confused as to what you could even possibly be talking about - I don't know how anyone who has actually written non-trivial programs in both languages could come to the conclusion that they have the same memory model (or much of anything in common when it comes to threads, really).
THERE IS NO ALTERNATIVE TO ADOPTION OF THOUGHTFUL DESING AND PRINCIPLES.