For Toolchain Builders — Ship More, Maintain Less

This part is for tooling developers — those building new languages, compilers, debuggers, or targeting alternate EVMs and virtual machines. If you're only looking to use solx to compile smart contracts, Part 1 should cover everything you need.

If you're working on compilers, debuggers, or alternate VMs, we recommend starting with Part 1 for context, then continuing here.

With 2 man-years of engineering effort, solx has already reached a state where it produces better runtime gas efficiency than solc’s --via-ir --optimize pipeline — and it does so without tuning the LLVM optimizer or implementing any EVM-specific optimizations yet.

Thanks to LLVM, solx’s optimizer and code generator are under 8,000 lines of code and far easier to maintain than a custom pipeline. Much of the complexity typical of compiler development can be offloaded to LLVM’s existing infrastructure.

In this part, we briefly cover solx’s internal structure, how it can be reused or extended, and what it offers if you’re planning to:

develop a new language or adapt an existing one for smart contract development,
implement an experimental EVM feature,
or retarget Solidity to something like RISC-V.

solx Architecture Overview

solx combines the standard Solidity front-end with a new LLVM-based back-end.

It reuses the entire solc front-end (lexer, parser, AST), ensuring full Solidity support and compatibility with language updates. From there, solc lowers the AST to Yul IR, which we translate into LLVM IR.

We also support solc’s legacy pipeline. Since it doesn't expose an IR, we lift the generated EVM assembly into LLVM IR. Because we observed some contracts that compile cleanly with the legacy pipeline fail or behave inconsistently under --via-ir, we’ve made the legacy path the default - just like solc does today. Users must explicitly pass --via-ir to use the Yul-based flow.

Once in LLVM IR, solx applies standard LLVM optimizations, then hands off to our custom EVM backend. This handles instruction selection, scheduling, stackification, and assembly or binary emission - reusing as much of LLVM’s infrastructure as possible.

This design let us deliver a working pre-alpha with two man-years of engineering effort, but it comes with trade-offs: some optimizations are currently inhibited, and binaries can be larger. We briefly touched on these issues in Part 1. To address them, we're developing a new high-level Solidity IR based on LLVM's MLIR framework.

But let’s start with focusing on what solx as a compiler infrastructure is offering today.

Why Modularity Matters: Solidity for RISC-V, Rust for EVM

One of LLVM’s most important innovations dating back to the early 2000s was its Intermediate Representation. At the time, most compilers tightly coupled language-specific ASTs with target-specific code generators. LLVM IR introduced a complete and self-contained representation of programs, including type information and debug metadata. This let compiler components and tools operate entirely at the IR level without requiring access to language-specific internals.

Frontend developers started to rely on a simple and stable interface - LLVM IR - without needing to understand code generation or debug formats. Backend developers only needed to support that IR format, regardless of how the source language worked. The optimizer, meanwhile, transformed IR and automatically updated associated metadata and debug info. And in 2025, LLVM’s optimizer preserves debug information in optimized builds reasonably well, thanks in large part to demands from the gaming industry, which needs fully optimized code to remain debuggable.

LLVM’s modularity is what makes solx adaptable.

Vitalik Buterin recently floated the idea of replacing the EVM with a RISC-V–based machine for Ethereum’s execution layer. If that ever happens, solx is ready - we estimate that adapting our LLVM IR from EVM to RISC-V would require less than 10% of the IR to change. In contrast, compilers like solc and Vyper would face a much steeper challenge. They would either need to start emitting LLVM IR or implement instruction selection, scheduling, register allocation, and binary emission for RISC-V from scratch.

This flexibility goes beyond backends. With LLVM, writing a new language for Ethereum requires just two entry points: IRBuilder for IR construction, and DIBuilder for debug info. You don’t need to know how the EVM works, or what ethdebug expects. You just write to the IR.

It makes the widespread idea of Rust for EVM within reach. We’re not working on this right now, and Rust isn’t a language designed for smart contract development. Still, supporting the EVM subset of LLVM IR we designed in Rust could be done with about one engineer-year of effort.

Modularity also means the ability to focus - without the necessity to maintain large bases of code.

Smaller Codebase, Easier Maintenance

At first glance, relying on LLVM might seem counterintuitive - it's a large system, and integrating it can appear to add complexity. But think of the alternative. Without reuse, not just optimizations, but all the foundational infrastructure - IR design, printing, parsing, assembly emission, and more - must be built from scratch and then maintained. In contrast, for solx we only maintains translation to LLVM IR and a custom LLVM backend for the EVM. To give you a sense of the difference, below is the cloc output for the EVM backend in solx compared to libyul and libevmasm from solc.

> cloc solidity/libyul
---------------------------------------------------------------------------
Language                     files          blank        comment           code
---------------------------------------------------------------------------
C++                             96           2652           2269          15585
C/C++ Header                   108           1784           3781           5597
CMake                            1              2              0            207
Markdown                         1              2              0              3
---------------------------------------------------------------------------
SUM:                           206           4440           6050          21392
---------------------------------------------------------------------------

> cloc solidity/libevmasm
---------------------------------------------------------------------------
Language                     files          blank        comment           code
---------------------------------------------------------------------------
C++                             20            716            649           6603
C/C++ Header                    24            583            995           2601
CMake                            1              1              0             47
---------------------------------------------------------------------------
SUM:                            45           1300           1644           9251
---------------------------------------------------------------------------

> cloc --force-lang="C/C++ Header",def llvm/lib/Target/EVM
---------------------------------------------------------------------------
Language                     files          blank        comment           code
---------------------------------------------------------------------------
C++                             39           1180           1400           5661
C/C++ Header                    24            366            434           1198
TableGen                         4            241            191            915
CMake                            4             16              0            106
LLVM IR                          1             15              0             69
---------------------------------------------------------------------------
SUM:                            72           1818           2025           7949

But it’s not the end, LLVM does not natively support stack machines, so we implemented stackification ourselves. In doing so, we drew inspiration from both the designs used in solc and the WASM backend in LLVM. This significantly increased the amount of code we need to maintain today.

Still, one can imagine generalizing stackification across stack-based targets - similar to how LLVM handles register machines through declarative target descriptions and shared algorithms. In LLVM, most of the code generation logic - like instruction selection, register allocation, and scheduling - is written once and reused across architectures. Targets such as x86, ARM, or RISC-V don’t each implement these from scratch; instead, they define their register sets, constraints, and instruction mappings declaratively. The LLVM backend then uses a shared pipeline to apply this information in a uniform way.

If integrated upstream, stackification generalization would be a meaningful contribution to LLVM itself - advancing support for all stack-based architectures and saving significant human-hours across the compiler ecosystem. It would also reduce solx maintenance burden in the long run. Here is the number of LoCs for stackification logic we have.

---------------------------------------------------------------------------
Language                     files          blank        comment           code
---------------------------------------------------------------------------
C++                              9            530            701           2759
C/C++ Header                     4            103            120            467
---------------------------------------------------------------------------
SUM:                            13            633            821           3226
---------------------------------------------------------------------------

This reflects a broader paradigm common in Web2 tooling: upstream whatever can be upstreamed. Maintaining compilers is expensive, and reducing custom logic is a proven way to keep development velocity high in long run.

Tooling Drives Compiler Development Speed

Small teams rarely have time to build tools that support compiler development. But since solx is built on LLVM, it gets them for free. Here are three tools that, in our experience, have each saved us at least a human-day during the week we were working on this blogpost.

Note: To try these tools yourself, you’ll need to build our LLVM fork - these tools aren’t bundled with solx directly.

`opt` and `llc`: Step-by-Step Transformation Tracking

opt and llc are LLVM’s command-line tools for running optimization passes and lowering IR to target code, respectively—and they’re invaluable for understanding what the compiler is doing.

In Part 1, we analyzed a contract’s performance and identified which transformations contributed to the observed gas savings. With LLVM, this kind of investigation is straightforward: opt -print-changed shows which passes modified the IR, and --debug reveals how those transformations were applied—or why some didn’t happen. The same applies to llc, which provides similar insights for the code generation phase.

`bugpoint`: Automatic Test Case Reduction

bugpoint is LLVM’s built-in tool for automatically minimizing test cases that trigger miscompilations or crashes.

While working on a stack-too-deep issue we found in a contract, we used bugpoint to reduce the size of problematic LLVM IR inputs—shrinking one from 67.6 KB to 8.6 KB of textual LLVM IR representation, and another from 75.3 KB to just 5.9 KB. These reductions made debugging faster and more focused.

`llvm-lit` and `update_llc_test_checks.py`: Making Regression Testing Simple

The contract reductions from the previous example were saved along with the bug fixes they exposed. Stackification is difficult to debug, and minimal test cases that trigger specific corner cases are worth preserving. LLVM’s testing tools make that easy: with llvm-lit, you can add IR-based tests that specify how to run the compiler and what to check. update_llc_test_checks.py then auto-generates assembly checks based on the current output.

These tools haven’t saved us a day yet on stackification - but based on prior experience, we know they will.

LLVM for Ethereum

solx isn't only a command-line compiler - it’s a rethink of how Ethereum tooling can be built.

By leveraging LLVM, solx provides a modular, maintainable, and extensible compiler infrastructure. Teams can specialize, collaborate, and reuse tooling instead of rebuilding everything from scratch.

It also challenges a long-standing belief: that the EVM is too different for general-purpose tooling. Yes, the EVM has unique constraints. But it shares more with other platforms than often acknowledged. Like embedded systems, it imposes strict size limits, making optimization a correctness issue. Like game engines, debugging optimized builds is essential - replaying transactions of optimized bytecode requires it. And EVM isn’t even the only stack-based VM; LLVM already supports WebAssembly.

LLVM for Ethereum isn’t theoretical anymore - it’s here.

And with LLVM, tools like llvm-cov for coverage, lldb and VS Code for step-debugging, and other mainstream dev tools are within reach.

Building With Us

If you're working on dev tools, solx is ready to be integrated, and extended to your needs. Here’s how you can start collaborating with us.

🧪 Integrate solx into dev tools: From a CLI perspective, solx works as a drop-in solc replacement. But if you encounter edge cases, we’re happy to help. If you need access to internal LLVM options not exposed in the CLI or to LLVM tools, we’re happy to assist with that as well.
🤝 Contribute to solx: solx is an open-source project, and we welcome contributions - with no gatekeeping. Small additions like tests, bug fixes, or features are always welcome. If you’re planning something larger, we recommend reaching out early - it helps align efforts and avoid duplicate work. We do expect major contributions to be maintained.
💬 Stay in touch: You can reach us via Telegram or by email at solx@matterlabs.dev. We’re always interested in hearing what you’re building, what you need, and where we can make things easier.