Last updated: 2024-03-28 21:36:36
Channels

Improved Multithreading in wgpu - Arcanization Lands on Trunk gfx-rs nuts and bolts

gfx-rs nuts and bolts2023-11-24 00:00:00

Arcanization is a large refactoring of wgpu’s internals aiming at reducing lock contention, and providing better performance when using wgpu on multiple threads. It was just merged into wgpu’s trunk branch and will be published as part of the 0.19 release scheduled for around January 17th.

A Long Journey

Before diving into the technical details, let’s have a quick look at the history of this project. The work started some time around mid 2021 with significant involvement from @pythonesque, @kvark and @cwfitzgerald. It went though multiple revisions, moving from one person to the next, until @gents83 picked it up and opened a pull request on March 30th 2023.

Fast-forward November 20th, after countless rebases, revisions and fixes by @gents83 spanning nearly 8 months, the pull request is finally merged! They tirelessly maintained this big and complex refactoring, all while the project was constantly changing and improving underneath them!

The Problem

wgpu internally stores all resources (buffers, textures, bind groups, etc.) in big contiguous arrays held by what we call the Hub.

Most of the data stored in these arrays is immutable. Once created, it never changes until the resource is destroyed. Inside and outside wgpu, the resources referred to by Ids which boil down to indices in the resource arrays with metadata.

A simplified diagram showing the Hub and resource arrays

This should play well with parallel access of the data from multiple threads, right? Unfortunately adding and removing resources requires mutable access to these resource arrays. Which meant adding locks. Locks when adding or removing items, but also locks while reading from the data they contain. Locks everywhere, and locks that have to be held for a non-negligible duration. This caused a lot of lock contention and poor performance when wgpu is used on multiple threads.

Interestingly, wgpu also had to maintain internal reference counts to resources, to keep track of the dependencies between them (for example a bind group depends on the bindings it refers to). This reference counting was carried out manually, and rather error-prone.

The solution

“Arcanization”, as it names implies, was the process of moving resources behind atomic reference counted pointers (Arc<T>). Today the Hub still holds resource arrays, however these contain Arcs instead of the data directly. This lets us hold the locks for much shorter times - in a lot of cases only while cloning the arc - which can then be read from safely outside of the critical section. In addition, some areas of the code don’t need to hold locks once the reference has been extracted.

A simplified diagram showing resources stored via Arcs

The result is much lower lock contention. If you use wgpu from multiple threads, this should significantly improve performance. Our friends in the bevy engine community noted that some very early testing (on an older revision of arcanization) showed that with arcanization, the encoding of shadow-related commands can run in parallel with the main passes, yielding 45% frame time reduction on a test scene (the famous bistro scene) compared to their single threaded configuration. Without arcanization, lock contention is too high to significantly improve performance.

In addition, wgpu’s internals are now simpler. This change lifted some restrictions and opens the door for further performance and ergonomics improvements.

wgpu 0.19

The next release featuring this work will be 0.19.0 which we expect to publish around January 17th. We made sure to merge the changes early in the release cycle to give ourselves as much time as we can to catch potential regressions.

This is an absolutely massive change and while we have and are testing as best we can, we do need help from everyone else. Please try updating your project to the latest wgpu and running it. Please report any issues you find!

What’s next?

Lifting RenderPass<'a> lifetime restrictions

If you have used wgpu, there is decent chance that you have had to work around the restrictions imposed by the 'rpass lifetime in a lot of RenderPass’s methods, such as set_bind_group, set_pipeline, and, set_vertex_buffer. The recent changes give us the opportunity to store Arcs where &'a references were previously needed which should let us remove these lifetime restrictions.

Internal improvements

There is ongoing work to ensure that buffer, textures, and devices can be destroyed safely while their handles are still alive. This is important for Firefox which uses wgpu_core as the basis for its WebGPU implementation. In the garbage-collected environment of javascript, the deallocation of resources is non-deterministic and can happen a long time after the program is done using the resources. While this in itself does not require arcanization, it gives us a better foundation to improve upon internal resource lifetime management.

Reference counting at the API level

So resources like buffers and textures are now internally reference counted, but the handles wgpu exposes are not. Could we potentially expose the reference counted resources more directly, avoiding going through the Hub? Most likely yes. That would be another fairly large project with important implications to wgpu_core’s recording infrastructure and how it integrates in Firefox. It won’t happen overnight, but that’s certainly something the wgpu maintainers would like to move towards.

Closing words

Changes of this scope and complexity take tremendous effort to realize, and take orders of magnitude more effort to push over the finish line. @gents83’s achievement here is truly outstanding. He poured an endless amount of time, effort, and patience into this work, which we now all benefit from, and deserves equally endless amounts of recognition for it.

Thanks @gents83!


Release of wgpu v0.13 and Call for Testing gfx-rs nuts and bolts

gfx-rs nuts and bolts2022-06-30 00:00:00

The gfx-rs community’s goal is to make graphics programming in Rust easy, fast, and reliable. Our main projects are:

  • wgpu is a portable graphics api. It provides safe, accessible, and portable access to the GPU.
  • naga translates shader programs between languages, including WGSL. It also provides shader validation and transformation, ensuring user code running on the GPU is safe and efficient.

After a long gap between releases, we have just rolled out v0.13 of wgpu and v0.9 of naga! See wgpu v0.13 changelog and naga v0.9 changelog for the details and migration guide.

While it’s been a long time between releases, we’ve been hard at work improving both wgpu’s implementation and its user facing experience.

Performance and Correctness

This release we’ve focused on improving both our performance and correctness. One of our biggest bottlenecks, tracking performance, has been significantly improved and is no longer the biggest bottleneck. There are more performance improvements coming in the near future.

There have been many bugs fixed in this release on all backends.

naga Improvements

naga, our shader translator, has improved substantially.

All backends and frontends have gotten even more solidly tested with a truly massive amount of bugs being fixed.

Additionally naga now supports the newest rendition of the wgsl spec, bringing it back inline with other WebGPU projects. See the wgpu changelog for transition details.

Presentation and Pipelining

We have focused some of our attention on improving the interface for surface managment and presentation. Most importantly we now allow a greater set of presentation modes (Mailbox, Fifo, FifoRelaxed, and Immediate) and have removed implicit fallback over explicit “Automatic” modes which have defined fallback paths (AutoVsync and AutoNoVsync). Additionally, surfaces now expose the full set of texture formats that can be used on them, not just their most preferred format. This should be paving the way for HDR and more explicit color space support.

Additionally we have changed BufferSlice::map_async from returning a future that resolves when the mapping is complete to calling a callback when the mapping is complete. We have received a sizable amount of feedback about how hard the futures based api was to use and how easily it leads to deadlocks or very poor performance. The callback based api makes it more clear what is actually happening under the hood and discourages the usage patterns that caused issues.

Call for Testing: DX12

For a variety of performance and stability reasons we are looking at making wgpu’s default backend on windows DX12 instead of vulkan. As part of this push we need people to test their wgpu 0.13 code on the DX12 backend. The easiest way to do this (for testing purposes) is, when you create your instance to pass in DX12 as the only available backend.

let instance = wgpu::Instance::new(wgpu::Backends::DX12);

If you find any inconsistencies, bugs, or crashes with this, please file a bug report!

For more information on this change, please see the tracking issue: #2719.

Release Schedule

We’ve slipped significantly from our original cadence of a release every 3 to 4 months with this release being nearly 7 months after the last release. As part of the effort to make releases less substantial and easier on both us and our users, we’re going to be attempting to follow a stricter 3 month (90 day) release cadence. This way contributors can be sure their changes get released in a timely fashion and release management easier on us.

Thank You!

Thanks to the countless contributors that helped out with this release! wgpu and naga’s momentum is truly incredible due to everyone’s contributions and we look forward to seeing the amazing places wgpu and naga will go. If you are interested in helping, take a look at our good-first-issues, our issues with help wanted, or contact us on our matrix chat, we are always willing to help mentor first time and returning contributors.

Additionally, thank you to all the users who report new issues, ask for enhancements, or test the git version of wgpu. Keep it coming!

Happy rendering!


This Year in Wgpu - 2021 gfx-rs nuts and bolts

gfx-rs nuts and bolts2021-12-25 00:00:00

gfx-rs community’s goal is to make graphics programming in Rust easy, fast, and reliable. Our main projects are:

  • wgpu is built on top of wgpu-hal and naga. It provides safety, accessibility, and portability for graphics applications.
  • naga translates shader programs between languages, including WGSL. It also provides shader validation and transformation, ensuring user code running on the GPU is safe and efficient.

As 2021 comes to an end, let’s look back at everything that has been accomplished.

Fredrik Norén's terrain with trees

Wgpu

We moved from gfx-hal to the newly created wgpu-hal and restructured the repository to keep everything together. At the same time, we dropped SPIRV-Cross in favor of naga, reaching the pure-rust tech stack. Read more in the 0.10 release post. Credit goes to @kvark.

At the same time, @cwfitzgerald has revamped our testing infrastructure with Rust integration tests and example snapshots. On top of that, wgpu has tightly integrated with Deno (thanks to the effort of Deno team!), opening up the road to testing on a real CTS, which is available in CI now.

One shiny highlight of the year was the WebGL port, which became practically usable. Getting it ready was truly a collaborative effort, kicked off by @zicklag. Today, wgpu-rs examples can be run online with WebGL.

In terms of correctness and portability, @Wumpf landed the titanic work of ensuring all our resources are properly zero-initialized. This has proven to be much more involved than it seems, and now users will get consistent behavior across platforms.

Finally, we just released version 0.12 with the fresh and good stuff!

Naga

Naga grew more backends (HLSL, WGSL) and greatly improved support all around the table. It went from an experimental prototype in 0.3 to production, shipping in Firefox Nightly. It proved to be 4x faster than SPIRV-Cross at SPV->MSL translation.

One notable improvement, led by @JCapucho with some help from @jimblandy, is the rewrite of SPIR-V control flow processing. This has been a very problematic and complicated area in past, and now it’s mostly solved.

Things have been busy on GLSL frontend as well. It got a completely new parser thanks to @JCapucho, which made it easier to improve and maintain.

Validation grew to cover all the expressions and types and everything. For some time, it was annoying to see rough validation errors without any reference to the source. But @ElectronicRU saved the day by making our errors really nice, similar to how WGSL parser errors were made pretty by @grovesNL work earlier.

Last but not the least, SPIR-V and MSL backends have been bullet-proofed by @jimblandy. This includes guarding against out-of-bounds accesses on arrays, buffers, and textures.

Future Work

One big project that hasn’t landed is the removal of “hubs”. This is a purely internal change, but a grand one. It would streamline our policy of locking internal data and allow the whole infrastructure to scale better with more elaborate user workloads. We hope to see it coming in 2022.

Another missing piece is DX11 backend. We know it’s much needed, and it was the only regression from the wgpu-hal port. This becomes especially important now as Intel stopped supporting DX12 on its Haswell GPUs.

Overall, there’s been a lot of good quality contributions, and this list by no means can describe the depth of it. We greatly appreciate all the improvements and would love to shout out about your work at the earliest opportunity. Big thanks for everybody involved!


Release of wgpu v0.11 and naga v0.7 gfx-rs nuts and bolts

gfx-rs nuts and bolts2021-10-07 00:00:00

gfx-rs community’s goal is to make graphics programming in Rust easy, fast, and reliable. Our main projects are:

  • wgpu is built on top of wgpu-hal and naga. It provides safety, accessibility, and portability for graphics applications.
  • naga translates shader programs between languages, including WGSL. It also provides shader validation and transformation, ensuring user code running on the GPU is safe and efficient.

Following our release cadence of every few months, we rolled out v0.11 through all of the gfx-rs projects! See wgpu v0.11 changelog and naga v0.7 changelog for the details.

This is our second release using our pure rust graphics stack. We’ve made a significant progress with shader translation and squashed many bugs in both wgpu and the underlying abstraction layer.

WebGL2

Thanks to the help of @Zicklag for spearheading the work on the WebGL2 backend. Through modifying the use of our OpenGL ES backend, they got WebGL2 working on the web. The backend is still in beta, so please test it out and file bugs! See the guide to running on the web for more information.

The following shows one of Bevy’s PBR examples running on the web.

bevy running on webgl2

Explicit Presentation

A long standing point of confusion when using wgpu was that dropping the surface frame caused presentation. This was confusing and often happened implicitly. With this new version, presentation is now marked explicitly by calling frame.present(). This makes very clear where the important action of presentation takes place.

More Robust Shader Translation

naga has made progress on all frontends and backends.

The most notable change was that @JCapucho, with the help of @jimb, completely rewrote the parsing of spirv’s control flow. spirv has notably complex control flow which has a large number of complicated edge cases. After multiple reworks, we have settled on this new style of control flow graph parsing. If you input spirv into wgpu, this will mean that even more spirv, especially optimized spirv, will properly validate and convert.

See the changelog for all the other awesome editions to naga.

Thank You!

Thanks to the countless contributors that helped out with this release! wgpu and naga’s momentum is truely incredible due to everyone’s contributions and we look forward to seeing the amazing places wgpu and naga will go as projects. If you are interested in helping, take a look at our good-first-issues, our issues with help wanted, or contact us on our matrix chat, we are always willing to help mentor first time and returning contributors.

Additionally, thank you to all the users who report new issues, ask for enhancements, or test the git version of wgpu. Keep it coming!

Happy rendering!


wgpu alliance with Deno gfx-rs nuts and bolts

gfx-rs nuts and bolts2021-09-16 00:00:00

gfx-rs community’s goal is to make graphics programming in Rust easy, fast, and reliable. Our main projects are:

  • wgpu is built on top of wgpu-hal and naga. It provides safety, accessibility, and portability for graphics applications.
  • naga translates shader programs between languages, including WGSL. It also provides shader validation and transformation, ensuring user code running on the GPU is safe and efficient.

wgpu works over native APIs, such as Vulkan, D3D12, Metal, and others. This involves a layer of translation to these APIs, which is generally straightforward. It promises safety and portability, so it’s critical for this library to be well tested. To this date, our testing was a mix of unit tests, examples, and a small number of integration tests. Is this going to be enough? Definitely no!

Fortunately, WebGPU is developed with a proper Conformance Test Suite (CTS), largely contributed by Google to date. It’s a modern test suite covering all of the API parts: API correctness, validation messages, shader functionality, feature support, etc. The only complication is that it’s written in TypeScript against the web-facing WebGPU API, while wgpu exposes a Rust API.

Deno

We want to be sure that the parts working today will keep working tomorrow, and ideally enforce this in continuous integration, so that offending pull requests are instantly detected. Thus, we were looking for the simplest way to bridge wgpu with TS-based CTS, and we found it.

Back in March Deno 1.8 shipped with initial WebGPU support, using wgpu for implementing it. Deno is a secure JS/TS runtime written in Rust. Using Rust from Rust is :heart:! Deno team walked the extra mile to hook up the CTS to Deno WebGPU and run it, and they reported first CTS results/issues ever on wgpu.

Thanks to Deno’s modular architecture, the WebGPU implementation is one of the pluggable components. We figured that it can live right inside wgpu repository, together with the CTS harness. This way, our team has full control of the plugin, and can update the JS bindings together with the API changes we bring from the spec.

Today, WebGPU CTS is fully hooked up to wgpu CI. We are able to run the white-listed tests by the virtue of adding “needs testing” tag to any PR. We are looking to expand the list of passing tests and eventually cover the full CTS. The GPU tests actually run on github CI, using D3D12’s WARP software adapter. In the future, we’ll enable Linux testing with lavapipe for Vulkan and llvmpipe for GLES as well. We are also dreaming of a way to run daemons on our working (and idle) machines that would pull revisions and run the test suite on real GPUs. Please reach out if you are interested in helping with any of this :wink:.

Note that Gecko is also going to be running WebGPU CTS on its testing infrastructure, independently. The expectation is that Gecko’s runs will not show any failures on tests enabled on our CI based on Deno, unless the failures are related to Gecko-specific code, thus making the process of updating wgpu in Gecko painless.

We love the work Deno is doing, and greatly appreciate the contribution to wgpu infrastructure and ecosystem! Special thanks to Luca Casonato and Leo K for leading the effort :medal_military:.


Release of a Pure-Rust v0.10 and a Call For Testing gfx-rs nuts and bolts

gfx-rs nuts and bolts2021-08-18 00:00:00

gfx-rs community’s goal is to make graphics programming in Rust easy, fast, and reliable. Our main projects are:

  • wgpu is built on top of wgpu-hal and naga. It provides safety, accessibility, and portability for graphics applications.
  • naga translates shader programs between languages, including WGSL. It also provides shader validation and transformation, ensuring user code running on the GPU is safe and efficient.

If you’ve been following these releases you’ll notice that gfx-hal is absent from this list. gfx-hal has now been deprecated in favor of a new abstraction layer inside of wgpu called wgpu-hal. To see more information about the deprecation, see the 0.9 release post.

Following our release cadence every few months, we rolled out 0.10 through all of the gfx-rs projects! See wgpu v0.10 changelog and naga v0.6 changelog for the details.

Pure-Rust Graphics

wgpu has had many new changes, the most notible of which is the switch to our new Hardware Abstraction Layer wgpu-hal. This includes completely rebuilt backends which are more efficient, easier to maintain, and signifigantly leaner. As part of this, we have shed our last C/C++ dependency spirv-cross. We now are entirely based on naga for all of our shader translation. This is not only a marked achievement for rust graphics, but has made wgpu safer and more robust.

The new wgpu-hal:

  • Supports Vulkan, D3D12, Metal, and OpenGL ES with D3D11 to come soon.
  • Has 60% fewer lines of code than gfx-hal (22k LOC vs 55k)
  • Maps better to the wide variety of backends we need to support.

Other notable changes within wgpu:

  • Many api improvements and bug fixes.
  • New automated testing infrastructure.

naga has continued to matured significantly since the last release:

  • hlsl output is now supported and working well.
  • wgsl parsing has had numerous bugs fixed.
  • spirv parsing support continues to be very difficult but improving steadily.
  • With wgpu-hal now dependending on naga, all code paths have gotten signifigant testing.
  • Validation has gotten more complete and correct.

Call For Testing

This is an extremely big release for us. While we have confidence in our code and we have tested it extensively, we need everyone’s help in testing this new release! As such we ask if people can update to the latest wgpu and report to us any problems or issues you face.

If you aren’t sure if something is an issue, feel free to hop on our matrix chat to discuss.

Thank You!

Thanks to the countless contributors that helped out with this massive release! wgpu’s momentum is truely incredible due to everyone’s contributions and we look forward to seeing the amazing places wgpu will go as a project. If you are interested in helping, take a look at our good-first-issues, our issues with help wanted, or contact us on our matrix chat, we are always willing to help mentor first time and returning contributors.

Additionally, thank you to all the users who report new issues, ask for enhancements, or test the git version of wgpu. Keep it coming!

Happy rendering!


Release of v0.9 and the Future of wgpu gfx-rs nuts and bolts

gfx-rs nuts and bolts2021-07-16 00:00:00

gfx-rs community’s goal is to make graphics programming in Rust easy, fast, and reliable. Our current main projects are:

  • gfx-rs makes low-level GPU programming portable with low overhead. It’s a single Vulkan-like Rust API with multiple backends that implement it: Direct3D 12/11, Metal, Vulkan, and even OpenGL ES.
  • naga translates the shaders between languages, including WGSL. Also provides validation and processing facilities on the intermediate representation.
  • wgpu is built on top of gfx-rs and gpu-alloc/gpu-descriptor. It provides safety, accessibility, and strong portability of applications.

Following our release cadence every few months, we rolled out the 0.9 version through all of the gfx projects! See gfx-rs changelog, wgpu changelog, and naga changelog for the details.

naga has matured significantly since the last release.

  • wgsl parsing has improved incredibly, targeting an up-to-date spec.
  • spirv parsing support has had numerous bugs fixed.
  • glsl support is starting to take shape, though still in an alpha state.
  • Validation has gotten more complete and correct.

wgpu validation has continued to improve. Many validation holes were plugged with the last release. Through the combined work in wgpu and naga, validation holes have been sured up, and new features have been implemented. One such feature is getting the array length of runtime-sized arrays, which is now properly implemented on metal.

wgpu performance is still a vital target for us, so we have done work on improving the overhead of resource tracking. We’ve reduced unnecessary overhead through only doing stateful tracking for resources that have complex states. These changes were made from benchmarks of Gecko’s WebGPU implementation which showed that tracking was a bottleneck. You can read more about it #1413.

wgpu Family Reunion, Relicense, and the Future

wgpu has had a number of large internal changes which are laying the future for wgpu to be a safe, efficient, and portable api for doing cross-platform graphics.

wgpu has been relicensed from MPL-2.0 to MIT/Apache-2.0. Thank you to all 142 people who replied to the issue and made this happen. This relicense is an important change because it allows the possibility of adding backends targeting APIs which are behind NDAs.

For a while, we acknowledged that having different essential parts of the project living in different repositories was hurting developers productivity. There were objective reasons for this, but the time has come to change that. Feedback from our friends at the Bevy game engine gave us the final push and we launched an initiative to make wgpu easier to contribute to. We moved wgpu-rs back into the wgpu repo. This means that PRs that touch both the core crate and the rust bindings no longer need multiple PRs that need to be synchronized. We have already heard from collaborators how much easier the contribution is now that there is less coordination to do. Read more about the family reunion.

As a part of our family reunion, 0.9 is going to be the last release that will use gfx-hal as its hardware abstraction layer. While it has served us well, it has proved to not be at the exact level of abstraction we need. We have started work on a new abstraction layer called wgpu-hal. This new abstraction has already had Vulkan, Metal, and GLES ported, with DX12 landed in an incomplete state, and DX11 to come soon. To learn more about this transition, you can read the whole discussion.

Finally, we have brand new testing infrastructure that allows us to automatically test across all backends and all adapters in the system. Included in our tests are image comparison tests for all of our examples and the beginnings of feature tests. We hope to expand this to cover a wide variety of features and use cases. We will be able to run these tests in CI on software adapters and our future goal is to setting up a distributed testing network so that we can automatically test on a wide range of adapters. This will be one important layer of our in-depth defences, ensuring that wgpu is actually portable and safe. Numerous bugs have already been caught by this new infrastructure thus far and it will help us prevent regressions in the future. Read more about our testing infrastructure.

Thank You!

Thank you for the countless contributors that helped out with this release! wgpu’s momentum is only increasing due to everyone’s contributions and we look forward to seeing the amazing places wgpu will go as a project. If you are interested in helping, take a look at our good-first-issues, our issues with help wanted, or contact us on our matrix chat, we are always willing to help mentor first time and returning contributors.

Additionally, thank you to all the users who report new issues, ask for enhancements, or test the git version of wgpu. Keep it coming!

Happy rendering!


Shader translation benchmark on Dota2/Metal gfx-rs nuts and bolts

gfx-rs nuts and bolts2021-05-09 00:00:00

gfx-rs community’s goal is to make graphics programming in Rust easy, fast, and reliable. See The Big Picture for the overview, and release-0.8 for the latest progress. In this post, we are going to share the first performance metrics of our new pure-Rust shader translation library Naga, which is integrated into gfx-rs. Check the Javelin announcement, which was the original name of this project, for the background.

gfx-portability is a Vulkan Portability implementation in Rust, based on gfx-rs. Previous Dota2 benchmarks showed good potential in our implementation. However, it couldn’t be truly called an alternative to MoltenVK if it relies on SPIRV-Cross. Today, we are able to run Dota2 with a purely rust Vulkan Portability implementation, thanks to Naga.

Test

Testing was done on MacBook Pro (13-inch, 2016), which has a humble dual-core Intel CPU running at 3.3GHz. We created an alias to libMoltenVK.dylib and pointed DYLD_LIBRARY_PATH to it for Dota2 to pick up on boot, thus running on gfx-portability. It was build from naga-bench-dota tag in release. The SPIRV-Cross path was enabled by uncommenting features = ["cross"] line in libportability-gfx/Cargo.toml.

In-game steps:

  1. launch make dota-release
  2. skip the intro videos
  3. proceed to “Heroes” menu
  4. select “Tide Hunter”
  5. and click on “Demo Hero”
  6. walk the center lane, enable the 2nd and 3rd abilities
  7. use the 3rd ability, then quit

Hero selection screen with Naga (low settings)

The point of this short run is to get a bulk of shaders loaded (about 600 graphics pipelines). We are only interested in the CPU cost for loading shaders and creating pipelines. This isn’t a test for the GPU time executing the shaders. The only fact about GPU that matters here is that the picture looks identical. We don’t expect any architectural changes for potential visual issues to be discovered.

Times were collected using profiling instrumentation, which is integrated into gfx-backend-metal. We added this as a temporary dependency to gfx-portability with “profile-with-tracy” feature enabled in order to capture the times in Tracy.

In tracy profiles, we’d find the relevant chunks and click on the “Statistics” for them. We are interested in the mean (μ) time and the standard deviation (σ).

Results

Function Cross μ Cross σ Naga μ Naga σ
SPIR-V parsing 0.34ms 0.15ms 0.45ms 0.50ms
MSL generation 3.94ms 3.5ms 0.56ms 0.38ms
Total per stage 4.27ms   1.01ms  
         
create_shader_module 0.005ms 0.01ms 0.53ms 0.57ms
create_shader_library 5.19ms 6.19ms 0.89ms 1.23ms
create_graphics_pipeline 10.94ms 12.05ms 2.24ms 5.13ms

The results are split in 2 groups: one for the time spent purely in the shader translation code of SPIRV-Cross (or just “Cross”) and Naga. And the other group shows combined times of the translation + Metal runtime doing its part. The latter very much depends on the driver caches of the shaders, which we don’t have any control of. We made sure to run the same test multiple times, and only take the last result, giving the opportunity for caches to warm up. Interestingly, the number of outliers (shaders that ended up missing the cache) was still higher in the “Cross” path. This may be just noise, or improperly warmed up caches, but there is a chance it’s also indicative of the fact “Cross” generates more of different shaders, and/or being non-deterministic.

The total time spent in shader module or pipeline creation is 7s with Cross path and just 1.29s with Naga. So we basically shaved 6 seconds off the user (single-core) time just to get into the game.

In neither case there was any pipeline caching involved. One could argue that pipeline caches, when loaded from disk, would essentially solve this problem, regardless of the translation times. We have the support for caching implemented for Naga path, and we don’t want to make it unfair to Cross, so we excluded the caches from the benchmark. We will definitely include them in any full games runs of gfx-portability versus MoltenVK in the future.

Conclusions

This benchmark shows Naga being roughly 4x faster than SPIRV-Cross in shader translation from SPIR-V to MSL. It’s still early days for Naga, and we want to optimize the SPIR-V control-flow graph processing, which can be seen in the numbers taking time. We assume SPIRV-Cross also has a lot of low-hanging fruits to optimize, and are looking forward to see its situation improving.

Previously, we heard multiple requests to allow MSL generation to happen off-line. We are hoping that the lightning fast translation times (1ms per stage) coupled with pipeline caching would resolve this need.

The quality and read-ability of generated MSL code in Naga is improving, but it’s still not at the level of SPIRV-Cross results. It also doesn’t have the same feature coverage. We are constantly adding new things in Naga, such as interpolation qualifiers, atomics, etc.

Finally, Naga is architectured for shader module re-use. It does a lot of work up-front, and can produce target-specific shaders quickly, so it works best when there are many pipelines created using fewer shader modules. Dota2’s ratio appears to be 2 pipelines per 1 shader module. We expect that applications using multiple entry points in SPIR-V modules, or creating more variations of pipeline states, would see even bigger gains.


Release of v0.8 gfx-rs nuts and bolts

gfx-rs nuts and bolts2021-04-30 00:00:00

gfx-rs community’s goal is to make graphics programming in Rust easy, fast, and reliable. The main projects are:

  • gfx-rs makes low-level GPU programming portable with low overhead. It’s a single Vulkan-like Rust API with multiple backends that implement it: Direct3D 12/11, Metal, Vulkan, and even OpenGL ES.
  • naga translates the shaders between languages, including WGSL. Also provides validation and processing facilities on the intermediate representation.
  • wgpu-rs is built on top of gfx-rs and gpu-alloc/gpu-descriptor. It provides safety, accessibility, and strong portability of applications.

Following the regular schedule of releasing once in a few month, we just rolled out 0.8 versions across gfx/wgpu projects! See gfx-rs changelist, wgpu changelist, and naga changelist for the details.

tree

Naga-based shader infrastructure has been growing and capturing more ground. It has reached an important point where SPIRV-Cross is not just optional on some platforms, but even not enabled by default. This is now the case for Metal and OpenGL backends. Naga path is easier to integrate, share types with, compile, and it’s much faster to run. Early benchmarks suggest about 2.5x perf improvement over SPIRV-Cross for us.

The work on HLSL and WGSL backends is underway. The former will allow us to deprecate SPIRV-Cross on Direct3D 12/11 and eventually remove this C dependency. The latter will help users port the existing shaders to WGSL.

Another big theme of the release is enhanced wgpu validation. The host API side is mostly covered, with occasional small holes discovered by testing. The shader side is now validating both statements and expressions. Programming shaders with wgpu starts getting closer to Rust than C: most of the time you fight the validator to pass, and then it just works, portably. The error messages are still a bit cryptic though, hopefully we’ll improve it in the next release. Hitting a driver panic/crash becomes rare, and we are working on eliminating these outcomes entirely. In addition, wgpu now knows when to zero-initialize buffers automatically, bringing the strong portability story a bit closer to reality.

We also integrated profiling into wgpu and gfx-backend-metal. The author was receptive to our needs and ideas, and we are very happy with the results so far. Gathering CPU performance profiles from your applications today can’t be any simpler:

profiling

In Naga internals, the main internal improvement was about establishing an association of expressions to statements. It allows backends to know exactly if expression results can be re-used, and when they need to be evaluated. Overall, the boundary between statements and expressions became well defined and easy to understand. We also converged to a model, at high level, where the intermediate representation is compact, but there is a bag of derived information. It is produced by the validator, and is required for backends to function. Finally, entry points are real functions now: they can accept parameters from the previous pipeline stages and return results.

Finally, we added a few experimental graphics features for wgpu on native-only:

  • Buffer descriptor indexing
  • Conservative rasterization

P.S. overall, we are in the middle of a grand project that builds the modern graphics infrastructure in pure Rust, and we appreciate anybody willing to join the fight!


Release of v0.7 gfx-rs nuts and bolts

gfx-rs nuts and bolts2021-02-02 00:00:00

gfx-rs community’s goal is to make graphics programming in Rust easy, fast, and reliable. It governs a wide range of projects:

  • gfx-rs makes low-level GPU programming portable with low overhead. It’s a single Vulkan-like Rust API with multiple backends that implement it: Direct3D 12/11, Metal, Vulkan, and even OpenGL.
  • naga translates the shaders between languages, including WGSL. Also provides validation and processing facilities on the intermediate representation.
  • wgpu-rs is built on top of gfx-rs and gfx-extras. It provides safety, accessibility, and even stronger portability of applications.
  • metal-rs and d3d12-rs wrap native graphics APIs on macOS and Windows 10 in Rust.

Today, we are happy to announce the release of 0.7 versions across gfx/wgpu projects!

gfx-hal-0.7

Overall theme of this release is simplification. We cut off a lot of experimental cruft that accumulated over the years, cleaned up the dependencies, and upgraded the API to be more modern.

For example, last release we made a step towards more generic bounds with ExactSizeIterator on our APIs. In this release, we are taking two steps back by removing not just ExactSizeIterator, but also Borrow from the iterator API. We figured a way to do the stack allocation without extra bounds, using inplace_it.

Having two distinct swapchain models has also come to an end. We removed the old Vulkan-like model, but also upgraded the new model to match “VK_KHR_imageless_framebuffer”, getting the best of both worlds. It maps to the backends even better than before, and we can expose it directly in gfx-portability now.

There is also a lot of API fixes and improvements, one particularly interesting one is aligning to Vulkan’s “external synchronization” requirements. This allows us to do less locking in the backends, making them more efficient.

Another highlight of the show is the OpenGL ES backend. It’s finally taking off based on EGL context and window system integration. There is still a lot of work to do on the logic, but the API is finally aligned to the rest of the backends (see 3.5 year old issue). We are targeting Linux/Android GLES3 and WebGL2 only.

See the full changelog for details.

wgpu-0.7

spaceship cheese

The list of libraries and applications has grown solidly since the last release. A lot of exciting projects and creative people joined our community.

Our goals were to bring the API closer to the stable point and improve validation. There is quite a bit of API changes, in particular with the pipeline descriptors and bind group layouts, but nothing architectural. We also got much nicer validation errors now, hopefully allowing users to iterate without always being confused :)

The highlight of wgpu work is support for WGSL shaders. It’s the emerging new shading language developed by WebGPU group, designed to be modern, safe, and writable by hands. Most of our examples are already using the new shaders, check them out! We are excited to finally be able to throw the C dependencies (spirv-cross, shaderc, etc) out of our projects, and build and deploy more easily.

See the core changelog and the rust API changelog for details.

naga-0.3

Naga has seen intensive development in all areas. SPIR-V frontend and backend, WGSL frontent, GLSL frontent and backend, intermediate layer, validation - all got a lot of improvements. It’s still not fully robust, but Naga has crossed the threshold of being actually usable, and we are taking advantage of it in wgpu-rs.

We experimented on the testing infrastructure and settled on cargo-insta. This boosted our ability to detect regressions, and allowed us to move forward more boldly.

The next steps for us are completing the validation, adding out-of-bounds checks, and replacing SPIRV-Cross completely in applications that have known shaders.

See the changelog for details.

P.S. overall, we are in the middle of a grand project that builds the modern graphics infrastructure in pure Rust, and we’d appreciate anybody willing to join the fight!