Last updated: 2024-03-28 11:46:57
Channels

Face cams: the missing guide fasterthanli.me

fasterthanli.me2024-03-02 18:00:00

I try to avoid doing "meta" / "behind the scenes" stuff, because I usually feel like it has to be "earned". How many YouTube channels are channels about making YouTube videos? Too many.


Thoughts on Testing Brandon's Website

Brandon's Website2024-01-04 00:00:00 Today I was thinking about tests.

Improved Multithreading in wgpu - Arcanization Lands on Trunk gfx-rs nuts and bolts

gfx-rs nuts and bolts2023-11-24 00:00:00

Arcanization is a large refactoring of wgpu’s internals aiming at reducing lock contention, and providing better performance when using wgpu on multiple threads. It was just merged into wgpu’s trunk branch and will be published as part of the 0.19 release scheduled for around January 17th.

A Long Journey

Before diving into the technical details, let’s have a quick look at the history of this project. The work started some time around mid 2021 with significant involvement from @pythonesque, @kvark and @cwfitzgerald. It went though multiple revisions, moving from one person to the next, until @gents83 picked it up and opened a pull request on March 30th 2023.

Fast-forward November 20th, after countless rebases, revisions and fixes by @gents83 spanning nearly 8 months, the pull request is finally merged! They tirelessly maintained this big and complex refactoring, all while the project was constantly changing and improving underneath them!

The Problem

wgpu internally stores all resources (buffers, textures, bind groups, etc.) in big contiguous arrays held by what we call the Hub.

Most of the data stored in these arrays is immutable. Once created, it never changes until the resource is destroyed. Inside and outside wgpu, the resources referred to by Ids which boil down to indices in the resource arrays with metadata.

A simplified diagram showing the Hub and resource arrays

This should play well with parallel access of the data from multiple threads, right? Unfortunately adding and removing resources requires mutable access to these resource arrays. Which meant adding locks. Locks when adding or removing items, but also locks while reading from the data they contain. Locks everywhere, and locks that have to be held for a non-negligible duration. This caused a lot of lock contention and poor performance when wgpu is used on multiple threads.

Interestingly, wgpu also had to maintain internal reference counts to resources, to keep track of the dependencies between them (for example a bind group depends on the bindings it refers to). This reference counting was carried out manually, and rather error-prone.

The solution

“Arcanization”, as it names implies, was the process of moving resources behind atomic reference counted pointers (Arc<T>). Today the Hub still holds resource arrays, however these contain Arcs instead of the data directly. This lets us hold the locks for much shorter times - in a lot of cases only while cloning the arc - which can then be read from safely outside of the critical section. In addition, some areas of the code don’t need to hold locks once the reference has been extracted.

A simplified diagram showing resources stored via Arcs

The result is much lower lock contention. If you use wgpu from multiple threads, this should significantly improve performance. Our friends in the bevy engine community noted that some very early testing (on an older revision of arcanization) showed that with arcanization, the encoding of shadow-related commands can run in parallel with the main passes, yielding 45% frame time reduction on a test scene (the famous bistro scene) compared to their single threaded configuration. Without arcanization, lock contention is too high to significantly improve performance.

In addition, wgpu’s internals are now simpler. This change lifted some restrictions and opens the door for further performance and ergonomics improvements.

wgpu 0.19

The next release featuring this work will be 0.19.0 which we expect to publish around January 17th. We made sure to merge the changes early in the release cycle to give ourselves as much time as we can to catch potential regressions.

This is an absolutely massive change and while we have and are testing as best we can, we do need help from everyone else. Please try updating your project to the latest wgpu and running it. Please report any issues you find!

What’s next?

Lifting RenderPass<'a> lifetime restrictions

If you have used wgpu, there is decent chance that you have had to work around the restrictions imposed by the 'rpass lifetime in a lot of RenderPass’s methods, such as set_bind_group, set_pipeline, and, set_vertex_buffer. The recent changes give us the opportunity to store Arcs where &'a references were previously needed which should let us remove these lifetime restrictions.

Internal improvements

There is ongoing work to ensure that buffer, textures, and devices can be destroyed safely while their handles are still alive. This is important for Firefox which uses wgpu_core as the basis for its WebGPU implementation. In the garbage-collected environment of javascript, the deallocation of resources is non-deterministic and can happen a long time after the program is done using the resources. While this in itself does not require arcanization, it gives us a better foundation to improve upon internal resource lifetime management.

Reference counting at the API level

So resources like buffers and textures are now internally reference counted, but the handles wgpu exposes are not. Could we potentially expose the reference counted resources more directly, avoiding going through the Hub? Most likely yes. That would be another fairly large project with important implications to wgpu_core’s recording infrastructure and how it integrates in Firefox. It won’t happen overnight, but that’s certainly something the wgpu maintainers would like to move towards.

Closing words

Changes of this scope and complexity take tremendous effort to realize, and take orders of magnitude more effort to push over the finish line. @gents83’s achievement here is truly outstanding. He poured an endless amount of time, effort, and patience into this work, which we now all benefit from, and deserves equally endless amounts of recognition for it.

Thanks @gents83!


Cursed Rust: Printing Things The Wrong Way Matthias Endler

Matthias Endler2023-11-01 00:00:00

There is a famous story about a physicist during an exam at the University of Copenhagen. The candidate was asked to describe how to determine a skyscraper's height using a barometer. The student suggested dangling the barometer from the building's roof using a string and then measuring the length of the string plus the barometer's height. Although technically correct, the examiners were not amused.

After a complaint and a reevaluation, the student offered various physics-based solutions, ranging from dropping the barometer and calculating the building’s height using the time of fall, to using the proportion between the lengths of the building's shadow and that of the barometer to calculate the building's height from the height of the barometer. He even humorously suggested simply asking the caretaker in exchange for the barometer.

The physicist, as the legend goes, was Niels Bohr, who went on to receive a Nobel Prize in 1922. This story is also known as the barometer question.

Why Is This Story Interesting?

The question and its possible answers have an important didactic side effect: they convey to the learner that one can also get to the solution with unconventional methods — and that these methods are often more interesting than the canonical solution because they reveal something about the problem itself.

There is virtue in learning from unconventional answers to conventional questions. To some extent, this fosters new ways of thinking and problem-solving, which is an essential part of innovation.

Applying The Same Principle To Learning Rust

One of the first examples in any book on learning Rust is the "Hello, world!" program.

fn main() {
    println!("Hello, world!");
}

It's an easy way to test that your Rust installation is working correctly.

However, we can also have some fun and turn the task on its head: let's find ways to print "Hello, world!" without using println!.

Let's try to come up with as many unconventional solutions as possible. The weirder, the better! As you go through each of the solutions below, try to understand why they work and what you can learn from them.

This started as a meme, but I decided to turn it into a full article after the post got a lot of attention.

It goes without saying that you should never use any of these solutions in production code. Check out this enterprise-ready version of hello world instead.

Solution 1: Desugaring println!

use std::io::Write;

write!(std::io::stdout().lock(), "Hello, world!");

This solution is interesting, because it shows that println! is just a macro that expands to a call to write! with a newline character appended to the string.

The real code is much weirder. Search for print in this file if you want to be amazed. write! itself desugars to a call to write_fmt, which is a method of the Write trait.

There is a real-world use case for this: if you want to print things really fast, you can lock stdout once and then use write!. This avoids the overhead of locking stdout for each call to println!. See this article on how to write a very fast version of yes with this trick.

Solution 2: Iterating Over Characters

"Hello, world!".chars().for_each(|c| print!("{}", c));

This shows that you can implement println! using Rust's powerful iterators. Here we iterate over the characters of the string and print each one of them.

chars() returns an iterator over Unicode scalar values.

Learn more about iterators here.

Solution 3: Impl Display

struct HelloWorld;

impl std::fmt::Display for HelloWorld {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(f, "Hello, world!")
    }
}

println!("{HelloWorld}");

This teaches us a little bit about how traits work in Rust: We define a struct that implements the Display trait, which allows us to print it using print!. In general, Display is intended to make more complex types printable, but it is also possible to implement it for a hardcoded string!

Solution 4: Who Needs Display?

How about we create our own trait instead of using Display?

trait Println {
    fn println(&self);
}

impl Println for &str {
    fn println(&self) {
        print!("{}", self);
    }
}

"Hello, world!".println();

We can exploit the fact that we can name our trait methods however we want. In this example, we choose println, making it look like it is part of the standard library.

This completely turns the println! macro on its head. Instead of passing a string as an argument, we call a method on the string itself!

Solution 5: Who Needs println! When You Got panic!?

panic!("Hello, world!");

There are other ways to print things in Rust than using println!. In this case, we use panic!, which prints the string (as a side-effect) and immediately terminates the program. It works as long as we only want to print a single string...

Solution 6: I ♥︎️ Closures

(|s: &str| print!("{}", s))("hello");

Rust allows you to call a closure directly after its definition. The closure is defined as an anonymous function that takes a string slice as an argument and prints it. The string slice is passed as an argument to the closure.

In practice, this can be useful for defining a closure that is only used once and for which you don't want to come up with a name.

Solution 7: C Style

extern crate libc;
use libc::{c_char, c_int};
use core::ffi::CStr;

extern "C" {
    fn printf(fmt: *const c_char, ...) -> c_int;
}

fn main() {
    const HI: &CStr = match CStr::from_bytes_until_nul(b"hello\n\0") {
        Ok(x) => x,
        Err(_) => panic!(),
    };

    unsafe {
        printf(HI.as_ptr());
    }
}

You don't even need to use Rust's standard library to print things! This example shows how to call the C standard library's printf function from Rust. It's unsafe because we are using a raw pointer to pass the string to the function. This teaches us a little bit about how FFI works in Rust.

Credit goes to /u/pinespear on Reddit and @brk@infosec.exchange.

Solution 8: C++ Style

We're well into psychopath territory now... so let's not stop here. If you try extremely hard, you can bend Rust to your will and make it look like C++.

use std::fmt::Display;
use std::ops::Shl;

#[allow(non_camel_case_types)]
struct cout;
#[allow(non_camel_case_types)]
struct endl;

impl<T: Display> Shl<T> for cout {
    type Output = cout;
    fn shl(self, data: T) -> Self::Output {
        print!("{}", data);
        cout
    }
}
impl Shl<endl> for cout {
    type Output = ();
    fn shl(self, _: endl) -> Self::Output {
        println!("");
    }
}

cout << "Hello World" << endl;

The Shl trait is used to implement the << operator. The cout struct implements Shl for any type that implements Display, which allows us to print any printable type. The endl struct implements Shl for cout, which prints the newline character in the end.

Credit goes to Wisha Wanichwecharungruang for this solution.

Solution 9: Unadulterated Control With Assembly

All of these high-level abstractions stand in the way of printing things efficiently. We have to take back control of your CPU. Assembly is the way. No more wasted cycles. No hidden instructions. Pure, unadulterated performance.

use std::arch::asm;

const SYS_WRITE: usize = 1;
const STDOUT: usize = 1;

fn main() {
    #[cfg(not(target_arch = "x86_64"))]
    panic!("This only works on x86_64 machines!");

    let phrase = "Hello, world!";
    let bytes_written: usize;
    unsafe {
        asm! {
            "syscall",
            inout("rax") SYS_WRITE => bytes_written,
            inout("rdi") STDOUT => _,
            in("rsi") phrase.as_ptr(),
            in("rdx") phrase.len(),
            // syscall clobbers these
            out("rcx") _,
            out("r11") _,
        }
    }

    assert_eq!(bytes_written, phrase.len());
}

(Rust Playground)

If you're wondering why we use Rust in the first place if all we do is call assembly code, you're missing the point! This is about way more than just printing things. It is about freedom! Don't tell me how I should use my CPU.

Okaaay, it only works on x86_64 machines, but that's a small sacrifice to make for freedom.

Submitted by isaacthefallenapple.

Solution 10: "Blazing Fast"

Why did we pay a premium for all those CPU cores if we aren't actually using them? Wasn't fearless concurrency one of Rust's promises? Let's put those cores to good use!

use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Duration;

fn main() {
    let phrase = "hello world";
    let phrase = Arc::new(Mutex::new(phrase.chars().collect::<Vec<_>>()));

    let mut handles = vec![];

    for i in 0..phrase.lock().unwrap().len() {
        let phrase = Arc::clone(&phrase);
        let handle = thread::spawn(move || {
            thread::sleep(Duration::from_millis(((i + 1) * 100) as u64));
            print!("{}", phrase.lock().unwrap()[i]);
        });
        handles.push(handle);
    }

    for handle in handles {
        handle.join().unwrap();
    }
    println!();
}

Here, each character is printed in a separate thread. The threads are spawned in a loop, and each thread sleeps for a certain amount of milliseconds before printing its character. This uses the full power of your CPU to print a string! It might not always consistently print the characters in the right order (hey, scheduling is hard!), but that's a worthwhile trade-off for all the raw performance gains.

Your Turn!

If you've got more solutions, please send me a message.

Also, if you liked this article, you might also enjoy the yearly obfuscated C code contest. Check out the previous winners here.

If you were actually more intrigued by the barometer story, read Surely You're Joking, Mr. Feynman!, a book by Richard Feynman, another famous physicist and Nobel Prize winner, who was known for his unconventional way of thinking.

We should all strive to think outside the box and come up with unconventional solutions to problems. Who knows, maybe that's the key to a deeper understanding of the problem itself?


Deploy Rust Code Faster Matthias Endler

Matthias Endler2023-10-25 00:00:00

I've come a long way in my tech journey, from dealing with bare metal servers to exploring the world of cloud computing. Initially, it seemed so straightforward – spin up a server, deploy a container, and you're done. But as I delved deeper, I realized that the ease of infrastructure is not as simple as it appears.

Cloud providers offer a multitude of tools, each with its own learning curve:

  • Google Cloud / AWS
  • Kubernetes
  • Helm
  • Docker
  • Terraform
  • GitHub Actions

If you're adventurous, you might even venture into managed Kubernetes services like EKS or GKE. It's tempting, with just a few clicks, your application is ready to roll. But the reality hits when you start juggling monitoring, logging, security, scaling, and more.

Soon, you find yourself unintentionally leading a DevOps team instead of focusing on your product. You hire more staff to manage infrastructure while your competitors are shipping features and growing their user base.

My Frustration

The cloud promised to make infrastructure easy, but the array of tools and services can be overwhelming. Even if you don't use them all, you must be aware of their existence and learn the basics. The result? Your focus on the product diminishes.

I appreciate dealing with infrastructure, but I also love delivering a product. Sadly, many companies waste precious time and money on infrastructure, repeating the same mistakes.

What if there was a way to eliminate infrastructure concerns altogether?

The Allure of Serverless

Serverless architecture seems promising - no servers, no containers, just pure business logic. However, it's not without challenges:

  • Cold start times
  • Lambda size limitations
  • Memory issues
  • Long-running processes
  • Debugging complexities
  • Lack of local testing

Serverless has its merits for certain use cases, but for larger applications, you might still need some servers.

Platform-As-A-Service (PaaS)

Platforms like Heroku and Netlify introduced a third option – managed services that handle all infrastructure for you. No more infrastructure concerns; you simply push code, and it deploys. What's great about these solutions is their deep integration with specific programming language ecosystems.

I was looking for a platform tailored for Rust developers, aiming to provide a top-notch developer experience. I wanted deep integration with the Rust ecosystem (serde, sqlx, axum,...).

A while ago, I came across Shuttle while trying to find ways to make my Rust development workflow a bit smoother. It’s a tool that kind of just fits into the existing Rust ecosystem, letting you use cargo as you normally would, but with some of the infrastructural heavy lifting taken out of the picture. Now, it’s not a magic wand that solves all problems, but what I appreciate about Shuttle is its simplicity. You’re not thrown into a completely new environment with a steep learning curve. Instead, you stick to your Rust code, and Shuttle is there in the background, helping manage some of the server-side complexities. So, in essence, it’s about sticking to what you know, while maybe making life a tad easier when it comes to deployment and server management. It’s not about a revolutionary change in how you code, but more about a subtle shift in managing the background processes that can sometimes be a bit of a headache.

My Shuttle Experience So Far

Until now, I built two smaller Rust services with Shuttle: Zerocal and Readable.

Shuttle takes your Rust code and with very few annotations, it can be deployed to the cloud. The developer experience is pretty close to ideal given that provisioning and deployment are usually the most painful parts of building a service.

Instead, it's just a matter of adding a few lines of code. See for yourself. The boilerplate just vanishes. What's left is the business logic.

Zerocal - Stateless Calendar Magic

Zerocal was the first project I deployed on Shuttle. The principle was very simple yet innovative: encode calendar data directly into a URL. This means creating an event was as straightforward as:

curl https://zerocal.shuttleapp.rs?start=2023-11-04+20:00&duration=3h&title=Birthday&description=paaarty

This would return an iCal file, that you can add to your calendar. Here's how you create an event in the browser:

I tried building this project on Shuttle when they were still fixing some things and changing their APIs here and there. Even with these small issues, it was a good experience. In just a few minutes, my app was up and running.

Here’s the code to start the service including the axum routes:

#[shuttle_runtime::main]
async fn axum() -> shuttle_axum::ShuttleAxum {
    // just normal axum routes
	let router = Router::new()
    		.route("/", get(calendar))
    		.route("/", post(calendar));

	Ok(router.into())
}

I don’t really need Zerocal for myself anymore, so I’m hoping someone else might want to take it over. I think it could be really useful for sharing invites on places like GitHub or Discord. If you want to know more about Zerocal, you can read this detailed breakdown.

I would also like to mention that someone else built a similar project inspired by Zerocal: kiwi by Mahesh Sundaram, written in Deno. This is a really cool outcome.

A Reader Mode For My E-Reader

My appreciation for Firefox's reader view sparked the creation of a Reader Mode Proxy for a minimalist, JavaScript-free web reading experience, particularly tailored for e-readers. The intention was to transform verbose websites into a more digestible format for distraction-free reading.

This project deeply reflected my personal preferences, as I like simple apps that solve a problem. With just a sprinkle of annotations, my code adapted smoothly to Shuttle's environment. Initially, I had my own local mode, which allowed me to run the app on my machine for testing, but I found no need to maintain that because Shuttle’s own local mode works just as well.

While developing the app, there were some bumps along the road. Service downtimes required some code revamping. Yet, Shuttle's evolution simplified parts of my process, especially when it introduced native static file handling.

Before it looked like this:

#[shuttle_runtime::main]
async fn axum() -> shuttle_axum::ShuttleAxum {
	let router = Router::new()
        // Previously, I needed to manually serve static files
    	.route(
        	"/static/Crimson.woff2",
        	get(|| async {
            	static_content(
                	include_bytes!("../static/fonts/Crimson.woff2",),
                	HeaderValue::from_static("text/woff2"),
            	)
        	}),
    	)
    	.route(
        	"/static/JetBrainsMono.woff2",
        	get(|| async {
            	static_content(
                	include_bytes!("../static/fonts/JetBrainsMono.woff2",),
                	HeaderValue::from_static("font/woff2"),
            	)
        	}),
    	)
    	.fallback(readable);

	Ok(router.into())
}

Now it’s just

#[shuttle_runtime::main]
async fn axum() -> shuttle_axum::ShuttleAxum {
   let router = Router::new()
    	.nest_service("/static", ServeDir::new(PathBuf::from("static")))
    	.fallback(readable);
	Ok(router.into())
}

To understand the intricacies of this project, here's a more comprehensive look.

Control and Safety

Initially, I was concerned that annotating my code for infrastructure would cause vendor lock-in. I wanted to retain full control over my project. Want to move away? The Shuttle macros get rid of the boilerplate, so I could just remove the 2 annotations I’ve added and get the original code back. Shuttle's code is also open source, so I could even set up your self-hosted instance — although I wouldn't want to.

The True Cost of DIY Infrastructure

Infrastructure may seem easy on the surface, but maintaining it involves various complexities and costs. Updates, deployments, availability – it can be overwhelming. Each hour spent on these tasks carries both a direct and opportunity cost.

Infrastructure can be a maze, and Shuttle seems to fit well for those working with Rust. I'm thinking of trying out a larger project on Shuttle soon, now that I've got a decent understanding of what Shuttle can and can't do. If you’re considering giving it a shot, it's wise to check their pricing to ensure it aligns with your needs.

Be mindful of the real cost of infrastructure!

As I've mentioned before, it's not just server costs, but a lot more. The biggest factor will probably be human labor for maintenance and debugging infrastructure and that is expensive. If I were to use infrastructure as code, I'd be spending many hours setting up my infrastructure and a lot more to maintain it and that can be expensive, given today's salaries.

Even if it was just for a hobby project, it would not be worth the trouble for me. I’d much rather work on features than the code that runs it all.


Just paying Figma $15/month because nothing else fucking works fasterthanli.me

fasterthanli.me2023-10-19 16:50:00

My family wasn't poor by any stretch of the imagination, but I was raised to avoid spending money whenever possible.

I was also taught "it's a poor craftsman that blames their tools", which apparently means "take responsibility for your fuckups", but, to young-me, definitely sounded more like "you don't deserve nice things".


Cracking Electron apps open fasterthanli.me

fasterthanli.me2023-07-03 16:30:00

I use the draw.io desktop app to make diagrams for my website. I run it on an actual desktop, like Windows or macOS, but the asset pipeline that converts .drawio files, to .pdf, to .svg, and then to .svg again (but smaller) runs on Linux.


The RustConf Keynote Fiasco, explained fasterthanli.me

fasterthanli.me2023-05-31 21:00:00

Disclosure: At some point in this article, I discuss The Rust Foundation. I have received a $5000 grant from them in 2023 for making educational articles and videos about Rust.

I have NOT signed any non-disclosure, non-disparagement, or any other sort of agreement that would prevent me from saying exactly how I feel about their track record.


Rust: The wrong people are resigning fasterthanli.me

fasterthanli.me2023-05-28 17:04:00

(Note: this was originally posted as a gist)

Reassuring myself about Rust


Extra credit fasterthanli.me

fasterthanli.me2023-03-05 07:30:12

We've achieved our goals already with this series: we have a web service written in Rust, built into a Docker image with nix, with a nice dev shell, that we can deploy to fly.io.

But there's always room for improvement, and so I wanted to talk about a few things we didn't bother doing in the previous chapters.

Making clash-geoip available in the dev shell


Generating a docker image with nix fasterthanli.me

fasterthanli.me2023-03-05 07:30:11

There it is. The final installment.

Over the course of this series, we've built a very useful Rust web service that shows us colored ASCII art cats, and we've packaged it with docker, and deployed it to https://fly.io.


Making a dev shell with nix flakes fasterthanli.me

fasterthanli.me2023-03-05 07:30:10

In the previous chapter, we've made a nix "dev shell" that contained the fly.io command-line utility, "flyctl".

That said, that's not how I want us to define a dev shell.

Our current solution has issues. I don't like that it has import <nixpkgs>. Which version of nixpkgs is that? The one you're on? Who knows what that is.


Learning Nix from the bottom up fasterthanli.me

fasterthanli.me2023-03-05 07:30:09

Remember the snapshot we made allll the way back in Part 1? Now's the time to use it.

Well, make sure you've committed and pushed all your changes, but when you're ready, let's go back in time to before we installed anything catscii-specific in our VM.


Doing geo-location and keeping analytics fasterthanli.me

fasterthanli.me2023-03-05 07:30:08

I sold you on some additional functionality for catscii last chapter, and we got caught up in private registry / docker shenanigans, so, now, let's resume web development as promised.

Adding geolocation


Using the Shipyard private crate registry with Docker fasterthanli.me

fasterthanli.me2023-03-05 07:30:07

Wait wait wait, so we're not talking about nix yet?

Well, no! The service we have is pretty simple, and I want to complicate things a bit, to show how things would work in both the Dockerfile and the nix scenario.


Deploying catscii to fly.io fasterthanli.me

fasterthanli.me2023-03-05 07:30:06

Disclosure: Because I used to work for fly.io, I still benefit from an employee discount at the time of this writing: I don't have to pay for anything deployed there for now.

fly.io is still sponsoring me for developing hring, but this isn't a sponsored post. It's just a good fit for what we're doing here, with a generous free tier.


Writing a Dockerfile for catscii fasterthanli.me

fasterthanli.me2023-03-05 07:30:05

Now that our service is production-ready, it's time to deploy it somewhere.

There's a lot of ways to approach this: what we are going to do, though, is build a docker image. Or, I should say, an OCI image.


Serving ASCII cats over HTTP fasterthanli.me

fasterthanli.me2023-03-05 07:30:04

Our catscii program does everything we want it to do, except that it's a command-line application rather than a web server. Let's fix that.

Enter axum


Printing ASCII cats to the terminal fasterthanli.me

fasterthanli.me2023-03-05 07:30:03

Now that our development environment is all set up, let's make something useful!

Creating the catscii crate


Developing over SSH fasterthanli.me

fasterthanli.me2023-03-05 07:30:02

With the previous part's VM still running, let's try connecting to our machine over SSH.

Network addresses, loopback and IP nets


Setting up a local Ubuntu Server VM fasterthanli.me

fasterthanli.me2023-03-05 07:30:01

The first step to using Nix to build Rust is to do so without Nix, so that when we finally do, we can feel the difference.

There's many ways to go about this: everyone has their favorite code editor, base Linux distribution (there's even a NixOS distribution, which I won't cover). Some folks like to develop on macOS first, and then build for Linux.


Rust's BufRead, And When To Use It Brandon's Website

Brandon's Website2023-02-28 00:00:00 Rust is a low-level language, and its standard library is careful to give the programmer lots of control over how things will behave and avoid implicit behavior, especially when that behavior impacts performance. But at the same time, it doesn't want to make the programmer's life harder than it needs to be. As a result, Rust's language features and standard library often give you access to really low-level concepts with no assumptions baked in, but then also give you abstractions you can optionally layer on top.

The bottom emoji breaks rust-analyzer fasterthanli.me

fasterthanli.me2023-02-13 14:20:00

Some bugs are merely fun. Others are simply delicious!

Today's pick is the latter.

Reproducing the issue, part 1


Day 18 (Advent of Code 2022) fasterthanli.me

fasterthanli.me2023-01-12 14:00:00

This time around, we're porting a solution from C++ to Rust and seeing how it feels, how it performs, and what we can learn about both languages by doing that.

See Day 17 for the rationale re: porting solutions rather than writing my own from scratch. TL;DR is: it's better than nothing, and we can still focus about learning Rust rather than spending entire days fighting off-by-one errors.


Twitch fell behind fasterthanli.me

fasterthanli.me2023-01-12 13:00:00

So you want to do live streams. Are you sure? Okay. Let's talk about it.

Let's talk numbers


Day 17 (Advent of Code 2022) fasterthanli.me

fasterthanli.me2023-01-11 15:00:00

Advent of Code gets harder and harder, and I'm not getting any smarter. Or any more free time. So, in order to close out this series anyway, I'm going to try and port other people's solutions from "language X" to Rust. That way, they already figured out the hard stuff, and we can just focus on the Rust bits!


Little Helpers Matthias Endler

Matthias Endler2023-01-05 00:00:00

Yesterday I couldn't help but feel a sense of awe at all the conveniences modern life has to offer.

A lot of the chores in our household are taken care of by little helpers: The dishwasher washes the dishes, the washing machine washes the clothes, and the robot vacuum cleaner cleans the floors. The refrigerator keeps our food cold, the microwave heats it up, and the oven cooks it.

We take all of this for granted because the devices rarely fail, but it's really amazing when you think about it. It's only been a few decades since much of this was tedious, time-consuming, manual labor.

I heard stories about how people used to watch the washing machine do its thing, just because it was entertaining to see the machine do their work for them.

Growing up in the 90s and early 2000s, I remember when "smart home" was a buzzword, and now it's a reality. Smart devices control the thermostat and soon the lights and the door locks in our apartment.

Of course there were a bunch of stupid ideas that didn't work out along the way. I remember when they tried to sell those "smart" fridges that would run a web browser and let you order groceries from the fridge. Who would want to do that? It's so much easier to just order groceries online from your phone or computer. On the other hand, of all the people I talked to, I've never met anyone who regrets buying a vacuum robot.

We recently got a cat and quickly automated all the tedious stuff. The litter box cleans itself, there's a water fountain that keeps the water fresh, and soon we'll get a food dispenser. That means we have more time to focus on the fun stuff, like playing with the cat.

And yes, I fully realize that this convenience comes from an incredible position of privilege. A privileged position that we should never take for granted! Instead, we should be grateful for the little helpers that make our lives easier and make them more accessible to everyone.


Day 16 (Advent of Code 2022) fasterthanli.me

fasterthanli.me2022-12-29 15:00:00

Let's tackle the day 16 puzzle!

Parsing

The input looks like this:

Valve AA has flow rate=0; tunnels lead to valves DD, II, BB
Valve BB has flow rate=13; tunnels lead to valves CC, AA
Valve CC has flow rate=2; tunnels lead to valves DD, BB
Valve DD has flow rate=20; tunnels lead to valves CC, AA, EE
Valve EE has flow rate=3; tunnels lead to valves FF, DD
Valve FF has flow rate=0; tunnels lead to valves EE, GG
Valve GG has flow rate=0; tunnels lead to valves FF, HH
Valve HH has flow rate=22; tunnel leads to valve GG
Valve II has flow rate=0; tunnels lead to valves AA, JJ
Valve JJ has flow rate=21; tunnel leads to valve II

Day 15 (Advent of Code 2022) fasterthanli.me

fasterthanli.me2022-12-25 17:00:00

The day 15 puzzle falls into the "math puzzle" territory more than "let's learn something new about Rust", but since several folks asked if I was going to continue... let's continue.


Day 14 (Advent of Code 2022) fasterthanli.me

fasterthanli.me2022-12-15 19:00:00

I like how the day 14 puzzle sounds, because I think it'll give me an opportunity to show off yet another way to have Rust embedded in a web page.

But first...

Let me guess: parsing?


Day 13 (Advent of Code 2022) fasterthanli.me

fasterthanli.me2022-12-14 20:30:00

The day 13 puzzle needs a speech therapist.

???

...because it has an awful lisp!! Ahhhahahahhhh

Are you ok? What is.. what is going on with you?

No but seriously we have what are ostensibly S-expressions, except they use JSON-adjacent notation:

[1,1,3,1,1]
[1,1,5,1,1]

[[1],[2,3,4]]
[[1],4]

[9]
[[8,7,6]]

[[4,4],4,4]
[[4,4],4,4,4]

[7,7,7,7]
[7,7,7]

[]
[3]

[[[]]]
[[]]

[1,[2,[3,[4,[5,6,7]]]],8,9]
[1,[2,[3,[4,[5,6,0]]]],8,9]

A Reader Mode Proxy for the Slow Web Matthias Endler

Matthias Endler2022-11-03 00:00:00
Reader showing an article in light and dark mode.
Reader showing an article in light and dark mode.

tl;dr: I built a service that takes any article and creates a pleasant-to-read, printable version. It is similar to Reader View in Firefox/Safari, but also works on older browsers, can be shared and has a focus on beautiful typography. Try it here.

The web used to be such a fun place.

Nowadays? Meh. Trackers, ads, bloat, fullscreen popups, autoplaying videos... it's all so exhausting.

I just want to read long-form posts without distractions with a good cup of tea, the cat sleeping on the windowsill and some light snow falling in front of the window.

The Slow Web

I'm a big fan of the Slow Web movement and of little sites that do one thing well.

For reading long-form text clutter-free I use Reader View in Firefox, and while it doesn't always work and it's not the prettiest I like it.

There are reader modes in other browsers as well, but some of them — like Chrome — hide it behind a feature flag. Other browsers, like the one on my eBook reader, don't come with a reader mode at all, which leaves me with a subpar and slow browsing experience on my main device used for reading.

So I built a reader mode as a service with a focus on beautiful typography which works across all browsers. It's very basic, but I use it to read articles on my older devices and it could also make content more accessible in regions with low bandwidth or while travelling.

Building It

Lately I saw a post about circumflex, a Hacker News terminal client. The tool did a solid job at rendering website content and I wondered if I can retrofit that into a proxy server.

The Golang cleanup code is here:

func GetArticle(url string, title string, width int, indentationSymbol string) (string, error) {
    articleInRawHTML, httpErr := readability.FromURL(url, 5*time.Second)
    if httpErr != nil {
        return "", fmt.Errorf("could not fetch url: %w", httpErr)
    }
    // ...
}

They use go-readability, a port of Mozilla's Readability. The Rust equivalent is readability and it's simple enough to use:

use readability::extractor;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let response = extractor::scrape("https://endler.dev/2022/readable")?;
    println!("{}", response.content);
    Ok(())
}

Before we write a full proxy server, let's write a simple CLI tool that takes a URL and outputs a clean, readable HTML file.

use readability::extractor;
use std::fs::File;
use std::io::Write;

fn main() -> Result<(), Box<dyn std::error::Error>> {
	// read the URL from the command line
	let url = std::env::args().nth(1).expect("Please provide a URL");

	let response = extractor::scrape(&url)?;
	let mut file = File::create("index.html")?;
	file.write_all(response.content.as_bytes())?;
	Ok(())
}

The output already looked surprisingly good. Next I added a simple HTML template to wrap the response content.

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <title>Document</title>
    <link rel="stylesheet" href="yue.css" />
    <style type="text/css">
      body {
        margin: 0;
        padding: 0.4em 1em 6em;
        background: #fff;
      }
      .yue {
        max-width: 650px;
        margin: 0 auto;
      }
    </style>
  </head>
  <body>
    <div class="yue">{{content}}</div>
  </body>
</html>

No need to use a full-blown template engine for now; we can just use str::replace to replace the {{content}} placeholder with the actual content. 😉

Proxy Setup

The proxy setup is super simple with shuttle. It's my second project after zerocal, which is hosted on shuttle and I'm very happy with how smooth the process is. 🚀 Let's call the app readable:

cargo shuttle init --axum --name readable

This creates a small Axum app with a simple hello world route.

Roadblock No. 1: reqwest

When I integrated the readability crate into the project I hit a minor roadblock.

I used extractor::scrape just like above and the proxy started locally. However when I wanted to fetch a website from the proxy, I got an error:

thread 'tokio-runtime-worker' panicked at
'Cannot drop a runtime in a context where blocking is not allowed.
This happens when a runtime is dropped from
within an asynchronous context.'

This meant that I started a runtime inside a runtime.

After checking the source code of the readability crate, I found that it builds a reqwest::blocking::Client and uses that to fetch the URL. After that request, the client is dropped which causes the runtime to be shut down.

I fixed this by using a reqwest::Client instead of the reqwest::blocking::Client.

// reqwest::blocking::Client
let client = reqwest::blocking::Client::new();

// reqwest::Client
let client = reqwest::Client::new();

Now I had the content of the article, but I still needed to pass it to readability. Fortunately they provide a function named extractor::extract that takes something that implements Read and returns the extracted content.

However, the reqwest::Response doesn't implement Read (in contrast to the reqwest::blocking::Response). So I needed to convert it to a Readable type myself.

Luckily, the reqwest::Response has a bytes method that returns a Bytes object. The Bytes object implements Read and I can use it to call extractor::extract.

let body = client.get(&url).await?.text().await?;
let bytes = body.bytes().await?;
let response = extractor::extract(&mut res, &url)?;

Roadblock No. 2: Routing

The app didn't crash anymore, but I still didn't get any response.

My router looked like this:

#[shuttle_service::main]
async fn axum() -> shuttle_service::ShuttleAxum {
    let router = Router::new().route("/:url", get(readable));
    let sync_wrapper = SyncWrapper::new(router);

    Ok(sync_wrapper)
}

Turns out that when I use /:url as the route, it doesn't match the path /https://example.com because : matches only a single segment up to the first slash.

The solution was to use /*url instead, which is a wildcard route that matches all segments until the end.

Typography and Layout

New York Times website (left) vs reader mode (right)
New York Times website (left) vs reader mode (right)

For my first prototype I used a CSS framework called yue.css because it was the first thing I found which looked nice.

For the final version I ended up mimicking the style of Ruud van Asseldonk's blog because it always reminded me of reading a well-typeset book.

For fonts I chose two of my favorites

Both are licensed under the SIL Open Font License 1.1.

You can even use readable from the terminal.

lynx https://readable.shuttleapp.rs/https://en.wikipedia.org/wiki/Alan_Turing

Caveats

The proxy is far from perfect. It's something I built in a few hours for my personal use.

  • It doesn't always produce valid HTML.
  • JavaScript is not executed, so some websites don't work properly. Some might say that's feature, not a bug. 😉
  • That is also true for websites with sophisticated paywalls or bot-detection. A workaround would be to use a headless browser like ScrapingBee or Browserless, but I didn't want to add that complexity to the project.
  • The readability library takes a lot of freedom in formatting the document however it pleases. It can sometimes produce weird results. For example, it loves to mangle code blocks.

Credits

I was not the first person to build a readability proxy. I found out about readable-proxy when I did my research, but the project seems to be abandoned. Nevertheless it was nice to see that others had the same need.

Thanks to Ruud van Asseldonk for open sourcing his blog. 🙏 His writing and documentation are always a great source of inspiration to me.

Conclusion

The browser on my old Kobo eBook reader using the readability proxy.
The browser on my old Kobo eBook reader using the readability proxy.

In times where the most popular browser might kill off ad blockers, a little service for reading articles without ads or tracking can come in handy. I'm not saying you should use it to send all your traffic through it, but it's a nice tool to have in your toolbox for a rainy day, a warm drink and a great article. ☕

Feel free to deploy your own instance of readable or use the one I'm hosting. The source code is available on GitHub. Maybe one of you wants to help me maintain it.


zerocal - A Serverless Calendar App in Rust Running on shuttle.rs Matthias Endler

Matthias Endler2022-10-05 00:00:00

Every once in a while my buddies and I meet for dinner. I value these evenings, but the worst part is scheduling these events!

We send out a message to the group.
We wait for a response.
We decide on a date.
Someone sends out a calendar invite.
Things finally happen.

None of that is fun except for the dinner.

Being the reasonable person you are, you would think: "Why don't you just use a scheduling app?".

I have tried many of them. None of them are any good. They are all... too much!

Just let me send out an invite and whoever wants can show up.

  • I don’t want to have to create an account for your calendar/scheduling/whatever app.
  • I don’t want to have to add my friends.
  • I don’t want to have to add my friends’ friends.
  • I don’t want to have to add my friends’ friends’ friends.
  • You get the idea: I just want to send out an invite and get no response from you.

The nerdy, introvert engineer's solution

💡 What we definitely need is yet another calendar app which allows us to create events and send out an invite with a link to that event! You probably didn't see that coming now, did you?

Oh, and I don't want to use Google Calendar to create the event because I don't trust them.

Like any reasonable person, I wanted a way to create calendar entries from my terminal.

That's how I pitched the idea to my buddies last time. The answer was: "I don’t know, sounds like a solution in search of a problem." But you know what they say: Never ask a starfish for directions.

Show, don’t tell

That night I went home and built a website that would create a calendar entry from GET parameters.

It allows you to create a calendar event from the convenience of your command line:

> curl https://zerocal.shuttleapp.rs?start=2022-11-04+20:00&duration=3h&title=Birthday&description=paaarty
BEGIN:VCALENDAR
VERSION:2.0
PRODID:ICALENDAR-RS
CALSCALE:GREGORIAN
BEGIN:VEVENT
DTSTAMP:20221002T123149Z
CLASS:CONFIDENTIAL
DESCRIPTION:paaarty
DTEND:20221002T133149Z
DTSTART:20221002T123149Z
SUMMARY:Birthday
UID:c99dd4bb-5c35-4d61-9c46-7a471de0e7f4
END:VEVENT
END:VCALENDAR

You can then save that to a file and open it with your calendar app.

> curl https://zerocal.shuttleapp.rs?start=2022-11-04+20:00&duration=3h&title=Birthday&description=paaarty > birthday.ics
> open birthday.ics

In a sense, it's a "serverless calendar app", haha. There is no state on the server, it just generates a calendar event on the fly and returns it.

How I built it

You probably noticed that the URL contains "shuttleapp.rs". That's because I'm using shuttle.rs to host the website.

Shuttle is a hosting service for Rust projects and I wanted to try it out for a long time.

To initialize the project using the awesome axum web framework, I’ve used

cargo install cargo-shuttle
cargo shuttle init --axum --name zerocal zerocal

and I was greeted with everything I needed to get started:

use axum::{routing::get, Router};
use sync_wrapper::SyncWrapper;

async fn hello_world() -> &'static str {
  "Hello, world!"
}

#[shuttle_service::main]
async fn axum() -> shuttle_service::ShuttleAxum {
  let router = Router::new().route("/hello", get(hello_world));
  let sync_wrapper = SyncWrapper::new(router);

  Ok(sync_wrapper)
}

Let's quickly commit the changes:

git add .gitignore Cargo.toml src/
git commit -m "Hello World"

To deploy the code, I needed to sign up for a shuttle account. This can be done over at https://www.shuttle.rs/login.

It will ask you to authorize it to access your Github account.

Then:

cargo shuttle login

and finally:

cargo shuttle deploy

Now let's head over to zerocal.shuttleapp.rs:

Hello World!

Deploying the first version took less than 5 minutes. Neat! We're all set for our custom calendar app.

Writing the app

To create the calendar event, I used the icalendar crate (shout out to hoodie for creating this nice library!). iCalendar is a standard for creating calendar events that is supported by most calendar apps.

cargo add icalendar
cargo add chrono # For date and time parsing

Let's create a demo calendar event:

let event = Event::new()
  .summary("test event")
  .description("here I have something really important to do")
  .starts(Utc::now())
  .ends(Utc::now() + Duration::days(1))
  .done();

Simple enough.

How to return a file!?

Now that we have a calendar event, we need to return it to the user. But how do we return it as a file?

There's an example of how to return a file dynamically in axum here.

async fn calendar() -> impl IntoResponse {
  let ical = Calendar::new()
    .push(
      // add an event
      Event::new()
        .summary("It works! 😀")
        .description("Meeting with the Rust community")
        .starts(Utc::now() + Duration::hours(1))
        .ends(Utc::now() + Duration::hours(2))
        .done(),
    )
    .done();

  CalendarResponse(ical)
}

Some interesting things to note here:

  • Every calendar file is a collection of events so we wrap the event in a Calendar object, which represents the collection.
  • impl IntoResponse is a trait that allows us to return any type that implements it.
  • CalendarResponse is a newtype wrapper around Calendar that implements IntoResponse.

Here is the CalendarResponse implementation:

/// Newtype wrapper around Calendar for `IntoResponse` impl
#[derive(Debug)]
pub struct CalendarResponse(pub Calendar);

impl IntoResponse for CalendarResponse {
  fn into_response(self) -> Response {
    let mut res = Response::new(boxed(Full::from(self.0.to_string())));
    res.headers_mut().insert(
      header::CONTENT_TYPE,
      HeaderValue::from_static("text/calendar"),
    );
    res
  }
}

We just create a new Response object and set the Content-Type header to the correct MIME type for iCalendar files: text/calendar. Then we return the response.

Add date parsing

This part is a bit hacky, so feel free to glance over it. We need to parse the date and duration from the query string. I used dateparser, because it supports sooo many different date formats.

async fn calendar(Query(params): Query<HashMap<String, String>>) -> impl IntoResponse {
  let mut event = Event::new();
  event.class(Class::Confidential);

  if let Some(title) = params.get("title") {
    event.summary(title);
  } else {
    event.summary(DEFAULT_EVENT_TITLE);
  }
  if let Some(description) = params.get("description") {
    event.description(description);
  } else {
    event.description("Powered by zerocal.shuttleapp.rs");
  }

  if let Some(start) = params.get("start") {
    let start = dateparser::parse(start).unwrap();
    event.starts(start);
    if let Some(duration) = params.get("duration") {
      let duration = humantime::parse_duration(duration).unwrap();
      let duration = chrono::Duration::from_std(duration).unwrap();
      event.ends(start + duration);
    }
  }

  if let Some(end) = params.get("end") {
    let end = dateparser::parse(end).unwrap();
    event.ends(end);
    if let Some(duration) = params.get("duration") {
      if params.get("start").is_none() {
        let duration = humantime::parse_duration(duration).unwrap();
        let duration = chrono::Duration::from_std(duration).unwrap();
        event.starts(end - duration);
      }
    }
  }

  let ical = Calendar::new().push(event.done()).done();

  CalendarResponse(ical)
}

Would be nice to support more date formats like now and tomorrow, but I'll leave that for another time.

Let's test it:

> cargo shuttle run # This starts a local dev server
> curl 127.0.0.1:8000?start=2022-11-04+20:00&duration=3h&title=Birthday&description=Party
*🤖 bleep bloop, calendar file created*

Nice, it works!

Opening it in the browser creates a new event in the calendar:

Of course, it also works on Chrome, but you do [support the open web](https://contrachrome.com/), right?
Of course, it also works on Chrome, but you do support the open web, right?

And for all the odd people who don't use a terminal to create a calendar event, let's also add a form to the website.

Add a form

<form>
  <table>
    <tr>
      <td>
        <label for="title">Event Title</label>
      </td>
      <td>
        <input type="text" id="title" name="title" value="Birthday" />
      </td>
    </tr>
    <tr>
      <td>
        <label for="desc">Description</label>
      </td>
      <td>
        <input type="text" id="desc" name="desc" value="Party" />
      </td>
    </tr>
    <tr>
      <td><label for="start">Start</label></td>
      <td>
        <input type="datetime-local" id="start" name="start" />
      </td>
    </tr>
    <tr>
      <td><label for="end">End</label></td>
      <td>
        <input type="datetime-local" id="end" name="end" />
      </td>
    </tr>
  </table>
</form>

I modified the calendar function a bit to return the form if the query string is empty:

async fn calendar(Query(params): Query<HashMap<String, String>>) -> impl IntoResponse {
  // if query is empty, show form
  if params.is_empty() {
    return Response::builder()
      .status(200)
      .body(boxed(Full::from(include_str!("../static/index.html"))))
      .unwrap();
  }

  // ...
}

After some more tweaking, we got ourselves a nice little form in all of its web 1.0 glory:

The form
The form

And that's it! We now have a little web app that can create calendar events. Well, almost. We still need to deploy it.

Deploying

cargo shuttle deploy

Right, that's all. It's that easy. Thanks to the folks over at shuttle.rs for making this possible.

The calendar app is now available at zerocal.shuttleapp.rs.

Now I can finally send my friends a link to a calendar event for our next pub crawl. They'll surely appreciate it.yeahyeah

From zero to calendar in 100 lines of Rust

Boy it feels good to be writing some plain HTML again.
Building little apps never gets old.

Check out the source code on GitHub and help me make it better! 🙏

Here are some ideas:

  • ✅ Add location support (e.g. location=Berlin or location=https://zoom.us/test). Thanks to sigaloid.
  • Add support for more human-readable date formats (e.g. now, tomorrow).
  • Add support for recurring events.
  • Add support for timezones.
  • Add Google calendar short-links (https://calendar.google.com/calendar/render?action=TEMPLATE&dates=20221003T224500Z%2F20221003T224500Z&details=&location=&text=).
  • Add example bash command to create a calendar event from the command line.
  • Shorten the URL (e.g. zerocal.shuttleapp.rs/2022-11-04T20:00/3h/Birthday/Party)?

Check out the issue tracker and feel free to open a PR!


Another Update On The Bagel Language Brandon's Website

Brandon's Website2022-07-03 00:00:00 Hi all. It's been about five months since my last Bagel post, so I wanted to give an update for the handful of you who have been interested/following along (thank you!).

Release of wgpu v0.13 and Call for Testing gfx-rs nuts and bolts

gfx-rs nuts and bolts2022-06-30 00:00:00

The gfx-rs community’s goal is to make graphics programming in Rust easy, fast, and reliable. Our main projects are:

  • wgpu is a portable graphics api. It provides safe, accessible, and portable access to the GPU.
  • naga translates shader programs between languages, including WGSL. It also provides shader validation and transformation, ensuring user code running on the GPU is safe and efficient.

After a long gap between releases, we have just rolled out v0.13 of wgpu and v0.9 of naga! See wgpu v0.13 changelog and naga v0.9 changelog for the details and migration guide.

While it’s been a long time between releases, we’ve been hard at work improving both wgpu’s implementation and its user facing experience.

Performance and Correctness

This release we’ve focused on improving both our performance and correctness. One of our biggest bottlenecks, tracking performance, has been significantly improved and is no longer the biggest bottleneck. There are more performance improvements coming in the near future.

There have been many bugs fixed in this release on all backends.

naga Improvements

naga, our shader translator, has improved substantially.

All backends and frontends have gotten even more solidly tested with a truly massive amount of bugs being fixed.

Additionally naga now supports the newest rendition of the wgsl spec, bringing it back inline with other WebGPU projects. See the wgpu changelog for transition details.

Presentation and Pipelining

We have focused some of our attention on improving the interface for surface managment and presentation. Most importantly we now allow a greater set of presentation modes (Mailbox, Fifo, FifoRelaxed, and Immediate) and have removed implicit fallback over explicit “Automatic” modes which have defined fallback paths (AutoVsync and AutoNoVsync). Additionally, surfaces now expose the full set of texture formats that can be used on them, not just their most preferred format. This should be paving the way for HDR and more explicit color space support.

Additionally we have changed BufferSlice::map_async from returning a future that resolves when the mapping is complete to calling a callback when the mapping is complete. We have received a sizable amount of feedback about how hard the futures based api was to use and how easily it leads to deadlocks or very poor performance. The callback based api makes it more clear what is actually happening under the hood and discourages the usage patterns that caused issues.

Call for Testing: DX12

For a variety of performance and stability reasons we are looking at making wgpu’s default backend on windows DX12 instead of vulkan. As part of this push we need people to test their wgpu 0.13 code on the DX12 backend. The easiest way to do this (for testing purposes) is, when you create your instance to pass in DX12 as the only available backend.

let instance = wgpu::Instance::new(wgpu::Backends::DX12);

If you find any inconsistencies, bugs, or crashes with this, please file a bug report!

For more information on this change, please see the tracking issue: #2719.

Release Schedule

We’ve slipped significantly from our original cadence of a release every 3 to 4 months with this release being nearly 7 months after the last release. As part of the effort to make releases less substantial and easier on both us and our users, we’re going to be attempting to follow a stricter 3 month (90 day) release cadence. This way contributors can be sure their changes get released in a timely fashion and release management easier on us.

Thank You!

Thanks to the countless contributors that helped out with this release! wgpu and naga’s momentum is truly incredible due to everyone’s contributions and we look forward to seeing the amazing places wgpu and naga will go. If you are interested in helping, take a look at our good-first-issues, our issues with help wanted, or contact us on our matrix chat, we are always willing to help mentor first time and returning contributors.

Additionally, thank you to all the users who report new issues, ask for enhancements, or test the git version of wgpu. Keep it coming!

Happy rendering!


Hiking With Your Dog Posts on elder.dev

Posts on elder.dev2022-05-21 00:00:00 Bobby loves hiking

Bagel Bites: Type Refinement Brandon's Website

Brandon's Website2022-02-13 00:00:00 I'm stuck solving a gnarly problem right now, so I thought I'd switch gears and write about a recent win in Bagel's design/implementation that I'm really excited about.

Grasping React Hooks Brandon's Website

Brandon's Website2022-02-02 00:00:00 Hooks are weird, and can be hard to reason about. They kind of (but don't actually!) establish a new domain-specific language on top of JavaScript, with its own set of rules and behaviors, and they can make it easy to lose track of what's actually really happening in your code.

Bagel Bites 🥯 (Update on the Bagel Language) Brandon's Website

Brandon's Website2022-01-22 00:00:00 It's been about four months since I last posted about Bagel, the new JavaScript-targeted programming language I've been working on. A lot has changed since then, but things are finally crystallizing and getting into a clear-enough space where I feel comfortable sharing some concrete details (and real code!).

Three Kinds of Polymorphism in Rust Brandon's Website

Brandon's Website2022-01-05 00:00:00 This information probably won't be new to you if you've been writing Rust for a bit! But I'm hoping the framing will be useful anyway. It's been useful for me.

This Year in Wgpu - 2021 gfx-rs nuts and bolts

gfx-rs nuts and bolts2021-12-25 00:00:00

gfx-rs community’s goal is to make graphics programming in Rust easy, fast, and reliable. Our main projects are:

  • wgpu is built on top of wgpu-hal and naga. It provides safety, accessibility, and portability for graphics applications.
  • naga translates shader programs between languages, including WGSL. It also provides shader validation and transformation, ensuring user code running on the GPU is safe and efficient.

As 2021 comes to an end, let’s look back at everything that has been accomplished.

Fredrik Norén's terrain with trees

Wgpu

We moved from gfx-hal to the newly created wgpu-hal and restructured the repository to keep everything together. At the same time, we dropped SPIRV-Cross in favor of naga, reaching the pure-rust tech stack. Read more in the 0.10 release post. Credit goes to @kvark.

At the same time, @cwfitzgerald has revamped our testing infrastructure with Rust integration tests and example snapshots. On top of that, wgpu has tightly integrated with Deno (thanks to the effort of Deno team!), opening up the road to testing on a real CTS, which is available in CI now.

One shiny highlight of the year was the WebGL port, which became practically usable. Getting it ready was truly a collaborative effort, kicked off by @zicklag. Today, wgpu-rs examples can be run online with WebGL.

In terms of correctness and portability, @Wumpf landed the titanic work of ensuring all our resources are properly zero-initialized. This has proven to be much more involved than it seems, and now users will get consistent behavior across platforms.

Finally, we just released version 0.12 with the fresh and good stuff!

Naga

Naga grew more backends (HLSL, WGSL) and greatly improved support all around the table. It went from an experimental prototype in 0.3 to production, shipping in Firefox Nightly. It proved to be 4x faster than SPIRV-Cross at SPV->MSL translation.

One notable improvement, led by @JCapucho with some help from @jimblandy, is the rewrite of SPIR-V control flow processing. This has been a very problematic and complicated area in past, and now it’s mostly solved.

Things have been busy on GLSL frontend as well. It got a completely new parser thanks to @JCapucho, which made it easier to improve and maintain.

Validation grew to cover all the expressions and types and everything. For some time, it was annoying to see rough validation errors without any reference to the source. But @ElectronicRU saved the day by making our errors really nice, similar to how WGSL parser errors were made pretty by @grovesNL work earlier.

Last but not the least, SPIR-V and MSL backends have been bullet-proofed by @jimblandy. This includes guarding against out-of-bounds accesses on arrays, buffers, and textures.

Future Work

One big project that hasn’t landed is the removal of “hubs”. This is a purely internal change, but a grand one. It would streamline our policy of locking internal data and allow the whole infrastructure to scale better with more elaborate user workloads. We hope to see it coming in 2022.

Another missing piece is DX11 backend. We know it’s much needed, and it was the only regression from the wgpu-hal port. This becomes especially important now as Intel stopped supporting DX12 on its Haswell GPUs.

Overall, there’s been a lot of good quality contributions, and this list by no means can describe the depth of it. We greatly appreciate all the improvements and would love to shout out about your work at the earliest opportunity. Big thanks for everybody involved!


Release of wgpu v0.11 and naga v0.7 gfx-rs nuts and bolts

gfx-rs nuts and bolts2021-10-07 00:00:00

gfx-rs community’s goal is to make graphics programming in Rust easy, fast, and reliable. Our main projects are:

  • wgpu is built on top of wgpu-hal and naga. It provides safety, accessibility, and portability for graphics applications.
  • naga translates shader programs between languages, including WGSL. It also provides shader validation and transformation, ensuring user code running on the GPU is safe and efficient.

Following our release cadence of every few months, we rolled out v0.11 through all of the gfx-rs projects! See wgpu v0.11 changelog and naga v0.7 changelog for the details.

This is our second release using our pure rust graphics stack. We’ve made a significant progress with shader translation and squashed many bugs in both wgpu and the underlying abstraction layer.

WebGL2

Thanks to the help of @Zicklag for spearheading the work on the WebGL2 backend. Through modifying the use of our OpenGL ES backend, they got WebGL2 working on the web. The backend is still in beta, so please test it out and file bugs! See the guide to running on the web for more information.

The following shows one of Bevy’s PBR examples running on the web.

bevy running on webgl2

Explicit Presentation

A long standing point of confusion when using wgpu was that dropping the surface frame caused presentation. This was confusing and often happened implicitly. With this new version, presentation is now marked explicitly by calling frame.present(). This makes very clear where the important action of presentation takes place.

More Robust Shader Translation

naga has made progress on all frontends and backends.

The most notable change was that @JCapucho, with the help of @jimb, completely rewrote the parsing of spirv’s control flow. spirv has notably complex control flow which has a large number of complicated edge cases. After multiple reworks, we have settled on this new style of control flow graph parsing. If you input spirv into wgpu, this will mean that even more spirv, especially optimized spirv, will properly validate and convert.

See the changelog for all the other awesome editions to naga.

Thank You!

Thanks to the countless contributors that helped out with this release! wgpu and naga’s momentum is truely incredible due to everyone’s contributions and we look forward to seeing the amazing places wgpu and naga will go as projects. If you are interested in helping, take a look at our good-first-issues, our issues with help wanted, or contact us on our matrix chat, we are always willing to help mentor first time and returning contributors.

Additionally, thank you to all the users who report new issues, ask for enhancements, or test the git version of wgpu. Keep it coming!

Happy rendering!


The Bagel Language 🥯 Brandon's Website

Brandon's Website2021-09-16 00:00:00 I've started working on a programming language.

wgpu alliance with Deno gfx-rs nuts and bolts

gfx-rs nuts and bolts2021-09-16 00:00:00

gfx-rs community’s goal is to make graphics programming in Rust easy, fast, and reliable. Our main projects are:

  • wgpu is built on top of wgpu-hal and naga. It provides safety, accessibility, and portability for graphics applications.
  • naga translates shader programs between languages, including WGSL. It also provides shader validation and transformation, ensuring user code running on the GPU is safe and efficient.

wgpu works over native APIs, such as Vulkan, D3D12, Metal, and others. This involves a layer of translation to these APIs, which is generally straightforward. It promises safety and portability, so it’s critical for this library to be well tested. To this date, our testing was a mix of unit tests, examples, and a small number of integration tests. Is this going to be enough? Definitely no!

Fortunately, WebGPU is developed with a proper Conformance Test Suite (CTS), largely contributed by Google to date. It’s a modern test suite covering all of the API parts: API correctness, validation messages, shader functionality, feature support, etc. The only complication is that it’s written in TypeScript against the web-facing WebGPU API, while wgpu exposes a Rust API.

Deno

We want to be sure that the parts working today will keep working tomorrow, and ideally enforce this in continuous integration, so that offending pull requests are instantly detected. Thus, we were looking for the simplest way to bridge wgpu with TS-based CTS, and we found it.

Back in March Deno 1.8 shipped with initial WebGPU support, using wgpu for implementing it. Deno is a secure JS/TS runtime written in Rust. Using Rust from Rust is :heart:! Deno team walked the extra mile to hook up the CTS to Deno WebGPU and run it, and they reported first CTS results/issues ever on wgpu.

Thanks to Deno’s modular architecture, the WebGPU implementation is one of the pluggable components. We figured that it can live right inside wgpu repository, together with the CTS harness. This way, our team has full control of the plugin, and can update the JS bindings together with the API changes we bring from the spec.

Today, WebGPU CTS is fully hooked up to wgpu CI. We are able to run the white-listed tests by the virtue of adding “needs testing” tag to any PR. We are looking to expand the list of passing tests and eventually cover the full CTS. The GPU tests actually run on github CI, using D3D12’s WARP software adapter. In the future, we’ll enable Linux testing with lavapipe for Vulkan and llvmpipe for GLES as well. We are also dreaming of a way to run daemons on our working (and idle) machines that would pull revisions and run the test suite on real GPUs. Please reach out if you are interested in helping with any of this :wink:.

Note that Gecko is also going to be running WebGPU CTS on its testing infrastructure, independently. The expectation is that Gecko’s runs will not show any failures on tests enabled on our CI based on Deno, unless the failures are related to Gecko-specific code, thus making the process of updating wgpu in Gecko painless.

We love the work Deno is doing, and greatly appreciate the contribution to wgpu infrastructure and ecosystem! Special thanks to Luca Casonato and Leo K for leading the effort :medal_military:.


Release of a Pure-Rust v0.10 and a Call For Testing gfx-rs nuts and bolts

gfx-rs nuts and bolts2021-08-18 00:00:00

gfx-rs community’s goal is to make graphics programming in Rust easy, fast, and reliable. Our main projects are:

  • wgpu is built on top of wgpu-hal and naga. It provides safety, accessibility, and portability for graphics applications.
  • naga translates shader programs between languages, including WGSL. It also provides shader validation and transformation, ensuring user code running on the GPU is safe and efficient.

If you’ve been following these releases you’ll notice that gfx-hal is absent from this list. gfx-hal has now been deprecated in favor of a new abstraction layer inside of wgpu called wgpu-hal. To see more information about the deprecation, see the 0.9 release post.

Following our release cadence every few months, we rolled out 0.10 through all of the gfx-rs projects! See wgpu v0.10 changelog and naga v0.6 changelog for the details.

Pure-Rust Graphics

wgpu has had many new changes, the most notible of which is the switch to our new Hardware Abstraction Layer wgpu-hal. This includes completely rebuilt backends which are more efficient, easier to maintain, and signifigantly leaner. As part of this, we have shed our last C/C++ dependency spirv-cross. We now are entirely based on naga for all of our shader translation. This is not only a marked achievement for rust graphics, but has made wgpu safer and more robust.

The new wgpu-hal:

  • Supports Vulkan, D3D12, Metal, and OpenGL ES with D3D11 to come soon.
  • Has 60% fewer lines of code than gfx-hal (22k LOC vs 55k)
  • Maps better to the wide variety of backends we need to support.

Other notable changes within wgpu:

  • Many api improvements and bug fixes.
  • New automated testing infrastructure.

naga has continued to matured significantly since the last release:

  • hlsl output is now supported and working well.
  • wgsl parsing has had numerous bugs fixed.
  • spirv parsing support continues to be very difficult but improving steadily.
  • With wgpu-hal now dependending on naga, all code paths have gotten signifigant testing.
  • Validation has gotten more complete and correct.

Call For Testing

This is an extremely big release for us. While we have confidence in our code and we have tested it extensively, we need everyone’s help in testing this new release! As such we ask if people can update to the latest wgpu and report to us any problems or issues you face.

If you aren’t sure if something is an issue, feel free to hop on our matrix chat to discuss.

Thank You!

Thanks to the countless contributors that helped out with this massive release! wgpu’s momentum is truely incredible due to everyone’s contributions and we look forward to seeing the amazing places wgpu will go as a project. If you are interested in helping, take a look at our good-first-issues, our issues with help wanted, or contact us on our matrix chat, we are always willing to help mentor first time and returning contributors.

Additionally, thank you to all the users who report new issues, ask for enhancements, or test the git version of wgpu. Keep it coming!

Happy rendering!


Casual Parsing in JavaScript Brandon's Website

Brandon's Website2021-08-16 00:00:00 Over the last year and a half I've gotten really into writing parsers and parser-adjacent things like interpreters, transpilers, etc. I've done most of these projects in JavaScript, and I've settled into a nice little pattern that I re-use across projects. I wanted to share it because I think it's neat, and it's brought me joy, and it could be an interesting or entertaining thing for others to follow along with!

Release of v0.9 and the Future of wgpu gfx-rs nuts and bolts

gfx-rs nuts and bolts2021-07-16 00:00:00

gfx-rs community’s goal is to make graphics programming in Rust easy, fast, and reliable. Our current main projects are:

  • gfx-rs makes low-level GPU programming portable with low overhead. It’s a single Vulkan-like Rust API with multiple backends that implement it: Direct3D 12/11, Metal, Vulkan, and even OpenGL ES.
  • naga translates the shaders between languages, including WGSL. Also provides validation and processing facilities on the intermediate representation.
  • wgpu is built on top of gfx-rs and gpu-alloc/gpu-descriptor. It provides safety, accessibility, and strong portability of applications.

Following our release cadence every few months, we rolled out the 0.9 version through all of the gfx projects! See gfx-rs changelog, wgpu changelog, and naga changelog for the details.

naga has matured significantly since the last release.

  • wgsl parsing has improved incredibly, targeting an up-to-date spec.
  • spirv parsing support has had numerous bugs fixed.
  • glsl support is starting to take shape, though still in an alpha state.
  • Validation has gotten more complete and correct.

wgpu validation has continued to improve. Many validation holes were plugged with the last release. Through the combined work in wgpu and naga, validation holes have been sured up, and new features have been implemented. One such feature is getting the array length of runtime-sized arrays, which is now properly implemented on metal.

wgpu performance is still a vital target for us, so we have done work on improving the overhead of resource tracking. We’ve reduced unnecessary overhead through only doing stateful tracking for resources that have complex states. These changes were made from benchmarks of Gecko’s WebGPU implementation which showed that tracking was a bottleneck. You can read more about it #1413.

wgpu Family Reunion, Relicense, and the Future

wgpu has had a number of large internal changes which are laying the future for wgpu to be a safe, efficient, and portable api for doing cross-platform graphics.

wgpu has been relicensed from MPL-2.0 to MIT/Apache-2.0. Thank you to all 142 people who replied to the issue and made this happen. This relicense is an important change because it allows the possibility of adding backends targeting APIs which are behind NDAs.

For a while, we acknowledged that having different essential parts of the project living in different repositories was hurting developers productivity. There were objective reasons for this, but the time has come to change that. Feedback from our friends at the Bevy game engine gave us the final push and we launched an initiative to make wgpu easier to contribute to. We moved wgpu-rs back into the wgpu repo. This means that PRs that touch both the core crate and the rust bindings no longer need multiple PRs that need to be synchronized. We have already heard from collaborators how much easier the contribution is now that there is less coordination to do. Read more about the family reunion.

As a part of our family reunion, 0.9 is going to be the last release that will use gfx-hal as its hardware abstraction layer. While it has served us well, it has proved to not be at the exact level of abstraction we need. We have started work on a new abstraction layer called wgpu-hal. This new abstraction has already had Vulkan, Metal, and GLES ported, with DX12 landed in an incomplete state, and DX11 to come soon. To learn more about this transition, you can read the whole discussion.

Finally, we have brand new testing infrastructure that allows us to automatically test across all backends and all adapters in the system. Included in our tests are image comparison tests for all of our examples and the beginnings of feature tests. We hope to expand this to cover a wide variety of features and use cases. We will be able to run these tests in CI on software adapters and our future goal is to setting up a distributed testing network so that we can automatically test on a wide range of adapters. This will be one important layer of our in-depth defences, ensuring that wgpu is actually portable and safe. Numerous bugs have already been caught by this new infrastructure thus far and it will help us prevent regressions in the future. Read more about our testing infrastructure.

Thank You!

Thank you for the countless contributors that helped out with this release! wgpu’s momentum is only increasing due to everyone’s contributions and we look forward to seeing the amazing places wgpu will go as a project. If you are interested in helping, take a look at our good-first-issues, our issues with help wanted, or contact us on our matrix chat, we are always willing to help mentor first time and returning contributors.

Additionally, thank you to all the users who report new issues, ask for enhancements, or test the git version of wgpu. Keep it coming!

Happy rendering!


The Uber of Poland Matthias Endler

Matthias Endler2021-06-14 00:00:00

A few years ago I visited a friend in Gdańsk, Poland. As we explored the city, one thing I noticed was that cabs were relatively expensive and there was no Uber. Instead, most (young) people used a community-organized service called Night Riders.

I couldn't find anything about that service on the web, so I decided to write about it to preserve its history.

Delightfully Low-Tech

What fascinated me about Night Riders was the way the service operated — completely via WhatsApp: you post a message in a group chat and one of the free riders would reply with a 👍 emoji. With that, your ride was scheduled. You'd pay through PayPal or cash.

In these days of venture-backed startups that need millions in capital before they turn a profit, this approach is decidedly antagonistic. Basically, Night Riders built on top of existing infrastructure instead of maintaining their own ride-hailing platform, sign-up process, or even website.

The service would grow solely by word of mouth. Using existing infrastructure meant that it was extremely cheap to run and there were almost zero upfront costs without a single line of code to write.

It simply solved the customer's problem in the most straightforward way possible. Of course, there are legal issues regarding data protection, labor law or payment processing, but the important bit is that they had paying customers from day one. The rest is easier to solve than a lack of product market fit.

In Defense of Clones

Uber and Lyft can't be everywhere from the start. While they expand their businesses, others have the ability to outpace them. There's an Uber clone in China (DiDi), one in Africa and the Middle East (Careem) and basically one for every country in the world. The tech industry rarely talks about these Ubers of X, but they serve millions of customers. While they start as exact copies of their well-known counterparts, some of them end up offering better service thanks to their understanding of the local market.

People always find a way

With creativity, you can provide great service even without a big budget. The important part is to know which corners you can cut while staying true to your mission. If there's a market, there's a way. The Cubans have a word for it: resolver, which means "we'll figure it out".


How Does The Unix `history` Command Work? Matthias Endler

Matthias Endler2021-05-31 00:00:00
Cozy attic created by [vectorpouch](https://www.freepik.com/vectors/poster) and tux created by [catalyststuff](https://www.freepik.com/vectors/baby) &mdash; freepik.com
Source: Cozy attic created by vectorpouch and tux created by catalyststuff — freepik.com

As the day is winding down, I have a good hour just to myself. Perfect time to listen to some Billie Joel (it's either Billie Joel or Billie Eilish for me these days) and learn how the Unix history command works. Life is good.

Learning what makes Unix tick is a bit of a hobby of mine.
I covered yes, ls, and cat before. Don't judge.

How does history even work?

Every command is tracked, so I see the last few commands on my machine when I run history.

❯❯❯ history
8680  cd endler.dev
8682  cd content/2021
8683  mkdir history
8684  cd history
8685  vim index.md

Yeah, but how does it do that?

The manpage on my mac is not really helpful — I also couldn't find much in the first place.

I found this article (it's good etiquette nowadays to warn you that this is a Medium link) and it describes a bit of what's going on.

Every command is stored in $HISTFILE, which points to ~/.zsh_history for me.

❯❯❯ tail $HISTFILE
: 1586007759:0;cd endler.dev
: 1586007763:0;cd content/2021
: 1586007771:0;mkdir history
: 1586007772:0;cd history
: 1586007777:0;vim index.md
...

So let's see. We got a : followed by a timestamp followed by :0, then a separator (;) and finally the command itself. Each new command gets appended to the end of the file. Not too hard to recreate.

Hold on, what's that 0 about!?

It turns out it's the command duration, and the entire thing is called the extended history format:

: <beginning time>:<elapsed seconds>;<command>

(Depending on your settings, your file might look different.)

Hooking into history

But still, how does history really work.

It must run some code whenever I execute a command — a hook of some sort!

💥 Swoooooosh 💥

Matthias from the future steps out of a blinding ball of light: Waaait! That's not really how it works!

It turns out that shells like bash and zsh don't actually call a hook for history. Why should they? When history is a shell builtin, they can just track the commands internally.

Thankfully my editor-in-chief and resident Unix neckbeard Simon Brüggen explained that to me — but only after I sent him the first draft for this article. 😓

As such, the next section is a bit like Lord of the Rings: a sympathetic but naive fellow on a questionable mission with no clue of what he's getting himself into.

In my defense, Lord of the Rings is also enjoyed primarily for its entertainment value, not its historical accuracy.... and just like in this epic story, I promise we'll get to the bottom of things in the end.

I found add-zsh-hook and a usage example in atuin's source code.

I might not fully comprehend all of that is written there, but I'm a man of action, and I can take a solid piece of work and tear it apart.

It's not much, but here's what I got:

# Source this in your ~/.zshrc
autoload -U add-zsh-hook

_past_preexec(){
    echo "preexec"
}

_past_precmd(){
    echo "precmd"
}

add-zsh-hook preexec _past_preexec
add-zsh-hook precmd _past_precmd

This sets up two hooks: the first one gets called right before a command gets executed and the second one directly after. (I decided to call my little history replacement past. I like short names.)

Okay, let's tell zsh to totally run this file whenever we execute a command:

source src/shell/past.zsh

...aaaaaand

❯❯❯ date
preexec
Fri May 28 18:53:55 CEST 2021
precmd

It works! ✨ How exciting!

Actually, I just remember now that I did the same thing for my little environment settings manager envy over two years ago, but hey!

So what to do with our newly acquired power?

Let's Run Some Rust Code

Here's the thing: only preexec gets the "real" command. precmd gets nothing:

_past_preexec(){
    echo "preexec $@"
}

_past_precmd(){
    echo "precmd $@"
}

$@ means "show me what you got" and here's what it got:

❯❯❯ date
preexec date date date
Fri May 28 19:02:11 CEST 2021
precmd

Shouldn't one "date" be enough?
Hum... let's look at the zsh documentation for preexec:

If the history mechanism is active [...], the string that the user typed is passed as the first argument, otherwise it is an empty string. The actual command that will be executed (including expanded aliases) is passed in two different forms: the second argument is a single-line, size-limited version of the command (with things like function bodies elided); the third argument contains the full text that is being executed.

I don't know about you, but the third argument should be all we ever need? 🤨

Checking...

❯❯❯ ls -l
preexec ls -l lsd -l lsd -l

(Shout out to lsd, the next-gen ls command )

Alright, good enough. Let's parse $3 with some Rust code and write it to our own history file.

use std::env;
use std::error::Error;
use std::fs::OpenOptions;
use std::io::Write;

const HISTORY_FILE: &str = "lol";

fn main() -> Result<(), Box<dyn Error>> {
    let mut history = OpenOptions::new()
        .create(true)
        .append(true)
        .open(HISTORY_FILE)?;

    if let Some(command) = env::args().nth(3) {
        writeln!(history, "{}", command)?;
    };
    Ok(())
}
❯❯❯ cargo run -- dummy dummy hello
❯❯❯ cargo run -- dummy dummy world
❯❯❯ cat lol
hello
world

We're almost done — at least if we're willing to cheat a bit. 😏 Let's hardcode that format string:

use std::env;
use std::error::Error;
use std::fs::OpenOptions;
use std::io::Write;
use std::time::SystemTime;

const HISTORY_FILE: &str = "lol";

fn timestamp() -> Result<u64, Box<dyn Error>> {
    let n = SystemTime::now().duration_since(SystemTime::UNIX_EPOCH)?;
    Ok(n.as_secs())
}

fn main() -> Result<(), Box<dyn Error>> {
    let mut history = OpenOptions::new()
        .create(true)
        .append(true)
        .open(HISTORY_FILE)?;

    if let Some(command) = env::args().nth(3) {
        writeln!(history, ": {}:0;{}", timestamp()?, command)?;
    };
    Ok(())
}

Now, if we squint a little, it sorta kinda writes our command in my history format. (That part about the Unix timestamp was taken straight from the docs. Zero regrets.)

Remember when I said that precmd gets nothing?

I lied.

In reality, you can read the exit code of the executed command (from $?). That's very helpful, but we just agree to ignore that and never talk about it again.

With this out of the way, our final past.zsh hooks file looks like that:

autoload -U add-zsh-hook

_past_preexec(){
    past $@
}

add-zsh-hook preexec _past_preexec

Now here comes the dangerous part! Step back while I replace the original history command with my own. Never try this at home. (Actually I'm exaggerating a bit. Feel free to try it. Worst thing that will happen is that you'll lose a bit of history, but don't sue me.)

First, let's change the path to the history file to my real one:

// You should read the ${HISTFILE} env var instead ;)
const HISTORY_FILE: &str = "/Users/mendler/.zhistory";

Then let's install past:

❯❯❯ cargo install --path .
# bleep bloop...

After that, it's ready to use. Let's add that bad boy to my ~/.zshrc:

source "/Users/mendler/Code/private/past/src/shell/past.zsh"

And FINALLY we can test it.

We open a new shell and run a few commands followed by history:

❯❯❯  date
...
❯❯❯ ls
...
❯❯❯ it works
...
❯❯❯ history
 1011  date
 1012  ls
 1013  it works

Yay.The source code for past is on Github.

How it really really works

Our experiment was a great success, but I since learned that reality is a bit different.

"In early versions of Unix the history command was a separate program", but most modern shells have history builtin.

zsh tracks the history in its main run loop. Here are the important bits. (Assume all types are in scope.)

Eprog prog;

/* Main zsh run loop */
for (;;)
{
    /* Init history */
    hbegin(1);
    if (!(prog = parse_event(ENDINPUT)))
    {
        /* Couldn't parse command. Stop history */
        hend(NULL);
        continue;
    }
    /* Store command in history */
    if (hend(prog))
    {
        LinkList args;
        args = newlinklist();
        addlinknode(args, hist_ring->node.nam);
        addlinknode(args, dupstring(getjobtext(prog, NULL)));
        addlinknode(args, cmdstr = getpermtext(prog, NULL, 0));

        /* Here's the preexec hook that we used.
        * It gets passed all the args we saw earlier.
        */
        callhookfunc("preexec", args, 1, NULL);

        /* Main routine for executing a command */
        execode(prog);
    }
}

The history lines are kept in a hash, and also in a ring-buffer to prevent the history from getting too big. (See here.)

That's smart! Without the ring-buffer, a malicious user could just thrash the history with random commands until a buffer overflow is triggered. I never thought of that.

History time (see what I did there?)

The original history command was added to the Unix C shell (csh) in 1978. Here's a link to the paper by Bill Joy (hey, another Bill!). He took inspiration from the REDO command in Interlisp. You can find its specification in the original Interlisp manual in section 8.7.

Lessons learned

  • Rebuild what you don't understand.
  • The history file is human-readable and pretty straightforward.
  • The history command is a shell builtin, but we can use hooks to write our own.
  • Fun fact: Did you know that in zsh, history is actually just an alias for fc -l? More info here or check out the source code.

“What I cannot create, I do not understand” — Richard Feynman


Shader translation benchmark on Dota2/Metal gfx-rs nuts and bolts

gfx-rs nuts and bolts2021-05-09 00:00:00

gfx-rs community’s goal is to make graphics programming in Rust easy, fast, and reliable. See The Big Picture for the overview, and release-0.8 for the latest progress. In this post, we are going to share the first performance metrics of our new pure-Rust shader translation library Naga, which is integrated into gfx-rs. Check the Javelin announcement, which was the original name of this project, for the background.

gfx-portability is a Vulkan Portability implementation in Rust, based on gfx-rs. Previous Dota2 benchmarks showed good potential in our implementation. However, it couldn’t be truly called an alternative to MoltenVK if it relies on SPIRV-Cross. Today, we are able to run Dota2 with a purely rust Vulkan Portability implementation, thanks to Naga.

Test

Testing was done on MacBook Pro (13-inch, 2016), which has a humble dual-core Intel CPU running at 3.3GHz. We created an alias to libMoltenVK.dylib and pointed DYLD_LIBRARY_PATH to it for Dota2 to pick up on boot, thus running on gfx-portability. It was build from naga-bench-dota tag in release. The SPIRV-Cross path was enabled by uncommenting features = ["cross"] line in libportability-gfx/Cargo.toml.

In-game steps:

  1. launch make dota-release
  2. skip the intro videos
  3. proceed to “Heroes” menu
  4. select “Tide Hunter”
  5. and click on “Demo Hero”
  6. walk the center lane, enable the 2nd and 3rd abilities
  7. use the 3rd ability, then quit

Hero selection screen with Naga (low settings)

The point of this short run is to get a bulk of shaders loaded (about 600 graphics pipelines). We are only interested in the CPU cost for loading shaders and creating pipelines. This isn’t a test for the GPU time executing the shaders. The only fact about GPU that matters here is that the picture looks identical. We don’t expect any architectural changes for potential visual issues to be discovered.

Times were collected using profiling instrumentation, which is integrated into gfx-backend-metal. We added this as a temporary dependency to gfx-portability with “profile-with-tracy” feature enabled in order to capture the times in Tracy.

In tracy profiles, we’d find the relevant chunks and click on the “Statistics” for them. We are interested in the mean (μ) time and the standard deviation (σ).

Results

Function Cross μ Cross σ Naga μ Naga σ
SPIR-V parsing 0.34ms 0.15ms 0.45ms 0.50ms
MSL generation 3.94ms 3.5ms 0.56ms 0.38ms
Total per stage 4.27ms   1.01ms  
         
create_shader_module 0.005ms 0.01ms 0.53ms 0.57ms
create_shader_library 5.19ms 6.19ms 0.89ms 1.23ms
create_graphics_pipeline 10.94ms 12.05ms 2.24ms 5.13ms

The results are split in 2 groups: one for the time spent purely in the shader translation code of SPIRV-Cross (or just “Cross”) and Naga. And the other group shows combined times of the translation + Metal runtime doing its part. The latter very much depends on the driver caches of the shaders, which we don’t have any control of. We made sure to run the same test multiple times, and only take the last result, giving the opportunity for caches to warm up. Interestingly, the number of outliers (shaders that ended up missing the cache) was still higher in the “Cross” path. This may be just noise, or improperly warmed up caches, but there is a chance it’s also indicative of the fact “Cross” generates more of different shaders, and/or being non-deterministic.

The total time spent in shader module or pipeline creation is 7s with Cross path and just 1.29s with Naga. So we basically shaved 6 seconds off the user (single-core) time just to get into the game.

In neither case there was any pipeline caching involved. One could argue that pipeline caches, when loaded from disk, would essentially solve this problem, regardless of the translation times. We have the support for caching implemented for Naga path, and we don’t want to make it unfair to Cross, so we excluded the caches from the benchmark. We will definitely include them in any full games runs of gfx-portability versus MoltenVK in the future.

Conclusions

This benchmark shows Naga being roughly 4x faster than SPIRV-Cross in shader translation from SPIR-V to MSL. It’s still early days for Naga, and we want to optimize the SPIR-V control-flow graph processing, which can be seen in the numbers taking time. We assume SPIRV-Cross also has a lot of low-hanging fruits to optimize, and are looking forward to see its situation improving.

Previously, we heard multiple requests to allow MSL generation to happen off-line. We are hoping that the lightning fast translation times (1ms per stage) coupled with pipeline caching would resolve this need.

The quality and read-ability of generated MSL code in Naga is improving, but it’s still not at the level of SPIRV-Cross results. It also doesn’t have the same feature coverage. We are constantly adding new things in Naga, such as interpolation qualifiers, atomics, etc.

Finally, Naga is architectured for shader module re-use. It does a lot of work up-front, and can produce target-specific shaders quickly, so it works best when there are many pipelines created using fewer shader modules. Dota2’s ratio appears to be 2 pipelines per 1 shader module. We expect that applications using multiple entry points in SPIR-V modules, or creating more variations of pipeline states, would see even bigger gains.


Release of v0.8 gfx-rs nuts and bolts

gfx-rs nuts and bolts2021-04-30 00:00:00

gfx-rs community’s goal is to make graphics programming in Rust easy, fast, and reliable. The main projects are:

  • gfx-rs makes low-level GPU programming portable with low overhead. It’s a single Vulkan-like Rust API with multiple backends that implement it: Direct3D 12/11, Metal, Vulkan, and even OpenGL ES.
  • naga translates the shaders between languages, including WGSL. Also provides validation and processing facilities on the intermediate representation.
  • wgpu-rs is built on top of gfx-rs and gpu-alloc/gpu-descriptor. It provides safety, accessibility, and strong portability of applications.

Following the regular schedule of releasing once in a few month, we just rolled out 0.8 versions across gfx/wgpu projects! See gfx-rs changelist, wgpu changelist, and naga changelist for the details.

tree

Naga-based shader infrastructure has been growing and capturing more ground. It has reached an important point where SPIRV-Cross is not just optional on some platforms, but even not enabled by default. This is now the case for Metal and OpenGL backends. Naga path is easier to integrate, share types with, compile, and it’s much faster to run. Early benchmarks suggest about 2.5x perf improvement over SPIRV-Cross for us.

The work on HLSL and WGSL backends is underway. The former will allow us to deprecate SPIRV-Cross on Direct3D 12/11 and eventually remove this C dependency. The latter will help users port the existing shaders to WGSL.

Another big theme of the release is enhanced wgpu validation. The host API side is mostly covered, with occasional small holes discovered by testing. The shader side is now validating both statements and expressions. Programming shaders with wgpu starts getting closer to Rust than C: most of the time you fight the validator to pass, and then it just works, portably. The error messages are still a bit cryptic though, hopefully we’ll improve it in the next release. Hitting a driver panic/crash becomes rare, and we are working on eliminating these outcomes entirely. In addition, wgpu now knows when to zero-initialize buffers automatically, bringing the strong portability story a bit closer to reality.

We also integrated profiling into wgpu and gfx-backend-metal. The author was receptive to our needs and ideas, and we are very happy with the results so far. Gathering CPU performance profiles from your applications today can’t be any simpler:

profiling

In Naga internals, the main internal improvement was about establishing an association of expressions to statements. It allows backends to know exactly if expression results can be re-used, and when they need to be evaluated. Overall, the boundary between statements and expressions became well defined and easy to understand. We also converged to a model, at high level, where the intermediate representation is compact, but there is a bag of derived information. It is produced by the validator, and is required for backends to function. Finally, entry points are real functions now: they can accept parameters from the previous pipeline stages and return results.

Finally, we added a few experimental graphics features for wgpu on native-only:

  • Buffer descriptor indexing
  • Conservative rasterization

P.S. overall, we are in the middle of a grand project that builds the modern graphics infrastructure in pure Rust, and we appreciate anybody willing to join the fight!


Why Rust strings seem hard Brandon's Website

Brandon's Website2021-04-13 00:00:00 Lately I've been seeing lots of anecdotes from people trying to get into Rust who get really hung up on strings (&str, String, and their relationship). Beyond Rust's usual challenges around ownership, there can be an added layer of frustration because strings are so easy in the great majority of languages. You just add them together, split them, whatever! They're primitives that you can do whatever you want with. For someone who's only ever known this mental model (which is to say, never worked much with C/C++), using strings in Rust can be a rude awakening. They feel very complicated, have all these restrictions and extra steps, and it all just seems so unnecessary.

Your First Business Should Be A Spreadsheet Matthias Endler

Matthias Endler2021-03-10 00:00:00

One of the best decisions I made in 2020 was to open my calendar to everyone. People book appointments to chat about open-source projects, content creation, and business ideas.

When we discuss business ideas, the conversation often leans towards problems suited for startups, such as using artificial intelligence to find clothes that fit or building a crowdfunding platform on the blockchain.

While these are exciting ideas, they require significant persistence and deep pockets. It might be easier and less risky to join an existing startup in that domain.

In reality, most people are simply looking for something cool to work on and to make their customers happy. It turns out you don't need to run a startup to achieve that (and you probably shouldn't). Instead, starting a side project is less risky and can organically grow into a business over time.

Often, the solution is right in front of them: hidden within an Excel spreadsheet on their computer.

I Hate Excel

I spend as little time in Excel as possible, only engaging with it when absolutely necessary. My focus is on getting tasks done quickly, not on layout or design; I'd rather pay someone to do that work for me. And this is precisely my point!

The spreadsheets and lists you create to solve your own problems can also solve someone else's. This represents a business opportunity!

This approach has several advantages:

  • 💪 It solves a real problem.
  • 🥱 It's mundane, so people might pay to avoid doing it themselves.
  • ⚡️ It wastes no time on design or infrastructure, embodying the ultimate MVP.
  • 🐢 It's low-tech: no programming required. You can start with Notion and Super.so.
  • 🐜 It targets a niche market: if there were an established service, you'd already be using it. Big corporations won't compete with you.
  • 🚀 It allows you to spend less time building and more time engaging with potential customers.

Examples

A few years ago, while researching static code analysis tools, I compiled a list, pushed it to GitHub, and moved on. Fast forward, and that side project now generates revenue from sponsors and consulting gigs.

Another example is a person who created a spreadsheet for remote work locations, shared it on Twitter, and then developed a website from it. The website is NomadList, and its creator, Pieter Levels, now earns $300k/year.

"Instead of building a site first, I simply made [a] public Google spreadsheet to collect the first data and see if there’d be interest for this at all." — Pieter Levels on how he created NomadList.

I've left a spot for your story here. Now, refine that spreadsheet (or list), share it with your friends, iterate based on their feedback, and build your first business.


Release of v0.7 gfx-rs nuts and bolts

gfx-rs nuts and bolts2021-02-02 00:00:00

gfx-rs community’s goal is to make graphics programming in Rust easy, fast, and reliable. It governs a wide range of projects:

  • gfx-rs makes low-level GPU programming portable with low overhead. It’s a single Vulkan-like Rust API with multiple backends that implement it: Direct3D 12/11, Metal, Vulkan, and even OpenGL.
  • naga translates the shaders between languages, including WGSL. Also provides validation and processing facilities on the intermediate representation.
  • wgpu-rs is built on top of gfx-rs and gfx-extras. It provides safety, accessibility, and even stronger portability of applications.
  • metal-rs and d3d12-rs wrap native graphics APIs on macOS and Windows 10 in Rust.

Today, we are happy to announce the release of 0.7 versions across gfx/wgpu projects!

gfx-hal-0.7

Overall theme of this release is simplification. We cut off a lot of experimental cruft that accumulated over the years, cleaned up the dependencies, and upgraded the API to be more modern.

For example, last release we made a step towards more generic bounds with ExactSizeIterator on our APIs. In this release, we are taking two steps back by removing not just ExactSizeIterator, but also Borrow from the iterator API. We figured a way to do the stack allocation without extra bounds, using inplace_it.

Having two distinct swapchain models has also come to an end. We removed the old Vulkan-like model, but also upgraded the new model to match “VK_KHR_imageless_framebuffer”, getting the best of both worlds. It maps to the backends even better than before, and we can expose it directly in gfx-portability now.

There is also a lot of API fixes and improvements, one particularly interesting one is aligning to Vulkan’s “external synchronization” requirements. This allows us to do less locking in the backends, making them more efficient.

Another highlight of the show is the OpenGL ES backend. It’s finally taking off based on EGL context and window system integration. There is still a lot of work to do on the logic, but the API is finally aligned to the rest of the backends (see 3.5 year old issue). We are targeting Linux/Android GLES3 and WebGL2 only.

See the full changelog for details.

wgpu-0.7

spaceship cheese

The list of libraries and applications has grown solidly since the last release. A lot of exciting projects and creative people joined our community.

Our goals were to bring the API closer to the stable point and improve validation. There is quite a bit of API changes, in particular with the pipeline descriptors and bind group layouts, but nothing architectural. We also got much nicer validation errors now, hopefully allowing users to iterate without always being confused :)

The highlight of wgpu work is support for WGSL shaders. It’s the emerging new shading language developed by WebGPU group, designed to be modern, safe, and writable by hands. Most of our examples are already using the new shaders, check them out! We are excited to finally be able to throw the C dependencies (spirv-cross, shaderc, etc) out of our projects, and build and deploy more easily.

See the core changelog and the rust API changelog for details.

naga-0.3

Naga has seen intensive development in all areas. SPIR-V frontend and backend, WGSL frontent, GLSL frontent and backend, intermediate layer, validation - all got a lot of improvements. It’s still not fully robust, but Naga has crossed the threshold of being actually usable, and we are taking advantage of it in wgpu-rs.

We experimented on the testing infrastructure and settled on cargo-insta. This boosted our ability to detect regressions, and allowed us to move forward more boldly.

The next steps for us are completing the validation, adding out-of-bounds checks, and replacing SPIRV-Cross completely in applications that have known shaders.

See the changelog for details.

P.S. overall, we are in the middle of a grand project that builds the modern graphics infrastructure in pure Rust, and we’d appreciate anybody willing to join the fight!


Starting A Print-On-Demand Business As A Software Engineer Matthias Endler

Matthias Endler2021-01-22 00:00:00

One day I had the idea to make a print of my Github timeline. I liked the thought of bringing something "virtual" into the real world. 😄

So I called up my friend Wolfgang and we built codeprints. It's my first "physical" product, so I decided to share my learnings.

[Felix Krause](https://krausefx.com/) of [fastlane](https://fastlane.tools/) fame was one
of our first customers and we are very thankful for this tweet promoting our
service, which gave us a huge traffic boost.
Felix Krause of fastlane fame was one of our first customers and we are very thankful for this tweet promoting our service, which gave us a huge traffic boost.

Launching Is Hard, So Launch Early

Even though I knew that launching early was vital, I still didn't want to "commit" to the final design shortly before the planned go-live. There was always that last bug to fix or that little extra feature to implement. For example, I wanted to offer two designs/layouts: the classic Github contribution timeline and a graph-based design for repositories. In cases like that, it helps to have a co-founder. Wolfgang convinced me that multiple layouts were not needed for the MVP and that whatever we'd come up with would probably be wrong anyway without getting early user feedback. He was right. Without Wolfgang, the shop would probably still not be live today. We have a much clearer vision now of what people want to see, thanks to launching early. Turns out users were not really interested in the graph-based design after all, and it would have been a waste of time to create it.

Lesson learned: Even if you know all the rules for building products, it's different when applying them in practice for the first time. We'll probably never be completely happy with the shop functionality, but it's better to launch early and make incremental improvements later.

Software Development Is Easy

When we started, my main concern was software development. The frontend and the backend needed to be coded and work together. We didn't want to run into Github rate-limiting issues in case there were many users on the site. I was also thinking a lot about which web frontend to use. Should we build it in Rust using Yew or better go with Gatsby?

Turns out writing the code is the easy part.

Being software engineers, it didn't take us too long to implement the backend API and we quickly found a decent template for the frontend. Most of our time was spent thinking about the product, the user experience, financing, taxes, the shipping process, marketing, and integrating customer feedback. These were all things I had (and still have) little experience in.

Wolfgang suggested to "just use Shopify and the default template" to get started quickly. In hindsight, it was the absolute right decision. I always thought Shopify was for simple mom-and-pop stores, but it turns out it's highly customizable, integrates well with pretty much anything, and offers excellent tooling like themekit. Payments, refunds, discounts, customer analytics: it's all built into the platform. It saved us sooo much development time.

Lesson learned: There are many unknown unknowns — things we are neither aware of nor understand — when starting a project. Try to get to the root of the problem as soon as possible to save time and avoid the sunk cost fallacy.

Users Expect Great UI/UX

Giants like Amazon, Facebook, and Netflix have raised customer expectations for great UX. They spend millions polishing their websites and getting every detail right. As a result, their sites work just right for millions of customers and on every device.

An indie shop does not have these resources. Nevertheless, many customers expect the same quality user experience as on other sites they use. Being on the other side of the fence for the first time, I learned how hard it is to build a user interface that works for 90% of the people. Every little detail — like the order of form fields — makes a huge difference. Get too many details wrong, and you lose a customer.

Those things can only be found by watching real users use your product. I promise you, it will be eye-opening!

Lesson learned: Watch potential customers use your service. It will be painful at first, but will improve the quality of your product. Use standard frameworks for shops if you can because they get many UI/UX details right out of the box. WooCommerce or Shopify come to mind.

Building Products Means Being Pragmatic

We have many ideas for future products. Many friends and customers tell us about potential features all the time, but the problem is how to prioritize them. Most ideas won't work at scale: It's tricky to find a supplier that has a product on offer, is cheap, ships worldwide, and has a working integration with your shop-system. So we have to regularly scrap product ideas, simply because our suppliers' support is not there. On top of that, we run the business next to our day job and other responsibilities, so we need to make use of our time as efficiently as possible.

Lesson learned: Making services look effortless is hard work. Time is your biggest constraint. You'll have to say "no" more often than you can say "yes".

Due to the pandemic, codeprints was
entirely built remotely. More people should give [whereby](https://whereby.com/)
a try.
Due to the pandemic, codeprints was entirely built remotely. More people should give whereby a try.

Getting Traction As A Small Business

It has never been easier to launch a shop. Services like Shopify, Stripe, and a host of suppliers make starting out a breeze. On the other hand, there is a lot more competition now that the barrier to entry is so low.

Thousands of services are constantly competing for our attention. On top of that, most customers just default to big platforms like Amazon, AliExpress, or eBay for their shopping needs these days, and search engines send a big chunk of the traffic there.

Since our product is custom-made, we can not offer it on those bigger platforms. As an indie shop, we get most visitors through word of mouth, exceptional customer support, and advertising where developers hang out: Twitter, Reddit, HackerNews, Lobste.rs, and friends. It's essential to focus on providing value on those platforms; a plain marketing post won't get you any attention. Other platforms like LinkedIn, Facebook, ProductHunt, or IndieHackers could also work, but our target audience (OSS developers with an active Github profile) doesn't hang out there that much.

Lesson learned: Always know where your customers are and understand their needs.

Finding A Niche Is Only Half The Job

Common market wisdom is to find niche and grow from within. With codeprints we definitely found our niche: the audience is very narrow but interested in our geeky products. There are 56 million developers on Github today; that's a big target audience. Most profiles are not very active, though. To make a print look attractive, you'd have to consistently commit code over a long period of time — many years. If we assume that only 1% of devs are active, that limits our target audience to 560.000 users. That's still a big but much smaller market. Now, if only 1% of these people find the shop and order something (which would be quite a good ratio), we're looking at 5.600 orders total. Not that much!

In order to extend that audience, one could either increase the number of potential customers or focus on getting more of the existing potential customers on the page. In our case, we expanded by offering a one-year layout, reducing the required level of Github activity for a cool print. We are also working on making emptier profiles look more interesting and highlighting the value-producing part of open source contribution. Every contribution counts — no matter how tiny.

Lesson learned: Make sure that your niche market is not too narrow so that you can make a sustainable business out of it.

Early adopters like [Orta
Therox](https://orta.io/) are incredibly precious when starting out. Not
everybody has a rockstar profile like that, though (and that's fine).
Early adopters like Orta Therox are incredibly precious when starting out. Not everybody has a rockstar profile like that, though (and that's fine).

Make User Feedback Actionable

Initial customer feedback is precious. You should focus on every word these customers say as they believe in your product and want you to win. (They voted with their wallet after all.) Feedback from friends is helpful, too, but I usually apply a bigger filter to that. Not all of my friends are software developers, and while they all mean well, what they tell me might be different from what they mean. It's like they are asking for faster horses when what they really want is a car. Feedback on social media can be... snarky at times; be prepared for that! Your job is to find the grain of truth in every statement and focus on constructive advice.

For example, take this feedback we got:

How lazy can someone be to pay €36 for this.

You could turn it around to make it constructive:

Can I get a cheaper version to print myself?

And that is some valuable feedback. We could provide a downloadable version in the future!

Lesson learned: It takes practice to extract actionable feedback from user input and make it fit your product vision.

Summary

2020 was a crazy year. I helped launch two small side-businesses, codeprints and analysis-tools.dev.

Both have an entirely different revenue model, but they have one thing in common: they were super fun to build! 🤩 It's motivating to look back at those achievements sometimes... That print of 2020 pretty much encapsulates those feelings for me. (Note the greener spots in August and September, which is when we launched analysis-tools and the days in December when we built codeprints.)

My coding year in review using our new
vertical layout.<br />Here's to
building more products in 2021.
My coding year in review using our new vertical layout.
Here's to building more products in 2021.

Let me know if you found that post helpful and reach out if you have questions. Oh and if you're looking for a unique way to decorate your home office, why not get your own print from codeprints? 😊

P.S.: If you're a product owner and you're looking for a unique present for your team, get in contact and be the first to get an invite to a private beta.


So You Want To Earn Money With Open Source Matthias Endler

Matthias Endler2021-01-04 00:00:00

I earned 0 Euros from maintaining OSS software for years, and I thought that's the way things are. I finally looked into ways to monetize my projects last year and in this talk I want to share what I learned so far. It didn't make me rich (yet!), but I built my first sustainable side-project with analysis-tools.dev ✨.

I'll talk about this and other projects and the mistakes I made on the road towards sustainability.

Related links and resources:

Find a full transcript of the talk below. (Sorry for the wall of text.)


This is my talk about earning money with Open Source, which I gave at the Web Engineering Meetup Aachen at the end of 2020. The organizers gladly allowed me to share it on my YouTube channel. I'm basically trying to answer the question: "Why am I not making 100k on Github?". I'm talking about finding corporate sponsors for myself and the long road towards sustainability of open-source maintenance.

You might not even want to start. This is a talk for those people that have the mindset that it's probably not worth it to spend that much effort on Open Source if it takes so long until you find success. Now, this talk turned out to be a little grim. I had this very motivational talk in mind, but in reality, it's hard, and by hard, I mean it's really hard.

I just want to get this point across and maybe still motivate you to do it but first: why am I entitled to talk about this? I've been doing Open Source for 10 years over 10 years now. This is a talk dedicated to my former self maybe 15 years ago. I work at trivago, which is a hotel search company based in Düsseldorf. I have a blog at endler.dev. Like everyone and their mom, I also have a YouTube channel. It's called Hello, Rust! and I'm extremely active with one video every two years. Hence, you definitely want to subscribe to not miss any updates. But today, I want to talk about Open Source, and I have a very sophisticated outline with two points my journey and revenue models.

Let's go back all the way to 2010. The world definitely looked a bit different back then.

Github in 2010
Github in 2010

This was Github, and I was a bit late to the game. I joined in January 2010, and by then, Github was already two years old, so my username was taken. I usually go by the handle mre on platforms, and I noticed that this handle was not used by anyone, so I just sent a mail to support and asked if I could have it, and then I got an answer from this guy saying "go for it." It was Chris Wanstrath, who goes by the handle defunct, and he's the former CEO of Github, and at this point in time, I was hooked. I really liked the platform. I really liked how they worked very hands-on with Open Source. I used it for some projects of mine; you can see in the screenshot that I uploaded my blog, for example, because they host it for free. It was built with Jekyll, and you just push it to their site. Then they statically generate it, and it's done. It goes without saying that nothing has changed in the last 10 years because my blog more or less still looks like that. It's not built with jQuery and Jekyll anymore, but with zola and Cloudflare Worker Sites, but it's more or less the same thing. For preparing for this talk, I wanted to take a step back and see where I was coming from and where I am right now, and probably the best way to do it is to look up some statistics and see if the number of repositories over time would give me some insights. So I queried the Github API for that.

You can see it's pretty much a linear graph from 2010 all the way to 2020. Except for 2018, where I reached peak productivity, it seems, but oh well. In the end, it's more or less a linear thing, and you might say you put some work in you get some feedback out, but in reality, it's different. There is a compound effect. If we look at my number of stars over time, you can see that more or less it started very slowly, and now it's sort of growing exponentially, so right now, we are at 25.000 stars across all projects. Another way to look at it would be the number of followers. That's kind of a new metric to me, but I did look up some statistics from archive.org (because Github doesn't have that information through their API), and again, it's more or less exponential growth.

You put some work in, but you get a compound effect of your work plus some interest out. This is not luck; it's work. It means you know what you're doing. At the same time, there's the elephant in the room, and that is it's just a pat on the back. We have earned zero dollars until now, and one question you might have is how do you monetize this effort.

First off, is it an effort?

Well, I don't know about you, but I probably spend two or three hours on average per day on Open Source: thinking about Open Source and creating new projects, but also maintaining and code review, so it really is work, and it's a lot of work, and you more or less do that for free.

There's nothing wrong with doing things for free and doing it as a hobby, but in this case, you are supposed to be working on whatever you like. Open Source is not like that; sometimes you have obligations, and you feel responsible for maybe helping people out, which is a big part of it. You do that next to your regular work, so it can really be a burden. If you don't know by now, making this somehow valuable is hard, it's really hard. I want to talk about some ways to build a proper revenue model from Open Source. It goes without saying that this should probably not be your first focus if you saw the graphs before, but once you reach a point where you want to get some revenue, you have a couple of options. I don't want to talk about doing Open Source as part of your business, and I don't want to talk about bigger companies and more significant support here. I want to focus on a couple things that everyone can do. Sponsoring [on Github] is one. Offer paid learning materials on top of your normal documentation. For example, you might have a video series that you ask for money. Sell merchandising like Mozilla does. Consulting next to your Open Source business Services and plugins like writing an ADFS plugin or high availability functionality are very common examples for paid features targeting enterprises.

But let's start with the basics. Let's start with point number one, sponsoring. There are two types of sponsoring: the first one is individual donations. Individual sponsoring is what Github Sponsors is all about. If you want to earn money [with that model], you have to think about the funnel, and you have to think about how you capture people's attention and how you monetize that. It starts with a product, [which] can be anything. From there, you generate interest, and this interest creates an audience, and that audience eventually might pay for your service, and this is actually the entire secret. It's how you earn money with any product, and with Open Source, if you want to attract sponsors, you build a product people want.

If you transfer that to Open Source, building a project is maybe a repository, and the stars indicate the interest of the audience. The audience itself is made out of followers (personal followers or followers of a company), and those followers might or might not become sponsors in the end. Now, I know stars are a terrible metric for popularity because some people use stars differently than others. For example, some use it as bookmarks to check out projects later, others want to thank the developers for maybe putting in a lot of effort, and so on, but it's a good first estimation.

Now, think about the following. Think about the number of stars I have and the followers and the number of sponsors. Think about my "funnel" right now. I told you that I have 25.000 stars and roughly 1000 followers, and out of those, I have three sponsors, so the ratio between the stars and sponsors is 0.01. That looks pretty grim. It means you need around 8.000 stars to attract a single supporter. I was wondering: "maybe it's just me?". Maybe the top 1000 Github maintainers did not have that problem. Well, it turns out it's exactly the same schema. If you take the top 1000 Github maintainers and look at their sponsors, it's again a pretty grim picture. For example, looking at the median, you look at 3421 followers per person and a median of zero sponsors. That's zero percent if my math is correct, and if you look at the average, you even have 5430 followers (because Linus Torvalds pushes that number up). You have 2.8 sponsors out of that on average, and that is 0.5%, which is a bit more than I have, but it's roughly in the same ballpark. Now think about this: Github has 40 million users, so that means the top 1000 maintainers make up 0.0025% of the entire community. The median income of those maintainers on Github is basically zero.

That in and on itself is maybe not the biggest problem, but keep in mind that the Github revenue of 2019 was 300 million dollars. I read that comment on Hacker News yesterday:

I have sponsors on Github and rake in a cool two dollars per month. It's obviously less after taxes, so I have to have a day job.

So this is clearly not working. You have to think of different ways to monetize Open Source, or you just wait until Github Sponsors becomes more popular -- whatever happens first. One way I just want to quickly touch on is the notion of sponsorware. It's kind of a new concept, and some people haven't heard of it before. I honestly really like it. Generally speaking, you create a project, and you keep it private. You talk about it on Twitter, though or any other platform, and you say: "hey, I'm building this, but if you want early access, you have to become a sponsor," and once you reach a certain threshold of sponsored sponsors, or income or whatever. Then you make a project public. This initial example that I showed you, where someone was earning 100k on Open Source, is from someone doing just that. He's building products and services, talks about them, and then makes them open for everyone in the end.

This has some advantages: first of you get early feedback from people that really believe in your mission. Second, you don't have to work for free all the time, and third, you might also create an audience and hype from your projects. The disadvantage is that if you are a hardcore Open Source or free software believer, this goes against your ethic. You want the software to be open, to begin with, without any additional requirements. So you really have to make up your own mind about that. I tried, and I have an early access program, which I only share with sponsors. [My first sponsorware was a] tool for getting Github statistics. [The statistics from this talk were] created with that tool. I think you need a big audience to pull that off. The question is if you want to put that much effort in, or you just want to make it open in the first place and think about other revenue models. However, I think still it's a very interesting concept, and we might see that [more] in the future, so you know how it looks like now, and you have a name for it.

Another one is corporate sponsoring. This is a double-edged sword because corporate sponsoring means that a company gives you money and sometimes wants something. They might want additional support, or they want the bug to be fixed, and more or less it feels like you are somehow beginning to work for them, but nevertheless, those companies put in quite a big amount of money into Open Source these days. Looking at two big companies, Facebook and Google, they invested 177k and 845k respectively into Open Source over their lifetime on Open Collective, a platform for collecting those donations. That's really great. We need more companies doing that, but also, as a little side note and maybe as a little rant, I believe that those companies are doing way too little.

Facebook's revenue last year was 70 billion, and Google had 160 billion, which is nothing to be ashamed of, so I wonder really if this is the most they can do. Of course, Google, for example, also donated to other projects like Mozilla, and they also organize meetups and so on. But do you really think that Facebook and Google would exist today if there was no Python or web server or Linux back in the day when two Stanford students tried to build a search engine? Sometimes I feel that Fortune 500 companies really don't understand how much they depend on Open Source and how many people depend on a few people who maintain critical parts of our infrastructure.

I don't think they invest nearly enough into Open Source. What a lot of people think is that Open Source works like the panel on the left where you have a full room of engineers trying to figure out the best way to build a project, and in reality, it's more or less someone working late at night to fix bugs and doing it because they believe in it. The public perception is probably wrong, and a really small group of people who maintain critical infrastructure. Sometimes that can lead to very tricky situations. Two of my childhood heroes talked about it openly: Kenneth Reitz is the core maintainer of requests for Python and antirez is the creator of Redis, a key-value store. So one is front-end development and the other one from backend-end. They both talk about burnout here because the burden of becoming an Open Source maintainer on a big scale can very much and very quickly lead to burnout. The internet never sleeps. You never go to sleep. You always get a ticket, a feature request, a pull request, an issue. You always have something to work on, and on top of that, you have to do all your other responsibilities, so that can lead to burnout really quickly. There was one guy who I also respect deeply. His name is Mark Pilgrim. He is the author of Dive Into Python, and he once pulled a 410 for deleting everything [about him] on the internet. There's actually a term for it: infocide for "information suicide." He got fed up with the ecosystem, and if you think about the Ruby community, you might remember _why, the author of the Poignant Guide to Ruby. He did kind of the same thing. Focusing on what antirez has said, "once I started to receive money to work at Redis, it was no longer possible for my ethics to have my past pattern, so I started to force myself to work on the normal schedules. This, for me, is a huge struggle for many years. At this point, moreover, I'm sure I'm doing less than I could, because of that, but this is how things work", so it feels like he feels guilty for maybe being forced into that work schedule and maybe not performing well enough. There are some signs of burnout for me somehow, and it's that love-hate relationship of Open Source and money. If you accept money, it becomes a job, but you're not writing code most of the time. You're writing the talks, reviewing pull requests, you're looking at issues, you're answering questions on StackOverflow, you're discussing on Discord, you're marketing on YouTube or conferences. When you become popular with Open Source, then it feels like you have a choice between two options: one is depression and the other one is burnout. If your project does not become successful, then suddenly you think you're a failure, you're a mistake. It has zero stars; nobody likes it. But if it becomes a success, then everyone likes it, and you get hugged to death. That's a really unfortunate situation to be in, and you want to stop being overwhelmed with those responsibilities. You have to set clear boundaries and pick your poison. You have to be careful if you accept companies as sponsors. I want to show you one example of how it can work and [point out] some risks. Earlier this year, I started working on a real project that I had been putting off for many years before.

You see, in December 2015, I started a list of static analysis tools on Github. Static analysis tools are just tools that help you improve your code, and it turns out that there's a lot of those tools. Just starting to collect them was the first step. I didn't think much about it, but over time that became really popular. And you can see that this graph is more or less a linear increase in stars over time. In 2018, I started really thinking hard about whether there was more than just a Github project here. I talked to many people that I had this idea of building something more from that. It really took someone else to maybe push me over the finishing line and convinced me that this was worth it, and that is Jakub. He said, "why not build a website from it?" and over the course of maybe two weekends or so, we built a website. It's built with Gatsby, but it really doesn't matter. We just did it, and then we saw what happened to it. We render 500 tools right now, and the initial feedback was really great. People really seem to like that. We got a cool 720.000 requests on the first day, and over the next week or so, it more or less hit 1.5 million. That was great because suddenly people started getting interested in that project. So we started finding some sponsors. Those companies are special because they believe in your mission, but they also know how Open Source works. They don't really expect you to advertise their tool. They want to sell to developers, so they want to be in the developers' minds, saying: "Hey! You are a developer. We built this amazing tool you might want to check it out!" but they also get seen as an Open Source company. I think that's a win-win. I have to say it doesn't always go as easily. sometimes companies expect you to just have cheap advertising space. Then they jump off the moment they see you don't get that many clicks, but others understand that they invest into something that maybe pays off in a year or two from now. So I'm really thankful that some companies understand that mission. However, what companies want is different than what individuals want. Companies want an invoice. Companies want something tax-deductible. Companies want someone that keeps the lights on and is responsive via email, so you really have those obligations, and one platform that helps with that is Open Collective. They have a 501c6 program for Open Source projects that acts as a fiscal host, which means they will do all the invoicing and officially be the maintainers. If you, as an Open Source maintainer or a contributor to a project, want to get [reimbursed for your work], you have to send an invoice to open collective.

I think that's the best of both worlds. Again, because it's a very transparent process, companies are in the loop and don't have to deal with all the financial stuff. But it also means that you have to really polish your public perception. Companies really want to know what they can get out of sponsoring you, and you have to make that very clear. Probably the most important site that you have is not your website, but it's your sponsors page on Github where you describe the different tiers and what those tiers mean, so we have three tiers: One is targeted at smaller companies and freelancers. They just get exposure, and they get seen as an Open Source friendly tech company. That is a hundred dollars a month. We have a middle-tier, a company sponsor that maybe is a bigger company. They get the batch, too, but they also get a blog post about a static analysis tool that they want to promote, but we make it transparent that this is really a sponsored content. Finally, if you want to go all the way, you go to full content creation, which might be a video workshop, but we don't have video workshop sponsors yet, so I cannot talk about that yet. I have to say I really would like to try though and it's cheap really for what you get.

Anyway, those are things that you can do today. Without really changing how you work on Open Source, you can set that up, and you just see how it goes. Maybe no one reacts, and that's fine. Everything else on that list is kind of advanced. You need an audience, and so you should start with that.

Paid learning material is something that we are doing with analysis tools in the future with a video course. There are companies like tailwind that do that impressively well, so you can learn from them. For merchandising, you have to have a brand. Hence, it's not something that I could do, but someone like Mozilla or the Coding Train on YouTube could definitely do something like that. Consulting is always an option. Still, it's also a lot more work and probably takes you away from what you really love, so it really becomes a job. You have to think about whether you want to do that or not. Enterprise services are very advanced [and interesting] for maybe the one percent of projects that can be run in a business and where you have special requirements. I have to say start from the top and work your way down. Start to create an audience. It's probably easier to build an audience on Twitter and then funnel it back to Github than the other way around. Oh, by the way, did I tell you it's hard? I really don't want to end on a low note. I really want to emphasize that I would do it again, all of that if I started today. I think there's no better time to contribute to Open Source than today. Probably tomorrow will even be a better time because suddenly, way more people are interested, it's way easier to set up projects, you have all those free tools like VSCode and Github actions, free hosting. It's just amazing how much you can pull off with very little money involved. So you can try it. What's the worst thing that can happen? No one cares? Well, okay, then you're as good as me. But I have some tips for you if you want to start today. My first tip is: "do your homework." Many people start with learning, and then they build things, and then they close the circle, but there's one key piece missing here. Some people hate the word, but you learn to love it eventually. It's called marketing. Marketing means a lot of things to a lot of people, but what it means to me is getting the word out because someone else will if you don't, and you are awesome; you just have to realize that. Maybe not everyone knows [about your project] right away, so you should really talk about it more. Maybe at conferences, maybe on Twitter, maybe you can just tell your friends. Maybe you can ask people to contribute and to support you. Somehow it's frowned upon in the community that if you do marketing, you're not doing it for real, but I think that's not true. I think that if smart people and patient and passionate people did marketing, then the world would be a better place; because I'm pretty sure the evil guys do marketing. So do your homework, but rest assured that being an Open Source maintainer means running a business, and you are the product. You have to think about why someone would want to sponsor you because if you don't come up with an answer for that, how should they know. Also, think about the funnel. How will people find you, for example? The best way for people to find you is probably starting a YouTube channel.

There are easier ways, though.

[First,] you can always help out in a different project, and you don't even have to be a coder. If you are good with design, then I can tell you there are so many Open Source projects that need designers. It's crazy. Maybe start creating a logo for a small project and start getting some visibility. Another one is having fun. If you know that earning money is hard in Open Source, then that can also be liberating because it means you can experiment and you can be creative, and yeah, having fun is the most important thing, I guess.

Second, build things you love because it's your free time in the end. The chances that someone will find the project is pretty low, so it better be something that you're really interested in. If you don't believe in that, just move on to the next thing. It's fine if you drop a project that you don't believe in anymore. No one will hold you accountable for that unless they are jerks, and you don't want to be surrounded by jerks.

Third, find friendly people because you really grow with your community. You want people that support your project and maybe eventually become maintainers to ease the burden, and that takes a lot of time, sometimes years, until you find one maintainer, so always be friendly, try to put yourself in their perspective. Go the extra mile if you can. For example, reintegrate the master branch into their pull request. Just do it for them. Say thanks twice if you're unsure.

Fourth is to grow an audience.
Radical marketing is one way, but being approachable and being inclusive is another way. You want to be the guy or the girl that people go to when they have a tough question, or they want to know how to get into Open Source. You want to be the person that helps them out on their first pull request. They will pay it back a thousand times. The most exciting people I have met so far are available for questions, and they don't really ask for anything in return. You hold them very close and dear to your heart. When the time comes, you will remember those people. We will say, like, "this is an amazing person to work with; I can highly recommend them," which is called a lead.

Finally, be in it for the long run. Good things take time. You see, it took me 10 years. Maybe it takes you five or maybe even less, but it's probably not an overnight success. It's really a long-term investment.


Write code. Not too much. Mostly functions. Brandon's Website

Brandon's Website2020-12-15 00:00:00 There's a well-known quote by author Michael Pollan: "Eat food. Not too much. Mostly plants." I like it because it doesn't attempt to be dogmatic: it encapsulates some basic guiding principles that get you 90% of the way there 90% of the time. Wikipedia describes the book the quote is from (emphasis mine):

Lazy Loading YouTube Videos Posts on elder.dev

Posts on elder.dev2020-12-12 00:00:00 ⓘ Note First, let me acknowledge up-front that this is neither a novel problem nor a novel solution. This is simply what I cobbled together to fit my own needs, I thought I’d share about how this went / works. Why Lazy Load? YouTube is a pretty ubiquitous for video hosting and very easy to embed. For most videos you can just open the video on youtube.com, click “share”, click “embed”, and finally copy + paste the generated <iframe> into your page source.

My Favorite Rust Function Signature Brandon's Website

Brandon's Website2020-09-16 00:00:00 I've gotten really into writing parsers lately, and Rust has turned out to be the perfect language for that. In the course of my adventures, I came up with the following:

My Blog Just Got Faster: Cloudflare Workers and AVIF Support Matthias Endler

Matthias Endler2020-09-14 00:00:00

Did I mention that this website is fast? Oh yeah, I did, multiple times.

Few reasons (from ordinary to the first signs of creeping insanity):

  • 📄 Static site
  • ☁️ Cached on Cloudflare CDN
  • 🔗 ️HTTP/2 and HTTP/3 support
  • 🚫 No web fonts (sadly)
  • Edge-worker powered analytics (no Google Analytics)
  • 🌸 Avoiding JavaScript whenever possible; CSS covers 90% of my use-cases.
  • 🖼️ Image width and height specified in HTML to avoid page reflows.
  • 👍🏻 Inlined, optimized SVG graphics and hand-rolled CSS
  • 🚅 Static WASM search (lazy loaded)
  • 🏎️ The entire homepage is <10K (brotli-compressed), including graphics, thus should fit into the first HTTP round-trip.
  • 💟 Heck, even the favicon is optimized for size. Update: I'm using an SVG icon now thanks to this article.

Then again, it's 2020: everyone is optimizing their favicons, right? ...right!?

Well, it turns out most other sites don't think about their user's data plans as much as I do. Actually, that's an understatement: they don't care at all. But to me, lean is beautiful!

Wait, What About Images?

I prefer SVG for diagrams and illustrations. Only if it's a photo, I'll use JPEG or WebP.

To be honest with you, I never really liked WebP. The gist is that it might not even be smaller than JPEGs compressed with MozJPEG. There is a lengthy debate on the Mozilla bug tracker if you want to read more. To this day, Safari doesn't support WebP.

Hello AVIF 👋

Meet AVIF, the new next-gen image compression format. Check this out:

[ReachLightSpeed.com](https://reachlightspeed.com/blog/using-the-new-high-performance-avif-image-format-on-the-web-today/)
Source: ReachLightSpeed.com

It's already supported by Chrome 85 and Firefox 80.
Then it hit me like a hurricane 🌪️:

😲 Holy smokes, AVIF is supported by major browsers now!?
I want this for my blog!

Yes and no.

I'm using Zola for my blog, and AVIF support for Zola is not yet there, but I want it now! So I whipped up an ugly Rust script (as you do) that creates AVIF images from my old JPEG and PNG images. I keep the original raw files around just in case.

Under the hood, it calls cavif by Kornel Lesiński.

Data Savings

The results of AVIF on the blog were nothing short of impressive:

Total image size for [endler.dev/2020/sponsors](https://endler.dev/2020/sponsors)
Total image size for endler.dev/2020/sponsors

Check Your Browser

But hold on for a sec... is your browser even capable of showing AVIF?

If that reads "yup," you're all set.
If that reads "nope," then you have a few options:

  • On Firefox: Open about:config from the address bar and search for avif.
  • On Chrome: Make sure to update to the latest version.
  • On Safari: I'm not sure what you're doing with your life. Try a real browser instead. 😏

Workaround I: Fallback For Older Browsers

HTML is great in that your browser ignores unknown new syntax. So I can use the <picture> element to serve the right format to you. (Look ma, no JavaScript!)

<picture>
  <source srcset="fancy_browser.avif" />
  <source srcset="decent_browser.webp" />
  <img src="meh_browser.jpg" />
</picture>

The real thing is a bit more convoluted, but you get the idea.

Workaround II: Wrong Content-Type On Github Pages

There was one ugly problem with Github and AVIF, though: Their server returned a Content-Type: application/octet-stream header.

This meant that the images did not load on Firefox.

There is no way to fix that on my side as Github is hosting my page. Until now! I wanted to try Cloudflare's Workers Sites for a long time, and this bug finally made me switch. Basically, I run the full website as an edge worker right on the CDN; no own web server is needed. What's great about it is that the site is fast everywhere now — even in remote locations — no more roundtrips to a server.

By running an edge worker, I also gained full control over the request- and response objects. I added this gem of a snippet to intercept the worker response:

if (/\.avif$/.test(url)) {
  response.headers.set("Content-Type", "image/avif");
  response.headers.set("Content-Disposition", "inline");
}

And bam, Bob's your uncle. Firefox is happy. You can read more about modifying response objects here.

Another side-effect of Workers Sites is that a production deployment takes one minute now.

Performance Results After Moving To Cloudflare

Website response time before
Website response time before
Source: KeyCDN
Website response time after
Website response time after
Source: KeyCDN

Page size and rating before
Page size and rating before
Source: Pingdom.com
Page size and rating after
Page size and rating after
Source: Pingdom.com

I don't have to hide from a comparison with well-known sites either:

Comparison with some other blogs I read
Comparison with some other blogs I read
Source: Speedcurve

Further reading


Launching a Side Project Backed by Github Sponsors Matthias Endler

Matthias Endler2020-08-21 00:00:00

Yesterday we launched analysis-tools.dev, and boy had I underestimated the response.

It's a side project about comparing static code analysis tools. Static analysis helps improve code quality by detecting bugs in source code without even running it.

What's best about the project is that it's completely open-source. We wanted to build a product that wouldn't depend on showing ads or tracking users. Instead, we were asking for sponsors on Github — that's it. We learned a lot in the process, and if you like to do the same, keep reading!

First, Some Stats

Everyone likes business metrics. Here are some of ours:

  • The project started as an awesome list on Github in December 2015.
  • We're currently listing 470 static analysis tools.
  • Traffic grew continuously. Counting 7.5k stars and over 190 contributors at the moment.
  • 500-1000 unique users per week.
  • I had the idea to build a website for years now, but my coworker Jakub joined in May 2020 to finally make it a reality.
Github stars over time. That graph screams BUSINESS OPPORTUNITY.
Github stars over time. That graph screams BUSINESS OPPORTUNITY.
Source: star-history.t9t.io

"Why did it take five years to build a website!?", I hear you ask. Because I thought the idea was so obvious that others must have tried before and failed.

I put it off, even though nobody stepped in to fill this niche.
I put it off, even though I kept the list up-to-date for five years, just to learn about the tools out there.
You get the gist: don't put things off for too long. When ideas sound obvious, it's probably because they are.

Revenue Model

It took a while to figure out how to support the project financially. We knew what we didn't want: an SEO landfill backed by AdWords. Neither did we want to "sell user data" to trackers.

We owe it to the contributors on Github to keep all data free for everyone. How could we still build a service around it? Initially, we thought about swallowing the infrastructure costs ourselves, but we'd have no incentive to maintain the site or extend it with new features.

Github Sponsors was still quite new at that time. Yet, as soon as we realized that it was an option, it suddenly clicked: Companies that are not afraid of a comparison with the competition have an incentive to support an open platform that facilitates that. Furthermore, we could avoid bias and build a product that makes comparing objective and accessible.

Sponsoring could be the antidote to soulless growth and instead allow us to build a lean, sustainable side business. We don't expect analysis-tools.dev ever to be a full-time job. The market might be too small for that — and that's fine.

Tech

Once we had a revenue model, we could focus on the tech. We're both engineers, which helps with iterating quickly.

Initially, I wanted to build something fancy with Yew. It's a Rust/Webassembly framework and your boy likes Rust/Webassembly...

I'm glad Jakub suggested something else: Gatsby. Now, let me be honest with you: I couldn't care less about Gatsby. And that's what I said to Jakub: "I couldn't care less about Gatsby." But that's precisely the point: not being emotionally attached to something makes us focus on the job and not the tool. We get more stuff done!

From there on, it was pretty much easy going: we used a starter template, Jakub showed me how the GraphQL integration worked, and we even got to use some Rust! The site runs on Cloudflare as an edge worker built on top of Rust. (Yeah, I cheated a bit.)

Count to three, MVP!

Finding Sponsors

So we had our prototype but zero sponsors so far. What started now was (and still is) by far the hardest part: convincing people to support us.

We were smart enough not to send cold e-mails because most companies ignore them. Instead, we turned to our network and realized that developers reached out before to add their company's projects to the old static analysis list on Github.

These were the people we contacted first. We tried to keep the messages short and personal.

What worked best was a medium-sized e-mail with some context and a reminder that they contributed to the project before. We included a link to our sponsors page.

Businesses want reliable partners and a reasonable value proposal, so a prerequisite is that the sponsor page has to be meticulously polished.

Our Github Sponsors page
Our Github Sponsors page

Just like Star Wars Episode IX, we received mixed reviews: many people never replied, others passed the message on to their managers, which in turn never replied, while others again had no interest in sponsoring open-source projects in general. That's all fair game: people are busy, and sponsorware is quite a new concept.

A little rant: I'm of the opinion that tech businesses don't nearly sponsor enough compared to all the value they get from Open Source. Would your company exist if there hadn't been a free operating system like Linux or a web server like Nginx or Apache when it was founded?

There was, however, a rare breed of respondents, which expressed interest but needed some guidance. For many, it is the first step towards sponsoring any developer through Github Sponsors / OpenCollective.

It helped that we use OpenCollective as our fiscal host, which handles invoicing and donation transfers. Their docs helped us a lot when getting started.

The task of finding sponsors is never done, but it was very reassuring to hear from DeepCode - an AI-based semantic analysis service, that they were willing to take a chance on us.

Thanks to them, we could push product over the finishing line. Because of them, we can keep the site free for everybody. It also means the website is kept free from ads and trackers.

In turn, DeepCode gets exposed to many great developers that care about code quality and might become loyal customers. Also, they get recognized as an open-source-friendly tech company, which is more important than ever if you're trying to sell dev tools. Win-win!

Marketing

Jakub and I both had started businesses before, but this was the first truly open product we would build.

Phase 1: Ship early 🚀

We decided for a soft launch: deploy the site as early as possible and let the crawlers index it. The fact that the page is statically rendered and follows some basic SEO guidelines sure helped with improving our search engine rankings over time.

Phase 2: Ask for feedback from your target audience 💬

After we got some organic traffic and our first votes, we reached out to our developer friends to test the page and vote on tools they know and love. This served as an early validation, and we got some honest feedback, which helped us catch the most blatant flaws.

Phase 3: Prepare announcement post 📝

We wrote a blog post which, even if clickbaity, got the job done: Static Analysis is Broken — Let's Fix It! It pretty much captures our frustration about the space and why building an open platform is important. We could have done a better job explaining the technical differences between the different analysis tools, but that's for another day.

Phase 4: Announce on social media 🔥

Shortly before the official announcement, we noticed that the search functionality was broken (of course). Turns out, we hit the free quota limit on Algolia a biiit earlier than expected. 😅 No biggie: quick exchange with Algolia's customer support, and they moved us over to the open-source plan (which we didn't know existed). We were back on track!

Site note: Algolia customer support is top-notch. Responsive, tech-savvy, and helpful. Using Algolia turned out to be a great fit for our product. Response times are consistently in the low milliseconds and the integration with Gatsby was quick and easy.

We got quite a bit of buzz from that
tweet: 63 retweets, 86 likes and counting
We got quite a bit of buzz from that tweet: 63 retweets, 86 likes and counting

Clearly, everyone knew that we were asking for support here, but we are thankful for every single one that liked and retweeted. It's one of these situations where having a network of like-minded people can help.

As soon as we were confident that the site wasn't completely broken, we set off to announce it on Lobste.rs (2 downvotes), /r/SideProject (3 upvotes) and Hacker News (173 upvotes, 57 comments). Social media is kind of unpredictable. It helps to cater the message to each audience and stay humble, though.

The response from all of that marketing effort was nuts:

Traffic on launch day
Traffic on launch day

Perhaps unsurprisingly, the Cloudflare edge workers didn't break a sweat.

Edge worker CPU time on Cloudflare
Edge worker CPU time on Cloudflare

My boss Xoan Vilas even did a quick performance analysis and he approved. (Thanks boss!)

High fives all around!

Now what?

Of course, we'll add new features; of course, we have more plans for the future, yada yada yada. Instead, let's reflect on that milestone: a healthy little business with no ads or trackers, solely carried by sponsors. 🎉

Finally, I want you to look deep inside yourself and find your own little product to work on. It's probably right in front of your nose, and like myself, you've been putting it off for too long. Well, not anymore! The next success story is yours. So go out and build things.

Oh wait! ...before you leave, would you mind checking out analysis-tools.dev and smashing that upvote button for a few tools you like? Hey, and if you feel super generous today (or you have a fabulous employer that cares about open-source), why not check out our sponsorship page?

Jakub and me in Vienna, Austria. I'm not actually that small.
Jakub and me in Vienna, Austria. I'm not actually that small.

Beware of Async/Await Brandon's Website

Brandon's Website2020-07-29 00:00:00 Can you spot the problem with this piece of code?

What Happened To Programming In The 2010s? Matthias Endler

Matthias Endler2020-07-02 00:00:00

A while ago, I read an article titled "What Happened In The 2010s" by Fred Wilson. The post highlights key changes in technology and business during the last ten years. This inspired me to think about a much more narrow topic: What Happened To Programming In The 2010s?

🚓 I probably forgot like 90% of what actually happened. Please don't sue me. My goal is to reflect on the past so that you can better predict the future.

Where To Start?

From a mile-high perspective, programming is still the same as a decade ago:

  1. Punch program into editor
  2. Feed to compiler (or interpreter)
  3. Bleep Boop 🤖
  4. Receive output

But if we take a closer look, a lot has changed around us. Many things we take for granted today didn't exist a decade ago.

What Happened Before?

Back in 2009, we wrote jQuery plugins, ran websites on shared hosting services, and uploaded content via FTP. Sometimes code was copy-pasted from dubious forums, tutorials on blogs, or even hand-transcribed from books. Stack Overflow (which launched on 15th of September 2008) was still in its infancy. Version control was done with CVS or SVN — or not at all. I signed up for Github on 3rd of January 2010. Nobody had even heard of a Raspberry Pi (which only got released in 2012).

<a href='https://xkcd.com/2324/'>xkcd #2324</a>
Source: xkcd #2324

An Explosion Of New Programming Languages

The last decade saw the creation of a vast number of new and exciting programming languages.

Crystal, Dart, Elixir, Elm, Go, Julia, Kotlin, Nim, Rust, Swift, TypeScript all released their first stable version!

Even more exciting: all of the above languages are developed in the open now, and the source code is freely available on Github. That means, everyone can contribute to their development — a big testament to Open Source.

Each of those languages introduced new ideas that were not widespread before:

  • Strong Type Systems: Kotlin and Swift made optional null types mainstream, TypeScript brought types to JavaScript, Algebraic datatypes are common in Kotlin, Swift, TypeScript, and Rust.
  • Interoperability: Dart compiles to JavaScript, Elixir interfaces with Erlang, Kotlin with Java, and Swift with Objective-C.
  • Better Performance: Go promoted Goroutines and channels for easier concurrency and impressed with a sub-millisecond Garbage Collector, while Rust avoids Garbage Collector overhead altogether thanks to ownership and borrowing.

This is just a short list, but innovation in the programming language field has greatly accelerated.

More Innovation in Older Languages

Established languages didn't stand still either. A few examples:

C++ woke up from its long winter sleep and released C++11 after its last major release in 1998. It introduced numerous new features like Lambdas, auto pointers, and range-based loops to the language.

At the beginning of the last decade, the latest PHP version was 5.3. We're at 7.4 now. (We skipped 6.0, but I'm not ready to talk about it yet.) Along the way, it got over twice as fast. PHP is a truly modern programming language now with a thriving ecosystem.

Heck, even Visual Basic has tuples now. (Sorry, I couldn't resist.)

Faster Release Cycles

Most languages adopted a quicker release cycle. Here's a list for some popular languages:

LanguageCurrent release cycle
Cirregular
C#~ 12 months
C++~ 3 years
Go6 months
Java6 months
JavaScript (ECMAScript)12 months
PHP12 months
Python12 months
Ruby12 months
Rust6 weeks (!)
Swift6 months
Visual Basic .NET~ 24 months

The Slow Death Of Null

Close to the end of the last decade, in a talk from 25thof August 2009, Tony Hoare described the null pointer as his Billion Dollar Mistake.

A study by the Chromium project found that 70% of their serious security bugs were memory safety problems (same for Microsoft). Fortunately, the notion that our memory safety problem isn't bad coders has finally gained some traction.
Many mainstream languages embraced safer alternatives to null: nullable types, Option, and Result types. Languages like Haskell had these features before, but they only gained popularity in the 2010s.

Revenge of the Type System

Closely related is the debate about type systems. The past decade has seen type systems make their stage comeback; TypeScript, Python, and PHP (just to name a few) started to embrace type systems.

The trend goes towards type inference: add types to make your intent clearer for other humans and in the face of ambiguity — otherwise, skip them. Java, C++, Go, Kotlin, Swift, and Rust are popular examples with type inference support. I can only speak for myself, but I think writing Java has become a lot more ergonomic in the last few years.

Exponential Growth Of Libraries and Frameworks

As of today, npm hosts 1,330,634 packages. That's over a million packages that somebody else is maintaining for you. Add another 160,488 Ruby gems, 243,984 Python projects, and top it off with 42,547 Rust crates.

Number of packages for popular programming languages.<br /> Don't ask me what happened to npm in 2019.
Number of packages for popular programming languages.
Don't ask me what happened to npm in 2019.
Source: Module Counts

Of course, there's the occasional leftpad, but it also means that we have to write less library code ourselves and can focus on business value instead. On the other hand, there are more potential points of failure, and auditing is difficult. There is also a large number of outdated packages. For a more in-depth discussion, I recommend the Census II report by the Linux Foundation & Harvard [PDF].

We also went a bit crazy on frontend frameworks:

No Free Lunch

A review like this wouldn't be complete without taking a peek at Moore's Law. It has held up surprisingly well in the last decade:

<a href='https://en.wikipedia.org/wiki/Moore%27s_law'>Wikipedia</a>
Source: Wikipedia

There's a catch, though. Looking at single-core performance, the curve is flattening:

<a href='https://www.youtube.com/watch?v=Azt8Nc-mtKM&'>Standford University: The Future of Computing (video)</a>
Source: Standford University: The Future of Computing (video)

The new transistors prophesied by Moore don’t make our CPUs faster but instead add other kinds of processing capabilities like more parallelism or hardware encryption. There is no free lunch anymore. Engineers have to find new ways of making their applications faster, e.g. by embracing concurrent execution.

Callbacks, coroutines, and eventually async/await are becoming industry standards.

GPUs (Graphical Processing Units) became very powerful, allowing for massively parallel computations, which caused a renaissance of Machine Learning for practical use-cases:

Deep learning becomes feasible, which leads to machine learning becoming integral to many widely used software services and applications. — Timeline of Machine Learning on Wikipedia

Compute is ubiquitous, so in most cases, energy efficiency plays a more prominent role now than raw performance (at least for consumer devices).

Unlikely Twists Of Fate

Learnings

If you're now thinking: Matthias, you totally forgot X, then I brought that point home. This is not even close to everything that happened. You'd roughly need a decade to talk about all of it.

Personally, I'm excited about the next ten years. Software is eating the world — at an ever-faster pace.


Tips for Faster Rust Compile Times Matthias Endler

Matthias Endler2020-06-21 00:00:00

When it comes to runtime performance, Rust is one of the fastest guns in the west. 🔫 It is on par with the likes of C and C++ and sometimes even surpasses those. Compile times, however? That's another story.

Below is a list of tips and tricks on how to make your Rust project compile faster today. They are roughly ordered by practicality, so start at the top and work your way down until you're happy and your compiler goes brrrrrrr.

Table of Contents

Why Is Rust Compilation Slow?

Wait a sec, slow in comparison to what? That is, if you compare Rust with Go, the Go compiler is doing a lot less work in general. For example, it lacks support for generics and macros. On top of that, the Go compiler was built from scratch as a monolithic toolchain consisting of both, the frontend and the backend (rather than relying on, say, LLVM to take over the backend part, which is the case for Rust or Swift). This has advantages (more flexibility when tweaking the entire compilation process, yay) and disadvantages (higher overall maintenance cost and fewer supported architectures).

In general, comparing across different programming languages makes little sense and overall, the Rust compiler is legitimately doing a great job. That said, above a certain project size, the compile times are... let's just say they could be better.

If you like to know what's slowing down your builds, run

cargo build --timings

This will generate a report on how much time was spent on each step involved in compiling your program. Here's the output:

A run of cargo-timings
A run of cargo-timings
Source: Mara Bos via Twitter

Why Bother?

According to the Rust 2019 survey, improving compile times is #4 on the Rust wishlist:

Rust Survey results 2019. (<a href='https://xkcd.com/303/'>Obligatory xkcd</a>.)
Rust Survey results 2019. (Obligatory xkcd.)

Compile-Time vs Runtime Performance

As is often cautioned in debates among their designers, programming language design is full of tradeoffs. One of those fundamental tradeoffs is runtime performance vs. compile-time performance, and the Rust team nearly always (if not always) chose runtime over compile-time.
Brian Anderson

Overall, there are a few features and design decisions that limit Rust compilation speed:

  • Macros: Code generation with macros can be quite expensive.
  • Type checking
  • Monomorphization: this is the process of generating specialized versions of generic functions. E.g., a function that takes an Into<String> gets converted into one that takes a String and one that takes a &str.
  • LLVM: that's the default compiler backend for Rust, where a lot of the heavy-lifting (like code-optimizations) takes place. LLVM is notorious for being slow.
  • Linking: Strictly speaking, this is not part of compiling but happens right after. It "connects" your Rust binary with the system libraries. cargo does not explicitly mark the linking step, so many people add it to the overall compilation time.

If you're interested in all the gory details, check out this blog post by Brian Anderson.

Update The Rust Compiler And Toolchain

Making the Rust compiler faster is an ongoing process, and many fearless people are working on it. Thanks to their hard work, compiler speed has improved 30-40% across the board year-to-date, with some projects seeing up to 45%+ improvements.

So make sure you use the latest Rust version:

rustup update

On top of that, Rust tracks compile regressions on a website dedicated to performance. Work is also put into optimizing the LLVM backend. Rumor has it that there's still a lot of low-hanging fruit. 🍇

Use cargo check Instead Of cargo build

Most of the time, you don't even have to compile your project at all; you just want to know if you messed up somewhere. Whenever you can, skip compilation altogether. What you need instead is laser-fast code linting, type- and borrow-checking.

For that, cargo has a special treat for you: ✨ cargo check ✨. Consider the differences in the number of instructions between cargo check on the left and cargo debug in the middle. (Pay attention to the different scales.)

Speedup factors: check 1, debug 5, opt 20
Speedup factors: check 1, debug 5, opt 20

A sweet trick I use is to run it in the background with cargo watch. This way, it will cargo check whenever you change a file.

Pro-tip: Use cargo watch -c to clear the screen before every run.

Use Rust Analyzer Instead Of Rust Language Server (RLS)

Another quick way to check if you set the codebase on fire is to use a "language server". That's basically a "linter as a service", that runs next to your editor.

For a long time, the default choice here was RLS, but lately, folks moved over to rust-analyzer, because it's more feature-complete and way more snappy. It supports all major IDEs. Switching to that alone might save your day.

Remove Unused Dependencies

So let's say you tried all of the above and find that compilation is still slow. What now?

Dependencies sometimes become obsolete thanks to refactoring. From time to time it helps to check if all of them are still needed to save compile time.

If this is your own project (or a project you like to contribute to), do a quick check if you can toss anything with cargo-udeps:

cargo install cargo-udeps && cargo +nightly udeps

There also is a newer tool called cargo-machete, which does the same thing but does not require a nightly compiler. It also works better with workspaces.

Update Remaining Dependencies

Next, update your dependencies, because they themselves could have tidied up their dependency tree lately.

Take a deep dive with cargo-outdated or cargo tree (built right into cargo itself) to find any outdated dependencies. On top of that, use cargo audit to get notified about any vulnerabilities which need to be addressed, or deprecated crates which need a replacement.

Here's a nice workflow that I learned from /u/oherrala on Reddit:

  1. Run cargo update to update to the latest semver compatible version.
  2. Run cargo outdated -wR to find newer, possibly incompatible dependencies. Update those and fix code as needed.
  3. Find duplicate versions of a dependency and figure out where they come from: cargo tree --duplicate shows dependencies which come in multiple versions.
    (Thanks to /u/dbdr for pointing this out.)

Pro-tip: Step 3 is a great way to contribute back to the community! Clone the repository and execute steps 1 and 2. Finally, send a pull request to the maintainers.

Replace Heavy Dependencies

From time to time, it helps to shop around for more lightweight alternatives to popular crates.

Again, cargo tree is your friend here to help you understand which of your dependencies are quite heavy: they require many other crates, cause excessive network I/O and slow down your build. Then search for lighter alternatives.

Also, cargo-bloat has a --time flag that shows you the per-crate build time. Very handy!

Here are a few examples:

Here's an example where switching crates reduced compile times from 2:22min to 26 seconds.

Use Cargo Workspaces

Cargo has that neat feature called workspaces, which allow you to split one big crate into multiple smaller ones. This code-splitting is great for avoiding repetitive compilation because only crates with changes have to be recompiled. Bigger projects like servo and vector are using workspaces heavily to slim down compile times. Learn more about workspaces here.

Use Cargo Nextest For Faster Test Execution

It's nice that cargo comes with its own little test runner, but especially if you have to build multiple test binaries, cargo nextest can be up to 60% faster than cargo test thanks to its parallel execution model. Here are some quick benchmarks:

Projectcargo test (s)nextest (s)Difference
meilisearch41.0420.62-49.8%
rust-analyzer6.765.23-22.6%
tokio27.1611.72-56.8%

You can try it with

cargo install cargo-nextest
cargo nextest run

Combine All Integration Tests In A Single Binary

Have any integration tests? (These are the ones in your tests folder.) Did you know that the Rust compiler will create a binary for every single one of them? And every binary will have to be linked individually. This can take most of your build time because linking is slooow. 🐢 The reason is that many system linkers (like ld) are single threaded.

👨‍🍳️💡‍️ A linker is a tool that combines the output of a compiler and mashes that into one executable you can run.

To make the linker's job a little easier, you can put all your tests in one crate. (Basically create a main.rs in your test folder and add your test files as mod in there.)

Then the linker will go ahead and build a single binary only. Sounds nice, but careful: it's still a trade-off as you'll need to expose your internal types and functions (i.e. make them pub).

Might be worth a try, though because a recent benchmark revealed a 1.9x speedup for one project.

This tip was brought to you by Luca Palmieri, Lucio Franco, and Azriel Hoh. Thanks!

Disable Unused Features Of Crate Dependencies

Check the feature flags of your dependencies. A lot of library maintainers take the effort to split their crate into separate features that can be toggled off on demand. Maybe you don't need all the default functionality from every crate?

For example, tokio has a ton of features that you can disable if not needed.

Another example is bindgen, which enables clap support by default for its binary usage. This isn't needed for library usage, which is the common use-case. Disabling that feature improved compile time of rust-rocksdb by ~13s and ~9s for debug and release builds respectively. Thanks to reader Lilian Anatolie Moraru for mentioning this.

⚠️ Fair warning: it seems that switching off features doesn't always improve compile time. (See tikv's experiences here.) It may still be a good idea for improving security by reducing the code's attack surface.

A quick way to list all features of a crate is cargo-feature-set. As of recently, you also get a list of features of a crate when installing it with cargo add.

If you want to look up the feature flags of a crate, they are listed on docs.rs. E.g. check out tokio's feature flags.

Use A Ramdisk For Compilation

💾 Skip this tip if you're using an SSD.

When starting to compile heavy projects, I noticed that I was throttled on I/O. The reason was that I kept my projects on a measly HDD. A more performant alternative would be SSDs, but if that's not an option, don't throw in the sponge just yet.

Ramdisks to the rescue! These are like "virtual harddisks" that live in system memory.

User moschroe_de shared the following snippet over on Reddit, which creates a ramdisk for your current Rust project (on Linux):

mkdir -p target && \
sudo mount -t tmpfs none ./target && \
cat /proc/mounts | grep "$(pwd)" | sudo tee -a /etc/fstab

On macOS, you could probably do something similar with this script. I haven't tried that myself, though.

Cache Dependencies With sccache

Another neat project is sccache by Mozilla, which caches compiled crates to avoid repeated compilation.

I had this running on my laptop for a while, but the benefit was rather negligible, to be honest. It works best if you work on a lot of independent projects that share dependencies (in the same version). A common use-case is shared build servers.

Cranelift – The Alternative Rust Compiler

Lately, I was excited to hear that the Rust project is using an alternative compiler that runs in parallel with rustc for every CI build: Cranelift, also called CG_CLIF.

Here is a comparison between rustc and Cranelift for some popular crates (blue means better):

LLVM compile time comparison between rustc and cranelift in favor of cranelift
LLVM compile time comparison between rustc and cranelift in favor of cranelift

Somewhat unbelieving, I tried to compile vector with both compilers.

The results were astonishing:

  • Rustc: 5m 45s
  • Cranelift: 3m 13s

I could really notice the difference! What's cool about this is that it creates fully working executable binaries. They won't be optimized as much, but they are great for local development.

A more detailed write-up is on Jason Williams' page, and the project code is on Github.

Switch To A Faster Linker

  • 🐧 Linux users: Try mold
  • 🍎 Apple users: Try zld
  • 🪟 Windows users: 🤷

The thing that nobody seems to target is linking time. For me, when using something with a big dependency tree like Amethyst, for example linking time on my fairly recent Ryzen 7 1700 is ~10s each time, even if I change only some minute detail only in my code. — /u/Almindor on Reddit

You can check how long your linker takes by running the following commands:

cargo clean
cargo +nightly rustc --bin <your_binary_name> -- -Z time-passes

It will output the timings of each step, including link time:

...
time:   0.000   llvm_dump_timing_file
time:   0.001   serialize_work_products
time:   0.002   incr_comp_finalize_session_directory
time:   0.004   link_binary_check_files_are_writeable
time:   0.614   run_linker
time:   0.000   link_binary_remove_temps
time:   0.620   link_binary
time:   0.622   link_crate
time:   0.757   link
time:   3.836   total
    Finished dev [unoptimized + debuginfo] target(s) in 42.75s

If the link steps account for a big percentage of the build time, consider switching over to a different linker. There are quite a few options.

According to the official documentation, "LLD is a linker from the LLVM project that is a drop-in replacement for system linkers and runs much faster than them. [..] When you link a large program on a multicore machine, you can expect that LLD runs more than twice as fast as the GNU gold linker. Your mileage may vary, though."

If you're on Linux you can switch to lld like so:

[target.x86_64-unknown-linux-gnu]
rustflags = [
    "-C", "link-arg=-fuse-ld=lld",
]

A word of caution: lld might not be working on all platforms yet. At least on macOS, Rust support seems to be broken at the moment, and the work on fixing it has stalled (see rust-lang/rust#39915).

Update: I recently learned about another linker called mold, which claims a massive 12x performance bump over lld. Compared to GNU gold, it's said to be more than 50x. Would be great if anyone could verify and send me a message.

Update II: Aaand another one called zld, which is a drop-in replacement for Apple's ld linker and is targeting debug builds. [Source]

Update III: zld is deprecated. The author recommends using lld instead. You can read up on the backstory here.

The zld benchmarks are quite impressive.
The zld benchmarks are quite impressive.

Which one you want to choose depends on your requirements. Which platforms do you need to support? Is it just for local testing or for production usage?

mold is optimized for Linux, zld only works on macOS. For production use, lld might be the most mature option.

Faster Incremental Debug Builds On Macos

Rust 1.51 added an interesting flag for faster incremental debug builds on macOS. It can make debug builds up to seconds faster (depending on your use-case). Just add this to your Cargo.toml:

[profile.dev]
split-debuginfo = "unpacked"

Some engineers report that this flag alone reduces compilation times on macOS by 70%.

The flag might become the standard for macOS soon. It is already the default on nightly.

Tweak More Codegen Options / Compiler Flags

Rust comes with a huge set of settings for code generation. It can help to look through the list and tweak the parameters for your project.

There are many gems in the full list of codegen options. For inspiration, here's bevy's config for faster compilation.

Profile Compile Times

If you like to dig deeper than cargo --timings, Rust compilation can be profiled with cargo rustc -- -Zself-profile. The resulting trace file can be visualized with a flamegraph or the Chromium profiler:

Image of Chrome profiler with all crates
Image of Chrome profiler with all crates
Source: Rust Lang Blog

Another golden one is cargo-llvm-lines, which shows the number of lines generated and objects copied in the LLVM backend:

$ cargo llvm-lines | head -20

  Lines        Copies         Function name
  -----        ------         -------------
  30737 (100%)   1107 (100%)  (TOTAL)
   1395 (4.5%)     83 (7.5%)  core::ptr::drop_in_place
    760 (2.5%)      2 (0.2%)  alloc::slice::merge_sort
    734 (2.4%)      2 (0.2%)  alloc::raw_vec::RawVec<T,A>::reserve_internal
    666 (2.2%)      1 (0.1%)  cargo_llvm_lines::count_lines
    490 (1.6%)      1 (0.1%)  <std::process::Command as cargo_llvm_lines::PipeTo>::pipe_to
    476 (1.5%)      6 (0.5%)  core::result::Result<T,E>::map
    440 (1.4%)      1 (0.1%)  cargo_llvm_lines::read_llvm_ir
    422 (1.4%)      2 (0.2%)  alloc::slice::merge
    399 (1.3%)      4 (0.4%)  alloc::vec::Vec<T>::extend_desugared
    388 (1.3%)      2 (0.2%)  alloc::slice::insert_head
    366 (1.2%)      5 (0.5%)  core::option::Option<T>::map
    304 (1.0%)      6 (0.5%)  alloc::alloc::box_free
    296 (1.0%)      4 (0.4%)  core::result::Result<T,E>::map_err
    295 (1.0%)      1 (0.1%)  cargo_llvm_lines::wrap_args
    291 (0.9%)      1 (0.1%)  core::char::methods::<impl char>::encode_utf8
    286 (0.9%)      1 (0.1%)  cargo_llvm_lines::run_cargo_rustc
    284 (0.9%)      4 (0.4%)  core::option::Option<T>::ok_or_else

Avoid Procedural Macro Crates

Procedural macros are the hot sauce of Rust development: they burn through CPU cycles so use with care (keyword: monomorphization).

Update: Over on Twitter Manish pointed out that "the reason proc macros are slow is that the (excellent) proc macro infrastructure – syn and friends – are slow to compile. Using proc macros themselves does not have a huge impact on compile times." (This might change in the future.)

Manish goes on to say

This basically means that if you use one proc macro, the marginal compile time cost of adding additional proc macros is insignificant. A lot of people end up needing serde in their deptree anyway, so if you are forced to use serde, you should not care about proc macros.

If you are not forced to use serde, one thing a lot of folks do is have serde be an optional dependency so that their types are still serializable if necessary.

If you heavily use procedural macros in your project (e.g., if you use serde), it might be worth it to play around with opt-levels in your Cargo.toml.

[profile.dev.build-override]
opt-level = 3

As reader jfmontanaro mentioned on Github:

I think the reason it helps with build times is because it only applies to build scripts and proc-macros. Build scripts and proc-macros are unique because during a normal build, they are not only compiled but also executed (and in the case of proc-macros, they can be executed repeatedly). When your project uses a lot of proc-macros, optimizing the macros themselves can in theory save a lot of time.

Another approach is to try and sidestep the macro impact on compile times with watt, a tool that offloads macro compilation to Webassembly.

From the docs:

By compiling macros ahead-of-time to Wasm, we save all downstream users of the macro from having to compile the macro logic or its dependencies themselves.

Instead, what they compile is a small self-contained Wasm runtime (~3 seconds, shared by all macros) and a tiny proc macro shim for each macro crate to hand off Wasm bytecode into the Watt runtime (~0.3 seconds per proc-macro crate you depend on). This is much less than the 20+ seconds it can take to compile complex procedural macros and their dependencies.

Note that this crate is still experimental.

(Oh, and did I mention that both, watt and cargo-llvm-lines were built by David Tolnay, who is a frickin' steamroller of an engineer?)

Get Dedicated Hardware

If you reached this point, the easiest way to improve compile times even more is probably to spend money on top-of-the-line hardware.

Perhaps a bit surprisingly, the fastest machines for Rust compiles seem to be Apple machines with an M1 chip:

Rik Arends on Twitter
Rik Arends on Twitter

The benchmarks for the new Macbook Pro with M1 Max are absolutely ridiculous — even in comparison to the already fast M1:

ProjectM1 MaxM1 Air
Deno6m11s11m15s
MeiliSearch1m28s3m36s
bat43s1m23s
hyperfine23s42s
ripgrep16s37s

That's a solid 2x performance improvement.

But if you rather like to stick to Linux, people also had great success with a multicore CPU like an AMD Ryzen Threadripper and 32 GB of RAM.

On portable devices, compiling can drain your battery and be slow. To avoid that, I'm using my machine at home, a 6-core AMD FX 6300 with 12GB RAM, as a build machine. I can use it in combination with Visual Studio Code Remote Development.

Compile in the Cloud

If you don't have a dedicated machine yourself, you can offload the compilation process to the cloud instead.
Gitpod.io is superb for testing a cloud build as they provide you with a beefy machine (currently 16 core Intel Xeon 2.80GHz, 60GB RAM) for free during a limited period. Simply add https://gitpod.io/# in front of any Github URL. Here is an example for one of my Hello Rust episodes.

Gitpod has a neat feature called prebuilds. From their docs:

Whenever your code changes (e.g. when new commits are pushed to your repository), Gitpod can prebuild workspaces. Then, when you do create a new workspace on a branch, or Pull/Merge Request, for which a prebuild exists, this workspace will load much faster, because all dependencies will have been already downloaded ahead of time, and your code will be already compiled.

Especially when reviewing pull requests, this could give you a nice speedup. Prebuilds are quite customizable; take a look at the .gitpod.yml config of nushell to get an idea.

Download ALL The Crates

If you have a slow internet connection, a big part of the initial build process is fetching all those shiny crates from crates.io. To mitigate that, you can download all crates in advance to have them cached locally. criner does just that:

git clone https://github.com/the-lean-crate/criner
cd criner
cargo run --release -- mine

The archive size is surprisingly reasonable, with roughly 50GB of required disk space (as of today).

Bonus: Speed Up Rust Docker Builds 🐳

Building Docker images from your Rust code? These can be notoriously slow, because cargo doesn't support building only a project's dependencies yet, invalidating the Docker cache with every build if you don't pay attention. cargo-chef to the rescue! ⚡

cargo-chef can be used to fully leverage Docker layer caching, therefore massively speeding up Docker builds for Rust projects. On our commercial codebase (~14k lines of code, ~500 dependencies) we measured a 5x speed-up: we cut Docker build times from ~10 minutes to ~2 minutes.

Here is an example Dockerfile if you're interested:

# Step 1: Compute a recipe file
FROM rust as planner
WORKDIR app
RUN cargo install cargo-chef
COPY . .
RUN cargo chef prepare --recipe-path recipe.json

# Step 2: Cache project dependencies
FROM rust as cacher
WORKDIR app
RUN cargo install cargo-chef
COPY --from=planner /app/recipe.json recipe.json
RUN cargo chef cook --release --recipe-path recipe.json

# Step 3: Build the binary
FROM rust as builder
WORKDIR app
COPY . .
# Copy over the cached dependencies from above
COPY --from=cacher /app/target target
COPY --from=cacher /usr/local/cargo /usr/local/cargo
RUN cargo build --release --bin app

# Step 4:
# Create a tiny output image.
# It only contains our final binary.
FROM rust as runtime
WORKDIR app
COPY --from=builder /app/target/release/app /usr/local/bin
ENTRYPOINT ["/usr/local/bin/app"]

cargo-chef can help speed up your continuous integration with Github Actions or your deployment process to Google Cloud.

Drastic Measures: Overclock Your CPU? 🔥

⚠️ Warning: You can damage your hardware if you don't know what you are doing. Proceed at your own risk.

Here's an idea for the desperate. Now I don't recommend that to everyone, but if you have a standalone desktop computer with a decent CPU, this might be a way to squeeze out the last bits of performance.

Even though the Rust compiler executes a lot of steps in parallel, single-threaded performance is still quite relevant.

As a somewhat drastic measure, you can try to overclock your CPU. (I owe you some benchmarks from my machine.)

Speeding Up Your CI Builds

If you collaborate with others on a Rust project, chances are you use some sort of continuous integration like Github Actions. Optimizing a CI build processes is a whole subject on its own. Thankfully Aleksey Kladov (matklad) collected a few tips on his blog. He touches on bors, caching, splitting build steps, disabling compiler features like incremental compilation or debug output, and more. It's a great read and you can find it here.

Upstream Work

Making the Rust compiler faster is an ongoing process, and many fearless people are working on it. Thanks to their hard work, compiler speed has improved 30-40% across the board year-to-date, with some projects seeing up to 45%+ improvements. On top of that, Rust tracks compile regressions on a website dedicated to performance

Work is also put into optimizing the LLVM backend. Rumor has it that there's still a lot of low-hanging fruit. 🍇

The Rust team is also continuously working to make the compiler faster. Here's an extract of the 2020 survey:

One continuing topic of importance to the Rust community and the Rust team is improving compile times. Progress has already been made with 50.5% of respondents saying they felt compile times have improved. This improvement was particularly pronounced with respondents with large codebases (10,000 lines of code or more) where 62.6% citing improvement and only 2.9% saying they have gotten worse. Improving compile times is likely to be the source of significant effort in 2021, so stay tuned!

Help Others: Upload Leaner Crates For Faster Build Times

cargo-diet helps you build lean crates that significantly reduce download size (sometimes by 98%). It might not directly affect your own build time, but your users will surely be thankful. 😊

More Resources

What's Next?

My company, corrode, can help you with performance problems and reducing your build times. Reach out here.

Phew! That was a long list. 😅 If you have any additional tips, please let me know.

If compiler performance is something you're interested in, why not collaborate on a tool to see what user code is causing rustc to use lots of time?

Also, once you're done optimizing your build times, how about optimizing runtimes next? My friend Pascal Hertleif has a nice article on that.


Does computing make the world better? Brandon's Website

Brandon's Website2020-06-21 00:00:00 I was reading an article the other day about some recent headway being made in type theory. It was exciting stuff; stuff that may one day make systems faster, more reliable, more expressive. It felt very important in a vague way. But it also got me thinking: is this really going to make the world a better place than it was before?

Gravity Matthias Endler

Matthias Endler2020-05-29 00:00:00

Here's a test to show your age:

Do you still remember that funny JavaScript gravity effect, which Google used on their homepage ten years ago? This one?

I wanted to have some fun and integrated it into a website I was building. Unfortunately, it didn't work out-of-the-box. It choked on some DOM elements that were not strictly classes (like SVG elements). So, in good hacker fashion, I quickly patched up the script (it's just a three-line change), and now it's back to its former glory.

Test it here! (Caution: you'll have to reload the page after that. 😏)

Apply Gravity

Anyway, feel free to add it to your own sites and have some fun. It's also great to prank your friends. Simply add that single line to any website and weeee!

<script
  type="text/javascript"
  src="https://endler.dev/2020/gravity/gravity.js"
></script>

Sometimes I miss those simple times of the early web...


Write Libraries, Not Frameworks Brandon's Website

Brandon's Website2020-05-08 00:00:00 Normally when I write about something on here I take the time to fully think-out a point, make a case, address all the major sides of the issue that I can think of, etc.

Hackers' Folklore Matthias Endler

Matthias Endler2020-04-24 00:00:00

Some computer terms have a surprising legacy. Many of them are derived from long-obsolete technologies. This post tries to dust off the exciting history of some of these terms that we use every day but aren't quite sure about their origins. Let's jump right in!

Bike-Shedding

Today's meaning: A pointless discussion about trivial issues.

The term bike-shed effect or bike-shedding was coined as a metaphor to illuminate the law of triviality; it was popularised in the Berkeley Software Distribution community by the Danish computer developer Poul-Henning Kamp in 1999 on the FreeBSD mailing list and has spread from there to the whole software industry.

The concept was first presented as a corollary of his broader "Parkinson's law" spoof of management. He dramatizes this "law of triviality" with the example of a committee's deliberations on an atomic reactor, contrasting it to deliberations on a bicycle shed. As he put it: "The time spent on any item of the agenda will be in inverse proportion to the sum of money involved."

A reactor is so vastly expensive and complicated that an average person cannot understand it, so one assumes that those who work on it understand it. On the other hand, everyone can visualize a cheap, simple bicycle shed, so planning one can result in endless discussions because everyone involved wants to add a touch and show personal contribution.
Reference - Wikipedia: Law of Triviality

Boilerplate

An old machine that bended steel
plates to water boilers.
An old machine that bended steel plates to water boilers.
Source: Wikimedia Commons

Today's meaning: A chunk of code that is copied over and over again with little or no changes made to it in the process.

Boiler plate originally referred to the rolled steel used to make water boilers but is used in the media to refer to hackneyed or unoriginal writing. The term refers to the metal printing plates of pre-prepared text such as advertisements or syndicated columns that were distributed to small, local newspapers. These printing plates came to be known as 'boilerplates' by analogy. One large supplier to newspapers of this kind of boilerplate was the Western Newspaper Union, which supplied "ready-to-print stories [which] contained national or international news" to papers with smaller geographic footprints, which could include advertisements pre-printed next to the conventional content.

References:

The man in the foreground is holding
a rounded printing plate. Plates like this were provided by companies such as
Western Newspaper Union to many smaller newspapers.
The man in the foreground is holding a rounded printing plate. Plates like this were provided by companies such as Western Newspaper Union to many smaller newspapers.
Source: Wikimedia Commons

Boot / Reboot / Bootstrapping

Lithography of Baron M&uuml;nchhausen
pulling himself out of a swamp by his pigtail
Lithography of Baron Münchhausen pulling himself out of a swamp by his pigtail
Source: Wikimedia

The term boot is used in the context of computers to refer to the process of starting a computer.

In compiler development, the term bootstrapping refers to the process of rewriting a compiler in a new language: The first compiler is written in an existing language. Then it gets rewritten in the new language and compiled by itself.

The saying "to pull oneself up by one's bootstraps" dates back to the 19th century. Tall boots may have a tab, loop or handle at the top allowing one to help pulling them on. The metaphor spawned additional metaphors for self-sustaining processes that proceed without external help.

According to Wikipedia,

The idiom dates at least to 1834, when it appeared in the Workingman's Advocate: "It is conjectured that Mr. Murphee will now be enabled to hand himself over the Cumberland river or a barn yard fence by the straps of his boots."

There's also a nice summary in Merriam-Webster.

Bug

Today's meaning: A defect in a piece of code or hardware.

The origins are unknown!

Contrary to popular belief it predates the bug found by Grace Hopper in the Mark II computer.

The term was used by engineers way before that; at least since the 1870s. It predates electronic computers and computer software. Thomas Edison used the term "bug" in his notes. Reference

Bit

The term's invention is credited to John W. Tukey, who in a memo written for Bell Labs on January 9, 1947, had shortened "binary information digit" to "bit". Reference

Byte

The term "byte" was first introduced by Werner Buchholz in June 1956. This was during the initial design stage for the IBM Stretch computer. The computer had a design that enabled addressing down to the individual bit and allowed variable field length instructions, with the size of the byte encoded into the instruction itself. The choice of spelling as "byte" instead of "bite" was intentional to prevent any accidental alteration to "bit".

Carriage Return and Line Feed

Today's meaning: Set the cursor to the beginning of the next line.

These two terms were adopted from typewriters.

The carriage holds the paper and is moving from left to right to advance the typing position as the keys are pressed. It "carries" the paper with it. The carriage return is the operation when the carriage gets moved into its original position on the very left end side of the paper.

Simply returning the carriage to the left is not enough to start with a new line, however. The carriage would still be on the same line than before — just at the beginning of the line. To go to a new line, a line feed was needed. It would move the paper inside the typewriter up by one line.

These two operations — carriage return (CR) and line feed (LF) — were commonly done at once by pushing the carriage return lever.

A mechanical typewriter. The lever for the carriage return is
on the outer left side.
A mechanical typewriter. The lever for the carriage return is on the outer left side.
Source: Source: piqsels
  • On Unix systems (like Linux or macOS), a \n still stands for a
    line feed (ASCII symbol: LF) or newline.
  • On CP/M, DOS, and Windows, \r\n is used, where \r stands for carriage return and \n stands for line feed (CR+LF).
  • Reference

Here is an old video that shows the basic mechanics of carriage return and line-feed:

Command key symbol (⌘)

Today's meaning: A meta-key available on Apple computers to provide additional keyboard combinations.

Directly quoting Wikipedia (emphasis mine):

The ⌘ symbol came into the Macintosh project at a late stage. The development team originally went for their old Apple key, but Steve Jobs found it frustrating when "apples" filled up the Mac's menus next to the key commands, because he felt that this was an over-use of the company logo. He then opted for a different key symbol. With only a few days left before deadline, the team's bitmap artist Susan Kare started researching for the Apple logo's successor. She was browsing through a symbol dictionary when she came across the cloverleaf-like symbol, commonly used in Nordic countries as an indicator of cultural locations and places of interest (it is the official road sign for tourist attraction in Denmark, Finland, Iceland, Norway, and Sweden and the computer key has often been called Fornminne — ancient monument — by Swedish Mac users and Seværdighedstegn by Danish users). When she showed it to the rest of the team, everyone liked it, and so it became the symbol of the 1984 Macintosh command key. Susan Kare states that it has since been told to her that the symbol had been picked for its Scandinavian usage due to its resembling the shape of a square castle with round corner towers as seen from above looking down, notably Borgholm Castle.

Norwegian Severdighet road sign
Norwegian Severdighet road sign
Source: Wikimedia Commons
Aearial view of Borgholm Castle, which could have been the model for the symbol
Aearial view of Borgholm Castle, which could have been the model for the symbol
Source: Wikimedia Commons

References:

Today's meaning: A small piece of data sent from a website and stored in the user's web browser.

The term cookie was coined by 23-year-old web browser programmer Lou Montulli in the fall of 1994. It was inspired by the term magic cookie, which is a packet of data a program receives and sends back unchanged, used by Unix programmers. This term in turn derives from the fortune cookie, which is a cookie with an embedded message.

Montulli used the term cookie to describe the small packets of data that the web browser receives and sends back unchanged to the web server.

"So, yeah, the cookie," Montulli says with a laugh. "It's one week of my life that turned into the most important thing that I ever did." (Reference)

Core Dump

Today's meaning: Retrieving a snapshot of a (crashed) program's state by storing all of its memory for offline analysis.

The name comes from magnetic core memory, which is an early storage mechanism based on a grid of toroid magnets. It has since become obsolete, but the term is still used today for getting a snapshot of a computer process. Reference

A 32 x 32 core memory plane storing
1024 bits (or 128 bytes) of data. The first core dumps were printed on paper, which sounds reasonable given these small amounts of bytes.
A 32 x 32 core memory plane storing 1024 bits (or 128 bytes) of data. The first core dumps were printed on paper, which sounds reasonable given these small amounts of bytes.
Source: Wikimedia Commons

Cursor

Today's meaning: a visual cue (such as a flashing vertical line) on a video display that indicates position (as for data entry). Merriam-Webster

Cursor is Latin for runner. A cursor is the name given to the transparent slide engraved with a hairline that is used for marking a point on a slide rule. The term was then transferred to computers through analogy. Reference

A December 1951 advertisement for the
IBM 604 Electronic Calculating Punch that was first produced in 1948. The
advertisement claims the IBM 604 can do the work of 150 engineers with slide
rules. The cursor (or runner) is the transparent part in the middle of the
slide.
Source: A December 1951 advertisement for the IBM 604 Electronic Calculating Punch that was first produced in 1948. The advertisement claims the IBM 604 can do the work of 150 engineers with slide rules. The cursor (or runner) is the transparent part in the middle of the slide.

Daemon

In computing, a daemon is a background process that handles requests for services such as print spooling and file transfers, and then terminates. The term was coined by the programmers of MIT's Project MAC (Mathematics and Computation) in 1963. They took the name from Maxwell's demon, a hypothetical creature from a thought experiment that constantly works in the background, sorting molecules.

The MIT programmers thought demon would be an appropriate name for a background process that worked tirelessly to perform system chores. But instead of using the term demon, they used daemon, which is an older form of the word. (Reference)

Dashboard

Today's meaning: A user interface that provides a quick overview of a system's status.

Originally a plank of wood at the front of a horse-drawn carriage to protect the driver from mud 'dashed' backward by a horses hooves.

When automobiles were manufactured, the board in front of the driver was given the same name. That was the logical place to put the necessary gauges so the driver could see them easily. In time, the term became more associated with the readouts than the protection it offered. Reference

A dashboard of a horse carriage.
A dashboard of a horse carriage.
Source: Wikimedia Commons

Firewall

Today's meaning: A network security system that establishes a barrier between a trusted internal network and an untrusted external network, such as the Internet.

Fire walls are used mainly in terraced houses, but also in individual residential buildings. They prevent fire and smoke from spreading to another part of the building in the event of a fire. Large fires can thus be prevented. The term is used in computing since the 80s. Reference

Firewall residential construction, separating the building into two separate residential units, and fire areas.
Firewall residential construction, separating the building into two separate residential units, and fire areas.
Source: Wikimedia Commons

Firmware

Today's meaning: A class of computer software that provides the low-level control for the device's specific hardware and closely tied to the hardware it runs on.

Ascher Opler coined the term firmware in a 1967 Datamation article. As originally used, firmware contrasted with hardware (the CPU itself) and software (normal instructions executing on a CPU). It existed on the boundary between hardware and software; thus the name "firmware". The original article is available on the Internet Archive. Reference

Foo and Bar

Today's meaning: Common placeholder variable names.

Originally the term might come from the military term FUBAR. There are a few variations, but a common meaning is FUBAR: "f***ed up beyond all recognition".

The use of foo in a programming context is generally credited to the Tech Model Railroad Club (TMRC) of MIT from circa 1960. In the complex model system, there were scram switches located at numerous places around the room that could be thrown if something undesirable was about to occur, such as a train going full-bore at an obstruction.

The way I understood it was that they literally had emergency buttons labeled foo for lack of a better name. Maybe related to the original military meaning of FUBAR to indicate that something is going very very wrong.

A scram switch (button), that could be
pressed to prevent inadvertent operation. Maybe the TMRC had buttons labeled `foo` instead
A scram switch (button), that could be pressed to prevent inadvertent operation. Maybe the TMRC had buttons labeled foo instead
Source: Source Wikimedia Commons

References:

Freelancer

Today's meaning: A self-employed person, which is not committed to a particular employer long-term.

The term first appears in the novel Ivanhoe by Sir Walter Scott. (The novel also had a lasting influence on the Robin Hood legend.)

Cover of a Classic Comics book
Cover of a Classic Comics book
Source: Wikimedia Commons

In it, a Lord offers his paid army of 'free lances' to King Richard:

I offered Richard the service of my Free Lances, and he refused them — I will lead them to Hull, seize on shipping, and embark for Flanders; thanks to the bustling times, a man of action will always find employment.

Therefore, a "free lancer" is someone who fights for whoever pays the most. Free does not mean "without pay", but refers to the additional freedom to work for any employer. Reference

Hash

Today's meaning: A hash function is any function that can be used to map data of arbitrary size to fixed-size values.

According to Wikipedia, the use of the word "hash" in hash function "comes by way of analogy with its non-technical meaning, to "chop and mix". Indeed, typical hash functions, like the mod operation, "chop" the input domain into many sub-domains that get "mixed" into the output range to improve the uniformity of the key distribution."

References:

Log / Logfile

Today's meaning: A file that records events of a computer program or system.

Sailors used so-called log lines to measure the speed of their ship. A flat piece of wood (the log) was attached to a long rope. The log had regularly spaced knots in it. As the log would drift away, the sailors would count the number of knots that went out in a fixed time interval, and this would be the ship's speed — in knots.

The ship's speed was important for navigation, so the sailors noted it down in a book, aptly called the log book, together with other information to establish the position of the ship more accurately, like landmark sightings and weather events. Later, additional information, more generally concerning the ship, was added — or logged — such as harbor fees and abnormal provision depletion.

Reference.

Sailors measuring ship speed with a
log line
Sailors measuring ship speed with a log line
Source: The Pilgrims & Plymouth Colony:1620 by Duane A. Cline
The parts of a log-line
The parts of a log-line
Source: The Pilgrims & Plymouth Colony:1620 by Duane A. Cline
Page from the log-file of the British
Winchelsea. The second column denotes the number of knots measured with the
log-line, which indicates the ship's speed
Page from the log-file of the British Winchelsea. The second column denotes the number of knots measured with the log-line, which indicates the ship's speed
Source: Navigation and Logbooks in the Age of Sail by Peter Reaveley

Patch

Today's meaning: A piece of code that can be applied to fix or improve a computer program.

In the early days of computing history, if you made a programming mistake, you'd have to fix a paper tape or a punched card by putting a patch on top of a hole.

A program tape with physical patches used
to correct punched holes by covering them.
A program tape with physical patches used to correct punched holes by covering them.
Source: Smithsonian Archives Center

Ping

Today's meaning: A way to check the availability and response time of a computer over the network.

Ping is a terminal program originally written by Mike Muuss in 1983 that is included in every version of UNIX, Windows, and macOS. He named it "after the sound that a sonar makes, inspired by the whole principle of echo-location. [...] ping uses timed IP/ICMP ECHO_REQUEST and ECHO_REPLY packets to probe the "distance" to the target machine." The reference is well worth a read.

Pixel

Today's meaning: The smallest controllable element of a picture represented on the screen.

The word pixel is a combination of pix (from "pictures", shortened to "pics") and el (for "element"). Similarly, voxel is a volume element and texel is a texture element. Reference

Shell

Today's meaning: An interactive, commonly text-based runtime to interact with a computer system.

The inventor of the term, Louis Pouzin, does not give an explanation for the name in his essay The Origins of the Shell. It can however be traced back to Unix' predecessor Multics. It is described in the Multics glossary like so:

[The shell] is passed a command line for execution by the listener.

The The New Hacker's Dictionary, (also known as the Jargon File) by Eric S. Raymond contains the following:

Historical note: Apparently, the original Multics shell (sense 1) was so called because it was a shell (sense 3);

where sense 3 refers to

A skeleton program, created by hand or by another program (like, say, a parser generator), which provides the necessary incantations to set up some task and the control flow to drive it (the term driver is sometimes used synonymously). The user is meant to fill in whatever code is needed to get real work done. This usage is common in the AI and Microsoft Windows worlds, and confuses Unix hackers.

Unfortunately, the book does not provide any evidence to back up this claim.

I like the (possibly historically incorrect) analogy to a nut with the shell being on the outside, protecting the kernel.

Reference

Slab allocator

Today's meaning: An efficient memory allocation technique, which reuses previous allocations.

Slab allocation was invented by John Bonwick (Note: PDF file) in 1994 and has since been used by services like Memcached and the Linux Kernel.

With slab allocation, a cache for a certain type or size of data object has a number of pre-allocated "slabs" of memory; within each slab there are memory chunks of fixed size suitable for the objects. (Wikpedia)

The name slab comes from a teenage friend of Bonwick. He tells the story on the Oracle blog:

While watching TV together, a commercial by Kellogg's came on with the tag line, "Can you pinch an inch?"

The implication was that you were overweight if you could pinch more than an inch of fat on your waist — and that hoovering a bowl of corn flakes would help.

Without missing a beat, Tommy, who weighed about 250 pounds, reached for his midsection and offered his response: "Hell, I can grab a slab!"

A decade later, Bonwick remembered that term when he was looking for a word to describe the allocation of a larger chunk of memory.

Here is the original Kellogg's advertisement:

Spam

Today's meaning: Unsolicited electronic communications, for example by sending mass-emails or posting in forums and chats.

The term goes back to a sketch by the British comedy group Monty Python from 1970. In the sketch, a cafe is including Spam (a brand of canned cooked pork) in almost every dish. Spam is a portmanteau of spiced and ham. The excessive amount of Spam mentioned is a reference to the ubiquity of it and other imported canned meat products in the UK after World War II (a period of rationing in the UK) as the country struggled to rebuild its agricultural base. Reference

Vintage Ad: Look What You Can Do With One
Can of Spam
Vintage Ad: Look What You Can Do With One Can of Spam
Source: By user Jamie (jbcurio) on flickr.com

Monty Pythons Flying Circus (1974) - SPAM from Testing Tester on Vimeo.

Radio Button

Today's meaning: A UI element that allows to choose from a predefined set of mutually exclusive options

"Radio buttons" are named after the analogous pendant of mechanical buttons that were used in radios. The UI concept has later been used in tape recorders, cassette recorders and wearable audio players (the famous "Walkman" and similar). And later in VCRs and video cameras. Reference

An old car radio (left) and CSS
radio buttons (right). Only a single option can be selected at any point in
time. As a kid, I would push two buttons at once so they would interlock. Good
times.
An old car radio (left) and CSS radio buttons (right). Only a single option can be selected at any point in time. As a kid, I would push two buttons at once so they would interlock. Good times.
Source: Images by Matt Coady

Uppercase and lowercase

Today's meaning: Distinction between capital letters and small letters on a keyboard.

Back when typesetting was a manual process where single letters made of led were "type set" to form words and sentences, upper- and lowercase letters were kept in separate containers — or cases — to make this rather tedious process a little faster.

A set of printers cases
A set of printers cases
Source: From the book 'Printing types, their history, forms, and use; a study in survivals' by Updike, Daniel Berkeley, 1860-1941. Freely available on archive.org.

Honorable mentions

404

Today's meaning: HTTP Status Code for "File not found".

There is a story that the number comes from the server room where the World Wide Web's central database was located. In there, administrators would manually locate the requested files and transfer them, over the network, to the person who made that request. If a file didn't exist, they'd return an error message: "Room 404: file not found".

This, however, seems to be a myth and the status code was chosen rather arbitrarily based on the then well-established FTP status codes. Reference

Programming languages and Abbreviations

The etymology of programming language names and common abbreviations would probably warrant its own article, but I've decided to note down some of my favorites for the time being.

C++

C++ is a programming language based on C by Bjarne Stroustrup. The name is a programmer pun by Rick Mascitti, a coworker of Stroustrup. The ++ refers to the post-increment operator, that is common in many C-like languages. It increases the value of a variable by 1. In that sense, C++ can be seen as the spiritual "successor" of C. Reference

C Sharp

Similarly to C++, C# is a C-like programming language. The name again refers to "incremental" improvements on top of C++. The # in the name looks like four plus signs. Hence C# == (C++)++. But on top of that, the name was also inspired by the musical notation where a sharp indicates that the written note should be made a semitone higher in pitch. Reference

A C-Sharp note.
A C-Sharp note.
Source: Wikimedia Commons

PNG

Officially, PNG stands for Portable Network Graphics. It was born out of frustration over a CompuServe announcement in 1994 that programs supporting GIF would have to pay licensing fees from now on. A working group lead by hacker Thomas Boutell created the .webp file format, a patent-free replacement for GIF. Therefore I prefer the format's unofficial name: PNG's Not GIF. Here's a great article on PNG's history. Reference

Credits

Most of the content comes from sources like Wikipedia (with reference where appropriate), but the explanations are difficult to hunt down if you don't know what you're looking for.
This is a living document, and I'm planning to update it in case of reader submissions.

Conclusion

You have to know the past to understand the present.
— Dr. Carl Sagan (1980)

I hope you enjoyed this trip down memory lane. Now it's your turn!
👉 Do you know any other stories? Send me a message, and I'll add them here.


Open Source Virtual Background Posts on elder.dev

Posts on elder.dev2020-04-09 00:00:00 With many of us around the globe under shelter in place due to COVID-19 video calls have become a lot more common. In particular, ZOOM has controversially become very popular. Arguably Zoom’s most interesting feature is the “Virtual Background” support which allows users to replace the background behind them in their webcam video feed with any image (or video). I’ve been using Zoom for a long time at work for Kubernetes open source meetings, usually from my company laptop.

Three Types of Data Brandon's Website

Brandon's Website2020-02-05 00:00:00 In my work I've developed a mental framework related to data modeling, which has helped greatly both when coming up with a model and when making decisions down the road about how to use that model. Here I will establish three different categories of data in software: Constants, State, and Cached Values. By "data" I generally mean "variables in code", but the same principles could be applied to files on a disk, or tables in a database, or whatever else.

A Timelapse of Timelapse Matthias Endler

Matthias Endler2020-02-04 00:00:00

Timelapse is a little open-source screen recorder for macOS. It takes a screenshot every second and creates a movie in the end.

To celebrate its unlikely 1.0 release today, I present here a "timelapse" of this project's journey. It just took ten years to get here.

2011 - How it all began

To be honest, I don't remember why I initially wrote the tool. I must have had a personal need for a screen recorder, I guess...

In May 2011, when I started the project, I was doing my Masters Degree in Computer Science. I might have needed the tool for University; most likely, however, I was just trying to find an excuse for not working on an assignment.

During that time, I wrote a lot of tools like that. Mainly to scratch a personal itch, learn a new programming language, or just have fun.

Among them are tools like a random sandwich generator for Subway (the American fast-food chain), DrawRoom, a keyboard-driven drawing app inspired by WriteRoom, and the obligatory CMS software, that I sold to clients. Surprisingly, none of them were a great success.

DrawRoom, a tool that I wrote around the same time, is a real piece of art. To this day it has five commits and a single Github star (by myself, don't judge...).
DrawRoom, a tool that I wrote around the same time, is a real piece of art. To this day it has five commits and a single Github star (by myself, don't judge...).

What I do know for sure is that I was unhappy with all existing screen recorders. They could roughly be categorized into these three groups:

  • Proprietary solutions that cost money or could call home.
  • Tools that didn't work on macOS.
  • Small, fragile, one-off scripts that people passed around in forums or as Github gists. They rarely worked as advertised.

Among the remaining tools were none that provided any timelapse functionality; so I set out to write my own.

This all sounds very epic, but in reality, I worked on it for a day. After five heroic commits on May 11, 2011, it sat there, idle, for seven years...

2018

A lot of time elapsed before anything exciting happened.

In January '18, seemingly out of nowhere, the first user filed a bug report. It was titled hung when creating the avi 😱. Turns out that a game developer from Canada, juul1a, was trying to use the tool to track her progress on an indie game — how cool is that?

To help her out, I decided to do some general cleanup, finally write down some instructions on how to even use the program, add a requirements.txt, and port the tool from mencoder to ffmpeg.

After that, timelapse was ready for prime-time. 🎬 Here is some live action from her videos featuring timelapses:

At that point, the tool was still very wobbly and could only be used from the commandline, but I began to see some potential for building a proper app from it; I just never found the time.

In October '18, I decided to ask for support during Hacktoberfest. I created a few tickets and labeled them with hacktoberfest to try and find contributors.

And then, I waited.

First, Shreya V Prabhu fixed an issue where a new recording was overwriting the previous one by adding a timestamp to the video name. Then Abner Campanha and Shane Creedon (no longer on Github) created a basic test structure. Gbenro Selere added a CI pipeline for Travis CI. It really worked, and the project was in much better shape after that!

2019

One year passes by, and Kyle Jones adds some contribution guidelines, while I move the CI pipeline to the newly released Github actions.

Chaitanya fixed a bug where the program would hang when the recording stopped by moving the video creation from threads to a separate process. He continued to make the codebase more robust and became a core contributor, reviewing pull requests and handling releases.

Thanks to orcutt989, the app now made use of type hints in Python 3.6.

gkpln3 added support for multi-monitor configurations. The screen captured will always be the one with the mouse on it.

2020

Fast forward to today, and after almost ten years, we finally created a true macOS app using the awesome py2app bundler. This should make the tool usable by non-developers.

Back to the Future

We reached the end of our little journey.

A long time has passed until 1.0. This project is a testament to the wonders of open source collaboration, and I am proud to work on it with contributors from around the world. It doesn't have to be a life-changing project to bring people together who have fun building things. If this were the end of the story, I'd be okay with that. I doubt it, though. Here's to the next ten years!

🎬 Download timelapse on Github.

Bonus

The video at the beginning is a timelapse of how I finish this article.
How meta.


Github Stars Matthias Endler

Matthias Endler2020-01-01 00:00:00
RepositoryStars
analysis-tools-dev/static-analysis12731 ★
mre/idiomatic-rust5844 ★
tinysearch/tinysearch2576 ★
mre/the-coding-interview1669 ★
lycheeverse/lychee1606 ★
analysis-tools-dev/dynamic-analysis857 ★
ReceiptManager/receipt-parser-legacy785 ★
mre/hyperjson500 ★
mre/cargo-inspect383 ★
hello-rust/show302 ★
mre/fcat263 ★
lycheeverse/lychee-action258 ★
mre/vscode-snippet229 ★
mre/kafka-influxdb215 ★
mre/timelapse213 ★
mre/prettyprint201 ★
ReceiptManager/receipt-manager-app175 ★
mre/zerocal162 ★

A Note on Mask Registers Performance Matters

Performance Matters2019-12-05 16:30:00

AVX-512 introduced eight so-called mask registers1, k02 through k7, which apply to most ALU operations and allow you to apply a zero-masking or merging3 operation on a per-element basis, speeding up code that would otherwise require extra blending operations in AVX2 and earlier.

If that single sentence doesn’t immediately indoctrinate you into the mask register religion, here’s a copy and paste from Wikipedia that should fill in the gaps and close the deal:

Most AVX-512 instructions may indicate one of 8 opmask registers (k0–k7). For instructions which use a mask register as an opmask, register k0 is special: a hardcoded constant used to indicate unmasked operations. For other operations, such as those that write to an opmask register or perform arithmetic or logical operations, k0 is a functioning, valid register. In most instructions, the opmask is used to control which values are written to the destination. A flag controls the opmask behavior, which can either be “zero”, which zeros everything not selected by the mask, or “merge”, which leaves everything not selected untouched. The merge behavior is identical to the blend instructions.

So mask registers4 are important, but are not household names unlike say general purpose registers (eax, rsi and friends) or SIMD registers (xmm0, ymm5, etc). They certainly aren’t going to show up on Intel slides disclosing the size of uarch resources, like these:

Intel Slide


In particular, I don’t think the size of the mask register physical register file (PRF) has ever been reported. Let’s fix that today.

We use an updated version of the ROB size probing tool originally authored and described by Henry Wong5 (hereafter simply Henry), who used it to probe the size of various documented and undocumented out-of-order structures on earlier architecture. If you haven’t already read that post, stop now and do it. This post will be here when you get back.

You’ve already read Henry’s blog for a full description (right?), but for the naughty among you here’s the fast food version:

Fast Food Method of Operation

We separate two cache miss load instructions6 by a variable number of filler instructions which vary based on the CPU resource we are probing. When the number of filler instructions is small enough, the two cache misses execute in parallel and their latencies are overlapped so the total execution time is roughly7 as long as a single miss.

However, once the number of filler instructions reaches a critical threshold, all of the targeted resource are consumed and instruction allocation stalls before the second miss is issued and so the cache misses can no longer run in parallel. This causes the runtime to spike to about twice the baseline cache miss latency.

Finally, we ensure that each filler instruction consumes exactly one of the resource we are interested in, so that the location of the spike indicates the size of the underlying resource. For example, regular GP instructions usually consume one physical register from the GP PRF so are a good choice to measure the size of that resource.

Mask Register PRF Size

Here, we use instructions that write a mask register, so can measure the size of the mask register PRF.

To start, we use a series of kaddd k1, k2, k3 instructions, as such (shown for 16 filler instructions):

mov    rcx,QWORD PTR [rcx]  ; first cache miss load
kaddd  k1,k2,k3
kaddd  k1,k2,k3
kaddd  k1,k2,k3
kaddd  k1,k2,k3
kaddd  k1,k2,k3
kaddd  k1,k2,k3
kaddd  k1,k2,k3
kaddd  k1,k2,k3
kaddd  k1,k2,k3
kaddd  k1,k2,k3
kaddd  k1,k2,k3
kaddd  k1,k2,k3
kaddd  k1,k2,k3
kaddd  k1,k2,k3
kaddd  k1,k2,k3
kaddd  k1,k2,k3
mov    rdx,QWORD PTR [rdx]  ; second cache miss load
lfence                      ; stop issue until the above block completes
; this block is repeated 16 more times

Each kaddd instruction consumes one physical mask register. If number of filler instructions is equal to or less than the number of mask registers, we expect the misses to happen in parallel, otherwise the misses will be resolved serially. So we expect at that point to see a large spike in the running time.

That’s exactly what we see:

Test 27 kaddd instructions

Let’s zoom in on the critical region, where the spike occurs:

Test 27 zoomed

Here we clearly see that the transition isn’t sharp – when the filler instruction count is between 130 and 134, we the runtime is intermediate: falling between the low and high levels. Henry calls this non ideal behavior and I have seen it repeatedly across many but not all of these resource size tests. The idea is that the hardware implementation doesn’t always allow all of the resources to be used as you approach the limit8 - sometimes you get to use every last resource, but in other cases you may hit the limit a few filler instructions before the theoretical limit.

Under this assumption, we want to look at the last (rightmost) point which is still faster than the slow performance level, since it indicates that sometimes that many resources are available, implying that at least that many are physically present. Here, we see that final point occurs at 134 filler instructions.

So we conclude that SKX has 134 physical registers available to hold speculative mask register values. As Henry indicates on the original post, it is likely that there are 8 physical registers dedicated to holding the non-speculative architectural state of the 8 mask registers, so our best guess at the total size of the mask register PRF is 142. That’s somewhat smaller than the GP PRF (180 entires) or the SIMD PRF (168 entries), but still quite large (see this table of out of order resource sizes for sizes on other platforms).

In particular, it is definitely large enough that you aren’t likely to run into this limit in practical code: it’s hard to imagine non-contrived code where almost 60%9 of the instructions write10 to mask registers, because that’s what you’d need to hit this limit.

Are They Distinct PRFs?

You may have noticed that so far I’m simply assuming that the mask register PRF is distinct from the others. I think this is highly likely, given the way mask registers are used and since they are part of a disjoint renaming domain11. It is also supported by the fact that that apparent mask register PFR size doesn’t match either the GP or SIMD PRF sizes, but we can go further and actually test it!

To do that, we use a similar test to the above, but with the filler instructions alternating between the same kaddd instruction as the original test and an instruction that uses either a GP or SIMD register. If the register file is shared, we expect to hit a limit at size of the PRF. If the PRFs are not shared, we expect that neither PRF limit will be hit, and we will instead hit a different limit such as the ROB size.

Test 29 alternates kaddd and scalar add instructions, like this:

mov    rcx,QWORD PTR [rcx]
add    ebx,ebx
kaddd  k1,k2,k3
add    esi,esi
kaddd  k1,k2,k3
add    ebx,ebx
kaddd  k1,k2,k3
add    esi,esi
kaddd  k1,k2,k3
add    ebx,ebx
kaddd  k1,k2,k3
add    esi,esi
kaddd  k1,k2,k3
add    ebx,ebx
kaddd  k1,k2,k3
mov    rdx,QWORD PTR [rdx]
lfence

Here’s the chart:

Test 29: alternating kaddd and scalar add

We see that the spike is at a filler count larger than the GP and PRF sizes. So we can conclude that the mask and GP PRFs are not shared.

Maybe the mask register is shared with the SIMD PRF? After all, mask registers are more closely associated with SIMD instructions than general purpose ones, so maybe there is some synergy there.

To check, here’s Test 35, which is similar to 29 except that it alternates between kaddd and vxorps, like so:

mov    rcx,QWORD PTR [rcx]
vxorps ymm0,ymm0,ymm1
kaddd  k1,k2,k3
vxorps ymm2,ymm2,ymm3
kaddd  k1,k2,k3
vxorps ymm4,ymm4,ymm5
kaddd  k1,k2,k3
vxorps ymm6,ymm6,ymm7
kaddd  k1,k2,k3
vxorps ymm0,ymm0,ymm1
kaddd  k1,k2,k3
vxorps ymm2,ymm2,ymm3
kaddd  k1,k2,k3
vxorps ymm4,ymm4,ymm5
kaddd  k1,k2,k3
mov    rdx,QWORD PTR [rdx]
lfence

Here’s the corresponding chart:

Test 35: alternating kaddd and SIMD xor

The behavior is basically identical to the prior test, so we conclude that there is no direct sharing between the mask register and SIMD PRFs either.

This turned out not to be the end of the story. The mask registers are shared, just not with the general purpose or SSE/AVX register file. For all the details, see this follow up post.

An Unresolved Puzzle

Something we notice in both of the above tests, however, is that the spike seems to finish around 212 filler instructions. However, the ROB size for this microarchtiecture is 224. Is this just non ideal behavior as we saw earlier? Well we can test this by comparing against Test 4, which just uses nop instructions as the filler: these shouldn’t consume almost any resources beyond ROB entries. Here’s Test 4 (nop filer) versus Test 29 (alternating kaddd and scalar add):

Test 4 vs 29

The nop-using Test 4 nails the ROB size at exactly 224 (these charts are SVG so feel free to “View Image” and zoom in confirm). So it seems that we hit some other limit around 212 when we mix mask and GP registers, or when we mix mask and SIMD registers. In fact the same limit applies even between GP and SIMD registers, if we compare Test 4 and Test 21 (which mixes GP adds with SIMD vxorps):

Test 4 vs 21

Henry mentions a more extreme version of the same thing in the original blog entry, in the section also headed Unresolved Puzzle:

Sandy Bridge AVX or SSE interleaved with integer instructions seems to be limited to looking ahead ~147 instructions by something other than the ROB. Having tried other combinations (e.g., varying the ordering and proportion of AVX vs. integer instructions, inserting some NOPs into the mix), it seems as though both SSE/AVX and integer instructions consume registers from some form of shared pool, as the instruction window is always limited to around 147 regardless of how many of each type of instruction are used, as long as neither type exhausts its own PRF supply on its own.

Read the full section for all the details. The effect is similar here but smaller: we at least get 95% of the way to the ROB size, but still stop before it. It is possible the shared resource is related to register reclamation, e.g., the PRRT12 - a table which keeps track of which registers can be reclaimed when a given instruction retires.

Finally, we finish this party off with a few miscellaneous notes on mask registers, checking for parity with some features available to GP and SIMD registers.

Move Elimination

Both GP and SIMD registers are eligible for so-called move elimination. This means that a register to register move like mov eax, edx or vmovdqu ymm1, ymm2 can be eliminated at rename by “simply”13 pointing the destination register entry in the RAT to the same physical register as the source, without involving the ALU.

Let’s check if something like kmov k1, k2 also qualifies for move elimination. First, we check the chart for Test 28, where the filler instruction is kmovd k1, k2:

Test 28

It looks exactly like Test 27 we saw earlier with kaddd. So we would suspect that physical registers are being consumed, unless we have happened to hit a different move-elimination related limit with exactly the same size and limiting behavior14.

Additional confirmation comes from uops.info which shows that all variants of mask to mask register kmov take one uop dispatched to p0. If the move is eliminated, we wouldn’t see any dispatched uops.

Therefore I conclude that register to register15 moves involving mask registers are not eliminated.

Dependency Breaking Idioms

The best way to set a GP register to zero in x86 is via the xor zeroing idiom: xor reg, reg. This works because any value xored with itself is zero. This is smaller (fewer instruction bytes) than the more obvious mov eax, 0, and also faster since the processor recognizes it as a zeroing idiom and performs the necessary work at rename16, so no ALU is involved and no uop is dispatched.

Furthermore, the idiom is dependency breaking: although xor reg1, reg2 in general depends on the value of both reg1 and reg2, in the special case that reg1 and reg2 are the same, there is no dependency as the result is zero regardless of the inputs. All modern x86 CPUs recognize this17 special case for xor. The same applies to SIMD versions of xor such as integer vpxor and floating point vxorps and vxorpd.

That background out of the way, a curious person might wonder if the kxor variants are treated the same way. Is kxorb k1, k1, k118 treated as a zeroing idiom?

This is actually two separate questions, since there are two aspects to zeroing idioms:

  • Zero latency execution with no execution unit (elimination)
  • Dependency breaking

Let’s look at each in turn.

Execution Elimination

So are zeroing xors like kxorb k1, k1, k1 executed at rename without latency and without needing an execution unit?

No.

Here, I don’t even have to do any work: uops.info has our back because they’ve performed this exact test and report a latency of 1 cycle and one p0 uop used. So we can conclude that zeroing xors of mask registers are not eliminated.

Dependency Breaking

Well maybe zeroing kxors are dependency breaking, even though they require an execution unit?

In this case, we can’t simply check uops.info. kxor is a one cycle latency instruction that runs only on a single execution port (p0), so we hit the interesting (?) case where a chain of kxor runs at the same speed regardless of whether the are dependent or independent: the throughput bottleneck of 1/cycle is the same as the latency bottleneck of 1/cycle!

Don’t worry, we’ve got other tricks up our sleeve. We can test this by constructing a tests which involve a kxor in a carried dependency chain with enough total latency so that the chain latency is the bottleneck. If the kxor carries a dependency, the runtime will be equal to the sum of the latencies in the chain. If the instruction is dependency breaking, the chain is broken and the different disconnected chains can overlap and performance will likely be limited by some throughput restriction (e.g., port contention). This could use a good diagram, but I’m not good at diagrams.

All the tests are in uarch bench, but I’ll show the key parts here.

First we get a baseline measurement for the latency of moving from a mask register to a GP register and back:

kmovb k0, eax
kmovb eax, k0
; repeated 127 more times

This pair clocks in19 at 4 cycles. It’s hard to know how to partition the latency between the two instructions: are they both 2 cycles or is there a 3-1 split one way or the other20, but for our purposes it doesn’t matter because we just care about the latency of the round-trip. Importantly, the post-based throughput limit of this sequence is 1/cycle, 4x faster than the latency limit, because each instruction goes to a different port (p5 and p0, respectively). This means we will be able to tease out latency effects independent of throughput.

Next, we throw a kxor into the chain that we know is not zeroing:

kmovb k0, eax
kxorb k0, k0, k1
kmovb eax, k0
; repeated 127 more times

Since we know kxorb has 1 cycle of latency, we expect to increase the latency to 5 cycles and that’s exactly what we measure (the first two tests shown):

** Running group avx512 : AVX512 stuff **
                               Benchmark    Cycles     Nanos
                kreg-GP rountrip latency      4.00      1.25
    kreg-GP roundtrip + nonzeroing kxorb      5.00      1.57

Finally, the key test:

kmovb k0, eax
kxorb k0, k0, k0
kmovb eax, k0
; repeated 127 more times

This has a zeroing kxorb k0, k0, k0. If it breaks the dependency on k0, it would mean that the kmovb eax, k0 no longer depends on the earlier kmovb k0, eax, and the carried chain is broken and we’d see a lower cycle time.

Drumroll…

We measure this at the exact same 5.0 cycles as the prior example:

** Running group avx512 : AVX512 stuff **
                               Benchmark    Cycles     Nanos
                kreg-GP rountrip latency      4.00      1.25
    kreg-GP roundtrip + nonzeroing kxorb      5.00      1.57
       kreg-GP roundtrip + zeroing kxorb      5.00      1.57

So we tentatively conclude that zeroing idioms aren’t recognized at all when they involve mask registers.

Finally, as a check on our logic, we use the following test which replaces the kxor with a kmov which we know is always dependency breaking:

kmovb k0, eax
kmovb k0, ecx
kmovb eax, k0
; repeated 127 more times

This is the final result shown in the output above, and it runs much more quickly at 2 cycles, bottlenecked on p5 (the two kmov k, r32 instructions both go only to p5):

** Running group avx512 : AVX512 stuff **
                               Benchmark    Cycles     Nanos
                kreg-GP rountrip latency      4.00      1.25
    kreg-GP roundtrip + nonzeroing kxorb      5.00      1.57
       kreg-GP roundtrip + zeroing kxorb      5.00      1.57
         kreg-GP roundtrip + mov from GP      2.00      0.63

So our experiment seems to check out.

Reproduction

You can reproduce these results yourself with the robsize binary on Linux or Windows (using WSL). The specific results for this article are also available as are the scripts used to collect them and generate the plots.

Summary

  • SKX has a separate PRF for mask registers with a speculative size of 134 and an estimated total size of 142
  • This is large enough compared to the other PRF size and the ROB to make it unlikely to be a bottleneck
  • Mask registers are not eligible for move elimination
  • Zeroing idioms21 in mask registers are not recognized for execution elimination or dependency breaking

Part II

I didn’t expect it to happen, but it did: there is a follow up post about mask registers, where we (roughly) confirm the register file size by looking at an image of a SKX CPU captured via microcope, and make an interesting discovery regarding sharing.

Comments

Discussion on Hacker News, Reddit (r/asm and r/programming) or Twitter.

Direct feedback also welcomed by email or as a GitHub issue.

Thanks

Daniel Lemire who provided access to the AVX-512 system I used for testing.

Henry Wong who wrote the original article which introduced me to this technique and graciously shared the code for his tool, which I now host on github.

Jeff Baker, Wojciech Muła for reporting typos.

Image credit: Kellogg’s Special K by Like_the_Grand_Canyon is licensed under CC BY 2.0.

If you liked this post, check out the homepage for others you might enjoy.




  1. These mask registers are often called k registers or simply kregs based on their naming scheme. Rumor has it that this letter was chosen randomly only after a long and bloody naming battle between MFs. 

  2. There is sometimes a misconception (until recently even on the AVX-512 wikipedia article) that k0 is not a normal mask register, but just a hardcoded indicator that no masking should be used. That’s not true: k0 is a valid mask register and you can read and write to it with the k-prefixed instructions and SIMD instructions that write mask registers (e.g., any AVX-512 comparison. However, the encoding that would normally be used for k0 as a writemask register in a SIMD operation indicates instead “no masking”, so the contents of k0 cannot be used for that purpose. 

  3. The distinction being that a zero-masking operation results in zeroed destination elements at positions not selected by the mask, while merging leaves the existing elements in the destination register unchanged at those positions. As as side-effect this means that with merging, the destination register becomes a type of destructive source-destination register and there is an input dependency on this register. 

  4. I’ll try to use the full term mask register here, but I may also use kreg a common nickname based on the labels k0, k1, etc. So just mentally swap kreg for mask register if and when you see it (or vice-versa). 

  5. H. Wong, Measuring Reorder Buffer Capacity, May, 2013. [Online]. Available: http://blog.stuffedcow.net/2013/05/measuring-rob-capacity/ 

  6. Generally taking 100 to 300 cycles each (latency-wise). The wide range is because the cache miss wall clock time varies by a factor of about 2x, generally between 50 and 100 naneseconds, depending on platform and uarch details, and the CPU frequency varies by a factor of about 2.5x (say from 2 GHz to 5 GHz). However, on a given host, with equivalent TLB miss/hit behavior, we expect the time to be roughly constant. 

  7. The reason I have to add roughly as a weasel word here is itself interesting. A glance at the charts shows that they are certainly not totally flat in either the fast or slow regions surrounding the spike. Rather there are various noticeable regions with distinct behavior and other artifacts: e.g., in Test 29 a very flat region up to about 104 filler instructions, followed by a bump and then a linearly ramping region up to the spike somewhat after 200 instructions. Some of those features are explicable by mentally (or actually) simulating the pipeline, which reveals that at some point the filler instructions will contribute (although only a cycle or so) to the runtime, but some features are still unexplained (for now). 

  8. For example, a given rename slot may only be able to write a subset of all the RAT entries, and uses the first available. When the RAT is almost full, it is possible that none of the allowed entries are empty, so it is as if the structure is full even though some free entries remain, but accessible only to other uops. Since the allowed entries may be essentially random across iterations, this ends up with a more-or-less linear ramp between the low and high performance levels in the non-ideal region. 

  9. The “60 percent” comes from 134 / 224, i.e., the speculative mask register PRF size, divided by the ROB size. The idea is that if you’ll hit the ROB size limit no matter what once you have 224 instructions in flight, so you’d need to have 60% of those instructions be mask register writes10 in order to hit the 134 limit first. Of course, you might also hit some other limit first, so even 60% might not be enough, but the ROB size puts a lower bound on this figure since it always applies. 

  10. Importantly, only instructions which write a mask register consume a physical register. Instructions that simply read a mask register (e.g,. SIMD instructions using a writemask) do not consume a new physical mask register.  2

  11. More renaming domains makes things easier on the renamer for a given number of input registers. That is, it is easier to rename 2 GP and 2 SIMD input registers (separate domains) than 4 GP registers. 

  12. This is either the Physical Register Reclaim Table or Post Retirement Reclaim Table depending on who you ask. 

  13. Of course, it is not actually so simple. For one, you now need to track these “move elimination sets” (sets of registers all pointing to the same physical register) in order to know when the physical register can be released (once the set is empty), and these sets are themselves a limited resource which must be tracked. Flags introduce another complication since flags are apparently stored along with the destination register, so the presence and liveness of the flags must be tracked as well. 

  14. In particular, in the corresponding test for GP registers (Test 7), the chart looks very different as move elimination reduce the PRF demand down to almost zero and we get to the ROB limit. 

  15. Note that I am not restricting my statement to moves between two mask registers only, but any registers. That is, moves between a GP registers and a mask registers are also not eliminated (the latter fact is obvious if consider than they use distinct register files, so move elimination seems impossible). 

  16. Probably by pointing the entry in the RAT to a fixed, shared zero register, or setting a flag in the RAT that indicates it is zero. 

  17. Although xor is the most reliable, other idioms may be recognized as zeroing or dependency breaking idioms by some CPUs as well, e.g., sub reg,reg and even sbb reg, reg which is not a zeroing idiom, but rather sets the value of reg to zero or -1 (all bits set) depending on the value of the carry flag. This doesn’t depend on the value of reg but only the carry flag, and some CPUs recognize that and break the dependency. Agner’s microarchitecture guide covers the uarch-dependent support for these idioms very well. 

  18. Note that only the two source registers really need to be the same: if kxorb k1, k1, k1 is treated as zeroing, I would expect the same for kxorb k1, k2, k2

  19. Run all the tests in this section using ./uarch-bench.sh --test-name=avx512/*

  20. This is why uops.info reports the latency for both kmov r32, k and kmov k, 32 as <= 3. They know the pair takes 4 cycles in total and under the assumption that each instruction must take at least one cycle the only thing you can really say is that each instruction takes at most 3 cycles. 

  21. Technically, I only tested the xor zeroing idiom, but since that’s the groud-zero, most basic idiom we can pretty sure nothing else will be recognized as zeroing. I’m open to being proven wrong: the code is public and easy to modify to test whatever idiom you want. 


Digging Into etcd Posts on elder.dev

Posts on elder.dev2019-12-01 00:00:00 What Is etcd? etcd, /ˈɛtsiːdiː/, per the official site is: A distributed, reliable key-value store for the most critical data of a distributed system Per the FAQ etcd’s name means “distributed etc directory”. With etc being a reference to the Unix directory for system-wide configuration /etc, and d being a reference to “distributed” 1. The d is perhaps also a pun on the long history of naming daemons with a d suffix (see: httpd, ntpd, systemd, containerd, …), though I’ve not yet found proof of this.

Self-Driving Debian Posts on elder.dev

Posts on elder.dev2019-12-01 00:00:00 For my home server I’ve come to appreciate using it rather than maintaining it 😏 After replacing some parts starting over I really wanted it to be fully “self-driving” to the extent possible – primarily meaning totally unattended and automatic updates. No manual maintenance. Automated Updates Debian 10 “Buster” 🐶 ships with the unattended-upgrades package installed out of the box, but it needs a little configuring to achieve what we want.

Procedures, Functions, Data Brandon's Website

Brandon's Website2019-11-23 00:00:00 Functional programming is all the rage these days. There are piles of Medium posts out there singing its praises, preaching the good word. And for good reason! It can be a powerful paradigm for reducing code duplication and preventing sprawling side-effects.

A Tiny, Static, Full-Text Search Engine using Rust and WebAssembly Matthias Endler

Matthias Endler2019-10-17 00:00:00

I wrote a basic search module that you can add to a static website. It's very lightweight (50kB-100kB gzipped) and works with Hugo, Zola, and Jekyll. Only searching for entire words is supported. Try the search box on the left for a demo. The code is on Github.

Static site generators are magical. They combine the best of both worlds: dynamic content without sacrificing performance.

Over the years, this blog has been running on Jekyll, Cobalt, and, lately, Zola.

One thing I always disliked, however, was the fact that static websites don't come with "static" search engines, too. Instead, people resort to custom Google searches, external search engines like Algolia, or pure JavaScript-based solutions like lunr.js or elasticlunr.

All of these work fine for most sites, but it never felt like the final answer.

I didn't want to add yet another dependency on Google; neither did I want to use a stand-alone web-backend like Algolia, which adds latency and is proprietary.

On the other side, I'm not a huge fan of JavaScript-heavy websites. For example, just the search indices that lunr creates can be multiple megabytes in size. That feels lavish - even by today's bandwidth standards. On top of that, parsing JavaScript is still time-consuming.

I wanted some simple, lean, and self-contained search, that could be deployed next to my other static content.

As a consequence, I refrained from adding search functionality to my blog at all. That's unfortunate because, with a growing number of articles, it gets harder and harder to find relevant content.

The Idea

Many years ago, in 2013, I read "Writing a full-text search engine using Bloom filters" — and it was a revelation.

The idea was simple: Let's run all my blog articles through a generator that creates a tiny, self-contained search index using this magical data structure called a ✨Bloom Filter ✨.

Wait, what's a Bloom Filter?

A Bloom filter is a space-efficient way to check if an element is in a set.

The trick is that it doesn't store the elements themselves; it just knows with some confidence that they were stored before. In our case, it can say with a certain error rate that a word is in an article.

A Bloom filter stores a
'fingerprint' (a number of hash values) of all input values instead of the raw
input. The result is a low-memory-footprint data structure. This is an example
of 'hello' as an input.
A Bloom filter stores a 'fingerprint' (a number of hash values) of all input values instead of the raw input. The result is a low-memory-footprint data structure. This is an example of 'hello' as an input.

Here's the Python code from the original article that generates the Bloom filters for each post (courtesy of Stavros Korokithakis):

filters = {}
for name, words in split_posts.items():
  filters[name] = BloomFilter(capacity=len(words), error_rate=0.1)
  for word in words:
    filters[name].add(word)

The memory footprint is extremely small, thanks to error_rate, which allows for a negligible number of false positives.

I immediately knew that I wanted something like this for my homepage. My idea was to directly ship the Bloom filters and the search engine to the browser. I could finally have a small, static search without the need for a backend!

Headaches

Disillusionment came quickly.

I had no idea how to bundle and minimize the generated Bloom filters, let alone run them on clients. The original article briefly touches on this:

You need to implement a Bloom filter algorithm on the client-side. This will probably not be much longer than the inverted index search algorithm, but it’s still probably a bit more complicated.

I didn't feel confident enough in my JavaScript skills to pull this off. Back in 2013, NPM was a mere three years old, and WebPack just turned one, so I also didn't know where to look for existing solutions.

Unsure what to do next, my idea remained a pipe dream.

A New Hope

Five years later, in 2018, the web had become a different place. Bundlers were ubiquitous, and the Node ecosystem was flourishing. One thing, in particular, revived my dreams about the tiny static search engine: WebAssembly.

WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications. [source]

This meant that I could use a language that I was familiar with to write the client-side code — Rust! 🎉

My journey started with a prototype back in January 2018. It was just a direct port of the Python version from above:

let mut filters = HashMap::new();
for (name, words) in articles {
  let mut filter = BloomFilter::with_rate(0.1, words.len() as u32);
  for word in words {
    filter.insert(&word);
  }
  filters.insert(name, filter);
}

While I managed to create the Bloom filters for every article, I still had no clue how to package it for the web... until wasm-pack came along in February 2018.

Whoops! I Shipped Some Rust Code To Your Browser.

Now I had all the pieces of the puzzle:

  • Rust — A language I was comfortable with
  • wasm-pack — A bundler for WebAssembly modules
  • A working prototype that served as a proof-of-concept

The search box you see on the left side of this page is the outcome. It fully runs on Rust using WebAssembly (a.k.a the RAW stack). Try it now if you like.

There were quite a few obstacles along the way.

Bloom Filter Crates

I looked into a few Rust libraries (crates) that implement Bloom filters.

First, I tried jedisct1's rust-bloom-filter, but the types didn't implement Serialize/Deserialize. This meant that I could not store my generated Bloom filters inside the binary and load them on the client-side.

After trying a few others, I found the cuckoofilter crate, which supported serialization. The behavior is similar to Bloom filters, but if you're interested in the differences, you can look at this summary.

Here's how to use it:

let mut cf = cuckoofilter::new();

// Add data to the filter
let value: &str = "hello world";
let success = cf.add(value)?;

// Lookup if data was added before
let success = cf.contains(value);
// success ==> true

Let's check the output size when bundling the filters for ten articles on my blog using cuckoo filters:

~/C/p/tinysearch ❯❯❯ l storage
Permissions Size User    Date Modified Name
.rw-r--r--   44k mendler 24 Mar 15:42  storage

44kB doesn't sound too shabby, but these are just the cuckoo filters for ten articles, serialized as a Rust binary. On top of that, we have to add the search functionality and the helper code. In total, the client-side code weighed in at 216kB using vanilla wasm-pack. Too much.

Trimming Binary Size

After the sobering first result of 216kB for our initial prototype, we have a few options to bring the binary size down.

The first is following johnthagen's advice on minimizing Rust binary size.

By setting a few options in our Cargo.toml, we can shave off quite a few bytes:

"opt-level = 'z'" => 249665 bytes
"lto = true"      => 202516 bytes
"opt-level = 's'" => 195950 bytes

Setting opt-level to s means we trade size for speed, but we're preliminarily interested in minimal size anyway. After all, a small download size also improves performance.

Next, we can try wee_alloc, an alternative Rust allocator producing a small .wasm code size.

It is geared towards code that makes a handful of initial dynamically sized allocations, and then performs its heavy lifting without any further allocations. This scenario requires some allocator to exist, but we are more than happy to trade allocation performance for small code size.

Exactly what we want. Let's try!

"wee_alloc and nightly" => 187560 bytes

We shaved off another 4% from our binary.

Out of curiosity, I tried to set codegen-units to 1, meaning we only use a single thread for code generation. Surprisingly, this resulted in a slightly smaller binary size.

"codegen-units = 1" => 183294 bytes

Then I got word of a Wasm optimizer called binaryen. On macOS, it's available through homebrew:

brew install binaryen

It ships a binary called wasm-opt and that shaved off another 15%:

"wasm-opt -Oz" => 154413 bytes

Then I removed web-sys as we don't have to bind to the DOM: 152858 bytes.

There's a tool called twiggy to profile the code size of Wasm binaries. It printed the following output:

twiggy top -n 20 pkg/tinysearch_bg.wasm
 Shallow Bytes │ Shallow % │ Item
─────────────┼───────────┼────────────────────────────────
         79256 ┊    44.37% ┊ data[0]
         13886 ┊     7.77% ┊ "function names" subsection
          7289 ┊     4.08% ┊ data[1]
          6888 ┊     3.86% ┊ core::fmt::float::float_to_decimal_common_shortest::hdd201d50dffd0509
          6080 ┊     3.40% ┊ core::fmt::float::float_to_decimal_common_exact::hcb5f56a54ebe7361
          5972 ┊     3.34% ┊ std::sync::once::Once::call_once::{{closure}}::ha520deb2caa7e231
          5869 ┊     3.29% ┊ search

From what I can tell, the biggest chunk of our binary is occupied by the raw data section for our articles. Next up, we got the function headers and some float to decimal helper functions, that most likely come from deserialization.

Finally, I tried wasm-snip, which replaces a WebAssembly function's body with an unreachable like so, but it didn't reduce code size:

wasm-snip --snip-rust-fmt-code --snip-rust-panicking-code -o pkg/tinysearch_bg_snip.wasm pkg/tinysearch_bg_opt.wasm

After tweaking with the parameters of the cuckoo filters a bit and removing stop words from the articles, I arrived at 121kB (51kB gzipped) — not bad considering the average image size on the web is around 900kB. On top of that, the search functionality only gets loaded when a user clicks into the search field.

Update

Recently I moved the project from cuckoofilters to XOR filters. I used the awesome xorf project, which comes with built-in serde serialization. which allowed me to remove a lot of custom code.

With that, I could reduce the payload size by another 20-25% percent. I'm down to 99kB (49kB gzipped) on my blog now. 🎉

The new version is released on crates.io already, if you want to give it a try.

Frontend- and Glue Code

wasm-pack will auto-generate the JavaScript code to talk to Wasm.

For the search UI, I customized a few JavaScript and CSS bits from w3schools. It even has keyboard support! Now when a user enters a search query, we go through the cuckoo filter of each article and try to match the words. The results are scored by the number of hits. Thanks to my dear colleague Jorge Luis Betancourt for adding that part.

Video of the search functionality
Video of the search functionality

(Fun fact: this animation is about the same size as the uncompressed Wasm search itself.)

Caveats

Only whole words are matched. I would love to add prefix-search, but the binary became too big when I tried.

Usage

The standalone binary to create the Wasm file is called tinysearch. It expects a single path to a JSON file as an input:

tinysearch path/to/corpus.json

This corpus.json contains the text you would like to index. The format is pretty straightforward:

[
  {
    "title": "Article 1",
    "url": "https://example.com/article1",
    "body": "This is the body of article 1."
  },
  {
    "title": "Article 2",
    "url": "https://example.com/article2",
    "body": "This is the body of article 2."
  }
]

You can generate this JSON file with any static site generator. Here's my version for Zola:

{% set section = get_section(path="_index.md") %}

[
  {%- for post in section.pages -%}
    {% if not post.draft %}
      {
        "title": {{ post.title | striptags | json_encode | safe }},
        "url": {{ post.permalink | json_encode | safe }},
        "body": {{ post.content | striptags | json_encode | safe }}
      }
      {% if not loop.last %},{% endif %}
    {% endif %}
  {%- endfor -%}
]

I'm pretty sure that the Jekyll version looks quite similar. Here's a starting point. If you get something working for your static site generator, please let me know.

Observations

  • This is still the wild west: unstable features, nightly Rust, documentation gets outdated almost every day.
    Bring your thinking cap!
  • Creating a product out of a good idea is a lot of work. One has to pay attention to many factors: ease-of-use, generality, maintainability, documentation, and so on.
  • Rust is very good at removing dead code, so you usually don't pay for what you don't use. I would still advise you to be very conservative about the dependencies you add to a Wasm binary because it's tempting to add features that you don't need and which will add to the binary size. For example, I used StructOpt during testing, and I had a main() function that was parsing these command-line arguments. This was not necessary for Wasm, so I removed it later.
  • I understand that not everyone wants to write Rust code. It's complicated to get started with, but the cool thing is that you can use almost any other language, too. For example, you can write Go code and transpile to Wasm, or maybe you prefer PHP or Haskell. There is support for many languages already.
  • A lot of people dismiss WebAssembly as a toy technology. They couldn't be further from the truth. In my opinion, WebAssembly will revolutionize the way we build products for the web and beyond. What was very hard just two years ago is now easy: shipping code in any language to every browser. I'm super excited about its future.
  • If you're looking for a standalone, self-hosted search index for your company website, check out sonic. Also check out stork as an alternative.

WOW! This tool getting quite a bit of traction lately.✨‍

I don't run ads on this website, but if you like these kind of experiments, please consider sponsoring me on Github. This allows me to write more tools like this in the future.

Also, if you're interested in hands-on Rust consulting, pick a date from my calendar and we can talk about how I can help .

Try it!

The code for tinysearch is on Github.

Please be aware of these limitations:

  • Only searches for entire words. There are no search suggestions. The reason is that prefix search blows up binary size like Mentos and Diet Coke.
  • Since we bundle all search indices for all articles into one static binary, I only recommend to use it for low- to medium-sized websites. Expect around 4kB (non-compressed) per article.
  • The compile times are abysmal at the moment (around 1.5 minutes after a fresh install on my machine), mainly because we're compiling the Rust crate from scratch every time we rebuild the index.
    Update: This is mostly fixed thanks to the awesome work of CephalonRho in PR #13. Thanks again!

The final Wasm code is laser-fast because we save the roundtrips to a search-server. The instant feedback loop feels more like filtering a list than searching through posts. It can even work fully offline, which might be nice if you like to bundle it with an app.


Evolution and Software Development Brandon's Website

Brandon's Website2019-10-09 00:00:00 I was at one point introduced to the recurrent laryngeal nerve. Put simply, this is a nerve that exists in several types of animals, including humans, which instead of taking a very obvious direct path from source to destination, wraps hilariously around a major artery before doubling back and getting on its way:

New Website Brandon's Website

Brandon's Website2019-10-01 00:00:00 I decided to rebuild my website from scratch.

California is Beautiful Posts on elder.dev

Posts on elder.dev2019-08-14 00:00:00 Just a few select photos from a short trip away from it all …

Writing Safer Bash Posts on elder.dev

Posts on elder.dev2019-04-08 00:00:00 Bash scripts are a really convenient way to write simple utilities. Unfortunately many bash scripts in the wild are littered with bugs. Writing reliable bash can be hard. I’ve been reviewing and fixing a lot of bash while working on cleaning up the Kubernetes project’s scripts and wanted to collect some tips for writing more reliable scripts. Use ShellCheck ShellCheck is an excellent open source linter for shell capable of detecting many errors.

Maybe You Don't Need Kubernetes Matthias Endler

Matthias Endler2019-03-21 00:00:00
A woman riding a scooter
A woman riding a scooter
Source: Illustration created by freepik, Nomad logo by HashiCorp.

Kubernetes is the 800-pound gorilla of container orchestration.
It powers some of the biggest deployments worldwide, but it comes with a price tag.

Especially for smaller teams, it can be time-consuming to maintain and has a steep learning curve. For what our team of four wanted to achieve at trivago, it added too much overhead. So we looked into alternatives — and fell in love with Nomad.

The Wishlist

Our team runs a number of typical services for monitoring and performance analysis: API endpoints for metrics written in Go, Prometheus exporters, log parsers like Logstash or Gollum, and databases like InfluxDB or Elasticsearch. Each of these services run in their own container. We needed a simple system to keep those jobs running.

We started with a list of requirements for container orchestration:

  • Run a fleet of services across many machines.
  • Provide an overview of running services.
  • Allow for communication between services.
  • Restart them automatically when they die.
  • Be manageable by a small team.

On top of that, the following things were nice to have but not strictly required:

  • Tag machines by their capabilities (e.g., label machines with fast disks for I/O heavy services.)
  • Be able to run these services independently of any orchestrator (e.g. in development).
  • Have a common place to share configurations and secrets.
  • Provide an endpoint for metrics and logging.

Why Kubernetes Was Not A Good Fit For Us

When creating a prototype with Kubernetes, we noticed that we started adding ever-more complex layers of logic to operate our services. Logic on which we implicitly relied on.

As an example, Kubernetes allows embedding service configurations using ConfigMaps. Especially when merging multiple config files or adding more services to a pod, this can get quite confusing quickly. Kubernetes - or helm, for that matter - allows injecting external configs dynamically to ensure separation of concerns. But this can lead to tight, implicit coupling between your project and Kubernetes. Helm and ConfigMaps are optional features so you don’t have to use them. You might as well just copy the config into the Docker image. However, it’s tempting to go down that path and build unnecessary abstractions that can later bite you.

On top of that, the Kubernetes ecosystem is still rapidly evolving. It takes a fair amount of time and energy to stay up-to-date with the best practices and latest tooling. Kubectl, minikube, kubeadm, helm, tiller, kops, oc - the list goes on and on. Not all tools are necessary to get started with Kubernetes, but it’s hard to know which ones are, so you have to be at least aware of them. Because of that, the learning curve is quite steep.

When To Use Kubernetes

At trivago specifically, many teams use Kubernetes and are quite happy with it. These instances are managed by Google or Amazon however, which have the capacity to do so.

Kubernetes comes with amazing features, that make container orchestration at scale more manageable:

  • Fine-grained rights management
  • Custom controllers allow getting logic into the cluster. These are just programs that talk to the Kubernetes API.
  • Autoscaling! Kubernetes can scale your services up and down on demand. It uses service metrics to do this without manual intervention.

The question is if you really need all those features. You can't rely on these abstractions to just work; you'll have to learn what's going on under the hood.

Especially in our team, which runs most services on-premise (because of its close connection to trivago's core infrastructure), we didn't want to afford running our own Kubernetes cluster; we wanted to ship services instead.

Nuclear hot take: nobody will care about Kubernetes in five years. -A tweet by Corey Quinn

Batteries Not Included

Nomad is the 20% of service orchestration that gets you 80% of the way. All it does is manage deployments. It takes care of your rollouts and restarts your containers in case of errors, and that's about it.

The entire point of Nomad is that it does less: it doesn’t include fine-grained rights management or advanced network policies, and that’s by design. Those components are provided as enterprise services, by a third-party — or not at all.

I think Nomad hit a sweet-spot between ease of use and expressiveness. It's good for small, mostly independent services. If you need more control, you'll have to build it yourself or use a different approach. Nomad is just an orchestrator.

The best part about Nomad is that it's easy to replace. There is little to no vendor lock-in because the functionality it provides can easily be integrated into any other system that manages services. It just runs as a plain old single binary on every machine in your cluster; that's it!

The Nomad Ecosystem Of Loosely Coupled Components

The real power of Nomad lies within its ecosystem. It integrates very well with other - completely optional - products like Consul (a key-value store) or Vault (for secrets handling). Inside your Nomad file, you can have sections for fetching data from those services:

template {
  data = <<EOH
LOG_LEVEL="{{key "service/geo-api/log-verbosity"}}"
API_KEY="{{with secret "secret/geo-api-key"}}{{.Data.value}}{{end}}"
EOH

  destination = "secrets/file.env"
  env         = true
}

This will read the service/geo-api/log-verbosity key from Consul and expose it as a LOG_LEVEL environment variable inside your job. It's also exposing secret/geo-api-key from Vault as API_KEY. Simple, but powerful!

Because it's so simple, Nomad can also be easily extended with other services through its API. For example, jobs can be tagged for service discovery. At trivago, we tag all services, which expose metrics, with trv-metrics. This way, Prometheus finds the services via Consul and periodically scrapes the /metrics endpoint for new data. The same can be done for logs by integrating Loki for example.

There are many other examples for extensibility:

  • Trigger a Jenkins job using a webhook and Consul watches to redeploy your Nomad job on service config changes.
  • Use Ceph to add a distributed file system to Nomad.
  • Use fabio for load balancing.

All of this allowed us to grow our infrastructure organically without too much up-front commitment.

Fair Warning

No system is perfect. I advise you not to use any fancy new features in production right now. There are bugs and missing features of course - but that's also the case for Kubernetes.

Compared to Kubernetes, there is far less momentum behind Nomad. Kubernetes has seen around 75.000 commits and 2000 contributors so far, while Nomad sports about 14.000 commits and 300 contributors. It will be hard for Nomad to keep up with the velocity of Kubernetes, but maybe it doesn’t have to! The scope is much more narrow and the smaller community could also mean that it'll be easier to get your pull request accepted, in comparison to Kubernetes.

Summary

The takeaway is: don't use Kubernetes just because everyone else does. Carefully evaluate your requirements and check which tool fits the bill.

If you're planning to deploy a fleet of homogenous services on large-scale infrastructure, Kubernetes might be the way to go. Just be aware of the additional complexity and operational costs. Some of these costs can be avoided by using a managed Kubernetes environment like Google Kubernetes Engine or Amazon EKS.

If you're just looking for a reliable orchestrator that is easy to maintain and extendable, why not give Nomad a try? You might be surprised by how far it'll get you.

If Kubernetes were a car, Nomad would be a scooter. Sometimes you prefer one and sometimes the other. Both have their right to exist.


Avoiding Burnout in Open Source Posts on elder.dev

Posts on elder.dev2019-03-17 00:00:00 This post may come off a bit ironic, coming from someone who burned out pretty hard recently, but I received some really good advice and I hope it can help someone else. Some of the advice I received: Set boundaries, reserve time for yourself Don’t feel guilty for not responding right away. Even if you work on Open Source fulltime, don’t let it become a “second job”, take time for yourself.

What Is Rust Doing Behind the Curtains? Matthias Endler

Matthias Endler2018-12-02 00:00:00

Rust allows for a lot of syntactic sugar, that makes it a pleasure to write. It is sometimes hard, however, to look behind the curtain and see what the compiler is really doing with our code.


The Unreasonable Effectiveness of Excel Macros Matthias Endler

Matthias Endler2018-11-05 00:00:00

I never was a big fan of internships, partially because all the exciting companies were far away from my little village in Bavaria and partially because I was too shy to apply.

Only once I applied for an internship in Ireland as part of a school program. Our teacher assigned the jobs and so my friend got one at Apple and I ended up at a medium-sized IT distributor — let's call them PcGo.


Mapping Appalachia Posts on elder.dev

Posts on elder.dev2018-10-31 00:00:00 ✎ Update It’s worth noting that I did not wind up playing Fallout 76 much more. After the B.E.T.A. my interest fell off quickly as the locations and quests failed to be as engaging for me as previous Fallout games. I do not recommend Fallout 76 to anyone. October 30th B.E.T.A. (Break-It Early Test Application) During the first B.E.T.A. some places I discovered along my travels were:

GitOps All The Things! Posts on elder.dev

Posts on elder.dev2018-09-23 00:00:00 You should use GitOps for everything. Everything. GitOps is a recent-ish term for: use declarative configuration for your infrastructure (e.g. Kubernetes) version all of your configuration in source control (I.E. Git) use your source control to drive your infrastructure (I.E. use CI/CD = Ops) GitOps: versioned CI/CD on top of declarative infrastructure. Stop scripting and start shipping. https://t.co/SgUlHgNrnY — Kelsey Hightower (@kelseyhightower) January 17, 2018 Why? - Well, do you like the sound of:

Slackmoji Anywhere Posts on elder.dev

Posts on elder.dev2018-09-09 00:00:00 I use slack a lot to communicate with other Kubernetes contributors, and I’m a fan of the emoji reaction feature for reacting to posts without notifying everyone in the group. Positive emoji responses in particular are a simple way to acknowledge messages and make discussions more friendly and welcoming. slack emoji reaction example (thanks dims!) A particularly fun part of this feature is custom emoji support, commonly known as “slackmoji”, which allows adding arbitrary images (and even gifs!

Switching from a German to a US Keyboard Layout - Is It Worth It? Matthias Endler

Matthias Endler2018-09-02 00:00:00

For the first three decades of my life, I've exclusively used a German keyboard layout for programming. In 2018, I finally switched to a US layout. This post summarizes my thoughts around the topic. I was looking for a similar article before jumping the gun, but I couldn't find one — so I wrote it.

My current keyboard (as of April 2021), the low-profile, tenkeyless Keychron K1 is close to my favorite input device. Yes, I got the RGB version. &mdash; [Amazon referral link](https://amzn.to/3tRatjU).
My current keyboard (as of April 2021), the low-profile, tenkeyless Keychron K1 is close to my favorite input device. Yes, I got the RGB version. — Amazon referral link.

Why Switch To the US Layout?

I was reasonably efficient when writing prose, but felt like a lemur on a piano when programming: lots of finger-stretching while trying to reach the special keys like {, ;, or /.

German Keyboard Layout
German Keyboard Layout
Source: Image by Wikipedia

Here's Wikipedia's polite explanation why the German keyboard sucks for programming:

Like many other non-American keyboards, German keyboards change the right Alt key into an Alt Gr key to access a third level of key assignments. This is necessary because the umlauts and some other special characters leave no room to have all the special symbols of ASCII, needed by programmers among others, available on the first or second (shifted) levels without unduly increasing the size of the keyboard.

But Why Switch Now?

After many years of using a rubber-dome Logitech Cordless Desktop Wave, I had to get a mechanical keyboard again.

Those rubber domes just feel too mushy to me now. In addition to that, I enjoy the clicky sound of a mechanical keyboard and the noticeable tactile bump. (I'm using Cherry MX Brown Keys with O-Ring dampeners to contain the anger of my coworkers.)

Most mechanical keyboards come with an ANSI US layout only, so I figured, I'd finally make the switch.

My first mechanical keyboard &mdash; [Durgod Taurus K320](https://www.amazon.de/gp/product/B07QK16RDQ/ref=as_li_tl?ie=UTF8&tag=matthiasendle-21&camp=1638&creative=6742&linkCode=as2&creativeASIN=B07QK16RDQ&linkId=fb0a782ecbc713f8266b90b941375a5f) (referral link). They also have a fancy [white-pink](https://www.amazon.de/gp/product/B081LZV2QM?ie=UTF8&tag=matthiasendle-21&camp=1638&linkCode=xm2&creativeASIN=B081LZV2QM) ISO version now.
My first mechanical keyboard — Durgod Taurus K320 (referral link). They also have a fancy white-pink ISO version now.

How Long Did It Take To Get Accustomed To The New Layout?

Working as a Software Engineer, my biggest fear was, that the switch would slow down my daily work. This turned out not to be true. I was reasonably productive from day one, and nobody even noticed any difference. (That's a good thing, right?)

At first, I didn't like the bar-shaped US-Return key. I preferred the European layout with a vertical enter key. I was afraid that I would hit the key by accident. After a while, I find that the US return key to be even more convenient. I never hit it by accident, and it's easy to reach with my pinky from the home position.

Within two weeks, I was back to 100% typing speed.

Did My Programming Speed Improve Noticeably?

Yup. I'd say I can type programs about 30% faster now.

Especially when using special characters (/, ;, {, and so on) I'm much faster now; partly because the key locations feel more intuitive, but mainly because my fingers stay at their dedicated positions now.

Somehow the position of special characters feels just right. I can now understand the reason why Vim is using / for search or why the pipe symbol is |: both are easy to reach! It all makes sense now! (For a fun time, try that on a German keyboard!)

I now understand why Mircosoft chose \ as a directory separator: it's easily accessible from a US keyboard. On the German layout, it's… just… awful (Alt Gr+ß on Windows, Shift + Option + 7 on Mac).

The opening curly brace on a German layout Mac is produced with Alt+8, which always made me leave the home row and break my typing flow. Now there are dedicated keys for parentheses. Such a relief!

Update: It also helps greatly when looking up hotkeys for IDEs, text editors, photo editors, etc. because some programs remap shortcuts for the German market, which means that all the English documentation is totally worthless. Now I can just use the shortcuts mentioned and move on with my life.

Am I Slower When Writing German Texts Now?

In the beginning, I was.

Somehow my brain associated the German layout with German texts. First, I used the macOS layout switcher. This turned out to be cumbersome and take time.

Then I found the "US with Umlauts via Option Key Layout". It works perfectly fine for me. It allows me to use a single Keyboard layout but insert German umlauts at will (e.g. ö is Option+o). There is probably a similar layout for other language combinations.

Stefan Imhoff notified me that there's also a Karabiner rule which does the same. Might come in handy in case you already use this tool.

Is Switching Between Keyboards Painful?

US keyboard layout
US keyboard layout
Source: Wikipedia

My built-in MacBook Pro keyboard layout is still German. I was afraid, that switching between the internal German and the external English keyboard would confuse me. This turned out not to be a problem. I rarely look at the print anyway. (Update: can't remember when I last looked at the print.)

How Often Do You Switch Back To A German Layout Now?

Never. My Girlfriend has a German keyboard and ever time I have to use it, I switch to the US layout. It makes her very happy when I do this and forget to switch back to German when I'm done.

Summary

If you consider switching, just do it! I don't look back at all and apart from the initial transition period, I still couldn't find any downsides.

Since posting this article, many of my friends made the switch as well and had similar experiences:


fastcat - A Faster `cat` Implementation Using Splice Matthias Endler

Matthias Endler2018-07-31 00:00:00

Lots of people asked me to write another piece about the internals of well-known Unix commands. Well, actually, nobody asked, but it makes for a good intro. I'm sure you’ve read the previous parts about yes and ls — they are epic.

Anyway, today we talk about cat, which is used to concatenate files - or, more commonly, abused to print a file's contents to the screen.

# Concatenate files, the intended purpose
cat input1.txt input2.txt input3.txt > output.txt

# Print file to screen, the most common use-case
cat myfile

Implementing cat

Here's a naive cat in Ruby:

#!/usr/bin/env ruby

def cat(args)
  args.each do |arg|
    IO.foreach(arg) do |line|
      puts line
    end
  end
end

cat(ARGV)

This program goes through each file and prints its contents line by line. Easy peasy! But wait, how fast is this tool?

I quickly created a random 2 GB file for the benchmark.

Let's compare the speed of our naive implementation with the system one using the awesome pv (Pipe Viewer) tool. All tests are averaged over five runs on a warm cache (file in memory).

# Ruby 2.5.1
> ./rubycat myfile | pv -r > /dev/null
[196MiB/s]

Not bad, I guess? How does it compare with my system's cat?

cat myfile | pv -r > /dev/null
[1.90GiB/s]

Uh oh, GNU cat is ten times faster than our little Ruby cat. 💎🐈🐌

Making our Ruby cat a little faster

Our naive Ruby code can be tweaked a bit. Turns out line buffering hurts performance in the end1:

#!/usr/bin/env ruby

def cat(args)
  args.each do |arg|
    IO.copy_stream(arg, STDOUT)
  end
end

cat(ARGV)
rubycat myfile | pv -r > /dev/null
[1.81GiB/s]

Wow... we didn't really try hard, and we're already approaching the speed of a tool that gets optimized since 1971. 🎉

But before we celebrate too much, let's see if we can go even faster.

Splice

What initially motivated me to write about cat was this comment by user wahern on Hacker News:

I'm surprised that neither GNU yes nor GNU cat uses splice(2).

Could this splice thing make printing files even faster? — I was intrigued.

Splice was first introduced to the Linux Kernel in 2006, and there is a nice summary from Linus Torvalds himself, but I prefer the description from the manpage:

splice() moves data between two file descriptors without copying between kernel address space and user address space. It transfers up to len bytes of data from the file descriptor fd_in to the file descriptor fd_out, where one of the file descriptors must refer to a pipe.

If you really want to dig deeper, here's the corresponding source code from the Linux Kernel, but we don't need to know all the nitty-gritty details for now. Instead, we can just inspect the header from the C implementation:

#include <fcntl.h>

ssize_t splice (int fd_in, loff_t *off_in, int fd_out,
                loff_t *off_out, size_t len,
                unsigned int flags);

To break it down even more, here's how we would copy the entire src file to dst:

const ssize_t r = splice (src, NULL, dst, NULL, size, 0);

The cool thing about this is that all of it happens inside the Linux kernel, which means we won't copy a single byte to userspace (where our program runs). Ideally, splice works by remapping pages and does not actually copy any data, which may improve I/O performance (reference).

File icon by Aleksandr Vector from the Noun Project. Terminal icon by useiconic.com from the Noun Project.
Source: File icon by Aleksandr Vector from the Noun Project. Terminal icon by useiconic.com from the Noun Project.

Using splice from Rust

I have to say I'm not a C programmer and I prefer Rust because it offers a safer interface. Here's the same thing in Rust:

#[cfg(any(target_os = "linux", target_os = "android"))]
pub fn splice(
    fd_in: RawFd,
    off_in: Option<&mut libc::loff_t>,
    fd_out: RawFd,
    off_out: Option<&mut libc::loff_t>,
    len: usize,
    flags: SpliceFFlags,
) -> Result<usize>

Now, I didn't implement the Linux bindings myself. Instead, I just used a library called nix, which provides Rust friendly bindings to *nix APIs.

There is one caveat, though: We cannot really copy the file directly to standard out, because splice requires one file descriptor to be a pipe. The way around that is to create a pipe, which consists of a reader and a writer (rd and wr). We pipe the file into the writer, and then we read from the pipe and push the data to stdout.

You can see that I use a relatively big buffer of 16384 bytes (214) to improve performance.

extern crate nix;

use std::env;
use std::fs::File;
use std::io;
use std::os::unix::io::AsRawFd;

use nix::fcntl::{splice, SpliceFFlags};
use nix::unistd::pipe;

const BUF_SIZE: usize = 16384;

fn main() {
    for path in env::args().skip(1) {
        let input = File::open(&path).expect(&format!("fcat: {}: No such file or directory", path));
        let (rd, wr) = pipe().unwrap();
        let stdout = io::stdout();
        let _handle = stdout.lock();

        loop {
            let res = splice(
                input.as_raw_fd(),
                None,
                wr,
                None,
                BUF_SIZE,
                SpliceFFlags::empty(),
            ).unwrap();

            if res == 0 {
                // We read 0 bytes from the input,
                // which means we're done copying.
                break;
            }

            let _res = splice(
                rd,
                None,
                stdout.as_raw_fd(),
                None,
                BUF_SIZE,
                SpliceFFlags::empty(),
            ).unwrap();
        }
    }
}

So, how fast is this?

fcat myfile | pv -r > /dev/null
[5.90GiB/s]

Holy guacamole. That's over three times as fast as system cat.

Operating System support

  • Linux and Android are fully supported.
  • OpenBSD also has some sort of splice implementation called sosplice. I didn't test that, though.
  • On macOS, the closest thing to splice is its bigger brother, sendfile, which can send a file to a socket within the Kernel. Unfortunately, it does not support sending from file to file.2 There's also copyfile, which has a similar interface, but unfortunately, it is not zero-copy. (I thought so in the beginning, but I was wrong.)
  • Windows doesn't provide zero-copy file-to-file transfer (only file-to-socket transfer using the TransmitFile API).

Nevertheless, in a production-grade implementation, the splice support could be activated on systems that support it, while using a generic implementation as a fallback.

Nice, but why on earth would I want that?

I have no idea. Probably you don't, because your bottleneck is somewhere else. That said, many people use cat for piping data into another process like

# Count all lines in C files
cat *.c | wc -l

or

cat kittens.txt | grep "dog"

In this case, if you notice that cat is the bottleneck try fcat (but first, try to avoid cat altogether).

With some more work, fcat could also be used to directly route packets from one network card to another, similar to netcat.

Lessons learned

  • The closer we get to bare metal, the more our hard-won abstractions fall apart, and we are back to low-level systems programming.
  • Apart from a fast cat, there's also a use-case for a slow cat: old computers. For that purpose, there's — you guessed it — slowcat.

That said, I still have no idea why GNU cat does not use splice on Linux. 🤔 The source code for fcat is on Github. Contributions welcome!

Footnotes

1. Thanks to reader Freeky for making this code more idiomatic.
2. Thanks to reader masklinn for the hint.


That Octocat on the Wall Matthias Endler

Matthias Endler2018-06-09 00:00:00
Photo of my office with Github's octocat on the wall over my couch
Photo of my office with Github's octocat on the wall over my couch

So I'm in a bit of a sentimental mood lately. Github got acquired by Microsoft. While I think the acquisition was well-deserved, I still wish it didn't happen. Let me explain.

My early days

I joined Github on 3rd of January 2010. Since I was a bit late to the game, my usual handle (mre) was already taken. So I naively sent a mail to Github, asking if I could bag the name as it seemed to be abandoned. To my surprise, I got an answer. The response came from a guy named Chris Wanstrath.

All he wrote was "it's yours."

That was the moment I fell in love with Github. I felt encouraged to collaborate on projects, that everybody could contribute something valuable. Only later I found out that Chris was one of the founders and the CEO of the company.

Living on Github

Before Github, there was SourceForge, and I only went there to download binaries. Github showed me, that there was an entire community of like-minded people out there, who ❤️ to work on code in their free-time. To me, Github is much more than a git interface; it's a social network. While other people browse Facebook or Instagram, I browse Github.

I can still vividly remember getting my first star and my first issue on one of my projects coming from a real (!) person other than myself.

After so many years, a pull-request still feels like the most personal gift anyone could give to me.

Github - the culture

After a while, I started to admire some Github employees deeply:

All three developers have since left the company. I can't help but notice that Github has changed. The harassment accusations and letting Zach Holman go are only part of the story.

It has become a company like any other, maintaining a mature product. It doesn't excite me anymore.

An alternative reality

There's still a bitter taste in my mouth when I think that Github has fallen prey to one of the tech giants. I loved Github while it was a small, friendly community of passionate developers. Could this have been sustainable?

Maybe through paid features for project maintainers.

You see, if you do Open Source every day, it can be a lot of work. People start depending on your projects, and you feel responsible for keeping the lights on.

To ease the burden, I'd love to have deeper insights into my project usage: visitor statistics for longer than two weeks, a front page where you could filter and search for events, a better way to handle discussions (which can get out of hand quickly), better CI integration à la Gitlab.

These features would be targeted at the top 10% of Github users, a group of 3 million people. Would this be enough to pay the bills? Probably. Would it be enough to grow? Probably not.

So what?

I don't think the acquisition will kill the culture. Microsoft is a strong partner and Nat Friedman is one of us. On the other side, I'm not as enthusiastic as I used to be. There's room for competitors now and I'm beginning to wonder what will be the next Github. That said, I will keep the Octocat on my office wall, in the hope that the excitement comes back.


Ten Years of Vim Matthias Endler

Matthias Endler2018-05-20 00:00:00

When I opened Vim by accident for the first time, I thought it was broken. My keystrokes changed the screen in unpredictable ways, and I wanted to undo things and quit. Needless to say, it was an unpleasant experience. There was something about it though, that kept me coming back and it became my main editor.

Fast forward ten years (!) and I still use Vim. After all the Textmates and Atoms and PhpStorms I tried, I still find myself at home in Vim. People keep asking me: Why is that?

Why Vim?

Before Vim, I had used many other editors like notepad or nano. They all behaved more or less as expected: you insert text, you move your cursor with the arrow keys or your mouse, and you save with Control + S or by using the menu bar. VI (and Vim, its spiritual successor) is different.

EVERYTHING in Vim is different, and that's why it's so highly effective. Let me explain.

The Zen of Vim

The philosophy behind Vim takes a while to sink in: While other editors focus on writing as the central part of working with text, Vim thinks it's editing.

You see, most of the time I don't spend writing new text; instead, I edit existing text.
I mold text, form it, turn it upside down. Writing text is craftsmanship and hard work. You have to shape your thoughts with your cold, bare hands until they somewhat form a coherent whole. This painful process is what Vim tries to make at least bearable. It helps you keep control. It does that, by providing you sharp, effective tools to modify text. The core of Vim is a language for editing text.

Vim, The Language

The Vim commands are not cryptic, you already know them.

  • To undo, type u.
  • To find the next t, type ft.
  • To delete a word, type daw.
  • To change a sentence, type cas.

More often than not, you can guess the correct command by thinking of an operation you want to execute and an object to execute it on. Then just take the first character of every word. Try it! If anything goes wrong, you can always hit ESC and type u for undo.

Operations: delete, find, change, back, insert, append,...
Objects: word, sentence, parentheses, (html) tag,... (see :help text-objects)

Inserting text is just another editing operation, which can be triggered with i. That's why, by default, you are in normal mode — also called command mode — where all those operations work.

Once you know this, Vim makes a lot more sense, and that's when you start to be productive.

How My Workflow Changed Over The Years

When I was a beginner, I was very interested in how people with more Vim experience would use the editor. Now that I'm a long-time user, here's my answer: there's no secret sauce. I certainly feel less exhausted after editing text for a day, but 90% of the commands I use fit on a post-it note.

That said, throughout the years, my Vim habits changed.
I went through several phases:

Year 1: I'm happy if I can insert text and quit again.
Year 2: That's cool, let's learn more shortcuts.
Year 3-5: Let's add all the features!!!
Year 6-10: My .vimrc is five lines long.

Year three is when I started to learn the Vim ecosystem for real. I tried all sorts of flavors like MacVim and distributions like janus. For a while, I even maintained my own Vim configuration , which was almost 400 lines long.

All of that certainly helped me learn what's out there, but I'm not sure if I would recommend that to a Vim beginner. After all, you don't really need all of that. Start with a vanilla Vim editor which works just fine!

My current Vim setup is pretty minimalistic. I don't use plugins anymore, mostly out of laziness and because built-in Vim commands or macros can replace them.

Here are three concrete examples of how my workflow changed over the years:

  1. In the beginning, I used a lot of "number powered movements". That is, if you have a command like b, which goes back one word in the text, you can also say 5b to go back five words. Nowadays I mostly use / to move to a matching word because it's quicker.

  2. I don't use arrow keys to move around in text anymore but forced myself to use h, j, k, l. Many people say that this is faster. After trying this for a few years, I don't think that is true (at least for me). I now just stick to it out of habit.

  3. On my main working machine I use Vim for quick text editing and Visual Studio Code plus the awesome Vim plugin for projects. This way, I get the best of both worlds.

Workflow Issues I Still Struggle With

After all these years I'm still not a Vim master — far from it. As every other Vim user will tell you, we're all still learning.

Here are a few things I wish I could do better:

  • Jumping around in longer texts: I know the basics, like searching (/), jumping to a matching bracket (%) or jumping to specific lines (for line 10, type 10G), but I still could use symbols more often for navigation.
  • Using visual mode for moving text around: Sometimes it can be quite complicated to type the right combination of letters to cut (delete) the text I want to move around. That's where visual mode (v) shines. It highlights the selected text. I should use it more often.
  • Multiple registers for copy and paste: Right now I only use one register (like a pastebin) for copying text, but Vim supports multiple registers. That's cool if you want to move around more than one thing at the same time. Let's use more of those!
  • Tabs: I know how tabs work, but all the typing feels clunky. That's why I never extensively used them. Instead, I mostly use multiple terminal tabs or an IDE with Vim bindings for bigger projects.

Would I learn Vim again?

That's a tough question to answer.

On one side, I would say no. There's a steep learning curve in Vim and seeing all those modern IDEs become better at understanding the user's intent, editing text became way easier and faster in general.

On the other side, Vim is the fastest way for me to write down my thoughts and code. As a bonus, it runs on every machine and might well be around for decades to come. In contrast, I don't know if the IntelliJ shortcuts will be relevant in ten years (note: if you read this in the future and ask yourself "What is IntelliJ?", the answer might be no).

Takeaways

If I can give you one tip, don't learn Vim by memorizing commands. Instead, look at your current workflow and try to make it better, then see how Vim can make that easier. It helps to look at other people using Vim to get inspired (Youtube link with sound).

You will spend a lot of time writing text, so it's well worth the time investment to learn one editor really well — especially if you are a programmer.

After ten years, Vim is somehow ingrained in my mind. I think Vim when I'm editing text. It has become yet another natural language to me. I'm looking forward to the next ten years.


Refactoring Go Code to Avoid File I/O in Unit Tests Matthias Endler

Matthias Endler2018-03-22 00:00:00

At work today, I refactored some simple Go code to make it more testable. The idea was to avoid file handling in unit tests without mocking or using temporary files by separating data input/output and data manipulation.


A Tiny `ls` Clone Written in Rust Matthias Endler

Matthias Endler2018-03-09 00:00:00

In my series of useless Unix tools rewritten in Rust, today I'm going to be covering one of my all-time favorites: ls.

First off, let me say that you probably don't want to use this code as a replacement for ls on your local machine (although you could!). As we will find out, ls is actually quite a powerful tool under the hood. I'm not going to come up with a full rewrite, but instead only cover the very basic output that you would expect from calling ls -l on your command line. What is this output? I'm glad you asked.

Expected output

> ls -l
drwxr-xr-x 2 mendler  staff    13468 Feb  4 11:19 Top Secret
-rwxr--r-- 1 mendler  staff  6323935 Mar  8 21:56 Never Gonna Give You Up - Rick Astley.mp3
-rw-r--r-- 1 mendler  staff        0 Feb 18 23:55 Thoughts on Chess Boxing.doc
-rw-r--r-- 1 mendler  staff   380434 Dec 24 16:00 nobel-prize-speech.txt

Your output may vary, but generally, there are a couple of notable things going on. From left to right, we've got the following fields:

  • The drwx things in the beginning are the file permissions (also called the file mode). If d is set, it's a directory. r means read, w means write and x execute. This rwx pattern gets repeated three times for the current user, the group, and other computer users respectively.
  • Next we got the hardlink count when referring to a file, or the number of contained directory entries when referring to a directory. (Reference)
  • Owner name
  • Group name
  • Number of bytes in the file
  • Date when the file was last modified
  • Finally, the path name

For more in-depth information, I can recommend reading the manpage of ls from the GNU coreutils used in most Linux distributions and the one from Darwin (which powers MacOS).

Whew, that's a lot of information for such a tiny tool. But then again, it can't be so hard to port that to Rust, right? Let's get started!

A very basic ls in Rust

Here is the most bare-bones version of ls, which just prints all files in the current directory:

use std::fs;
use std::path::Path;
use std::error::Error;
use std::process;

fn main() {
	if let Err(ref e) = run(Path::new(".")) {
		println!("{}", e);
		process::exit(1);
	}
}

fn run(dir: &Path) -> Result<(), Box<Error>> {
	if dir.is_dir() {
		for entry in fs::read_dir(dir)? {
				let entry = entry?;
				let file_name = entry
						.file_name()
						.into_string()
						.or_else(|f| Err(format!("Invalid entry: {:?}", f)))?;
				println!("{}", file_name);
		}
	}
	Ok(())
}

We can copy that straight out of the documentation. When we run it, we get the expected output:

> cargo run
Cargo.lock
Cargo.toml
src
target

It prints the files and exits. Simple enough.

We should stop for a moment and celebrate our success, knowing that we just wrote our first little Unix utility from scratch. Pro Tip: You can install the binary with cargo install and call it like any other binary from now on.

But we have higher goals, so let's continue.

Adding a parameter to specify the directory

Usually, if we type ls mydir, we expect to get the file listing of no other directory than mydir. We should add the same functionality to our version.

To do this, we need to accept command line parameters. One Rust crate that I love to use in this case is structopt. It makes argument parsing very easy.

Add it to your Cargo.toml. (You need cargo-edit for the following command).

cargo add structopt

Now we can import it and use it in our project:

#[macro_use]
extern crate structopt;

// use std::...
use structopt::StructOpt;

#[derive(StructOpt, Debug)]
struct Opt {
	/// Output file
	#[structopt(default_value = ".", parse(from_os_str))]
	path: PathBuf,
}

fn main() {
	let opt = Opt::from_args();
	if let Err(ref e) = run(&opt.path) {
			println!("{}", e);
			process::exit(1);
	}
}

fn run(dir: &PathBuf) -> Result<(), Box<Error>> {
	// Same as before
}

By adding the Opt struct, we can define the command line flags, input parameters, and the help output super easily. There are tons of configuration options, so it's worth checking out the project homepage.

Also note, that we changed the type of the path variable from Path to PathBuf. The difference is, that PathBuf owns the inner path string, while Path simply provides a reference to it. The relationship is similar to String and &str.

Reading the modification time

Now let's deal with the metadata. First, we try to retrieve the modification time from the file. A quick look at the documentation shows us how to do it:

use std::fs;

let metadata = fs::metadata("foo.txt")?;

if let Ok(time) = metadata.modified() {
	println!("{:?}", time);
}

The output might not be what you expect: we receive a SystemTime object, which represents the measurement of the system clock. E.g. this code

println!("{:?}", SystemTime::now());
// Prints: SystemTime { tv_sec: 1520554933, tv_nsec: 610406401 }

But the format that we would like to have is something like this:

Mar  9 01:24

Thankfully, there is a library called chrono, which can read this format and convert it into any human readable output we like:

let current: DateTime<Local> = DateTime::from(SystemTime::now());
println!("{}", current.format("%_d %b %H:%M").to_string());

this prints

9 Mar 01:29

(Yeah, I know it's getting late.)

Armed with that knowledge, we can now read our file modification time.

cargo add chrono
use chrono::{DateTime, Local};

fn run(dir: &PathBuf) -> Result<(), Box<Error>> {
	if dir.is_dir() {
		for entry in fs::read_dir(dir)? {
			let entry = entry?;
			let file_name = ...

			let metadata = entry.metadata()?;
			let size = metadata.len();
			let modified: DateTime<Local> = DateTime::from(metadata.modified()?);

			println!(
				"{:>5} {} {}",
				size,
				modified.format("%_d %b %H:%M").to_string(),
				file_name
			);
		}
	}
	Ok(())
}

This {:>5} might look weird. It's a formatting directive provided by std::fmt. It means "right align this field with a space padding of 5" - just like our bigger brother ls -l is doing it.

Similarly, we retrieved the size in bytes with metadata.len().

Unix file permissions are a zoo

Reading the file permissions is a bit more tricky. While the rwx notation is very common in Unix derivatives such as *BSD or GNU/Linux, many other operating systems ship their own permission management. There are even differences between the Unix derivatives.

Wikipedia lists a few extensions to the file permissions that you might encounter:

That just goes to show, that there are a lot of important details to be considered when implementing this in real life.

Implementing very basic file mode

For now, we just stick to the basics and assume we are on a platform that supports the rwx file mode.

Behind the r, the w and the x are in reality octal numbers. That's easier for computers to work with and many hardcore users even prefer to type the numbers over the symbols. The ruleset behind those octals is as follows. I took that from the chmod manpage.

	Modes may be absolute or symbolic.
	An absolute mode is an octal number constructed
	from the sum of one or more of the following values

	 0400    Allow read by owner.
	 0200    Allow write by owner.
	 0100    For files, allow execution by owner.
	 0040    Allow read by group members.
	 0020    Allow write by group members.
	 0010    For files, allow execution by group members.
	 0004    Allow read by others.
	 0002    Allow write by others.
	 0001    For files, allow execution by others.

For example, to set the permissions for a file so that the owner can read, write and execute it and nobody else can do anything would be 700 (400 + 200 +100).

Granted, those numbers are the same since the 70s and are not going to change soon, but it's still a bad idea to compare our file permissions directly with the values; if not for compatibility reasons, then for readability and to avoid magic numbers in our code.

Therefore, we use the libc crate, which provides constants for those magic numbers. As mentioned above, these file permissions are Unix specific, so we need to import a Unix-only library named std::os::unix::fs::PermissionsExt; for that.

extern crate libc;

// Examples:
// * `S_IRGRP` stands for "read permission for group",
// * `S_IXUSR` stands for "execution permission for user"
use libc::{S_IRGRP, S_IROTH, S_IRUSR, S_IWGRP, S_IWOTH, S_IWUSR, S_IXGRP, S_IXOTH, S_IXUSR};
use std::os::unix::fs::PermissionsExt;

We can now get the file permissions like so:

let metadata = entry.metadata()?;
let mode = metadata.permissions().mode();
parse_permissions(mode as u16);

parse_permissions() is a little helper function defined as follows:

fn parse_permissions(mode: u16) -> String {
	let user = triplet(mode, S_IRUSR, S_IWUSR, S_IXUSR);
	let group = triplet(mode, S_IRGRP, S_IWGRP, S_IXGRP);
	let other = triplet(mode, S_IROTH, S_IWOTH, S_IXOTH);
	[user, group, other].join("")
}

It takes the file mode as a u16 (simply because the libc constants are u16) and calls triplet on it. For each flag read, write, and execute, it runs a binary & operation on mode. The output is matched exhaustively against all possible permission patterns.

fn triplet(mode: u16, read: u16, write: u16, execute: u16) -> String {
	match (mode & read, mode & write, mode & execute) {
		(0, 0, 0) => "---",
		(_, 0, 0) => "r--",
		(0, _, 0) => "-w-",
		(0, 0, _) => "--x",
		(_, 0, _) => "r-x",
		(_, _, 0) => "rw-",
		(0, _, _) => "-wx",
		(_, _, _) => "rwx",
	}.to_string()
}

Wrapping up

The final output looks like this. Close enough.

> cargo run
rw-r--r--     7  6 Mar 23:10 .gitignore
rw-r--r-- 15618  8 Mar 00:41 Cargo.lock
rw-r--r--   185  8 Mar 00:41 Cargo.toml
rwxr-xr-x   102  5 Mar 21:31 src
rwxr-xr-x   136  6 Mar 23:07 target

That's it! You can find the final version of our toy ls on Github. We are still far away from a full-fledged ls replacement, but at least we learned a thing or two about its internals.

If you're looking for a proper ls replacement written in Rust, go check out lsd. If, instead, you want to read another blog post from the same series, check out A Little Story About the yes Unix Command.


Brewing With Kubernetes Posts on elder.dev

Posts on elder.dev2018-03-04 00:00:00 My coffee pot is now a node in my home Kubernetes cluster, and it’s awesome. More specifically the Raspberry Pi wired to my CoffeePot controller now runs on Kubernetes thanks to kubeadm in a cluster with the node running my site. I set up a public live status page displaying all of the sensor data as well as the last update time, with control restricted to users on my local network.

Migrating My Site to Kubernetes Posts on elder.dev

Posts on elder.dev2018-03-04 00:00:00 Previously when I brought my my site back online I briefly mentioned the simple setup I threw together with Caddy running on a tiny GCE VM with a few scripts — Since then I’ve had plenty of time to experience the awesomeness that is managing services with Kubernetes at work while developing Kubernetes’s testing infrastructure (which we run on GKE). So I decided, of course, that it was only natural to migrate my own service(s) to Kubernetes for maximum dog-fooding.

Rust in 2018 Matthias Endler

Matthias Endler2018-01-09 00:00:00

I wrote about the future of Rust before and it seems like nobody stops me from doing it again! Quite the contrary: this time the Rust core team even asked for it. I'm a bit late to the party, but here are my 2 cents about the priorities for Rust in 2018.


Functional Programming for Mathematical Computing Matthias Endler

Matthias Endler2018-01-02 00:00:00

Programming languages help us describe general solutions for problems; the result just happens to be executable by machines. Every programming language comes with a different set of strengths and weaknesses, one reason being that its syntax and semantics heavily influence the range of problems which can easily be tackled with it.

tl;dr: I think that functional programming is better suited for mathematical computations than the more common imperative approach.

Using built-in abstractions for Mathematics

The ideas behind a language (the underlying programming paradigms) are distinctive for the community that builds around it. The developers create a unique ecosystem of ready-to-use libraries and frameworks around the language core. As a consequence, some languages are stronger in areas such as business applications (one could think of Cobol), others work great for systems programming (like C or Rust).

When it comes to solving mathematical and numerical problems with computers, Fortran might come to mind. Although Fortran is a general-purpose language, it is mostly known for scientific computing. Of course, the language was created with that purpose in mind – hence the name, Formula Translation.

One reason for its popularity in this area is that it offers some built-in domain-specific keywords to express mathematical concepts, while keeping an eye on performance. For instance, it has a dedicated datatype for complex numbers – COMPLEX – and a keyword named DIMENSION which is quite similar to the mathematical term and can be used to create arrays and vectors.

Imperative vs functional style

Built-in keywords can help expand the expressiveness of a language into a specific problem space, but this approach is severly limited. It’s not feasible to extend the language core ad infinitum; it would just be harder to maintain and take longer to learn. Therefore, most languages provide other ways of abstraction – like functions, subroutines, classes and objects – to split a routine into smaller, more manageable parts. These mechanisms might help to control the complexity of a program, but especially when dealing with mathematical problems, one has to be careful not to obfuscate the solution with boilerplate code.

Specimen I - Factorial

As an example, the stated problem might be to translate the following formula, which calculates the factorial of a positive number n, into program code:

The mathematical definition of a faculty: n! = 1 * 2 * 3 ... * n

An implementation of the above formula using imperative style Java might look like this:

public static long fact(final int n) {
    if (n < 0) {
        // Negative numbers not allowed
        return 0;
    }
    long prod = 1;
    for (int i = 1; i <= n; ++i) {
        prod *= i;
    }
    return prod;
}

This is quite a long solution for such a short problem definition. (Note that writing a version with an explicit loop from 1 to n was on purpose; a recursive function would be shorter, but uses a concept which was not introduced by the mathematical formula.)

Also, the program contains many language-specific keywords, such as public, static, and System.err.println(). On top of that, the programmer must explicitly provide all data types for the variables in use – a tiresome obligation.

All of this obfuscates the mathematical definition.

Compare this with the following version written in a functional language, like Haskell.

fact n = product [1..n]

This is an almost direct translation from the problem definition into code. It needs no explicit types, no temporary variables and no access modifiers (such as public).

Specimen II - Dot product

One could argue that the above Haskell program owes its brevity to the fact, that the language provides just the right abstractions (namely the product keyword and the [1..n] range syntax) for that specific task. Therfore let’s examine a simple function which is neither available in Haskell nor in Java: The dot product of two vectors. The mathematical definition is as follows:

The mathematical definition of a vector dot product: a·b= aibi =a1b1+a2b2+···+anbn =abT

For vectors with three dimensions, it can be written as

Vector dot product for three dimentsions: a·b = a1 * b1 + a2 * b2 + a3* b3

First, a Haskell implementation:

type Scalar a = a
data Vector a = Vector a a a deriving (Show)
dot :: (Num a) => Vector a -> Vector a -> Scalar a
(Vector a1 a2 a3) `dot` (Vector b1 b2 b3) = a1*b1 + a2*b2 + a3*b3

Note, that the mathematical types can be defined in one line each. Further note, that we define the dot function in infix notation, that is, we place the first argument of dot in front of the function name and the second argument behind it. This way, the code looks more like its mathematical equivalent. An example call of the above function would be

(Vector 1 2 3) ’dot’ (Vector 3 2 1)

which is short, precise and readable.

Now, a similar implementation in Java.

public static class Vector<T extends Number> {
    private T x, y, z;

    public Vector(T x, T y, T z) {
        this.x = x;
        this.y = y;
        this.z = z;
    }

    public double dot(Vector<?> v) {
        return (x.doubleValue() * v.x.doubleValue() +
                y.doubleValue() * v.y.doubleValue() +
                z.doubleValue() * v.z.doubleValue());
        }
    }

    public static void main(String[] args) {
        Vector<Integer> a = new Vector<Integer>(3, 2, 1);
        Vector<Integer> b = new Vector<Integer>(1, 2, 3);
        System.out.println(a.dot(b));
    }
}

For a proper textual representation of Vectors, the toString() Method would also need to be overwritten. In Haskell, one can simply derive from the Show typeclass as shown in the code.

Creating new abstractions

If functions and types are not sufficient to write straightforward programs, Haskell also offers simple constructs to create new operators and keywords which extend the language core itself. This makes domain-specific-languages feasible and enables the developer to work more directly on the actual problem instead of working around peculiarities of the programming language itself (such as memory management or array iteration). Haskell embraces this concept; Java has no such functionality.

Conclusion

I'm not trying to bash Java or worship Haskell here. Both languages have their place. I merely picked Java, because lots of programmers can read it.

The comparison is more between a functional and an imperative approach for numerical and symbolical programming; and for that, I prefer a functional approach every day. It removes clutter and yields elegant solutions. It provides convenient methods to work on a high level of abstraction and speak in mathematical terms and still, these strengths are disregarded by many programmers.

Abraham H. Maslow’s observation in his 1966 book The Psychology of Science seems fitting:

“I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.”


Prow Posts on elder.dev

Posts on elder.dev2017-12-26 00:00:00 Prow - extended nautical metaphor. Go Gopher originally by Renee French, SVG version by Takuya Ueda, modified under the CC BY 3.0 license. Ship's wheel from Kubernetes logo by Tim Hockin. The Kubernetes project does a lot of testing, on the order of 10000 jobs per day covering everything from build and unit tests, to end-to-end testing on real clusters deployed from source all the way up to ~5000 node scalability and performance tests.

Rust for Rubyists Matthias Endler

Matthias Endler2017-12-17 00:00:00

Recently I came across a delightful article on idiomatic Ruby. I'm not a good Ruby developer by any means, but I realized, that a lot of the patterns are also quite common in Rust. What follows is a side-by-side comparison of idiomatic code in both languages.

The Ruby code samples are from the original article.

Map and Higher-Order Functions

The first example is a pretty basic iteration over elements of a container using map.

user_ids = users.map { |user| user.id }

The map concept is also pretty standard in Rust. Compared to Ruby, we need to be a little more explicit here: If users is a vector of User objects, we first need to create an iterator from it:

let user_ids = users.iter().map(|user| user.id);

You might say that's quite verbose, but this additional abstraction allows us to express an important concept: will the iterator take ownership of the vector, or will it not?

  • With iter(), you get a "read-only view" into the vector. After the iteration, it will be unchanged.
  • With into_iter(), you take ownership over the vector. After the iteration, the vector will be gone. In Rust terminology, it will have moved.
  • Read some more about the difference between iter() and into_iter() here.

The above Ruby code can be simplified like this:

user_ids = users.map(&:id)

In Ruby, higher-order functions (like map) take blocks or procs as an argument and the language provides a convenient shortcut for method invocation — &:id is the same as {|o| o.id()}.

Something similar could be done in Rust:

let id = |u: &User| u.id;
let user_ids = users.iter().map(id);

This is probably not the most idiomatic way to do it, though. What you will see more often is the use of Universal Function Call Syntax in this case:1

let user_ids = users.iter().map(User::id);

In Rust, higher-order functions take functions as an argument. Therefore users.iter().map(Users::id) is more or less equivalent to users.iter().map(|u| u.id()).2

Also, map() in Rust returns another iterator and not a collection. If you want a collection, you would have to run collect() on that, as we'll see later.

Iteration with Each

Speaking of iteration, one pattern that I see a lot in Ruby code is this:

["Ruby", "Rust", "Python", "Cobol"].each do |lang|
  puts "Hello #{lang}!"
end

Since Rust 1.21, this is now also possible:

["Ruby", "Rust", "Python", "Cobol"]
    .iter()
    .for_each(|lang| println!("Hello {lang}!", lang = lang));

Although, more commonly one would write that as a normal for-loop in Rust:

for lang in ["Ruby", "Rust", "Python", "Cobol"].iter() {
    println!("Hello {lang}!", lang = lang);
}

Select and filter

Let's say you want to extract only even numbers from a collection in Ruby.

even_numbers = [1, 2, 3, 4, 5].map { |element| element if element.even? } # [ni, 2, nil, 4, nil]
even_numbers = even_numbers.compact # [2, 4]

In this example, before calling compact, our even_numbers array had nil entries. Well, in Rust there is no concept of nil or Null. You don't need a compact. Also, map doesn't take predicates. You would use filter for that:

let even_numbers = vec![1, 2, 3, 4, 5]
    .iter()
    .filter(|&element| element % 2 == 0);

or, to make a vector out of the result

// Result: [2, 4]
let even_numbers: Vec<i64> = vec![1, 2, 3, 4, 5]
    .into_iter()
    .filter(|element| element % 2 == 0).collect();

Some hints:

  • I'm using the type hint Vec<i64> here because, without it, Rust does not know what collection I want to build when calling collect.
  • vec! is a macro for creating a vector.
  • Instead of iter, I use into_iter. This way, I take ownership of the elements in the vector. With iter() I would get a Vec<&i64> instead.

In Rust, there is no even method on numbers, but that doesn't keep us from defining one!

let even = |x: &i64| x % 2 == 0;
let even_numbers = vec![1, 2, 3, 4, 5].into_iter().filter(even);

In a real-world scenario, you would probably use a third-party package (crate) like num for numerical mathematics:

extern crate num;
use num::Integer;

fn main() {
    let even_numbers: Vec<i64> = vec![1, 2, 3, 4, 5]
        .into_iter()
        .filter(|x| x.is_even()).collect();
}

In general, it's quite common to use crates in Rust for functionality that is not in the standard lib. Part of the reason why this is so well accepted is that cargo is such a rad package manager. (Maybe because it was built by no other than Yehuda Katz of Ruby fame. 😉)

As mentioned before, Rust does not have nil. However, there is still the concept of operations that can fail. The canonical type to express that is called Result.

Let's say you want to convert a vector of strings to integers.

let maybe_numbers = vec!["1", "2", "nah", "nope", "3"];
let numbers: Vec<_> = maybe_numbers
    .into_iter()
    .map(|i| i.parse::<u64>())
    .collect();

That looks nice, but maybe the output is a little unexpected. numbers will also contain the parsing errors:

[Ok(1), Ok(2), Err(ParseIntError { kind: InvalidDigit }), Err(ParseIntError { kind: InvalidDigit }), Ok(3)]

Sometimes you're just interested in the successful operations. An easy way to filter out the errors is to use filter_map:

let maybe_numbers = vec!["1", "2", "nah", "nope", "3"];
let numbers: Vec<_> = maybe_numbers
    .into_iter()
    .filter_map(|i| i.parse::<u64>().ok())
    .collect();

I changed two things here:

  • Instead of map, I'm now using filter_map.
  • parse returns a Result, but filter_map expects an Option. We can convert a Result into an Option by calling ok() on it3.

The return value contains all successfully converted strings:

[1, 2, 3]

The filter_map is similar to the select method in Ruby:

[1, 2, 3, 4, 5].select { |element| element.even? }

Random numbers

Here's how to get a random number from an array in Ruby:

[1, 2, 3].sample

That's quite nice and idiomatic! Compare that to Rust:

let mut rng = thread_rng();
rng.choose(&[1, 2, 3, 4, 5])

For the code to work, you need the rand crate. Click on the snippet for a running example.

There are some differences to Ruby. Namely, we need to be more explicit about what random number generator we want exactly. We decide for a lazily-initialized thread-local random number generator, seeded by the system. In this case, I'm using a slice instead of a vector. The main difference is that the slice has a fixed size while the vector does not.

Within the standard library, Rust doesn't have a sample or choose method on the slice itself. That's a design decision: the core of the language is kept small to allow evolving the language in the future.

This doesn't mean that you cannot have a nicer implementation today. For instance, you could define a Choose trait and implement it for [T].

extern crate rand;
use rand::{thread_rng, Rng};

trait Choose<T> {
    fn choose(&self) -> Option<&T>;
}

impl<T> Choose<T> for [T] {
    fn choose(&self) -> Option<&T> {
        let mut rng = thread_rng();
        rng.choose(&self)
    }
}

This boilerplate could be put into a crate to make it reusable for others. With that, we arrive at a solution that rivals Ruby's elegance.

[1, 2, 4, 8, 16, 32].choose()

Implicit returns and expressions

Ruby methods automatically return the result of the last statement.

def get_user_ids(users)
  users.map(&:id)
end

Same for Rust. Note the missing semicolon.

fn get_user_ids(users: &[User]) -> Vec<u64> {
    users.iter().map(|user| user.id).collect()
}

But in Rust, this is just the beginning, because everything is an expression. The following block splits a string into characters, removes the h, and returns the result as a HashSet. This HashSet will be assigned to x.

let x: HashSet<_> = {
    // Get unique chars of a word {'h', 'e', 'l', 'o'}
    let unique = "hello".chars();
    // filter out the 'h'
    unique.filter(|&char| char != 'h').collect()
};

Same works for conditions:

let x = if 1 > 0 { "absolutely!" } else { "no seriously" };

Since a match statement is also an expression, you can assign the result to a variable, too!

enum Unit {
    Meter,
    Yard,
    Angstroem,
    Lightyear,
}

let length_in_meters = match unit {
    Unit::Meter => 1.0,
    Unit::Yard => 0.91,
    Unit::Angstroem => 0.0000000001,
    Unit::Lightyear => 9.461e+15,
};

Multiple Assignments

In Ruby you can assign multiple values to variables in one step:

def values
  [1, 2, 3]
end

one, two, three = values

In Rust, you can only decompose tuples into tuples, but not a vector into a tuple for example. So this will work:

let (one, two, three) = (1, 2, 3);

But this won't:

let (one, two, three) = [1, 2, 3];
//    ^^^^^^^^^^^^^^^^^ expected array of 3 elements, found tuple

Neither will this:

let (one, two, three) = [1, 2, 3].iter().collect();
// a collection of type `(_, _, _)` cannot be built from an iterator over elements of type `&{integer}`

But with nightly Rust, you can now do this:

let [one, two, three] = [1, 2, 3];

On the other hand, there's a lot more you can do with destructuring apart from multiple assignments. You can write beautiful, ergonomic code using pattern syntax.

let x = 4;
let y = false;

match x {
    4 | 5 | 6 if y => println!("yes"),
    _ => println!("no"),
}

To quote The Book:

This prints no since the if condition applies to the whole pattern 4 | 5 | 6, not only to the last value 6.

String interpolation

Ruby has extensive string interpolation support.

programming_language = "Ruby"
"#{programming_language} is a beautiful programming language"

This can be translated like so:

let programming_language = "Rust";
format!("{} is also a beautiful programming language", programming_language);

Named arguments are also possible, albeit much less common:

println!("{language} is also a beautiful programming language", language="Rust");

Rust's println!() syntax is even more extensive than Ruby's. Check the docs if you're curious about what else you can do.

That’s it!

Ruby comes with syntactic sugar for many common usage patterns, which allows for very elegant code. Low-level programming and raw performance are no primary goals of the language.

If you do need that, Rust might be a good fit, because it provides fine-grained hardware control with comparable ergonomics. If in doubt, Rust favors explicitness, though; it eschews magic.

Did I whet your appetite for idiomatic Rust? Have a look at this Github project. I'd be thankful for contributions.

Footnotes

1. Thanks to Florian Gilcher for the hint.
2. Thanks to masklin for pointing out multiple inaccuracies.
3. In the first version, I sait that ok() would convert a Result into a boolean, which was wrong. Thanks to isaacg for the correction.


Making Myself Obsolete Matthias Endler

Matthias Endler2017-12-10 00:00:00
The Stegosaurus had better days 150 million years ago.
The Stegosaurus had better days 150 million years ago.
Source: Paleontologists once thought it had a brain in its butt.

In December 2015 I was looking for static analysis tools to integrate into trivago's CI process. The idea was to detect typical programming mistakes automatically. That's quite a common thing, and there are lots of helpful tools out there which fit the bill.

So I looked for a list of tools...

To my surprise, the only list I found was on Wikipedia — and it was outdated. There was no such project on Github, where most modern static analysis tools were hosted.

Without overthinking it, I opened up my editor and wrote down a few tools I found through my initial research. After that, I pushed the list to Github.

I called the project Awesome Static Analysis.

Fast forward two years and the list has grown quite a bit. So far, it has 75 contributors, 277 forks and received over 2000 stars. (Thanks for all the support!) (Update May 2018: 91 contributors, 363 forks, over 3000 stars)

Around 1000 unique visitors find the list every week. Not much by any means, but I feel obliged to keep it up-to-date because it has become an essential source of information for many people.

It now lists around 300 tools for static analysis. Everything from Ada to TypeScript is on there. What I find particularly motivating is, that now the authors themselves create pull requests to add their tools!

There was one problem though: The list of pull requests got longer and longer, as I was busy doing other things.

The list of Github Pull requests for awesome-static-analysis
The list of Github Pull requests for awesome-static-analysis

Adding contributors

I always try to make team members out of regular contributors. My friend and colleague Andy Grunwald as well as Ouroboros Chrysopoeia are both valuable collaborators. They help me weed out new PRs whenever they find the time.

But let's face it: checking the pull requests is a dull, manual task. What needs to be checked for each new tool can be summarized like this:

  • Formatting rules are satisfied
  • Project URL is reachable
  • License annotation is correct
  • Tools of each section are alphabetically ordered
  • Description is not too long

I guess it's obvious what we should do with that checklist: automate it!

A linter for linting linters

So why not write an analysis tool, which checks our list of analysis tools! What sounds pretty meta, is actually pretty straightforward.

With every pull request, we trigger our bot, which checks the above rules and responds with a result.

The first step was to read the Github documentation about building a CI server.

Just for fun, I wanted to create the bot in Rust. The two most popular Github clients for Rust were github-rs (now deprecated) and hubcaps. Both looked pretty neat, but then I found afterparty, a "Github webhook server".

The example looked fabulous:

#[macro_use]
extern crate log;
extern crate env_logger;
extern crate afterparty;
extern crate hyper;

use afterparty::{Delivery, Hub};

use hyper::Server;

pub fn main() {
    env_logger::init().unwrap();
    let addr = format!("0.0.0.0:{}", 4567);
    let mut hub = Hub::new();
    hub.handle("pull_request", |delivery: &Delivery| {
        match delivery.payload {
            Event::PullRequest { ref action, ref sender, .. } => {
                // TODO: My code here!
                println!("sender {} action {}", sender.login, action)
            }
            _ => (),
        }
    });
    let srvc = Server::http(&addr[..])
                   .unwrap()
                   .handle(hub);
    println!("listening on {}", addr);
    srvc.unwrap();
}

This allowed me to focus on the actual analysis code, which makes for a pretty boring read. It mechanically checks for the things mentioned above and could be written in any language. If you want to have a look (or even contribute!), check out the repo.

Talking to Github

After the analysis code was done, I had a bot, running locally, waiting for incoming pull requests.

But how could I talk to Github?
I found out, that I should use the Status API and send a POST request to /repos/mre/awesome-static-analysis/statuses/:sha
(:sha is the commit ID that points to the HEAD of the pull request):

{
  "state": "success",
  "description": "The build succeeded!"
}

I could have used one of the existing Rust Github clients, but I decided to write a simple function to update the pull request status code.

fn set_status(status: Status, desc: String, repo: &str, sha: &str) -> Result<reqwest::Response> {
    let token = env::var("GITHUB_TOKEN")?;
    let client = reqwest::Client::new();
    let mut params = HashMap::new();
    params.insert("state", format!("{}", status));
    params.insert("description", desc);
    println!("Sending status: {:#?}", params);

    let status_url = format!("https://api.github.com/repos/{}/statuses/{}", repo, sha);
    println!("Status url: {}", status_url);
    Ok(client
        .request(
            reqwest::Method::Post,
            &format!(
                "{}?access_token={}",
                status_url,
                token,
            ),
        )
        .json(&params)
        .send()?)
}

You can see that I pass in a Github token from the environment and then I send the JSON payload as a post request using the reqwest library.

That turned out to become a problem in the end: while afterparty was using version 0.9 of hyper, reqwest was using 0.11. Unfortunately, these two versions depend on a different build of the openssl-sys bindings. That's a well-known problem and the only way to fix it, is to resolve the conflict.

I was stuck for a while, but then I saw, that there was an open pull request to upgrade afterparty to hyper 0.10.

So inside my Cargo.toml, I locked the version of afterparty to the version of the pull request:

[dependencies]
afterparty = { git = "https://github.com/ms705/afterparty" }

This fixed the build, and I could finally move on.

Deployment

I needed a place to host the bot.

Preferably for free, as it was a non-profit Open Source project. Also, the provider would have to run binaries.

For quite some time, I was following a product named zeit. It runs any Docker container using an intuitive command line interface called now.

I fell in love the first time I saw their demo on the site, so I wanted to give it a try.

So I added a multi-stage Dockerfile to my project:

FROM rust as builder
COPY . /usr/src/app
WORKDIR /usr/src/app
RUN cargo build --release

FROM debian:stretch
RUN apt update \
    && apt install -y libssl1.1 ca-certificates \
    && apt clean -y \
    && apt autoclean -y \
    && apt autoremove -y
COPY --from=builder target/release/check .
EXPOSE 4567
ENTRYPOINT ["./check"]
CMD ["--help"]

The first part would build a static binary, the second part would run it at container startup. Well, that didn't work, because zeit does not support multi-stage builds yet.

The workaround was to split up the Dockerfile into two and connect them both with a Makefile. Makefiles are pretty powerful, you know?

With that, I had all the parts for deployment together.

# Build Rust binary for Linux
docker run --rm -v $(CURDIR):/usr/src/ci -w /usr/src/ci rust cargo build --release

# Deploy Docker images built from the local Dockerfile
now deploy --force --public -e GITHUB_TOKEN=${GITHUB_TOKEN}

# Set domain name of new build to `check.now.sh`
# (The deployment URL was copied to the clipboard and is retrieved with pbpaste on macOS)
now alias `pbpaste` check.now.sh

Here's the output of the deploy using now:

> Deploying ~/Code/private/awesome-static-analysis-ci/deploy
> Ready! https://deploy-sjbiykfvtx.now.sh (copied to clipboard) [2s]
> Initializing…
> Initializing…
> Building
> ▲ docker build
Sending build context to Docker daemon 2.048 kBkB
> Step 1 : FROM mre0/ci:latest
> latest: Pulling from mre0/ci
> ...
> Digest: sha256:5ad07c12184755b84ca1b587e91b97c30f7d547e76628645a2c23dc1d9d3fd4b
> Status: Downloaded newer image for mre0/ci:latest
>  ---> 8ee1b20de28b
> Successfully built 8ee1b20de28b
> ▲ Storing image
> ▲ Deploying image
> ▲ Container started
> listening on 0.0.0.0:4567
> Deployment complete!

The last step was to add check.now.sh as a webhook inside the awesome-static-analysis project settings.

Now, whenever a new pull request is coming in, you see that little bot getting active!

A successful pull request, which was checked by the bot

Outcome and future plans

I am very pleased with my choice of tools: afterparty saved me from a lot of manual work, while zeit made deployment really easy.
It feels like Amazon Lambda on steroids.

If you look at the code and the commits for my bot, you can see all my little missteps, until I got everything just right. Turns out, parsing human-readable text is tedious.
Therefore I was thinking about turning the list of analysis tools into a structured format like YAML. This would greatly simplify the parsing and have the added benefit of having a machine-readable list of tools that can be used for other projects.

Update May 2018

While attending the WeAreDevelopers conference in Vienna (can recommend that), I moved the CI pipeline from zeit.co to Travis CI. The reason was, that I wanted the linting code next to the project, which greatly simplified things. First and foremost I don't need the web request handling code anymore, because travis takes care of that. If you like, you can compare the old and the new version.


Modern Day Annoyances - Digital Clocks Matthias Endler

Matthias Endler2017-11-07 00:00:00

This morning I woke up to the beeping noise of our oven's alarm clock. The reason was that I tried to correct the oven's local time the day before — and I pushed the wrong buttons. As a result I didn't set the correct time, instead, I set a cooking timer... and that's what woke me up today.


Learn Some Rust During Hacktoberfest Matthias Endler

Matthias Endler2017-10-15 00:00:00
Dirndl, Lederhose, Brezn, Beer, Rust
Dirndl, Lederhose, Brezn, Beer, Rust
Source: Designed by Freepik

October is the perfect time to contribute to Open Source — at least according to Github and DigitalOcean. Because that's when they organize Hacktoberfest, a global event where you get a free shirt and lots of street cred for creating pull requests. Read the official announcement here.

Some people think they cannot contribute anything of value. Either because they lack the programming skills or because they don't know where to start.

This guide is trying to change that!

Let me show you, how everybody can contribute code to Rust, a safe systems programming language. I was inspired to write this by a tweet from llogiq.

1. Find a great Rust project to work on

We all want our work to be appreciated.
Therefore I suggest to start contributing to medium-sized projects, because they gained some momentum but are still driven by a small number of maintainers, so help is always welcome. By contrast, tiny projects are mostly useful to the original author only, while large projects can be intimidating at first and have stricter guidelines.

For now, let's look at repositories with 5-100 stars, which were updated within this year. Github supports advanced search options based on Lucene syntax.

language:Rust stars:5..100 pushed:>2017-01-01

Here's a list of projects, which match this filter.

2. Install the Rust toolchain

To start contributing, we need a working Rust compiler and the cargo package manager. Fortunately, the installation should be straightforward. I recommend rustup for that.

Run the following command in your terminal, then follow the onscreen instructions.

curl https://sh.rustup.rs -sSf | sh

If you're unsure, just accept the defaults. After the installation is done, we also need to get the nightly version of the compiler for later.

rustup install nightly

Questions so far? Find more detailed installation instructions here.

3. Fork the project and clone it to your computer

First, click on the little fork button on the top right of the Github project page. Then clone your fork to your computer.

git clone git@github.com:yourusername/project.git

For more detailed instructions, go here.

4. Does it build?

Before we start modifying the codebase, we should make sure that it is in a workable state. The following commands should work right away from inside the project folder.

cargo build
cargo test

If not, you might want to consult the README for further instructions. (But feel free to choose another project.)

5. The magic sauce

Here's the trick: we use a linter called clippy to show us improvement areas in any Rust codebase.

To get clippy, install it like so:

cargo +nightly install clippy

Afterwards, run it from the project root as often as you like.

rustup run nightly cargo clippy

This should give you actionable information on how to improve the codebase.

Here's some sample output:

warning: useless use of `format!`
   --> src/mach/header.rs:420:49
    |
420 |             let error = error::Error::Malformed(format!("bytes size is smaller than an Mach-o header"));
    |                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: #[warn(useless_format)] on by default
    = help: for further information visit https://rust-lang-nursery.github.io/rust-clippy/v0.0.165/index.html#useless_format

warning: this expression borrows a reference that is immediately dereferenced by the compiler
   --> src/mach/header.rs:423:36
    |
423 |             let magic = mach::peek(&bytes, 0)?;
    |                                    ^^^^^^ help: change this to: `bytes`
    |
    = help: for further information visit https://rust-lang-nursery.github.io/rust-clippy/v0.0.165/index.html#needless_borrow

Just try some of the suggestions and see if the project still compiles and the tests still pass. Check out the links to the documentation in the help section to learn more. Start small to make your changes easier to review.

6. Creating a Pull Request

If you're happy with your changes, now is the time to publish them! It's best to create a new branch for your changes and then push it to your fork.

git checkout -b codestyle
git commit -am "Minor codestyle fixes"
git push --set-upstream origin codestyle

Afterwards, go to the homepage of your fork on Github. There should be a button titled Compare & pull request. Please add a meaningful description and then submit the pull request.

Congratulations! You've contributed to the Rust ecosystem. Thank you! 🎉

Trophy case

Bonus!

If all of the manual fixing and checking sounds too dull, you can automate step number 5 using rustfix by Pascal Hertleif (@killercup):

rustfix --yolo && cargo check

A Little Story About the `yes` Unix Command Matthias Endler

Matthias Endler2017-10-10 00:00:00

What's the simplest Unix command you know?
There's echo, which prints a string to stdout and true, which always terminates with an exit code of 0.

Among the series of simple Unix commands, there's also yes. If you execute it without arguments, you get an infinite stream of y's, separated by a newline:

y
y
y
y
(...you get the idea)

What seems to be pointless in the beginning turns out to be pretty helpful :

yes | sh boring_installation.sh

Ever installed a program, which required you to type "y" and hit enter to keep going? yes to the rescue! It will carefully fulfill its duty, so you can keep watching Pootie Tang.

Writing yes

Here's a basic version in... uhm... BASIC.

10 PRINT "y"
20 GOTO 10

And here's the same thing in Python:

while True:
    print("y")

Simple, eh? Not so quick!
Turns out, that program is quite slow.

python yes.py | pv -r > /dev/null
[4.17MiB/s]

Compare that with the built-in version on my Mac:

yes | pv -r > /dev/null
[34.2MiB/s]

So I tried to write a quicker version in Rust. Here's my first attempt:

use std::env;

fn main() {
  let expletive = env::args().nth(1).unwrap_or("y".into());
  loop {
    println!("{}", expletive);
  }
}

Some explanations:

  • The string we want to print in a loop is the first command line parameter and is named expletive. I learned this word from the yes manpage.
  • I use unwrap_or to get the expletive from the parameters. In case the parameter is not set, we use "y" as a default.
  • The default parameter gets converted from a string slice (&str) into an owned string on the heap (String) using into().

Let's test it.

cargo run --release | pv -r > /dev/null
   Compiling yes v0.1.0
    Finished release [optimized] target(s) in 1.0 secs
     Running `target/release/yes`
[2.35MiB/s]

Whoops, that doesn't look any better. It's even slower than the Python version! That caught my attention, so I looked around for the source code of a C implementation.

Here's the very first version of the program, released with Version 7 Unix and famously authored by Ken Thompson on Jan 10, 1979:

main(argc, argv)
char **argv;
{
  for (;;)
    printf("%s\n", argc>1? argv[1]: "y");
}

No magic here.

Compare that to the 128-line-version from the GNU coreutils, which is mirrored on Github. After 25 years, it is still under active development! The last code change happened around a year ago. That's quite fast:

# brew install coreutils
gyes | pv -r > /dev/null
[854MiB/s]

The important part is at the end:

/* Repeatedly output the buffer until there is a write error; then fail.  */
while (full_write (STDOUT_FILENO, buf, bufused) == bufused)
  continue;

Aha! So they simply use a buffer to make write operations faster. The buffer size is defined by a constant named BUFSIZ, which gets chosen on each system so as to make I/O efficient (see here). On my system, that was defined as 1024 bytes. I actually had better performance with 8192 bytes.

I've extended my Rust program:

use std::env;
use std::io::{self, BufWriter, Write};

const BUFSIZE: usize = 8192;

fn main() {
    let expletive = env::args().nth(1).unwrap_or("y".into());
    let mut writer = BufWriter::with_capacity(BUFSIZE, io::stdout());
    loop {
        writeln!(writer, "{}", expletive).unwrap();
    }
}

The important part is, that the buffer size is a multiple of four, to ensure memory alignment.

Running that gave me 51.3MiB/s. Faster than the version, which comes with my system, but still way slower than the results from this Reddit post that I found, where the author talks about 10.2GiB/s.

Update

Once again, the Rust community did not disappoint.
As soon as this post hit the Rust subreddit, user nwydo pointed out a previous discussion on the same topic. Here's their optimized code, that breaks the 3GB/s mark on my machine:

use std::env;
use std::io::{self, Write};
use std::process;
use std::borrow::Cow;

use std::ffi::OsString;
pub const BUFFER_CAPACITY: usize = 64 * 1024;

pub fn to_bytes(os_str: OsString) -> Vec<u8> {
  use std::os::unix::ffi::OsStringExt;
  os_str.into_vec()
}

fn fill_up_buffer<'a>(buffer: &'a mut [u8], output: &'a [u8]) -> &'a [u8] {
  if output.len() > buffer.len() / 2 {
    return output;
  }

  let mut buffer_size = output.len();
  buffer[..buffer_size].clone_from_slice(output);

  while buffer_size < buffer.len() / 2 {
    let (left, right) = buffer.split_at_mut(buffer_size);
    right[..buffer_size].clone_from_slice(left);
    buffer_size *= 2;
  }

  &buffer[..buffer_size]
}

fn write(output: &[u8]) {
  let stdout = io::stdout();
  let mut locked = stdout.lock();
  let mut buffer = [0u8; BUFFER_CAPACITY];

  let filled = fill_up_buffer(&mut buffer, output);
  while locked.write_all(filled).is_ok() {}
}

fn main() {
  write(&env::args_os().nth(1).map(to_bytes).map_or(
    Cow::Borrowed(
      &b"y\n"[..],
    ),
    |mut arg| {
      arg.push(b'\n');
      Cow::Owned(arg)
    },
  ));
  process::exit(1);
}

Now that's a whole different ballgame!

The only thing that I could contribute was removing an unnecessary mut. 😅

Lessons learned

The trivial program yes turns out not to be so trivial after all. It uses output buffering and memory alignment to improve performance. Re-implementing Unix tools is fun and makes me appreciate the nifty tricks, which make our computers fast.


Lightning Fast Image Previews with Pure CSS and LQIP Matthias Endler

Matthias Endler2017-09-18 00:00:00
Adapted from <a href='https://www.freepik.com/free-vector/industrial-machine-vector_753558.htm'>Freepik</a>
Source: Adapted from Freepik

My website is reasonably fast.

I hope that every page load feels snappy, no matter your device or location. That should not come as a surprise. After all, I'm just using plain HTML and CSS. JavaScript is avoided whenever possible.

There was one thing left, which really annoyed me: layout reflow after images got loaded.

The problem is, that the image dimensions are not known when the text is ready to be displayed. As a result, the text will be pushed down on the screen as soon as an image is loaded above.

Also, while an image is loading, there is no preview, just blank space. Here's what that looks like on a slower connection:

Illustration of a flash of unstyled content
Illustration of a flash of unstyled content

I could fix that, by hardcoding the image width and height, but that would be tedious and error-prone. And there would be no preview. So I was wondering, what others were doing. 🤔

Tiny image thumbnails

I vaguely remembered, that Facebook uses tiny preview thumbnails in their mobile app. They extract the quantization table from the JPEG header to render the preview. This information is stored on the client, so it doesn't need to be downloaded every time. Unfortunately, this approach requires full control over the image encoder. It works for apps, but hardly for websites.

The search continued.

Until my colleague Tobias Baldauf introduced me to LQIP (Low-Quality Image Placeholders).

Here's the idea:

  • Load the page including inlined, low-quality image thumbnails.
  • Once the page is fully loaded (e.g. when the onload event is fired), lazy load full quality images.

Unfortunately, this technique requires JavaScript. Nevertheless, I liked the idea, so I started experimenting with different image sizes and formats. My goal was to create the smallest thumbnails using any standard image format.

Benchmark

Here are 15 pixel wide thumbnails encoded in different file formats:

Comparison of different image formats when creating thumbnails
Comparison of different image formats when creating thumbnails

I used different tools to create the thumbnails. For JPEG and PNG encoding, I used svgexport.

svgexport img.svg img.png "svg{background:white;}" 15: 1%

For webp, I used cwebp:

cwebp img.png -o img.webp

The gif was converted using an online tool and optimized using gifsicle:

gifsicle -O3 < img.gif > img_mini.gif

Comparison

WebP is the smallest, but it's not supported by all browsers.
Gif was second, but when resizing the image and applying the blur filter, I was not happy with the result.
In the end, I settled for PNG, which provided an excellent tradeoff between size and quality. I optimized the images even further using oxipng, which supports zopfli compression. With that, I end up with thumbnails of around 300-400 bytes in size.

I integrated the thumbnail creation process into my build toolchain for the blog. The actual code to create the images is rather boring. If you really want to have a look, it's on Github.

Avoiding JavaScript

Here is the skeleton HTML for the image previews:

<figure>
  <div class="loader">
    <object data="image.svg" type="image/svg+xml"></object>
    <img class="frozen" src="data:image/png;base64,..." />
  </div>
</figure>

The trick is to wrap both the full-size image and the preview image into a loader div, which gets a width: auto CSS attribute:

.loader {
  position: relative;
  overflow: hidden;
  width: auto;
}

I wrap the SVG into an object tag instead of using an img element. This has the benefit, that I can show a placeholder in case the SVG can't be loaded. I position the object at the top left of the loader div.

.loader object {
  position: absolute;
}

.loader img,
.loader object {
  display: block;
  top: 0;
  left: 0;
  width: 100%;
}

Here's the placeholder hack including some references:

/* https://stackoverflow.com/a/29111371/270334 */
/* https://stackoverflow.com/a/32928240/270334 */
object {
  position: relative;
  float: left;
  display: block;

  &::after {
    position: absolute;
    top: 0;
    left: 0;
    display: block;
    width: 1000px;
    height: 1000px;
    content: "";
    background: #efefef;
  }
}

The last part is the handling of the thumbnails. Like most other sites, I decided to apply a blur filter. In a way, it looks like the image is frozen, so that's what I called the CSS selector. I also applied a scaling transformation to achieve sharp borders.

.frozen {
  -webkit-filter: blur(8px);
  -moz-filter: blur(8px);
  -o-filter: blur(8px);
  -ms-filter: blur(8px);
  filter: blur(8px);
  transform: scale(1.04);
  animation: 0.2s ease-in 0.4s 1 forwards fade;
  width: 100%;
}

@keyframes fade {
  0% {
    opacity: 1;
  }
  100% {
    opacity: 0;
  }
}

I use CSS animations instead of JavaScript.
The duration of the animation is based on the 95% percentile load time of all visitors of the page. Although it's just an approximation, this should work for most readers.

Result

  • No JavaScript needed
  • Works on all modern browsers
  • Supports a fallback in case the main image can't be loaded
  • Tiny overhead

Resources


Go vs Rust? Choose Go. Matthias Endler

Matthias Endler2017-09-15 00:00:00

I wrote this article a long time ago. In the meantime, my opinion on some aspects has changed.

In order to give a more balanced perspective on the pros and cons, I suggest to read this comparison on Go vs Rust instead, which I wrote in collaboration with Shuttle 🚀

Rust vs Go: A Hands-On Comparison

Gopher designed with <a href='https://gopherize.me'>Gopherize.me</a>. Gears designed by <a href='https://www.freepik.com/free-vector/gear-background-with-pieces-different-colors_966124.htm'>Freepik</a>
Source: Gopher designed with Gopherize.me. Gears designed by Freepik

"Rust or Go, which one should I choose?" is a question I get quite often. Both languages seem to be competing for the same user base and they both seem to be systems programming languages, so there must be a clear winner, right?

Go: practical, pragmatic, plain

The Golang learning curve over time, a straight line.
The Golang learning curve over time, a straight line.

I don't think Go is an elegant language. Its biggest feature is simplicity. Go is not even a systems programming language. While it's great for writing microservices and tooling around backend infrastructure, I would not want to write a kernel or a memory allocator with it.

But with Go, you get things done — fast.
Go is one of the most productive languages I've ever worked with. The mantra is: solve real problems today.

Rust's strong guarantees come at a cost

The Rust learning curve over time, a bumpy ride.
The Rust learning curve over time, a bumpy ride.

Rust in comparison is hard. It took me many months to become somewhat productive. You need to invest a serious amount of time to see any benefit. Rust is already a powerful language and it gets stronger every day. It feels much more like a pragmatic Haskell to me than a safer C.

Don't get me wrong: I love Rust, and it helped me become a better programmer. It is certainly a nice language to learn. The big question is, if it is the right choice for your next major project.

Here's the thing: if you choose Rust, usually you need the guarantees, that the language provides:

  • Safety against Null pointers, race conditions and all sorts of low-level threats.
  • Predictable runtime behavior (zero cost abstractions and no garbage collector).
  • (Almost) total control over the hardware (memory layout, processor features).
  • Seamless interoperability with other languages.

If you don't require any of these features, Rust might be a poor choice for your next project. That's because these guarantees come with a cost: ramp-up time. You'll need to unlearn bad habits and learn new concepts. Chances are, you will fight with the borrow checker a lot when you start out.

Case-study: Primality by trial division

Let's say, you want to check if a number is prime. The easiest way is to check if we can divide the number by any smaller natural number (without a remainder). If not, we found a prime number! This approach is called trial division.

Here's how to do that in Golang (courtesy of Rosetta Code):

func IsPrime(n int) bool {
	if n < 0 {
		n = -n
	}
	switch {
	case n < 2:
		return false
	default:
		for i := 2; i < n; i++ {
			if n%i == 0 {
				return false
			}
		}
	}
	return true
}

And here's the same thing in Rust:

pub fn is_prime(n: u64) -> bool {
    match n {
        0...1 => false,
        _ => {
            for d in 2..n {
                if n % d == 0 {
                    return false;
                }
            }
            true
        }
    }
}

At first sight, both solutions look pretty similar. But if we look closer, we can spot some differences.

  • In Go, we use a simple switch-case statement. In Rust, we use a match statement, which is much more powerful.
  • In Go, we use a simple for-loop to iterate over the numbers 2 to n. In Rust, we use a range expression (2..n).
  • In Go, we use two return statements, in Rust we have one return expression. In general, most things in Rust are expressions, which can be returned and assigned to a variable. Read more about expressions here.

In many areas, Rust is more functional than Golang. You could rewrite the above code using the any method, which is implemented for Range.

fn is_prime(n: u64) -> bool {
    match n {
        0...1 => false,
        _ => !(2..n).any(|d| n % d == 0),
    }
}

It might seem a little alien at first, but it will become second-nature after a while.

This was just a quick example, of course. I suggest, you browse some code on Rosetta Code to get a better feeling for both languages.

Case study: Finding duplicate words in text files

If you're more like a visual type, here is a video where I write a simple concurrent program in Go and Rust to compare both languages:

Some things I prefer in Go

  • Fast compile times
  • Pragmatic problem-solving approach
  • Nice ecosystem for typical DevOps tasks
  • Batteries-included standard-library
  • IDE support
  • Simple error handling
  • The mascot 😉

Some things I prefer in Rust

  • Safety: No null pointers, no data races,...
  • Fine-grained system control
  • Incredible runtime speed (comparable with C/C++)
  • Zero-cost abstractions
  • Awesome, open-minded community
  • Simple package management with cargo
  • Support for Generics in form of traits
  • C interop and FFI

Conclusion

99% of the time, Go is "good enough" and that 1% where it isn't, you'll know. And then take a look at Rust, because the two languages complement each other pretty well. If you're interested in hands-on Rust consulting, pick a date from my calendar and we can talk about how I can help.

After all is said and done, Rust and Go are not really competitors.


Afraid of Makefiles? Don't be! Matthias Endler

Matthias Endler2017-08-15 00:00:00
What do clothes have to do with Makefiles? Find out in this post!
What do clothes have to do with Makefiles? Find out in this post!
Source: Illustration by Anindyanfitri - Freepik.com

In the last few years, I've had the pleasure to work with a lot of talented Software Engineers. One thing that struck me is that many of them did not have any working knowledge of Makefiles and why they are useful.

When faced with the task to automate a build process, they often roll their own shell scripts. Common culprits are called build.sh, run.sh or doall.sh in a project folder.

They implement the same basic functionality over and over again:

  • Parsing input parameters and environment variables.
  • Manually managing dependencies between build steps.
  • Error handling (...maybe).

Along the way, they keep making the same basic mistakes:

These are issues Makefiles were invented to solve.

Makefiles are scary!

If you think that make is scary, you probably think of complicated build machinery for big software projects. It doesn't need to be that way. Let's hear from the author of make, Stuart Feldman himself:

It began with an elaborate idea of a dependency analyzer, boiled down to something much simpler, and turned into Make that weekend. Use of tools that were still wet was part of the culture. Makefiles were text files, not magically encoded binaries because that was the Unix ethos: printable, debuggable, understandable stuff.

The Art of Unix Programming (2003)

Make was built in one weekend to solve a reoccuring problem in a simple way.

Makefiles are simple!

Before I leave the house, I need to get dressed. I use the same simple routine every time: Underpants, trousers, shirt, pullover, socks, shoes, jacket. Most likely you also have a routine, even though yours might be different.

Some of these steps depend on each other.
Make is useful for handling dependencies.
Let's try to express my routine as a Makefile.

dress: trousers shoes jacket
	@echo "All done. Let's go outside!"

jacket: pullover
	@echo "Putting on jacket."

pullover: shirt
	@echo "Putting on pullover."

shirt:
	@echo "Putting on shirt."

trousers: underpants
	@echo "Putting on trousers."

underpants:
	@echo "Putting on underpants."

shoes: socks
	@echo "Putting on shoes."

socks: pullover
	@echo "Putting on socks."

If we execute the Makefile, we get the following output:

$ make dress
Putting on underpants.
Putting on trousers.
Putting on shirt.
Putting on pullover.
Putting on socks.
Putting on shoes.
Putting on jacket.
All done. Let's go outside!

What just happened?

Noticed how the steps are in the correct order? By plainly writing down the dependencies between the steps, make helps us to execute them correctly.

Each build step has the following structure:

target: [dependencies]
	<shell command to execute>
	<shell command to execute>
	...
  • The first target in a Makefile will be executed by default when we call make.

  • The order of the targets does not matter.

  • Shell commands must be indented with a tab.

  • Add an @ sign to suppress output of the command that is executed.

  • If target isn't a file you want to build, please add .PHONY <target> at the end of the build step. Common phony targets are: clean, install, run,... Otherwise, if somebody creates an install directory, make will silently fail, because the build target already exists.

    .PHONY: install
    install:
    	npm install
    

Congratulations! You've learned 90% of what you need to know about make.

Next steps

Real Makefiles can do much more! They will only build the files that have changed instead of doing a full rebuild. And they will do as much as possible in parallel. Just try to keep them simple please.


Of Boxes and Trees - Smart Pointers in Rust Matthias Endler

Matthias Endler2017-08-12 00:00:00

Recently, I tried to implement a binary tree data structure in Rust. Each binary tree has a root value, a left, and a right subtree. I started from this Python implementation, which is quite straightforward.


Why Type Systems Matter Matthias Endler

Matthias Endler2017-07-10 00:00:00

I've written most of my code in dynamically typed languages such as Python or PHP. But ever since dabbling with Rust, I've developed a passion for static type systems.
It began to feel very natural to me; like a totally new way to express myself.


Automata Posts on elder.dev

Posts on elder.dev2017-06-03 00:00:00 JavaScript is required to view the demos in this post.Please enable JavaScript. I am fascinated by automation, both mechanical and software. A particularly interesting form of automation is Automata. While the earliest usage of the term referred to mechanical devices, in Computer Science ‘automata’ and automata theory include abstract machines instead of physical devices. Where a physical automaton might be constructed of complex gears and clockwork, an abstract automata is constructed of state and rules that define how state is updated in each iteration or ‘generation’ of the automata.

Being a Professional Programmer Matthias Endler

Matthias Endler2017-05-18 00:00:00

When I was around 12, I set myself the goal to become a professional programmer.
I can tell, because at this time I made the conscious decision to use my right hand to control the mouse — even though I'm left-handed.

My reasoning was, that if I ever had to help out a colleague with a computer problem I sure did not want to move her mouse to the other side before getting started. That would be awkward. (Of course I did not foresee the advent of the wireless mouse... As a matter of fact, I still use the right hand out of habit.)

One thing I always wanted to know is how a typical workday of a programmer looked like. Was I wasting my time by pursuing this career? Only later I found the answer — but I had to become a professional programmer myself. This article aims to save you from a few years of uncertainty.

Before you dig into this, be sure to read the first part of this series titled "Why I love Programming".

What's the difference between "professional" and "hobby" programming?

In one word: accountability.
You are expected to be responsible.

Programming in your free time is like throwing a party without having to clean up: pure fun! If you get bored you're free to move on. Not so in professional programming, where you're expected to get the job done.

Every application requires constant bug fixing, refactoring and sometimes even monkey patching. Maintaining code is no amusement park; especially if it's not your own.

Being a Junior Developer

Fresh out of school you might think you're a pretty kick-ass programmer. Let me tell you: you're not. You wouldn't guess what talented people can do with these blinking machines. You'll have tons of things to learn in the first few years.

Professional software development is a lengthy process. Writing readable, well-tested, well-documented code is a substantial effort. You will need patience, lots of it. Both, with yourself and with others.

As a junior, you only think in black and white. You look at some code, and it's all wrong. Who in their right mind created this horrible monstrosity?! As you become more experienced, you'll see the shades of grey.

Eventually, you'll understand that those neckbeards were not slower than you, but more careful. You learn how to test your code, how to document it. You even begin to appreciate UML diagrams.

Becoming obsolete

"The world is moving too fast. What you learned today is obsolete tomorrow. Why bother?". I've heard that saying countless times throughout my career. It's both, popular and wrong.

If a skill becomes obsolete, it's not a skill. Throughout your career you don't want to be known as "the Jenkins guy", you want to be the expert in Software Quality. Hint: If you don't know what Jenkins is, that's the whole point. You should not narrow down your scope too much. The right skills never become obsolete.

From time to time it happens, that due to some new company policy your beautiful creation will become obsolete. As depressing as it sounds: it's a regular part of the software business. You need to adapt. One advice I can give you is not to take it too seriously. Drop the project, keep the wisdom. Embrace change.

Writing software in a non-perfect world

A professional programmer has to deal with deficiencies all the time. The game is called "balancing constraints". Deadlines, budgets, and code quality are just a few competing constraints we have to consider. Elegant designs fade away in the face of reality. In the end you want to earn money with your software, so you have to ship it!

The best developers I know, keep the balance between pragmatism and elegance. They know which parts matter and which don't. Those who don't will be replaced when there's a need.

For me, I was always leaning more towards elegance. That's just a nicer way to say I was a perfectionist. I needed to learn the pragmatic part through hard work.

Mentoring less experienced Programmers

The better you become at programming, the less you code.

Instead, you will spend more time thinking about Software Architecture, high-level designs and splitting up the work into smaller junks for other developers to consume. You will start mentoring Junior Developers. Recruiting will require a lot of your attention. You will spend your time in Meetings, discussing project goals with business people. One might say, you take the role of a mediator. Others might call you a manager.

Once you know the ins and outs of the business, you are an essential asset for the company. You might get asked to become a manager, or at least managing projects will slowly feel like a natural extension of your responsibilities. But beware! This slow and gradual process is dangerous. Moving back to being a full-time programmer is not easy. During the time you were busy with project management, others were busy improving their coding skills. You can try to keep up-to-date in your free time but that's hard.

I've seen excellent developers become great managers. At some point in your career it's a decision you need to make for yourself.

However you decide, it pays off to invest some time into learning how to communicate. Empathy plays a prominent role in that. Developing software as a team is so complicated that a lot of time is spent on aligning goals and communicating problems. In fact, communication is what you get paid for. This includes documentation, tests and the code itself.

Talk to others, listen to their problems. Read books about Software Project Management, even though you don't want to be a manager yourself. It will help you understand the role of your boss.

A word about money

There are many good reasons to work in IT, but money is not one of them.

While it can be tempting to base your career decisions on prospective salary, don't do it. You will be very unhappy. You will spend eight hours or more each day sitting in front of a blinking cursor. That's a lot of time, and time is much more valuable than money.

Don't get me wrong. There's plenty of jobs that pay well. You will most likely not get rich, though. If you want to make it big, I can't help you. Maybe look into Real Estate or so... The only way to get rich as a developer is to work on something really hard, put in lots of hours and get lucky. Startups, basically. Keep in mind: One Bill Gates takes a thousand failed attempts. Another way is to stop being a programmer and become a manager instead. I've already shared my opinion on that in the last section.

Final words

While you should learn to read (and maybe write) code, working as a professional programmer is not for everyone. You might ask: "Is it worth it?". For me it was the right decision. Hopefully I could help you to make your own.


The Future of Rust Matthias Endler

Matthias Endler2017-04-27 00:00:00

Let me first point out the obvious: yes, the title is a little sensationalist. Also you might be asking why I should be entitled to talk about the future of Rust. After all, I'm neither part of the Rust core team, nor a major contributor to the Rust ecosystem. To that I answer: why not? It's fun to think about the future of systems programming in general and Rust in particular.

Ferris is the inofficial Rust mascot
Ferris is the inofficial Rust mascot
Source: Illustration provided by FreePik.com

You might have heard of the near-term goals that the core team has committed itself to. Faster compile times and a more gentle learning curve come to mind. This post is not about that. Instead, I want to explore some more exotic areas where Rust could shine in five to ten years from now. To make it big, we need both, roots and wings.

Data Science

Right now, the most popular languages for Data Science are Python, Java, R, and C++.

Programming language popularity for data science
Programming language popularity for data science (Source).

We've observed that while prototypes are mostly written in dynamically typed languages like Python and R, once an algorithm reaches production level quality it is often rewritten in faster languages such as C++ for scalability. It is not unthinkable that Rust is going to be some healthy competition for C++ in the near future. The benchmarks of leaf, a machine learning library written in Rust, are already nothing short of impressive.

Blockbuster games

Games are another area where Rust might shine. It's financially attractive for Game Studios to support multiple platforms without much effort. Cargo and rustup make cross-compiling easy. Modern libraries slowly fill the tooling gaps for large-scale game development. Rust's support for the Vulkan 3D graphics API might already be the best of class. The killer feature though is the unique combination of safety and performance. If you ship a game to a million players and they throw money at you, you'll better make sure that it doesn't crash... right?

That said, the first AAA Rust game might still be far in the future. Here's Blizzard's standpoint on Rust in 2017.

Systems Engineering

Maybe — eventually — we will also see formal verification of the Rust core. Projects like RustBelt would then open new opportunities in safety-focused industries like the Space industry. Wouldn't it be nice to safely land a Spacecraft on Mars that is controlled by Rust? (Or by one of its spiritual successors.) I wonder if SpaceX is experimenting with Rust already...

Integrating with other languages

There are many other areas I haven't even mentioned yet. For example, financial and medical software or Scientific Computing, just to name a few. In all cases, Rust might be a good fit. Right now the biggest barrier to entry is probably the huge amount of legacy code. Many industries maintain large codebases in Cobol, C or Fortran that are not easily rewritten.

Fortunately, Rust has been proven to work very nicely with other languages. Partly because of strong C-compatibility and partly because there is no Runtime or Garbage Collector. A typical pattern is to optimize some core part of an application in Rust that has hard safety/performance requirements, while leaving the rest untouched. I think this symbiosis will only become stronger in the long run. There are even ambitious projects like Corrode which attempt to translate C code to Rust automatically.

Summary

Overall I see huge potential for Rust in areas where safety, performance or total control over the machine are essential. With languages like Rust and Crystal, a whole class of errors is a thing of the past. No null pointers, no segmentation faults, no memory leaks, no data races. I find it encouraging that future generations of programmers will take all that for granted.


Launching a URL Shortener in Rust using Rocket Matthias Endler

Matthias Endler2017-04-09 00:00:00

One common systems design task in interviews is to sketch the software architecture of a URL shortener (a bit.ly clone, if you may). Since I was playing around with Rocket – a web framework for Rust – why not give it a try?


Hello Again Posts on elder.dev

Posts on elder.dev2017-03-19 00:00:00 Hello World! I am now the proud owner of bentheelder.io. If you are curious, you can find the page source for the new site here, and the scripts used to set up the GCE VM here. The new site is also now on Cloudflare for performance and security. The old site will be redirected to this one soon, and what little content it had has been preserved here. The pages have been reformatted to match the new site, and should be much easier on the eyes.

The Essence of Information Matthias Endler

Matthias Endler2017-03-18 00:00:00

People look confused when I tell them about my passion for algorithms and data-structures. Most of them understand what a Programmer is doing, but not what Computer Science is good for. And even if they do, they think it has no practical relevance. Let me show you with a simple example, that applied Computer Science can be found everywhere.

Imagine a pile of socks that need to get sorted. Not exactly the most exciting pastime. You've put off this task for so long, that it will inevitably take an hour to be done.

Yes, there is a game about sorting socks.
Yes, there is a game about sorting socks.
Source: It's called Sort the Socks and you can get it for free on the App Store.

Considering your options, you decide to get some help. Together with a friend you get to work. You finish in roughly half the time.

A Computer Scientist might call this pile of socks a resource. You and your friend get bluntly degraded to workers. Both of you can work on the problem at the same time — or in parallel. This is the gist of Parallel Computing.

Now, some properties make sock-sorting a good fit for doing in parallel.

  • The work can be nicely split up. It takes about the same time for every worker to find a pair of socks.
  • Finding a different pair is a completely separate task that can happen at the same time.

The more workers you assign to this task, the faster you're done.

  • 1 worker takes 60 minutes.
  • 2 workers take 30 minutes.

How long will 3 workers take? Right! Around 20 minutes. We could write down a simple formula for this relationship:

The formula for the sorting time.
The formula for the sorting time.

Well, that is not quite correct. We forgot to consider the overhead: When Mary tries to pick up a sock, Stephen might reach for the same. They both smile and one of them picks another sock. In computing, a worker might do the same. Well, not smiling but picking another task. When lots of workers share resources, these situations occur quite frequently. And resolving the situation always takes a little extra time. So we are a bit away from our optimal sorting speed because of that.

But it gets worse! Let's say you have 100 workers for 100 socks. In the beginning, every worker might take one sock and try to find a match for it. Here's the problem: As soon as they pick up one sock each, there are no socks left. All workers are in a waiting state. The sorting takes forever. That's a deadlock, and it's one of the most frightening scenarios of parallel computing.

In this case, a simple solution is to put down the sock again and wait for some time until trying to get a new sock. Another way out of the dilemma would be, to enforce some kind of "protocol" for sorting. Think of a protocol as a silent agreement between the workers on how to achieve a common goal.

So, in our case, each worker might only be responsible for one color of socks. Worker one takes the green socks, worker two the gray ones and so on. With this simple trick, we can avoid a deadlock, because we work on completely separate tasks.

But there's still a catch. What if there are only four green socks and 4000 gray socks? Worker one would get bored fairly quickly. He would sort the two pairs of socks in no time and then watch worker two sort the rest. That's not really team spirit, is it?

Splitting up the work like this makes most sense, if we can assume that we have around the same number of socks for every color. This way we achieve roughly the same workload for everyone.

The following histogram gives you an idea of what I mean:

Even piles of socks.
Even piles of socks.

In this case, we have about equally sized piles for each color. Looks like a fair workload for every worker to me.

Uneven piles of socks.
Uneven piles of socks.

In the second case, we don't have an equal distribution. I don't want to sort the gray socks in this example. We need to think a little harder here.

What can we do?

Most of the time it helps to think of other ways to split up work. For example, we could have two workers sort the big gray pile together. One sorts the large socks; the other one sorts the small ones. We run into another problem, though: Who decides what "large" and "small" means in this case?

So, instead of thinking too hard about a smarter approach, we decide to be pragmatic here. Everyone just grabs an equally sized pile of socks — no matter the color or the size — and gets to work.

Most likely, there will be some remaining socks in each pile, which have no match. That's fine. We just throw them all together, mix the socks, create new piles from that, and sort them again. We do so until we're done. We call that a task queue. It has two advantages: First, you don't need any additional agreements between the workers and second, it scales reasonably well with the number of workers without thinking too hard about the problem domain.

The tricky part about distributed systems is, that seemingly straightforward solutions can fail miserably in practice.

What if our small piles look like this?

A random pile of socks.
A random pile of socks.

The number of pairs in each pile is... sobering. What we could do is run a very quick presorting step to increase the number of matches. Or maybe you come up with an even better idea?
The cool thing is, once you have found a faster approach, it works for similar tasks, too.

Problems like this have their roots in Computer Science, and they can be found everywhere. Personally, I don't like the term Computer Science too much. I prefer the German term "Informatik", which I would roughly translate as "Information Science". Because the real essence of what we're doing here is to find a general way to solve a whole class of problems. We think of the nature of objects and their properties. We don't sort socks; we try to answer the fundamental questions of information. Maybe now you can understand why I'm so passionate about this subject.

Oh, and here's a related post about why I love programming.


Why I Love Programming Matthias Endler

Matthias Endler2017-03-15 00:00:00

Programming has many faces. It is the science of structured thinking. It is the art of eloquent expression. It teaches you to be humble when you look at other peoples' fascinating work. Most of all, it teaches you a lot about yourself.
While the syntax may change, the concepts will not.


CreatureBox Posts on elder.dev

Posts on elder.dev2016-01-02 00:00:00 CreatureBox is a simple evolutionary obstacle avoidance demo I wrote inspired by studio otoro’s awesome creatures avoiding planks. I wanted to build something similar for fun and try out golang’s Go mobile project as well, so over the break between semesters I took a little time to write one. gomobile The first thing I did was get gomobile up and running, and create a basic main loop to handle events and draw a quad to the screen during paint events.

Rust Hotswap Posts on elder.dev

Posts on elder.dev2015-01-12 00:00:00 ⚠ Warning This post is old! Rust has changed a lot since this post was written, it may not still be accurate. Mozilla’s Rust language has just reached 1.0 alpha. I’ve been learning it off and on for a while now, and I’m quite happy to see the breaking changes slow down. As part of learning rust I’ve played around implementing things that would be normally done in c or c++; one of those is the old trick of hot-swapping code by reloading a shared library at runtime.

Tools Matthias Endler

Matthias Endler2011-10-30 00:00:00

For as long as I can think, religious flamewars have infected computer science.

Having arguments about technical topics can be healthy, but flamewars are not. I'm sick of it. I'm fed up with people telling me that their work environment is oh-so better, faster and so on. That's fine, but it doesn't matter. Your equipment only plays a supporting role. You don't even need a computer to do programming. Donald Knuth wrote algorithms on a notepad. Alan Turing wrote the first chess computer on a piece of paper. And it worked. Beat that!

For an average user, the next best system is probably good enough. Just a few bucks and you get an excellent piece of hardware which is completely sufficient to surf the web, chat, archive photos, write documents, listen to music and watch movies. You can do that with a Pentium IV, 256 MB RAM and any recent Operating System (you will likely get that one for free). Heck, you can use your old Commodore for most of that. Computers have been mature and reliable enough to do all that for ages. There's no need to upgrade your system for Farmville, just like there's no reason to buy a new car if the old one works perfectly fine. When it comes to software, many of us still use Office 2000 or Photoshop 8 or VisiCalc without feeling the urge to upgrade.

Professionals find themselves in a similar situation. Well, maybe we invest a bit more money, but still, our hardware is incredibly cheap compared to our salary (hopefully). Nothing is perfect, but most of the time it's good enough. That compiler you were using a decade ago? Still does the job. We are still using slightly modified descendants of programming languages from computing stone-age. Even if you're doing numerical computing for NASA, your primary work environment is a black box running a text editor or an IDE.

I don't care what you are using to get things done. Find an environment that suits your needs and be happy with it. Maybe you use Emacs on a Lemote Yeelong netbook (hello Richard Stallman) or Vim on your workstation. It's the same thing: A text editor running on a piece of metal. You're not a worse programmer for using Nano, ed or TextMate. Notepad works just fine, too. It loads files, saves files and lets you edit them in between. That's a hell lot more functionality than Bill Gates and Paul Allen had when they wrote a BASIC interpreter for the Altair. If you find something you're happy with, just stick with it but don't start arguing. It isn't worth your time.

Don't feed the trolls. When it comes to software, don't fall into the old FreeBSD vs. Linux vs. Windows vs. mum cliche. Instead, talk about your code. Let's look at your problem-solving skills. Let's be pragmatic here.

Talk is cheap. Show me the code. - Linus Torvalds

I don't care which programming language you are using. Java? Fine. Visual Basic? Great! Scala, Cobol, PHP, C++? All fine. Write in Assembler or lolcode. Don't moan about the fact that language X is missing feature Y. Write a library or use something different. Stop saying JavaScript is a toy language. It just doesn't fit your needs. Instead, show me your Lisp adventure game. Write an interpreter for Brainfuck. Do something. Move things.

Concerning PHP, nir wrote on Hacker News:

Any idiot can write a snarky comment about PHP. Very few get to write code that has anywhere near the impact it had.

Will you fall off your chair when I admit that I like the PHP syntax? OK, it has its rough edges (do we really need the $ sign?) but what's more important is how much I can get done with it. PHP was my long time go-to language for off the hook, one time scripts. It looks a bit ugly but it runs on any server and comes with an enormous amount of built-in functionality. It's great for rapid prototyping and gluing things together. In fact, when you write a piece of software, what you should strive for is to produce quite good software and what you really need to accomplish is good enough software to make your users happy.

Zed A. Shaw puts it quite nicely in the afterword to Learn Python the hard way

I have been programming for a very long time. So long that it is incredibly boring to me. At the time that I wrote this book I knew about 20 programming languages and could learn new ones in about a day to a week depending on how weird they were. Eventually though this just became boring and couldn't hold my interest. What I discovered after this journey of learning was that the languages didn't matter, it was what you did with them. Actually, I always knew that, but I'd get distracted by the languages and forget it periodically. The programming language you learn and use does not matter. Do not get sucked into the religion surrounding programming languages as that will only blind you to their real purpose of being your tool for doing interesting things.

Don't get emotional for any tool you use. An iPhone - I'm sorry to disappoint you - is just a phone. No magic. No "think different". "But it's evil!", the ether says, "it's not open source". Well, Android just exists because Google needed to rapidly develop a mobile platform. It's simply part of their business. There is no moral behind that. Google is a yet another company just like Microsoft or Apple.

My MacBook serves me as a solid tool, but if something "better" comes around, I will happily kick it out. I've ditched Firefox after five years just because Chrome is faster and I will get rid of Chrome when I find a worthy successor. Vim is quite good in my opinion but if there's a faster way to do things I'm not afraid to dump it. Instead get your hands dirty and fix the problems or craft something new.


Are you a Programmer? Matthias Endler

Matthias Endler2011-10-20 00:00:00

My geography teacher once told the story of her first lecture at University. As an introduction, her professor asked the class to draw a map of Germany without any help and as accurate as possible. To her surprise, she was not able to fill the map with much detail. Even the shape of the country was a bit vague.

She had seen thousands of images of Germany (her mother country) but wasn't able to reproduce it from her blurry memory. She would have to look it up.

Doesn't this sound familiar? We rely on machines to manage large portions of our knowledge. There's hard work involved to learn something by heart.

Here is a similar test for programmers:

Using a programming language of your choice, write a correct sorting algorithm with an average runtime complexity of O(n*log n) (Heapsort, Quicksort, Bucketsort, you name it) on a piece of paper without the help of any external tools.

And by correct I mean it must be free of bugs without any modifications when you type it in.

You would be surprised by the large percentage of professional software engineers who can't pull this off.

Some might argue that knowledge about details of programming language syntax is unimportant: "Why learn all the little nitpicks when you know how to use a search engine? Why start with a clean slate when you can easily copy, paste and modify an example from a tutorial? Every few years/months I have to completely relearn the syntax for a different language anyway."

But that is a myth. If you know only one programming language really well - even if it is something outdated like Fortran or COBOL - you could easily earn a fortune with that knowledge. Suppose you started with C in 1975. You could still use the same syntax today - almost four decades later. Same for text editors. Emacs and Vim are both decades old. They are battle-hardened. I don't care which one you prefer, but you will spend a large part of your life with your tools so invest the time to master them.

As a side note, it appears that very few people strive for perfection in anything they do. They happily settle for "good enough". This can have many different reasons, and I'm not blaming anybody for not doing his homework but maybe I'm not alone with that observation.

If you don't know how to use your tools without a manual, you are a lousy craftsman. If you need a dictionary to write a simple letter, you will have a hard time becoming a writer because it would already be challenging for you to form elegant, fluent sentences — let alone engaging and original stories. I don't want to read these books.

What makes a programmer?

  • She has at least one programming language she knows inside out.
  • She can implement standard algorithms (i.e. for sorting, searching) and data-structures (i.e. trees, linked lists) which are robust and reasonably fast on the fly.
  • She has at least a basic understanding of complexity theory and programming concepts like recursion and pointers.

But, to be a good programmer, you should

  • Be able to code in at least two fundamentally different programming paradigms (i.e. declarative, functional).
  • Have experience with big software architectures.
  • Be familiar with your programming environment like the operating system and a sophisticated text editor of your choice. Preferably one, that is easily extendable.

And that is just the tip of the iceberg. "There's too much to learn!", I hear some of you say. Start slowly. You need only three commands to start with Vim: i, ESC, :wq. That's enough for day one.

I realize that most of these essentials won't be taught during lectures. You have to learn a vast portion on your own. But let's face it: If you don't know this stuff, you are not a programmer, you're a freshman.


On Hard Work Matthias Endler

Matthias Endler2011-10-13 00:00:00

Great people get shaped by their achievements

  • There's Thomas Edison who developed countless prototypes before selling a single light bulb.
  • The unemployed Joanne K. Rowling writing Harry Potter in a Cafe while caring for her child.
  • Steve Wozniak creating the first personal computer in his spare time while working at HP.

What do they have in common?

They all lived through frustration and contempt but still reached their goals, even though the chances for success were low. These people are stemming their strong will from an intrinsic curiosity.

Dedication

Sure, I love what I do. I want to be a programmer for the rest of my life, but sometimes it seems simply too hard to finish a project. I get scared by the big picture and fear that I won't finish on time. What I need is a different mindset.

Dhanji R. Prasanna, a former Google Wave team member made this observation

And this is the essential broader point--as a programmer you must have a series of wins, every single day. It is the Deus Ex Machina of hacker success. It is what makes you eager for the next feature, and the next after that.

While Google Wave has not been commercially successful, it sure was a technical breakthrough — and it was a drag to push it out into public. We always have to see our goal right in front of us, as we take a billion baby steps to reach it. This is true for any profession. Winners never give up.

Direction

Today it is easier to accomplish something meaningful than ever before.

If you are reading this, you have access to a powerful instrument — a computer with an Internet connection. We live in a time where a single person can accomplish miracles without hard physical labor. A time where billions of people can grow a business from their desk, get famous in minutes, publish books in seconds and have instant access to large amounts of data. The most potent development over the last 100 years has been the reduction of communication costs. Transferring a bit of information to the other end of the world is virtually free and takes fractions of a second. While proper education was a privilege of a lucky few well into the 20th century, learning new things is now mostly a question of will.

Nevertheless, learning is still a tedious task, requiring patience and determination. As the amount of information has increased, so have the ways of distraction. Losing focus is just a click away.

Devotion

Everybody can start something. Few will finish anything. That's because getting things done is hard, even if you love what you're doing. (Watch the beginnings of There Will Be Blood and Primer for a definition of hard work.)

No matter what they tell you, achieving anything sustainable means hustling. It means making sacrifices. It means pushing through. It means selling something even though it isn't perfect. Your beautiful project might turn into an ugly groundhog in the end. Put makeup on it and get it out the door.

On a report about Quake's 3D-Engine, developer Michael Abrash says:

By the end of a project, the design is carved in stone, and most of the work involves fixing bugs, or trying to figure out how to shoehorn in yet another feature that was never planned for in the original design. All that is a lot less fun than starting a project, and often very hard work--but it has to be done before the project can ship. As a former manager of mine liked to say, "After you finish the first 90% of a project, you have to finish the other 90%." It's that second 90% that's the key to success.

A lot of programmers get to that second 90%, get tired and bored and frustrated, and change jobs, or lose focus, or find excuses to procrastinate. There are a million ways not to finish a project, but there's only one way to finish: Put your head down and grind it out until it's done. Do that, and I promise you the programming world will be yours.

That last part has influenced me a lot. The dedication, the urgency to reach your aims must come from within you. It's your raw inner voice speaking — don't let it fade away. And when you are close to giving up, stop thinking so hard. Just try to push forward and make a tiny step in the right direction. Ship it!


Overkill – Java as a First Programming Language Matthias Endler

Matthias Endler2010-02-12 00:00:00

I recently talked to a student in my neighborhood about his first programming experiences. They started learning Java at school, and it soon turned out to be horrible.

A lot of us learned to code in languages like BASIC or Pascal. There was no object orientation, no sophisticated file I/O and almost no modularization... and it was great. In BASIC you could just write

PRINT "HELLO WORLD"

and you were done. This was actually a running program solving a basic and reoccurring problem: Output some text on a screen.

If you wanted to do the same thing in Java you just write:

public class Main {
  public static void main (String[] args) {
    System.out.println("Hello, world!");
  }
}

Do you see how much knowledge about programming you must have to achieve the easiest task one could think of? Describing the program to a novice programmer may sound like this:

Create a Main class containing a main-method returning void expecting a string array as a single argument using the println method of the out object of class PrintStream passing your text as a single argument.

— please just don't forget your brackets. This way your first programming hours are guaranteed to be great fun.

OK. So what are the alternatives? I admit that nobody wants to write BASIC anymore because of its lack of a sophisticated standard library for graphics (Java doesn't have one either) and its weak scalability. The language has to be clean and straightforward. It should be fast enough for numerical tasks but not as wordy as the rigid C-type bracket languages (sorry C++ guys). It should have a smooth learning curve and provide direct feedback (compiled languages often suck at that point). It should encourage clean code and reward best practices. One language that provides all that is Python.

And Python has even more: hundreds of libraries that help you with almost everything, good integration into common IDEs (PyDev in Eclipse, IDLE...), a precise and elegant syntax.

Here is our program from above written in Python:

print("Hello World")

There's no need to know about object orientation, scopes and function arguments at this point. No householding or book-keeping. Yes, it's an interpreted language, but that's not a deal breaker for beginners.

If you aren't convinced yet, printing and formatting text output in Java is relatively easy for an advanced programmer but the gruesome stuff begins with file input:

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;

public class fileIO {
    public static void main(String[] args) {
        String filename = "test.txt", line;
        try {
            BufferedReader myFile =
                new BufferedReader(new FileReader(filename));

            while ( ( line = myFile.readLine()) != null) {
                System.out.println(line);
            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

I hear you say: "Dude, file I/O is pretty complex. It's just the way it is". That's true... internally . But a beginner should get an easy interface. Python shows how it's done:

file = open("test.txt")
text = file.read()
print(text);

The code goes hand in hand with the natural understanding of how the process works: "The computer opens a file, reads it and prints it". Even a five-year-old kid can understand that. Nobody would start to explain: "Before you can read a file you need a BufferedReader that works on a FileReader..." even if this is precisely how it works internally. You want to explain the big picture at first. The elementary principles of teaching a computer how to do useful stuff. Otherwise, you will start frustrating beginners and fool them into thinking that they are not bright enough for programming. Programming is fun and starting with it is the most crucial step. So don't spoil that experience with layers of unneeded abstraction.


Howto Sort a Vector or a List in C++ using STL Matthias Endler

Matthias Endler2010-01-27 00:00:00

A little code snippet that people need very often.


Why I Love Text Files Matthias Endler

Matthias Endler2010-01-10 00:00:00

Text files are the single most important way we can communicate with computers. It's no coincidence that they are also the most vital way to interact with other human beings. What we can achieve with text files is invaluable: Write it once and refer to it whenever you want to get the message across in the future. Write a program (it's just text), save it and let the machine execute it whenever you like. Write another text file which contains the rules for the execution of your program and the computer runs your application exactly as you specified (cron files do that on Unix).

Text files can be structured in any way you can imagine. Some flavours are JSON, Markdown and SVG. It's all just text. There exist a billion of programs and algorithms to access, modify and distribute text files. You can write them with Emacs, print them on a terminal, pipe them through sed and send them via email to a friend who publishes them on the web. Because text files are so important we have good support for them on any computing system. On Unix, everything is a file and HTML is just structured text. It's a simple and powerful tool to make a contribution to society that outlasts our lives.

I have a single text file in my mac dock bar which is called TODO.txt. I open it every day, and after years of experimenting with different task management apps from simple command line tools to sophisticated online information storage systems, I always came back to plain text files. And the explanation is simple: If humanity will still be around a thousand years from now, chances are that plain text files are one of the very few file formats that will still be readable.

They are an incremental part of how we can modify our environment without even leaving our desk. They have no overhead and can contain a single thought or the complete knowledge of our species. Distributing textual information is so vital for us that we permanently develop faster distribution networks – the fastest by now being the internet.

On the web, you have instant access to a virtually endless amount of information and data distributed as plain text files. New web services made accessing the data even easier offering APIs and feeds. You can pull down the data from their servers and make statistics with a programming language of your choice. As you may have noticed, my affinity to text files partially comes from my programming background. As Matt Might correctly points out on his blog:

The continued dominance of the command line among experts is a testament to the power of linguistic abstraction: when it comes to computing, a word is worth a thousand pictures.

Whenever you like a text on the web, just link to it and create a wonderful chain of ideas. Want to read it later or recommend it to a friend? Just share the text or print it on paper. The fact that we all take such things for granted is a testament for the power of text files and their importance for the information age.


Running Legacy Code Matthias Endler

Matthias Endler2009-11-08 00:00:00

This short article deals with a severe problem in software development: bit rot. When switching to a new platform (for instance from Windows XP to Windows Vista/7), the programmers need to make sure that old bits of code run flawlessly. There are several ways to achieve this goal that will be discussed in the next paragraphs:

Porting the code

This is generally considered a hard path to follow. For non-trivial legacy code-blocks, chances are high that they contain side-effects and hacks to make them work in different environments. Porting code means replacing parts of the program that use functions and methods that don't exist anymore with new ones which make use of the modern libraries  and routines of the new platform. The significant advantages are maintainable software and sometimes faster running programs. But it may be needed to hack the new platform libraries in order to preserve the whole functionality of an old application. When changing an algorithm inside legacy code, the ported version may become unstable. Thus there may be better ways of maintaining obsolete code today.

Emulators

Emulators work much the same like porting the code. You replace old function calls with new ones to make everything work again.
However you don't alter the old codebase itself (because you may not have the source code available) but you create a new compatibility layer that "translates" the communication between the underlying operating system and software (our new platform) and our old software. Emulation can also be very fast and run stable for many years but writing an emulator can be even harder than porting the code because an educational guess may be needed to figure out how the program works internally. Additionally, the emulator itself may become obsolete in the future and might eventually  be replaced by a new one.

Virtual machines

During the last years, a new approach was gaining popularity. The idea is simple: Don't touch anything. Take the whole platform and copy it in order to run old software. The old software runs on top of the old operating system within a virtual machine that runs on the new platform.

From a sane software developers view, this method is ridiculous. A lot of resources are wasted along the way. The system is busier switching contexts from an old platform to the new one and back than running the actual legacy program. However, with cheap and capable hardware everywhere this idea gets more and more interesting. As Steve Atwood coined it:

Always try to spend your way out of a performance problem first by throwing faster hardware at it.

And he's right. The Microsoft developers did the same on their new NT 6.0 platform (Vista, Windows 7, Windows Server 2008...): Windows XP is running on a virtual machine. This way everything behaves just like one would run the software on the old system. And by optimizing the performance bottlenecks (input/output, context switches), one gets a fast and stable, easy to maintain product.

Every method has its major advantages and disadvantages. It's on the developer to select the appropriate strategy.