Binary Interfaces and Language Interoperability – xlang

How does a program call a function in the operating system? If so many languages can all call operating system functions, why can’t any programming language call functions in any other programming languages more easily? How can we make this work?

Usually for any given one language, interoperability is defined by the language or runtime, and the code that you need to write is unique for that toolset. In almost all cases, the one “other language” that you can call into is “C”, although that can sometimes be used as a bridge to other languages.

Why are language interop solutions all unique?

Every language has it’s own object management, data management, and feature sets. Enabling full-fidelity access to these features requires domain-specific design for that language. For instance, if a language uses garbage collection, object implementations may need to be able to trace references correctly to participate in cleanup. String data may need to be stored in a global string table, or allocated using the same allocator the runtime uses. Dynamic languages may need the ability to enumerate functionality on the native objects. The models used for language extensibility can be faster and more expressive by being specifically tuned to the design characteristics of the language and runtime.

However, even in this short description, you can see emerging patterns that span many languages and scenarios. xlang doesn’t attempt to replace the native interface layer for any one language, but instead provides a framework for mapping common concepts across many languages.

Why is C so magical?

As a “low level” language, C function definitions can specify their calling convention and exact data in memory. A calling convention specifies how control and data is passed from one routine to another as defined by machine instructions and memory layout, rather than by programming language semantics. While it’s easy to see how this might allow different programming languages to work together, it also can be critical to allow two pieces of code written in *the same* programming language to work together when using separate toolchains.

Operating systems typically define their functions in C for this reason. A C (or C++) function that specifies its calling convention and uses size-specific types effectively defines the instructions, registers and memory layout needed to make a function call. We call this type of specification a binary-stable function definition.

xlang’s predecessor, WinRT

Windows took this approach a step further in the 90’s, and created a set of conventions that also specified a binary layout for a virtual function dispatch table (often referred to in C++ as a v-table) and a few fundamental operations that enabled developers to use object-oriented programming through binary-stable interfaces to make calls across programming language boundaries. This system was known as COM, and was a fundamental building block that enabled Visual Basic to call objects implemented in C++ and shipped as binary code by the operating system or by other developers. It also provided the foundation for OLE, which allowed applications to embed controls implemented in other applications.

Around 2010, Windows introduced the Windows Runtime type system. Building on the COM model of binary stable interfaces, the Windows runtime added language-independent metadata for describing types and well-defined interface call patterns that could be expressed as common object oriented constructs like static methods, constructors, properties and inheritance. Today this is used to support calling directly between a wide range of languages including C, C++, C# (and other .NET languages), Rust, JavaScript, Python, and others.

The Windows Runtime model has been very effective for exposing operating system features, but it also has certain characteristics that limit how & where it can be used. It builds on COM, which requires the COM runtime to be initialized & present. It also relies on specialized support built into the operating system. This means that the latest features are only available on the latest Windows OS’s, and it can’t easily be taken to other platforms.

Introducing xlang

A project that my team is developing called xlang seeks to solve this problem using the same approach that Windows uses to make functionality written in one language callable by a code written in a wide variety of programming languages.

In simplest terms, the xlang project is a refresh of the Windows Runtime type system without the hard coupling to the Windows operating system. It defines a binary-stable interoperability model for languages that isn’t tied to a particular platform or toolchain. It also represents an evolution of the Windows Runtime, improving scalability & performance based on our experience living with the system for a decade. Because it borrows heavily from Windows Runtime concepts, we’re continuing to move & recreate some of our Windows Runtime tooling in the xlang repo as well so that we can contribute support for new languages & features back to the Windows Runtime as well.

As a recent example of that progress in action, the latest release of C++ /WinRT is now built from the xlang repository.

I’m looking forward to writing more about this project, it’s inspirations, progress, and goals over the coming months.

Thanks for reading along.

Ben Kuhn


… and a cookie, sort of.


It’s probably time for a cookie post, or something along those lines. Being the new year, why not reflect on 2018 a bit? This little trio was from Sanaa at Walt Disney World’s Animal Kingdom Lodge. The food was amazing, and at least as Disney fare goes, it was relatively reasonable for the quality & setting. I don’t remember the other two as well, but the middle one was called a ‘candy bar’, a house made chocolate and nut confection. Yum! The Naan sampler there is also not to be missed. 🙂

So how does this relate to coding? Not at all. But I hope you enjoyed the little break!


Windows Preview SDK (18309) now available

One of the things my team is responsible for is shipping the Windows platform SDK. We regularly flight previews to let developers get a head-start on new features in upcoming Windows releases, and we have a fresh set of bits for you today. If you want to scope out the latest & greatest APIs in the next Windows update, hit up the link below.


Windows 10 passes Windows 7

This showed up in my news summary today.

It’s great to see that we’ve finally beaten ourselves again. Every so often a release is just so robust & well received, it sets a high water mark that’s hard to surpass. It turns out that in software, much like in politics, getting the basics right counts for way more than doing new cool things. Windows XP (SP2+) got the basics right. Vista didn’t. Windows 7 got the basics right, and has been an entrenched release despite getting quite dated by now, and 8 was so bad we skipped 9 just to distance ourselves from it (that’s a joke).

Windows 10 took three years to earn its throne. For reference, Windows 7 took about 18 months, according to this old article I was able to dig up.

Windows 10 has had a few bumps along the way, but on the whole it’s been a great product, so this is a milestone that I’m happy to finally see that we’ve reached.


Racing Object Construction: a debugging tale

Like so many great debugging sessions, this one started with a classic line that I’ve heard many times before, and is almost always wrong (unless it’s coming from Kenny): “Hey, it looks like there’s a code generation issue here. Wanna take a look?” Code generation issues are not unheard of, but they’re very uncommon. Nonetheless, there was a crash, and a memory dump file to look at. I’ve obfuscated a few file & type names to protect the guilty, but in all other respects this represents a real crash.

Aside: Post-mortem debugging is a special genre of debugging where, instead of stepping through a program and watching things change, you have a only snapshot of a program that represents the program state at some moment in time, usually when something bad happened. In this case, the program crashed due to an access violation trying to read from address 0.

I debug these in windbg. For Windows, there’s no better tool for poking around the state of a dump file. Visual Studio is a great debugger with some wonderful features, but too many of them rely on being able to run the application, look at symbols, etc. Windbg is better for looking at what’s happening “on the metal” with or without symbols or code.

So, I opened up the crash dump. The application failed due to an access violation. We can start by looking at the register state in the dump file to see the instruction and data that the program crashed on. For the record, in this case we’re looking at the amd64 / x64 instruction set &  machine architecture.

0:040> .ecxr
rax=0000000000000000 rbx=00000158897f70a0 rcx=000001588937aba0
rdx=00007ffb2776ebf8 rsi=0000015889704558 rdi=0000015889704530
00007ffb`4f278fe6 488b00 mov rax,qword ptr [rax] ds:00000000`00000000=?

The RAX register value is 0. That’s certainly bad if you’re trying to dereference it. In most modern operating systems, the first page of memory is reserved but not committed to force a crash if code attempts to dereference it. But is it a codegen issue? Well, we need to understand what the code around this is doing to see whether this instruction is correct. In windbg, we can do this by disassembling the method up to the point of failure, as well as a few instructions past the failure.

0:040> u 00007ffb`4f278fe6-26 00007ffb`4f278fe6 
00007ffb`4f278fc0 4053 push rbx
00007ffb`4f278fda 488b4b10 mov rcx,qword ptr [rbx+10h]
00007ffb`4f278fde 48897c2430 mov qword ptr [rsp+30h],rdi
00007ffb`4f278fe3 488b01 mov rax,qword ptr [rcx]
00007ffb`4f278fe6 488b00 mov rax,qword ptr [rax]

0:040> u
00007ffb`4f278fe9 ff1581000300    call    qword ptr [__guard_dispatch_icall_fptr...]
00007ffb`4f278fef 488b4b10        mov     rcx,qword ptr [rbx+10h]
00007ffb`4f278ff3 8bf8            mov     edi,eax

Without even looking at the code, I can tell a few things from the debugger. First, these two lines are a double-dereference. It’s followed by a __guard_dispatch_icall_fptr call. That last call tells me that I’m looking up a method address. The double-indirect is characteristic of a virtual method table dispatch.

The code happens to be C++ code, which matches what we might expect from the dissassembly.

hr = target->QueryInterface(
    riid, reinterpret_cast<void**>(objectReference));

The guard call is a call to a control flow guard (CFG) check function, which validates that the target of the virtual call is a valid code page. The CFG function expects rax to contain the address of the function to be called. In short, this looks like typical code for loading and invoking a virtual method. This probably isn’t a code generation issue. But what is it?

rax was nullptr. We were attempting to read a method address that we could invoke from the v-table address held in rax, and store the function back in rax. It just happens to be the case that we were looking at the first method in the v-table. Otherwise there would have been an offset added to rax before dereferencing. The value in rax was read from rcx. Presumably, rcx should have been a pointer to a valid C++ object, if everything were working as expected. So let’s see what’s hanging out at rcx.

0:040> dqs rcx
00000158`8937aba0  00007ffb`277f85d0 mydll!ViewModel::`vftable'
00000158`8937aba8  00007ffb`277f8590 mydll!ViewModel::`vftable'
00000158`8937abb0  00007ffb`277f84a8 mydll!ViewModel::`vftable'
00000158`8937abb8  00007ffb`277f8478 mydll!ViewModel::`vftable'

Hmm. The machine read rcx and store zero into rax, but the dump file says the value in that memory location is 00007ffb`277f85d0, and the debugger tells us that points to a valid v-table. Something’s fishy. So what’s up?

There are a few thoughts that go through my head in this situation:

First off, depending on the process dump, the debugger sometimes fills in data from cached DLL and EXE files rather than reading it from the dump. This can lead to mismatches if the code had been overwritten. That doesn’t apply here, though, since rcx points to a heap address. The only place that value could have come from is the dump file itself. When the snapshot was taken, that address had the non-null value.

Which leads to the second thought: this is likely a race. Dump file process snapshots represent the process state at the time the dump was taken. There are two ways it can get out of sync with what a thread saw at the time that the thread triggered the exception. First, other threads in the process continue to run for a fraction of a second while the exception machinery determines that the exception isn’t handled and a crash dump needs to be taken. The failing thread won’t change any state, but other threads can. Second, even if another thread never got paged in during that time, the dump machinery captures the flushed process state. If there is a race involving a missing memory barrier, the thread may have seen a stale cache line state that is not reflected in the dump file.

There are a couple of ways that a v-table could go from null to non-null, and point to something that looks correct in context. This is either first-time construction, or memory re-use, where another object happened to have a reasonable value, was released, zero’d out, and reallocated. It could even be the same type of object, if that type gets allocated and released frequently.

In this case, now that I’ve found the v-table, I can reason a bit more about the usage patterns. This is basically created once on startup. It’s a view model for an MVVM application and the view is basically created once. So this isn’t memory re-use. That means we need to understand how an attempt was made to call a virtual method prior to construction. What I’m looking for is a

class ViewModel : public BaseViewModel {...}

    Controller^ controller,
    View^ view)
    // Listen for presentation ready change events
    m_token =
        someView->ViewChanged +=
            ref new TypedEventHandler<IView^, Object^>(
                this, &MyViewModel::OnViewChanged);

I’ve abbreviated a bit from the original code, but sure enough, we can see that we’re registering for an event inside the constructor. What’s more, reading a bit more source I found that this event can fire asynchronously at any time after registering.

The details get a little gnarly from here, and could warrant there own entire post, but basically C++ /CX objects (like most winrt objects in any language) support weak references via a separate control block. TypedEventHandler takes a weak reference to ‘this’ and the resulting weak reference, implemented on the control block object, points to the outer type, MyView, not BaseMyView.

To understand what actually happens here, we need to understand a bit about C++ object construction. Constructors in C++ run a few things during construction in a specific order:

  1. First, base classes are constructed transitively, each using the order described here.
  2. Next the v-tables for the current class are assigned
  3. Member object constructors run
  4. The current class constructor body is run.

You can see this in action in this simple program:

struct Member {
    Member() { cout << "member constructor\n"; }

struct Base {
    Base() { 
        cout << "base constructor\n"; 
    virtual void work() { cout << "base work\n"; }

struct Derived : public Base {
    Derived() { 
        cout << "derived constructor\n"; 
    virtual void work() override { cout << "derived work\n"; }
    Member m;

int main() {
    Derived d;
    return 0;

base constructor
base work
member constructor
derived constructor
derived work

Check line 2 of the output. Even though we’re constructing a ‘Derived’, the constructor for Base called Base::work(), not Derived::work(). v-tables pointers on an object are set as the constructors run, and can be uninitialized or overwritten as outer objects override the behavior of inner objects.

The behavior we see here in C++ is one choice. C# fills in the correct v-table before running any constructors at all. While that might sound better at a glance, in practice, it’s just a different choice with different implications. If the code above were converted to C#, it means the Derived class’s work() method would be called before its members were initialized.

Coming back to our original problem, the weak reference implementation in C++ /CX uses a placement new model. The control block used for reference counting and weak references allocates memory for the underlying type, then runs the constructor over that memory. The weak reference (basically a pointer to the control block) was handed out during the construction of the base type. When the event fired on another thread before the construction was complete and observable in cache, the control block attempted to call a method on the derived type, but the v-table hadn’t been set yet. Kaboom.

So what’s the takeaway?

In this case, the object enabled a callback against itself before construction was done. It was asynchronous here, but the same can occur with a synchronous callback registered against & triggered during construction. Although this case led to a crash, other variations of this type of coding pattern can lead to obscure behavior bugs as well.

Anytime you see a reference to self being handed out in a constructor in any programming language, be wary.

Using ‘this’ in the constructor may be correct, but only if you can reason thoroughly about the execution order of the class, derived classes, and the recipient. Specific ordering behaviors durin construction are language dependent, and if you’re relying on them, ask yourself if you’re being too clever for your own good.


I’ve been away from technical blogging for a while, and I’m overdue to jump back in. I used to blog on MSDN blogs. You can see some of my older posts about my past work on the Windows printing stack there.

I’ve since moved on to different things, and currently work as an engineering lead at Microsoft focused on the developer experience for Windows. My team helps ensure that programming languages that people love can be used to create great software that runs on the Windows family of devices. You can see (or participate in) one of my team’s latest projects, a cross-language interop system, here:

I’ll use this space to post about various topics related to programming & debugging. I have a love of cookies, Lego’s and the outdoors. Those things are also likely to leak into this blog as well. I also (sporadically) blog about cooking topics on, a blog my wife and I maintain.

Ben Kuhn