Are Explicit Location Bindings a Good Idea for a Shading Language?

Probably Not.

Introduction

Both the HLSL and GLSL shading languages support mechanisms for assigning explicit, programmer-selected locations to certain shader parameters. In HLSL, this is done with the regsiter keyword:

// texture/buffer/resource:
Texture2D someTexture          : register(t0); 

// sampler state:
SamplerState linearSampler     : register(s0);

// unordered access view (UAV):
RWStructuredBuffer<T> someData : register(u0);

// constant buffer:
cbuffer PerFrame               : register(b0)    
{
    // offset for values in buffer:
    float4x4 view              : packoffset(c0);
    float4x4 proj              : packoffset(c4);
    // ...
}

When setting shader parameters through the Direct3D API, these explicit locations tell us where to bind data for each parameter. For example, since the cbuffer PerFrame is bound to register b0, we will associate data with it by binding an ID3D11Buffer* to constant buffer slot zero (with, say, PSSetConstantBuffers).

The OpenGL Shading Language did not initially support such mechanisms, but subsequent API revisions have added more and more uses of the layout keyword:

// texture/buffer/resource
layout(binding = 0) sampler2D someTexture;

// shader storage buffer (SSB)
layout(binding = 0) buffer T someData[];

// uniform buffer:
layout(binding = 0) PerFrame
{
    mat4 view;
    mat4 proj;
    // ...
}

// input and output attributes:
layout(location = 2) in  vec3 normal;
layout(location = 0) out vec4 color;

// default block uniforms (not backed by buffer)
layout(location = 0) uniform mat4 model;
layout(location = 1) uniform mat4 modelInvTranspose;

It is clear that location binding was an afterthought in the design of both languages; the syntax is ugly and obtrusive. Using explicit locations can also be error-prone, since it becomes the programmer’s responsibility to avoid conflicts, etc., and ensure a match between application and shader code. The shading languages “want” you to write your code without any explicit locations.

If you talk to game graphics programmers, though, you will find that they use explicit locations almost exclusively. If you try to give them a shading language without this feature (as GLSL did), they will keep demanding that you add it until you relent (as GLSL did).

Why Do We Need These Things?

If the programmer does not assign explicit locations, then it is up to the shader compiler to do so. Unfortunately, there is no particular scheme that the compiler is required to implement, and in particular:

  • The locations assigned to parameters might not reflect their declared order.
  • A parameter might not be assigned a location at all (if it is statically unreferenced in the shader code).
  • Two different GL implementations might (indeed, will) assign locations differently.
  • A single implementation might assign locations differently for two shaders that share parameters in common.

When an application relies on the shader compiler to assign locations, it must then query the resulting assignment through a “reflection” interface before it can go on to bind shader parameters. What used to be a call like PSSetConstantBuffers(0, ...) must now be somethin glike PSSetConstantBuffers(queriedLocations[0], ...). In the case of Direct3D, these locations can be queried once a shader is compiled to bytecode, after which the relevant meta-data can be stripped, and the overhead of reflection can be avoided at runtime; this is not an option in OpenGL.

Even statically querying the compiler-assigned locations does not help us with the issue that two different shaders with identical (or near-identical) parameter lists may end up with completely different location assignments. This makes it impossible to bind “long-lived” parameters (e.g., per-frame or camera-related uniforms) once and re-use that state across many draw calls. Every time we change shaders, we would need to re-bind everything since the locations might have changed. In the context of OpenGL, this issue means that linkage between separately-compiled vertex and fragment requires an exact signature match (no attributes dropped), unless explicit locations are used.

As it stands today, you can have clean shader code at the cost of messy application logic (and the loss of some useful mix-and-match functionality), or you can have clean application logic at the cost of uglier shader code.

A Brief Digression

On the face of it, the whole situation is a bit silly. When I declare a function in C, I don’t have to specify explicit “locations” for the parameters lest the compiler reorder them behind my back (and eliminate those I’m not using):

int SomeFunction(
    layout(location = 0) float x,
    layout(location = 1) float A[] );

When I declare a struct, I don’t have to declare the byte offset for each field (or, again, worry about unused fields being optimized away):

struct T {
    layout(offset = 0) int32_t x;
    layout(offset = 4) float y;
};

In practice, most C compilers provide fairly strong guarantees about struct layout, and conform to a platform ABI which guarantees the calling convention for functions, even across binaries generated with different compilers. A high level of interoperability can be achieved, all without the onerous busy-work of assigning locations/offsets manually.

Guarantees, and the Lack Thereof

Why don’t shader compilers provide similar guarantees? For example, why not just assign locations to shader parameters in a well-defined manner based on lexical order: the first texture gets location #0, the next gets #1, and so on? After all, what makes the parameters of a shader any different from the parameters of a C function?

(In fact, the Direct3D system already follows just such an approach for the matching of input and output attributes across stage boundaries. The attributes declared in a shader entry point are assigned locations in the input/output signature in a well-defined fashion based on order of declaration, and unused attributes aren’t skipped.)

Historically, the rationale for not providing guarantees about layout assignment was so that shader compilers could optimize away unreferenced parameters. By assigning locations only to those textures or constants that are actually used, it might be possible to compile shaders that would otherwise fail due to resource limits. In the case of GLSL, different implementations might perform different optimizations, and thus some might do a better job of eliminating parameters than others; the final number of parameters is thus implementation-specific.

This historical rationale breaks down for two reasons. First is the simple fact that on modern graphics hardware, the limits are much harder to reach. The Direct3D 10/11 limits of 128 resources, 16 samplers, and 15 constant buffers is more than enough for most shaders (the limit of only 8 UAVs is a bit more restrictive). Second, and more important, is that if a programmer really cares about staying within certain resource bounds, they will carefully declare only the parameters they intend to use rather than count on implementation-specific optimizations in driver compilers to get them under the limits (at which point they could just as easily use explicit locations).

One wrinkle is that common practice in HLSL is to define several shaders in the same file, and to declare uniform and resource parameters at the global scope. This practice increases the apparent benefit of optimizing away unreferenced parameters. The underlying problem, though, is that the language design forces programmers to use global variables for what are, logically, function parameters. Trying to “fix” this design decision by optimizing away unused parameters is treating the symptoms rather than the disease.

As far as I can tell, there is no particularly compelling reason why a modern shading language should not just assign locations to parameters in a straightforward and deterministic manner. We need an ABI and calling convention for interface between the application and shader code, not a black box.

So Then What About Explicit Locations?

If our compiler assigned locations in a deterministic fashion, would there still be a need for explicit location bindings? Deterministic assignment would serve the same purpose in eliminating the need for a “reflection” API to query parameter bindings (though one could, of course, still be provided).

The remaining benefit of explicit locations is that they allow for us to make the parameter signatures of different shaders “match,” as described above. In the simplest case, a deterministic assignment strategy can ensure that if two shaders share some initial subsequence of parameters in common, then they assign matching locations to those parameters. In the more complex cases (where we want a “gap” in the parameter matching), it seems like the answer is right there in C already: unions allow us to create heterogeneous layouts just fine.

All we need to do, then, is provide a shading language the same kinds of tools that C has for describing layouts (structs and unions), and we should be able to rid ourselves of all these fiddly layout and register declarations.

So What Now?

In the long run, this whole issue is probably moot, as “bindless” resource mechanisms start to supplant the very idea of binding locations as a concept in shading languages and graphics APIs.

In the short term, though, I hope that we can get support for deterministic and predictable layout assignment in near-future versions of HLSL and GLSL. This would allow us to write cleaner and simpler shader declarations, without having to compromise the simplicity of our C/C++ application logic.

Advertisements

4 thoughts on “Are Explicit Location Bindings a Good Idea for a Shading Language?

  1. Perhaps this is slightly off-topic, but here is a case where I liked the explicit mapping:

    In OpenGL when you have a vertex buffers with different vertex attributes in it, these needs to be mapped to shader input names. Either you need to bind the vertex buffers on different vertex attributes indices depending on what shader you are using, or you need some scheme using explicit/fixed attribute channels for them.

    If you go for the different bindings depending on shader you could of course optimize this by having a different vertex array object for each shadermesh-combination, but I don’t like to keep that kind of data around.

    I much rather have static allocations of all the vertex-attributes I will ever use so that I can just switch shaders and it will just work.

    I did not see you cover vertex attributes at all in your pots so I’m not sure if it is relevant to your argument!

  2. “this issue means that linkage between separately-compiled vertex and fragment requires an exact signature match (no attributes dropped)” “Attributes” is a bit misleading here as the specifications use this word for something else. “No varying dropped”… legacy language or “no output variables or block members dropped in the subsequent shader input interface” erm…

    It seems that you suggest that variable elimination optimizations are not needed because the limits are much higher. For some interfaces, that might be true. At least for uniform samplers and for varying variables, I disagree. For the first case, because 16 uniform samplers per stage is not much. Ok that’s just an ARB failure. For the second case, vertex outputs would be store in LDS and cache is always limited.

    I agree that the overall layout and binding qualifiers is ugly and one of the issue is that Khronos doesn’t want to consider the global picture of the GLSL shader interface. Furthermore, yes it’s ugly as hell and it’s history is even worse but Jakob case with vertex attributes can be generalized to all resources.

    The way I see the future of rendering is something like 10 glMultiDraw per frame, each draw accessing the resources and code needed through dynamically uniform indexing and only switching programs for really significant different executions. For example, “shadow map generation”, “rendering”, “shading”. However, that doesn’t event mean that we would need to rebind the all the resources we need for rendering when we come back to that step. Yes, working at a lower binding rate than a frame, pretty challenging today and that’s not because the hardware can’t. The goal of that perspective, it much reduced CPU overhead and finer granularity scheduling in the GPUs so that each Execution Unit would reach full load but work on individual tasks. That large batches approach is just an ugly quick hack which gives us full GPU load for the price of efficiency. We can’t afford that any longer.

    When blocks were introduced I add the same feeling: why don’t we use structures instead and capitalized on it? They are answering every single use case or at least more of them that structures would. I wish I had an answer for “cleanness” but I haven’t. I agree that there is something wrong went we design some languages that manage to be more convoluted than C++ overload resolution. :p Who the hell is going to figure out all the rules involved? Beside few ARB members? No one.

    Finally you said that we an ABI, they would be no more need for reflection API. The only valid use case of the reflection API today, is debugging. Otherwise, there is this rebinding massive overhead required. That would not change with an ABI, because an external debug tool would be incapable without reflection to figure out the interfaces of a shader.

    Great article, I hope you are planning on more!

  3. I try to avoid explicit mapping whenever possible, and let the shader compiler tell me the layout. I like that it will can optimize stuff away, and I haven’t run into perf issues yet piecing together the shader constant buffers (although last time I really did this was on PS3 SPUs, which are fast πŸ™‚

    For common data that should be lumped together (scene camera stuff), I use ‘structs’ where the shader compiler doesn’t optimize away members of the struct so the layout is guaranteed and I can use the same buffer across shaders

    I also tend to use a struct between the shader stages so the interpolators line up

  4. “In practice, most C compilers provide fairly strong guarantees about struct layout, and conform to a platform ABI which guarantees the calling convention for functions, even across binaries generated with different compilers.” – if only this were true on my last big job. πŸ˜‰ The MSVC compiler for XB1 and CLANG based compiler for PS4 produce different layouts when you consider structs contained within other structs (or classes). This then required hand tuning placement with dummy values to ensure identical layouts (which were needed for reasons)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s