A Digression on Divergence

Posted on 12 April, 2013 by tangentvector

Most people who do serious GPU work understand that shaders and other GPU programs are executed in a SIMD or “SIMT” fashion using “warps” or “wavefronts” that group together multiple program instances (vertices, fragments, work-items, “threads”). It is also widely understood that when the condition computed for a dynamic if–else is different across these instances, the warp must execute “both sides.”

Aside:If you weren’t aware of any of that, you might like to look over this presentation from Kayvon Fatahalian before you proceed.

Terminology

Our terminology for GPU programming is kind of a mess, since every hardware vendor and every API/language seems to invent its own nomenclature. In this post, I’ll refer to individual running instances of a program as invocations (you might know these as vertices, fragments, work-items, or “threads”), and the implementation-specific aggregates of these instances that are actually scheduled as warps (you may know these as wavefronts or threads).

Caveat:For those familiar with OpenCL or Compute Shader, a warp is not the same thing as a “work-group” or “thread-group.”

When a the different invocations in a warp disagree on the way to go at a conditional branch, we will say they diverge. This is actually a terrible choice of words, since divergence is already used in CS theory to refer to non-termination, but there aren’t many good alternatives.

The whole topic is closely related to the question of when particular expressions/variables are uniform or varying across a warp.

So Why Does This Matter?

Many people don’t really understand what goes on for such “divergent” branches, beyond the “execute both sides” mantra. The various hardware vendors tend not to talk about it, and the API/language specs are surprisingly mum.

Maybe it doesn’t matter. After all, who cares how it is implemented if it Just Works™, right?

Well, the problem is that ever since we put things like global-memory atomics into our GPU languages, the behavior of divergent branches has been semantically observable. This is perhaps most pronounced if you try to get clever and write a per-pixel mutex, as my colleague Marco Salvi did. The following example is in HLSL, but you can write an equivalent program in GLSL, OpenCL, or CUDA:


// We will use each 'uint' entry in this buffer as a lock
RWStructuredBuffer lockBuffer;

void FeelingClever()
{
    uint pixelID = ...; // compute which lock to use

    // try to acquire per-pixel lock,
    // using comapre-and-swap

    [allow_uav_condition] // need this to appease fxc...
    while (true)
    {
        uint oldValue;
        InterlockedCompareExchange(lockBuffer[pixelID],
                                   0, 1, oldValue);

        // if we successfully swapped 0->1,
        // then we got the lock
        if( oldValue == 0 )
            break;
    }

    DoSomethingWhileHoldingLock();

    // release the lock and give everybody else a shot
    lockBuffer[pixelID] = 0;
}

In a world where every Pixel Shader invocation ran as a truly independent thread, this ought to work. If you try to run code like this on current D3D11 GPUs, though, you will find that it seems to work on some implementations, and breaks on others.

The crux of the problem is that a subset of the shader invocations in warp A might grab their mutex right away, while others fail to grab theirs, forcing the entire warp to spin in the while(true) loop. If the locks that A is waiting for are held by a different warp B, which is also in the same situation, then we have a deadlock.

Now, you might decide that this code just isn’t clever enough (as I initially did, when Marco brought this problem to me). If the warps are stuck spinning in that while(true) loop, then the solution might be as simple as rearranging the code so they never have to leave the loop in the first place:


RWStructuredBuffer lockBuffer;

void FeelingExtraClever()
{
    uint pixelID = ...;
    [allow_uav_condition]
    while (true)
    {
        uint oldValue;
        InterlockedCompareExchange(lockBuffer[pixelID],
                                   0, 1, oldValue);

        // if we successfully swapped 0->1,
        // then we got the lock
        if( oldValue == 0 )
        {
            // rather than break out of the loop,
            // just do the work here
            DoSomethingWhileHoldingLock();

            // release the lock and give
            // everybody else a shot
            lockBuffer[pixelID] = 0;

            // only bail from the loop after
            // we've release the lock
            break;
        }
    }

}

It turns out that this version didn’t work either. I honestly can’t remember which versions failed on which hardware, but nothing we tried worked across the board.

Stepping Back

At this point it is worth stopping to ask: why might we expect one of these two options to work, and not the other? What kind of assumptions are we making about how the HW handles divergent branches? Are those assumptions valid in practice?

It turns out that we naturally make a lot of very strong intuitive assumptions about how GPUs will execute our programs, and most of these assumptions aren’t actually guaranteed by the programming models or the hardware.

Aside:This strikes me as similar to the situation with memory models, where people have a strong tendency to assume sequential consistency, just because it is so natural. Un-learning “common sense” is one of the hardest parts of parallel programming.

What Do We Know?

Let’s start with the stuff that seems to be common across most GPU architectures.

We have our warp, made up of several invocations. Along with that we have an active mask which tells us which of the invocations are currently live. The warp as a whole has a single program counter, from which it reads and executes instructions for all of the invocations in the active mask.

Aside: Some people will insist that GPUs have a distinct program counter for every invocation in the warp. These people are either (A) confused, (B) trying to confuse you, or (C) work with some really odd hardware.

When the warp executes a divergent branch, the set of active invocations divide into two (or more) subsets. The warp must pick one of these subsets for the new active mask, and defer the other(s). It is easiest to think of the deferred invocations as sitting on a “divergence stack,” although the particular implementations may vary a lot between vendors.

What Don’t We Know?

At this point I hope some questions are nagging at you, in particular:

When we split the active set at a divergent branch, which invocation(s) get “followed” first, and which deferred?
When do the invocations that get deferred get picked up and followed?
When do the invocations that have diverged “re-converge”?

Because current GPU programming languages do not specify these behaviors, there is a lot of latitude in how different hardware can answer these questions. I can’t easily comment on what particular vendors do (except where information is public), but I can try to illuminate the breadth of the design space.

A particular hardware platform might support structured control-flow instructions (akin to the control flow operations in the D3D shader bytecode), or unstructured branches (c.f. CUDA’s PTX), or both. Things that might seem obvious in the high-level language code can sometimes be subtle when you are looking at a low-level ISA.

Which Path Gets Followed First?

In general, it shouldn’t matter which path gets followed first at a divergent branch, although my per-pixel mutex example clearly depends on the invocation(s) that obtained their lock being followed before those that failed to obtain theirs. There are, however, some cases where an implementer might clearly want to follow one option over another.

If we have a one-sided if:


if (someVaryingCondition)
{
    // inside
}
// after

it seems obvious that we should follow the invocations that go inside the if before those that are just going to skip over it. Of course, remember that “obvious” can be a dangerous word.

Similarly, for the back-edge on a do–while loop:


do
{
    // inside
}
while (someVaryingCondition);

it seems obvious to follow the invocations that continue looping first.

Lets look at some possible strategies:

Follow the “branch taken” option first

For a simple conditional branch in the hardware ISA, we could prefer to follow the invocations that take the branch over those that continue normaly. For the loop example above, this does the obvious thing, but not for the if example.

Follow the “not taken” option first

This option would do the expected thing for the if, but go against intuition for the loop. It might also be better for instruction caching, since the instructions immediately after the branch are likely to be in the cache already.

Follow “taken” first for back edges, “not taken” for forward edges

We can switch between the two strategies above based on whether the destination of the branch is before or after the branch instruction itself. This gives the expected “obvious” behavior for both of the examples above (assuming the compiler lays out basic blocks in the same order as the high-level code).

Follow whichever path has fewer active invocations

This may sound like an odd design decision, but it is precisely what is described in AMD’s Southern Islands (GCN) ISA documentation.

If we always follow the path that has fewer active invocations, then we know that every divergent branch reduces the number of active invocations by at least half. As a result, the maximum depth of our divergence stack is logarithmic, rather than linear, in the size of the warp (6 rather than 64 entries, in the case of GCN).

Multi-way (switch) and indirect branches (function pointers) add more complexity, but we already have several viable designs alternatives given only two-way branching.

Where/when do we re-converge?

After a divergent branch, a warp is executing with reduced SIMD efficiency: every instruction only applies to a subset of the invocations. In order to reclaim that efficiency, we would like to re-converge and continue executing with the original active mask.

This isn’t just a performance concern, though. There are various operations that communicate or synchronize between the invocations in a warp: finite-difference derivatives in fragment shaders, synchronization barriers in compute languages, and horizontal SIMD reduction operations (e.g., CUDA’s warp-granularity “voting” operations). These operations typically require control-flow reconvergence for their semantics.

In practice, the “re-convergence point” for any potentially-divergent branch must be selected at compile time by the driver’s low-level compiler. This ensures that when the micro-architecture encounters a divergent branch at runtime, it already knows where it can expect to re-converge, and can arrange any internal data structures accordingly.

Re-converge at the end of an `if` or loop

We as programmers tend to assume that after a divergent branch, the invocations re-converge as soon as possible: typically at the end of the associated high-level control-flow construct. Note, though, that current GPU programming systems do not give this as a guarantee, no matter how intuitive it might seem.

The OpenGL Shading Language spec defines a notion of uniform and non-uniform control flow, and specifies that uniform control flow resumes at the end of a high-level control-flow construct. This spec language is meant to capture the essence of divergence and re-convergence, but it should be noted that it is only used to define when certain operations (e.g., partial-difference derivatives in a fragment shader) are valid. No stronger guarantees are made.

When using a language with unstructured control flow (goto in OpenCL or CUDA; PTX), high-level control-flow structures may not be discernible. In these cases, re-convergence rules must be defined on an arbitrary control-flow graph (CFG).

Re-converge at the immediate post-dominator

A simple approach that can be applied to an arbitrary CFG (and is thus applicable to assembly-level interfaces, or in the presence of goto) is to identify the immediate post-dominator of the basic block that has the branch.

Aside: For those unfamiliar with the term, it is simplest to think of the immediate post-dominator as the earliest point in the code that control flow must eventually pass through. For a precise definition, please consult any compiler textbook, or Wikipedia.

On paper this sounds like a good approach, but fails to match our intuitive expectations for several cases of structured control flow:


while( X )
{
    if( A )
    {
        if( Y )
            break;
    }
    B;
}

In this case, we expect any invocations that diverged at the branch on A, but did not subsequently break out of the loop, to re-converge before executing B. However, this isn’t guaranteed by immediate-post-dominator reconvergence, since the immediate post dominator of the A branch is at the end of the while loop, and not at the end of the if.

The same basic problem arises for switch statements with fall-through:


switch( A )
{
case 0:
    X;
    // fall through
case 1:
    B;
    break;
default:
    break;
}

Here again, an immediate post-dominator rule tells us to re-converge after the whole switch, and does not tell us to re-converge the case 0 invocations with the case 1 invocations before executing B.

As a reminder, re-convergence isn’t just about performance. If B in these examples included a Compute Shader barrier operation, then failure to re-converge at the right place could cause a deadlock.

Re-converge at “thread frontiers”

This paper describes a refined approach to re-convergence that addresses some of the shortcomings of immediate post-dominator reconvergence. They define “thread frontiers” that capture the intuitive notion of re-converging as soon as possible after a branch. Achieving thread-frontier reconvergence can require adding new conservative control flow edges (forcing the warp to visit basic blocks it would otherwise skip). The paper also describes new hardware requirements to implement this scheme efficiently.

Maximal convergence

The ISPC system, which implements a GPU-style programming model on SIMD CPU cores, defines a notion of “maximal convergence.” Intuitively, this can be though of as re-convergence at each and every sequence point as defined by the C language. If operation X happens before Y from the point of view of one invocation, then X for all invocations appears to happen before Y for any invocation in the warp.

This very strict model of convergence is often what people think of when they talk about “warp-synchronous programming,” or about invocations in a warp executing in “lock-step.” This kind of semantic guarantee can be beneficial to users (it would allow us to write a working per-pixel mutex), but it comes at the cost of flexibility for compiler optimizations. For example, if an implementation wants to emulate 16-wide warps on 8 wide SIMD hardware by doubling up math instructions, it is very restricted in how it can schedule those independent 8-wide operations.

Other than ISPC, I am not aware of any GPU-like programming models that give this strong guarantee. CUDA documentation used to give general advice for programmers attempting to use warp-synchronous approaches, but more recent documentation seems to advise against the practice.

When Are Deferred Paths Followed?

When control flow for a warp arrives at a re-convergence point, it needs to run any invocations for the “other” path(s) that had been deferred before it continues execution past the re-convergence point. The simplest way this might be achieved is by popping those deferred invocations off of the “divergence stack” (or alternative hardware mechanism).

This is one case where it seems like there really is an obvious Right Thing to do, and I’m not aware of a lot of design alternatives. Of course, if you don’t know which path is followed first, or where re-convergence points will be placed, there still isn’t much you can do.

Conclusion

If you find yourself trying to write “clever” GPU code that depends on order of execution or memory operations between parallel invocations within a single warp, please be aware that you are stepping into a minefield of implementation-specific behavior. For those of you targeting fixed hardware platforms, you might be able to find out what your GPU is doing under the hood and exploit this knowledge.

For those of you working on more open platforms, I would invite you to carefully read through the available API and language documentation. As a general rule, if the API/language spec doesn’t explicitly guarantee the behavior you are hoping for, you can bet that some hardware out there violates your assumptions.

I hope I’ve given you a sense of just how deep this semantic rabbit-hole goes.

Are Explicit Location Bindings a Good Idea for a Shading Language?

Posted on 25 February, 2013 by tangentvector

Probably Not.

Introduction

Both the HLSL and GLSL shading languages support mechanisms for assigning explicit, programmer-selected locations to certain shader parameters. In HLSL, this is done with the regsiter keyword:

// texture/buffer/resource:
Texture2D someTexture          : register(t0); 

// sampler state:
SamplerState linearSampler     : register(s0);

// unordered access view (UAV):
RWStructuredBuffer<T> someData : register(u0);

// constant buffer:
cbuffer PerFrame               : register(b0)    
{
    // offset for values in buffer:
    float4x4 view              : packoffset(c0);
    float4x4 proj              : packoffset(c4);
    // ...
}

When setting shader parameters through the Direct3D API, these explicit locations tell us where to bind data for each parameter. For example, since the cbuffer PerFrame is bound to register b0, we will associate data with it by binding an ID3D11Buffer* to constant buffer slot zero (with, say, PSSetConstantBuffers).

The OpenGL Shading Language did not initially support such mechanisms, but subsequent API revisions have added more and more uses of the layout keyword:

// texture/buffer/resource
layout(binding = 0) sampler2D someTexture;

// shader storage buffer (SSB)
layout(binding = 0) buffer T someData[];

// uniform buffer:
layout(binding = 0) PerFrame
{
    mat4 view;
    mat4 proj;
    // ...
}

// input and output attributes:
layout(location = 2) in  vec3 normal;
layout(location = 0) out vec4 color;

// default block uniforms (not backed by buffer)
layout(location = 0) uniform mat4 model;
layout(location = 1) uniform mat4 modelInvTranspose;

It is clear that location binding was an afterthought in the design of both languages; the syntax is ugly and obtrusive. Using explicit locations can also be error-prone, since it becomes the programmer’s responsibility to avoid conflicts, etc., and ensure a match between application and shader code. The shading languages “want” you to write your code without any explicit locations.

If you talk to game graphics programmers, though, you will find that they use explicit locations almost exclusively. If you try to give them a shading language without this feature (as GLSL did), they will keep demanding that you add it until you relent (as GLSL did).

Why Do We Need These Things?

If the programmer does not assign explicit locations, then it is up to the shader compiler to do so. Unfortunately, there is no particular scheme that the compiler is required to implement, and in particular:

The locations assigned to parameters might not reflect their declared order.
A parameter might not be assigned a location at all (if it is statically unreferenced in the shader code).
Two different GL implementations might (indeed, will) assign locations differently.
A single implementation might assign locations differently for two shaders that share parameters in common.

When an application relies on the shader compiler to assign locations, it must then query the resulting assignment through a “reflection” interface before it can go on to bind shader parameters. What used to be a call like PSSetConstantBuffers(0, ...) must now be somethin glike PSSetConstantBuffers(queriedLocations[0], ...). In the case of Direct3D, these locations can be queried once a shader is compiled to bytecode, after which the relevant meta-data can be stripped, and the overhead of reflection can be avoided at runtime; this is not an option in OpenGL.

Even statically querying the compiler-assigned locations does not help us with the issue that two different shaders with identical (or near-identical) parameter lists may end up with completely different location assignments. This makes it impossible to bind “long-lived” parameters (e.g., per-frame or camera-related uniforms) once and re-use that state across many draw calls. Every time we change shaders, we would need to re-bind everything since the locations might have changed. In the context of OpenGL, this issue means that linkage between separately-compiled vertex and fragment requires an exact signature match (no attributes dropped), unless explicit locations are used.

As it stands today, you can have clean shader code at the cost of messy application logic (and the loss of some useful mix-and-match functionality), or you can have clean application logic at the cost of uglier shader code.

A Brief Digression

On the face of it, the whole situation is a bit silly. When I declare a function in C, I don’t have to specify explicit “locations” for the parameters lest the compiler reorder them behind my back (and eliminate those I’m not using):

int SomeFunction(
    layout(location = 0) float x,
    layout(location = 1) float A[] );

When I declare a struct, I don’t have to declare the byte offset for each field (or, again, worry about unused fields being optimized away):

struct T {
    layout(offset = 0) int32_t x;
    layout(offset = 4) float y;
};

In practice, most C compilers provide fairly strong guarantees about struct layout, and conform to a platform ABI which guarantees the calling convention for functions, even across binaries generated with different compilers. A high level of interoperability can be achieved, all without the onerous busy-work of assigning locations/offsets manually.

Guarantees, and the Lack Thereof

Why don’t shader compilers provide similar guarantees? For example, why not just assign locations to shader parameters in a well-defined manner based on lexical order: the first texture gets location #0, the next gets #1, and so on? After all, what makes the parameters of a shader any different from the parameters of a C function?

(In fact, the Direct3D system already follows just such an approach for the matching of input and output attributes across stage boundaries. The attributes declared in a shader entry point are assigned locations in the input/output signature in a well-defined fashion based on order of declaration, and unused attributes aren’t skipped.)

Historically, the rationale for not providing guarantees about layout assignment was so that shader compilers could optimize away unreferenced parameters. By assigning locations only to those textures or constants that are actually used, it might be possible to compile shaders that would otherwise fail due to resource limits. In the case of GLSL, different implementations might perform different optimizations, and thus some might do a better job of eliminating parameters than others; the final number of parameters is thus implementation-specific.

This historical rationale breaks down for two reasons. First is the simple fact that on modern graphics hardware, the limits are much harder to reach. The Direct3D 10/11 limits of 128 resources, 16 samplers, and 15 constant buffers is more than enough for most shaders (the limit of only 8 UAVs is a bit more restrictive). Second, and more important, is that if a programmer really cares about staying within certain resource bounds, they will carefully declare only the parameters they intend to use rather than count on implementation-specific optimizations in driver compilers to get them under the limits (at which point they could just as easily use explicit locations).

One wrinkle is that common practice in HLSL is to define several shaders in the same file, and to declare uniform and resource parameters at the global scope. This practice increases the apparent benefit of optimizing away unreferenced parameters. The underlying problem, though, is that the language design forces programmers to use global variables for what are, logically, function parameters. Trying to “fix” this design decision by optimizing away unused parameters is treating the symptoms rather than the disease.

As far as I can tell, there is no particularly compelling reason why a modern shading language should not just assign locations to parameters in a straightforward and deterministic manner. We need an ABI and calling convention for interface between the application and shader code, not a black box.

So Then What About Explicit Locations?

If our compiler assigned locations in a deterministic fashion, would there still be a need for explicit location bindings? Deterministic assignment would serve the same purpose in eliminating the need for a “reflection” API to query parameter bindings (though one could, of course, still be provided).

The remaining benefit of explicit locations is that they allow for us to make the parameter signatures of different shaders “match,” as described above. In the simplest case, a deterministic assignment strategy can ensure that if two shaders share some initial subsequence of parameters in common, then they assign matching locations to those parameters. In the more complex cases (where we want a “gap” in the parameter matching), it seems like the answer is right there in C already: unions allow us to create heterogeneous layouts just fine.

All we need to do, then, is provide a shading language the same kinds of tools that C has for describing layouts (structs and unions), and we should be able to rid ourselves of all these fiddly layout and register declarations.

So What Now?

In the long run, this whole issue is probably moot, as “bindless” resource mechanisms start to supplant the very idea of binding locations as a concept in shading languages and graphics APIs.

In the short term, though, I hope that we can get support for deterministic and predictable layout assignment in near-future versions of HLSL and GLSL. This would allow us to write cleaner and simpler shader declarations, without having to compromise the simplicity of our C/C++ application logic.

Spark is on GitHub

Posted on 1 November, 2011 by tangentvector

The source code for the Spark shading language is now hosted on GitHub, under a commercial-friendly open-source license:

github.com/spark-shading-language/spark

There is also a binary/source package available for download, and for those who just want more details on the Spark language and Direct3D 11 interface, you can look at the User’s Guide. If you missed the SIGGRAPH paper, though, it is probably the best starting point.

If you’d like to give Spark a try – either as part of a research project, or to get ideas for your next in-house shader system – contact me. I need to gather feedback on Spark from users, as part of my continuing research at Stanford, and I’d be happy to provide support to anybody who uses Spark.

Even if you just want to talk about how to integrate some of the more down-to-earth ideas from Spark into your in-house shader system, I’m game. I’ve already gotten feedback from one game developer, who rearchitected their shader-stiching system based on ideas from Spark, and was very happy with the result.

Emulating Classic Games in HD – Part 2

Posted on 29 October, 2011 by tangentvector

Last time, we learned the basics of the Game Boy hardware, and tried a simple strategy for rendering games with high-res graphics. Sometimes this simple approach worked well:

but other times it didn’t work at all:

In this part we’ll look at the timing of the Game Boy display hardware, and see how this interferes with our simple scheme.

Horizontal Blank

Computing the appropriate color for a pixel, given the background and sprites, takes time. As such, the graphics hardware does not compute colors for the entire screen simultaneously, but instead render pixels one at a time via raster scanning.

If we imagine how an old-fashioned CRT display scans out an image, there is an “electron gun” that scans each row of pixels from left to right, and proceeds down the screen through the rows. While the GameBoy uses an LCD screen rather than a CRT, it employs a similar notion of how the screen is scanned. Pixels are scanned from left to right in each horizontal line, and lines are scanned from the top down.

After the each line has been scanned, it takes time for the “electron gun” to move back to the start of the next line; this time is called the horizontal blank. At the end of the last line, it takes an even longer time to scan back to the start of the first line; this time is called the vertical blank.

During the two blanking periods, no rendering takes place. Because the rendering hardware is idle during this time, it is possible for software to safely change the values of hardware registers and affect how the next line will be drawn. For example, in Super Mario Land, the level scrolls from right to left as the player progresses, but the status/score display does not. In order to achieve this effect, the game changes the horizontal background-scroll register at the end of line 15.

A Less Simple Renderer

Now it should be clear why our simple rendering approach has problems: it only looks at the values of the hardware registers at the end of a frame (vertical blank), but those registers might have changed on each and every scan line (horizontal blank). Once we realize this, though, it is really quite simple to work around

First, we need to “snapshot” the state of the graphics registers at the end of each scan line. A clever implementation might only take a new snapshot when one or more registers actually change, or might store compact deltas, but my implementation takes a pretty naive snapshot at the end of each line.

Next, instead of rendering a textured quad for every tile, we can simply render a one-scan-line tall textured quad for every tile for every scan-line. On each line we render the tile using the state snapshot for that line.

This approach is somewhat wasteful of CPU and GPU time, but it rectifies the problems we saw before:

Try It Out!

I’ve gone ahead and thrown the code (such as it is) up on GitHub, just to prove I have nothing up my sleeves. You can also download a pre-built binary of the emulator for Mac (I haven’t had time to work on a Windows build yet, sorry).

Keep in mind that this is only a proof of concept; the controls are hard-wired, there is no support for save states or save slots, and it only supports a handful of Game Boy games.

Next Steps

My goal here was to show that this kind of graphics replacement was possible in theory, and I hope I’ve achieved that. I don’t have the kind of time it would take to turn this project into a real, usable emulator, so if you want to fork the code and run with it, feel free.

Scaling this approach up to a more powerful console would bring some interesting challenges, so I’ll throw out a few notes for anybody who might be interested in trying. Readers should feel free to bail out now if none of this makes sense:

I dealt with palettes in the easiest possible fashion: I use a 4-channel image to store weights for each of the colors in a 4-color palette (effectively treating the palette as a basis). For 16-color palettes this would involve 16-channel images, which is still pretty tractable. Beyond that, you’ll probably need a new technique.
Because the GameBoy is pure grayscale, it was easy to “project” arbitrary grayscale images into this basis by expressing them as a weighted sum of colors from the palette. For more complex palettes this runs the risk of being over- or under-constrained, so you’d likely want a tool to directly edit replacement images in a paletted fashion.
More accurate emulation is often required for more interesting consoles. In order to handle mid-scan-line effects on things like the NES, you might need to draw the replacement graphics on a per-pixel basis, rather than the per-line approach I use. In that case, it would help to detect when the graphics registers change, and only do the more expensive rendering when/where required.
I’ve ignored relative sprite priorities in my emulator, but I know that subtle priority details often have big effects (e.g., the clipping for item blocks in Mario 3). The key to getting priority right while still using replacement graphics is to cast the pixel-compositing operations usually applied into something like the Porter-Duff algebra, which will naturally scale to images and alpha transparency.
Many SNES games use HDMA to make rounded shapes (e.g., the Power Bomb in Super Metroid) or wavy distortion effects (e.g., Phantoon). If these kind of scanline effects get scaled up, they will become blocky. Ideally, an emulator should be able to interpolate between the state of registers at different scanlines to smooth out these shapes. The challenge then is figuring out when not to interpolate.

Emulating Classic 2D Games in HD – Part 1

Posted on 20 October, 2011 by tangentvector

Many emulators for 3D consoles support rendering games at higher resolutions than supported by the original hardware. Some emulators for older 3D consoles even support replacing the texture maps used in games with higher-resolution versions. Emulators for 2D consoles, such as the venerable Super NES, have no such functionality. Frames are rendered at the original native resolution, and then may be upsampled to higher resolutions using a handful of simple filters.

Why can’t we play classic 2D games in HD? Couldn’t we take the same concept of replacing low-resolution texture maps and apply it to inserting high-resolution sprites or background tiles into 2D games?

I’ve seen comments from emulator creators that this just isn’t possible, but I hadn’t really seen a solid technical explanation. So I decided to find out the answer for myself. Just as a little teaser, though, I’ll say that I was pleasantly surprised by what I could achieve (my own poorly-drawn programmer art aside):

Super Mario Land, now at 4 times the resolution!

The Game Boy

In order to understand both the opportunities and challenges, we need to cover some background on sprite-based 2D graphics hardware. To keep things simple, both for my implementation and for this article, I’ve chosen to focus on the original Nintendo Game Boy.

The Game Boy has a 160×144 pixel LCD screen, and uses 4 grayscale colors (2-bit color). Rather than store the image to be displayed in a frame buffer, the hardware renders the image by compositing three kinds of elements: the background, sprites, and window. For simplicity, I’ll only talk about the first two.

Background Tile Map

The background is a 32×32 grid of tiles, where each tile is an 8×8 pixel image. Thus the background defines a conceptual 256×256 pixel image.. We can represent the background as a 32×32 array of tile indices (the tile map) along with an array of the corresponding tile images.

uint8_t tileMap[32][32];
TileImage tileImages[256];

Since the background is larger than the screen, two 8-bit hardware registers are used to scroll the background and select what part will be visible. As we scroll past the edge of the background, it repeats. A game can replace tile indices that are out of view to implement a continuously-scrolling map.

Sprites

The sprites are defined by a structure called the Object Attribute Map (OAM). The Game Boy supports 40 sprites, and each sprite has an X and Y position, flags for horizontal and vertical flip, and the index of the tile image to use:

struct Sprite {
    uint8_t x;
    uint8_t y;
    bool flipX;
    bool flipY;
    uint8_t tileIndex;
};
Sprite oam[40];

Tiles

Both the background and sprite use tiles to define their image data. Each tile is a 2-bit-per-pixel paletted image (yes, even though the Game Boy only supports 2-bit color, it uses paletted images). An 8×8 pixel tile takes up 8x8x2=128 bits (16 bytes). There are distinct color palettes for sprites and the background:

Color bgPalette[4];
Color spritePalette[4];

In the sprite palette, color index zero is special: it indicates a transparent pixel.

Take-Away

There are, of course, a lot more details to the Game Boy rendering hardware, and other 2D platforms will of course have different designs. Still, these basic concepts – tile maps, sprites, and paletted tile images – are shared by many different 2D consoles.

Tile Replacement

In order to render a Game Boy game at higher resolution, we must find a way to map the existing 8×8 tiles to high-res replacements. A simple approach would be to hash the contents of the tile image, and use the resulting hash as a key for a table of replacements. In the case of the Game Boy, tile images are only 16 bytes, so we can just use the tile data itself as our hash key. Of course we also need the replacement images; for now I’ll be using my own programmer art:

On the left we have an original sprite from Super Mario Land, and on the right we have my replacement. The use of paletted images complicates this process somewhat, but let’s ignore those details for right now.

A Simple Scheme

At first blush it seems like we could accomplish our goal with a relatively simple renderer. Whenever a frame needs to be drawn we can:

Iterate over the background tile map, and draw one textured quad per background tile.
Iterate over the sprites, and draw one textured quad per sprite.

We need a cache that maps tile images to texture maps, but that can easily be folded into our tile-replacement step.

Let’s try this scheme out on a few games (note, I’m only replacing a subset of tiles right now – even programmer art takes time to create):

Link's Awakening

Those results seem good (my terrible art skills aside). Suspiciously good, in fact, if I’m claiming that this problem merits a whole article. And yes, if we look a little harder we can find examples where the results of our simple renderer are not as satisfying:

Super Mario Land again, not looking as good

Castlevania II looks like a complete mess

What is going on in these images that makes them fail? In Part 2 of this series, we will discuss the timing of sprite hardware, and see how to resolve these graphical issues.

Metroid: Other M

Posted on 17 May, 2011 by tangentvector

When I find that my feelings towards a game are ambiguous, I like to look at its design decisions (whether gameplay, technology, art direction, or narrative) and categorize them into “what worked” and “what didn’t work.” In the case of Other M, though, I found that my “what worked” pile was pretty thin.

Other M is a competently made game (though not, obviously, without bugs). As with almost any Nintendo-published game, it has that layer of polish and an overall cohesiveness that make it stand out from the pack. But these are not enough to make a game good – competence and cohesiveness should be the least we demand from our games.

The best thing I can say about Other M is that it is very aware that it is a Metroid game. It’s overall structure (both in narrative and world design) is an obvious homage to Metroid Fusion. Samus’ powers are precisely those she acquired in Super Metroid. The morph-ball sequences show a clear influence from the Prime games. The best enemy designs are those lifted from prior games, and the best boss fights are effectively cameos. Other M even acknowledges some of the more silly aspects of the series mythology (e.g., the “evolved” forms of Metroids) which had been quietly ignored by the Prime games (over the course of which series the story has increasingly been sanitized into generic sci-fi pablum).

Put that kind of pastiche before a dedicated Metroid fan, and they might play through the whole thing on auto-pilot. Samus is a fun character to control; her full suite of abilities allow most situations to be tackled in a variety of ways, and so mastering her moveset lends a sense of fluency to interactions that most characters lack.

Once Other M steps out of these narrow confines of emulation and homage, though, things get rockier. The modifications and extensions to the Metroid formula all serve to compromise the fluency of the interface between the player and Samus. Perhaps this was intentional – calling attention to and disrupting the interface between the player and Samus as avatar to serve the narrative goals of portraying Samus as Other.

I could almost believe that if the storytelling on display wasn’t so painfully clumsy and ham-fisted at every term.

I’m definitely in the crowd that doesn’t feel that the Samus we see in Other M is the same woman we saw in the other games. It isn’t that the other games were laden with characterization, but what was there (even in the story-heavy Fusion) is almost entirely contradictory with Other M.

My wife, for her part, sees the portrayal of Samus in Other M as part of an overall downward trend in Nintendo’s handling of the character, that started with the introduction of the “Zero Suit.” Samus has always stood out as gaming’s first, and for a long time, best, female role-model character; now she is portrayed in-canon as walking around in a skin-tight leotard and desperately seeking male approval. One can’t really expect female gamers to respond any better to this than toward Aya Brea’s revamped in-game portrayal.

(I’ll leave aside the issue of the Justin Bailey code and the uncomfortable strip-tease that has become a hallmark of Metroid endings. Maybe objectification of Samus has been with us from the start, but it was easier to overlook when it was restricted to low-fi graphics in an “easter egg” or special ending.)

I’ve played enough games with embarrassing or poorly translated stories in my time that I might have been able to give Other M a pass if the gameplay had held up. Unofortunately, once you get out of the “auto-pilot” situation I describe above and look at things carefully, it doesn’t hold up.

Moment to moment, playing Other M gives you the sense that a really great 2.5D Metroidvania could be built on this engine. The same could be said of Chair’s Shadow Complex. The problem for both games is that the world map and player progression just don’t live up to the promise of the moment-to-moment gameplay.

Like Metroid Fusion, Other M actively thwarts attempts by the player to explore out of the prescribed sequence. Arbitrary locked doors that only open on storyline triggers (the so-called “broken bridge” trope) are a distinctly un-Metroid concept. While Fusion also succumbed to this problem, it also used the design trick of letting the player think they were going “off the rails” even when they were following the designated path (an approach used to great effect in Portal). In contrast, Other M never really tries to give the player this sense of subversion (with one late-game exception: using the power bomb to break out of a stalled elevator), and as a result the player is left with the impression that the designers are leading them on a short leash from one expository cut-scene to another.

Other M has innumerable other gameplay foibles that further limit the potential for enjoyment:

The first-person view is tedious and clumsy to use in battle (leading to many cheap hits from enemies while waiting for the Wii remote sensor to register a perspective change), and leads to the most frustrating and horribly un-Metroid gameplay bottlenecks with the mandatory pixel-hunting sequences. The battle sequences that played out in first-person were a frustrating exercise as the player had no way to avoid damage but to expertly use the clumsy controls. That these unnecessary and unusable first-person elements were used to advertise the game in a world where Metroid Prime had already showed just how well a first-person Metroid could play was unbelievable.

The fact that most of the hidden items can only be acquired in the late- or post-game as part of a fetch-quest “cleanup” pass is irritating for fans of the classic Metroid formula (although Zero Mission commits this same sin). Putting the last few power-ups behind power-bomb-able panels that trigger pointless button-mashing QTE-laden fights was also a kick in the teeth.

For that matter, all of Other M‘s combat “innovations” were pretty lackluster. The idea of giving Samus more melee options might have sounded good on paper, but the implementation basically boiled down to QTE finishers plus a “mash me to not die” button. An even more egregious sin was that the game would sometimes expect QTE input during what looked to the player like a cutscene (without any on-screen prompt), so that a player would never know which threats Samus faced in a cut-scene needed to be dodged, and which were just scripted choreography.

This final point leads to the most important realization that led me to conclude that Other M is not a good game (whether interpreted as a Metroid game or not): Other M sets up a contract with the player and then violates its own rules. I’ll get to how in a second, but let’s talk about the “authorization” system:

The presentation of authorization is patently ridiculous (which negates its purported benefit of “finally” explaining how Samus loses all her powers between games). It also breaks the important link between exploration and discovery that is the core of the Metroidvania genre. Other M doesn’t reward exploration with new abilities for Samus – at best it awards her with a few more missiles, a longer health bar, or shorter charge times – and so exploration is no longer the driving force behind the player’s progression. Instead, abilities are tied to story progression, and so the player’s first priority is (nominally) to advance the story. Unfortunately, the way that new powers are almost always given out right after they would be useful leads players to feel resentful when a new ability is authorized, rather than excited to try out the new tool.

This is the opposite of how a traditional Metroidvania works. Usually you’d notice ledges you can’t reach or walls you can’t break through, and upon getting the item that would render those impediments surmountable you’d be excited to go back and see where they lead. In Other M you are just annoyed at Adam; his stinginess has forced you to backtrack again.

For all its faults, though, the authorization system sets up one important and immutable rule – a contract with the player: a new ability can only be used when the requisite “authorization” cut-scene has played, hilighting the new item in Samus’ inventory. As a player you can count on the fact that any puzzle and any battle that you are expected to tackle along the main story-line can be bested with just those abilities that have been explicitly enabled.

That is, until the very end of the game, where you are suddenly required to make use of Power Bombs, without any prompting, within a limited timeframe, or else Samus will die. Before this point you have probably already seen Samus die rather than use her Varia suit, so why are players expected to know that Samus should now use her Power Bombs without permission rather than die a pointlessly noble death? Once players do figure out this ridiculous “press X to not die” QTE they are rewarded with a post-facto authorization cut scene, but the damage has already been done: the game has violated its contract with the player.

And for what?

This scene was probably meant to be a cathartic one for the player: the culmination of the marriage of story and mechanics that the “authorization” system had represented. Unfortunately, the game designers crafted an ending that relied on players feeling a particular way: they wanted players to think that they had nothing to lose and could thus finally cut loose and use the most destructive weapon in their arsenal without concern – Samus finally acting like the independent and resourceful bounty hunter we had always thought her to be.

Speaking for myself, though, this wasn’t how the game made me feel. I had spent hours of time with the game, in which Other M struck down my every attempt to break out of the strictures of its narrow world design, and had listened to its plodding narrative tell me that Samus was incapable or unwilling to act on her own. Other M had left me feeling powerless and without direction – it had turned me (and Samus) into the kind of person who would sit patiently in the stomach of a monster, waiting for death, not even thinking to use the one weapon in my arsenal that might have saved my life, because the arbitrary fiat of my commanding officer (or rather, of the game designer, finally revealed once the mask of Adam had been cast aside) had not explicitly sanctioned it.

So there’s another “positive” thing to say about Other M: it would make a good tool for inculcation of recruits into a regimented military organization or totalitarian regime in which independent thought must be supressed, even to the point of death.

It’s called “Stereo,” not “3D”

Posted on 28 September, 2010 by tangentvector

When speaking or writing about the entertainment industry’s current gimmick du jour, please refer to it correctly as “stereoscopic 3D” or “stereo 3D.” If there is no chance of confusion with stereo audio (and honestly, how often do we talk about stereo audio any more?) you may simply use “stereo.”

What you may not, however, call it is “3D.”

The term “3D” can, obviously, refer to anything three-dimensional. In the case of films, TV, and video games, it has usually refered to “3D graphics”; the rendering of three-dimensional objects and scenes as viewed through a virtual camera. We’ve had numerous successful 3D animated films (with Toy Story being the first feature-length example), and almost every big-budget live-action movie now uses 3D graphics for special effects. Since the advent of the PSX/N64 generation, a majority of video games have used 3D graphics.

Thus, when I hear otherwise-intelligent people asking whether we will soon be seeing “3D video games,” I cringe. We already have 3D video games – or are games like Half-Life and Mario Galaxy now “2D” simply by virtue of the movie industry’s latest marketing campaign?

Those of us who understand what words mean have a responsibility to use them correctly. This is the only way we can hope to stem the tide of those who use words they don’t understand, incorrectly.

Special Addendum for Pedants

Some readers might want to split hairs and point out that the “3D” graphics I refer to above are always projected onto a 2D screen. It would seem, then, that stereo graphics is “more 3D” by virtue of adding a sense of depth to an image that already has height and width.

This argument is entertaining (as most pedantic arguments are), but ultimately misses the point … or rather, two points:

The “3D” in 3D graphics doesn’t refer to the final projected image, but to the object or scene that is being rendered. The dimensionality of the content and the final projected image don’t need to match. You can have a 2D projection of 3D content, just as you could have a stereo projection of a 2D image.
Projected stereo graphics are not, in the mathematical sense, three dimensional. There are merely a sum of two two-dimensional projections. Adding a depth component to a flat image doesn’t change its dimensionality either. Crumple up a piece of paper and it is still a 2D surface.

In the end, the only method of projection that could reasonably lay claim to the name “3D” would be some sort of omnidirectional holographic projection (and researchers are working on various experimental display technologies of this sort). Still, if such displays ever became commonplace, we’d be better off calling them “holographic.”

So there you have it, stereo “3D” apologists (if there are any); your nomenclature doesn’t have a leg to stand on – not even a pedantic and wobbly one.

Final Fantasy XIII

Posted on 29 May, 2010 by tangentvector

It should no longer be surprising when a new Final Fantasy game leads to violent disagreement among fans of the series. The characteristic marriage of over-the-top production values and highly experimental game design remains a winning formula in spite of – or perhaps because of – the controversy surrounding each new release.

I still haven’t made up my mind about FF13; both the supporters and detractors make valid points. There are some things the game does brilliantly, and other areas where it just falls flat. Given this situation, I thought it would be appropriate to look at some of the design choices made in this game, talking about which of them worked, and which of them didn’t.

What Worked

Automatic Recovery after Battles

Halo brought automatic health recovery to the FPS genre, and now FF13 might bring it to RPGs.

I have maintained (even before FF13) that this is just the Right Way to build an RPG in the modern era. If you look at the trend over the entire Final Fantasy series (which is in many ways representative of the entire JRPG genre), you would see that post-battle recovery has become easier with each installment. FF13 simply jumps to the end-point of this trajectory.

Some hardened RPG vets might complain that automatic recovery robs the game of its challenge. Such a complaint, though, completely misses the point. In older RPGs, surviving to the end of a massive dungeon was always a matter of attrition. Would the little nuisance encounters along the way drain your health, MP and stock of items enough to make the boss fight unwinnable? FF13 does abandon this kind of attrition challenge, but in return it adds a new challenge:

More Difficult Battles

Even in the early/mid game, the no-longer-random battles between boss fights can pose a real challenge in FF13. Enemies hit for a lot more and soak up a lot more damage than in any recent Final Fantasy. Coming into battle with the wrong strategy, or holding to a damage-dealing Paradigm just a bit too long, can quickly bring up the Game Over screen.

The battles are never overly cheap, though, as the more challenging random battles in the earliest JRPGs tended to be. You are unlikely to fall victim to a one-hit KO in the first round of battle, and almost every battle in the story progression can be won on a second or third try by bringing a better strategy to the table. And speaking of second tries…

No Penalty for Failure

Losing a battle in FF13 simply resets your party to a point right before that battle. At most you have to contend with a (skippable) cutscene before you can dive right back into the encounter with your new strategy.

Once again, the old-fashioned RPG vanguard might balk, but FF13 is simply subscribing to a more modern design ethos. Today’s players (if they are anything like me) have less free time, and thus have far less patience for being forced to replay an hour-long slog through a dungeon because of a single mistake made in a boss encounter.

Boss encounters in FF13 can be truly intimidating. Failure is never far away (see More Difficult Battles above), and victory usually depends on an understanding of the new battle mechanics that FF13 brings to the table. This brings us to:

Chaining, Stagger and Launch

The Chain Gauge is the key to victory in FF13’s hardest encounters, and adds a welcome layer of complexity to the battle system.

Every enemy has a Chain Gauge, initially empty. Attacks from a Ravager (aka Black Mage) fill the gauge, but it will quickly drain. Attacks from a Commando (aka Fighter) do little to fill the gauge, but slow its rate of decay. Once the gauge is filled to the top the enemy temporarily enters a Stagger state in which it is more susceptible to attacks of all kinds. Some enemies, when staggered, can be Launched into the air and juggled by successive attacks, preventing them from acting.

The stagger system is a great addition, and really forms the meat of FF13’s combat. Many enemies are effectively impervious to attack until staggered, while others can have their devastating offensive abilities completely shut down by a timely Launch. Mages and fighters feel balanced, without being interchangeable, because both play essential roles in the stagger mechanic.

Roles and Character Differentiation

A big problem looming over the 3D Final Fantasy games has been how to allow players to personalize their character development while still giving each character unique strengths and weaknesses. Final Fantasy VII and VIII occupy one extreme, where characters are nothing more than their base stats and a place to junction materia or spells. Final Fantasy IX stands at the other extreme, with each character belonging to a single designer-selected class (thief, knight, black mage, etc.) and only able to learn abilities appropriate to that class.

On this spectrum, FF13 occupies a nice middle ground. Every character is allowed to use any class, but has inherent aptitudes that make them most effective in certain roles. These aptitudes aren’t just a matter of their base stats, but are also reflected in the relative cost to upgrade them in each class, as well as the order in which they receive each class’s abilities (if at all). Because of these aptitudes, and because a character can only use one class at a time in battle, the game encourages players to specialize each character in a role that suits them.

What Didn’t Work

The Story

Final Fantasy XIII has an interesting setting (the dual worlds of Cocoon and Pulse) and a somewhat compelling scenario; ordinary people pressed into the service of warring deities, forced to fulfill their proscribed task or suffer a terrible curse. What it lacks, however, is any kind of story arc. After the initial early-game gauntlet is cleared, the protagonists blunder forward without any clear purpose. They don’t even know what task they are supposed to perform, let alone whether they intend to defy it. Once the antagonist steps forward, things only get worse. Even at the end of the game, it is unclear precisely how the protagonists’ ambitions differ from the antagonist.

All of this might not be so bad if the dialogue didn’t consist entirely of the various characters taking turns suffering crises of faith; questioning the morality/feasibility of their nebulously-defined mission. These crises are inevitably resolved with meaningless anime-inspired platitudes (e.g. “We can do it – we just need to believe in each other!”), and do nothing to further the plot or characterization.

The only visible purpose these ridiculous dialogues serve is to set up the reworked summons:

The Summons

There really isn’t anything about FF13’s “eidolons” that was executed well. First, they are clumsily shoehorned into the narrative; despite characters talking about how important they are, they could be completely excised from the story without any impact. The only time the eidolons appear outside of the scenes that introduce them is in a few style-over-substance cutscenes.

The (non-optional) battles to acquire each eidolon are tedious, and come at the end of some of the most insipid dialogue scenes in the game (see “The Story” above). The idea of a “puzzle battle” probably sounded good on paper, but in practice is just boring. The conditions for victory are so obtuse that the only way to find them is with a guide or the “Libra” technique. At that point, however, you are just applying a set formula, and your intelligence isn’t really being tested.

Your reward for acquiring each eidolon is underwhelming. Each can only be summoned by one character, and then only when that character is the party leader. The summons are barely powerful enough to take down ordinary enemy mobs, but they do have the side-effect of recovering the health and status of the party. As a result, the only reason to use a summon in practice is for its ancillary healing effect in major encounters.

The final nail in the coffin for FF13’s eidolons, though, is the patently ridiculous “Transformers” thing they have going. From a bright red race car to a disturbingly misogynistic motorcycle, these alternate forms for the summons look comically out of place in the game’s setting, and completely shatter suspension of disbelief.

Battle System Design

This might seem confusing after I talked up the whole Chain/Stagger/Launch thing earlier. While FF13 did make some interesting strategic additions to the battle system, it also took several missteps.

FF13 encourages players to lean on the “Auto-Battle” feature, letting the AI select which commands to use and on which targets, and reserving player control for “Paradigm Shifts.” Most battles can be won without any manual command entry, and the game’s creators have even commented about how manual command entry is neither practical (because of the speed of battles) nor desirable (since the AI can exploit nuances of the battle system that are not explained to the player).

Why, then, do I have to sit there and repeatedly press the X button to tell the party leader’s AI to do its job? Why, if I decide to input a command manually, can’t I pause the action while I buffer up commands? Why do I get a Game Over if my party leader falls in battle? Why can’t the non-leader characters use items? Why can’t I switch leaders in battle, or at least manually enter commands for non-leader characters? Why can’t I instruct the AI to favor reviving fallen characters over healing only moderately-wounded ones? Why can I only switch character classes in bulk, instead of for each character?

It may sound like I am just picking nits, but these are all things that Final Fantasy XII – its immediate predecessor – got right! FF13 inherits the emphasis on AI-controlled characters in battle, but robs the system of the extra degree of control (and hence of user-friendliness) that was present in FF12.

End-Game Tedium

In the early game, FF13 seemed to breathe new life into stagnant JPRG conventions. Minor battles can be cleared in tens of seconds (or less), and even boss encounters can be dealt with in mere minutes with the right strategy. It is unfortunate that by the end game, things have changed for the worse.

Random enemy mobs in the late game can take several minutes to clear (which makes grinding for experience more tedious than it needs to be), and boss battles can take 20 minutes or more if your team is even slightly under-leveled. Once your party has gained access to the most important abilities in the Synergist and Saboteur roles, most battles fall into a predictable and tired cycle of (de)buffing, attacking, and healing.

By this point, strategic decision making is largely absent from the picture. The player is responsible for switching Paradigms at the right time, but the set of Paradigms used in each enemy encounter rarely changes – only the particular attacks, buffs and debuffs used by the characters vary. But when the AI characters automatically “learn” what elements and status effects work best in each encounter, that leaves the player with very little to do during these 10-minute-plus battles.

Conclusion

In the end, lucky number XIII isn’t going to be remembered as one of the best Final Fantasies. It takes a bold stand on where JRPGs might go in the future – aspiring for accessibility with streamlined mechanics – but it stumbles in just as many places as it succeeds.

Learning to Write

Posted on 18 May, 2010 by tangentvector

The recent slew of changes that Apple has made to the secret iDevice developer agreement has finally pushed me to write up these thoughts, which have been nagging at me for a while now.

In the new world of computing, “devices” are king. And nobody sells a device these days without also having the forethought to build a walled garden around it. Apple is now the poster child for the “app store” model, but the video game industry had already proven the value of controlling both the hardware platform and software distribution.

It would be disingenuous of me to decry the practice outright. I’ve owned a variety of these locked-down devices, and have purchased software through the “approved” online stores. Yes, I have been frustrated by the consequences of restrictive copy-protection – re-buying games that I had already purchased when my XBOX 360 came up with a Red Ring of Death. Yes, I have often pondered “jailbreaking” these devices, and occasionally tried it out on those that I was willing to risk “bricking.”

In all of this, I have never stood up and “voted with my wallet” – passing up on a device because I didn’t approve of its software development and distribution model. Simply put, the value I got out of these devices – the latest Nintendo game, or the ease-of-use of the iPhone – surpassed the cost – the inability to load my own software and fully customize the device. Or at least, that was the only cost I could perceive at the time.

Now that I have a daughter, my perception is a bit different.

My generation has already formed a powerful attachment to our mobile devices – I often joke that given the choice between air and my iPhone I would have to think carefully. My daughter is going to grow up thinking that it is normal to be able to stream live video from across the world while riding in the back seat of a car. We could debate whether this pervasive access to technology will be harmful for the next generation, but this misses the point. The presence or absence of technology is not what is important. What is important is how future generations will relate to the technology of their time.

Remember that written language was once a “technology.” Those of you reading this have grown up in a world immersed in that technology; most of us are within sight of written words every moment of every day. The spread of literacy over millennia has changed human society, human history. We know that mastery of this technology – the ability to both consume and produce – is necessary for success in our society.

Those in my generation who consider themselves “computer literate” can often trace their learning process back to a handful of software systems: “turtle graphics” in Logo, BASIC on the Apple II or DOS, HyperCard on the early Mac. All of these systems allowed anyone with a computer to experiment with programming, and allowed even young children to use the technology to create and not just consume. The Scratch project provides a similar exploratory programming environment for today’s children. My wife teaches a technology course that, among other things, has 7th and 8th grade students build their own computer games using Scratch.

In case you hadn’t noticed, all of those software systems have another feature in common – none of them are allowed in Apple’s App Store. Children who want to use an iPad to create animations or games in Scratch can no longer do so. If they are especially motivated, and have helpful parents, I suppose they could pay to join Apple’s developer program, go out and buy dense technical books to teach them C, Objective C and Apple’s proprietary APIs, and spend months or years creating a project that would have taken mere minutes in Scratch.

That hardly sounds fair, though. It’s like telling kids they can have their Dr. Seuss book after they finish a thesis on Finnegans Wake. There is a reason these simple, intuitive programming environments exist, and the companies selling these devices shouldn’t just ignore the programmers of the future – the authors of tomorrow’s technology – in the name of platform lock-in.

I guess I’ve already nailed the point into the ground, but just to throw in a few closing comments: There are two technologies that shaped me most in my formative years – books, and desktop computers. What really frightens me is that both of these are brilliant ideas that would never be invented in today’s world. Books, which can be read, shared, bought and sold by anyone in any country – that don’t need to be bought separately for at-home and on-the-go use – would never even be considered by today’s media companies. The earliest PCs/Macs, which allowed anybody to buy, sell, install and develop any software they wanted, without diverting revenue back to the hardware manufacturer, would be seen as a squandered opportunity.

These technologies – books and computers – changed the world for the better, and now we are making haste to bury them in favor of their more profitable successors.

In Defense of Explosive Barrels

Posted on 21 March, 2010 by tangentvector

I’m always a little confused when people talk about “realism” in games. Maybe it started with Old Man Murray’s famous screed about the medium’s overreliance on crates.

As comedy, this kind of analysis works. More worrying, though, is when players, critics and designers decide that the conceits of the medium, or of a particular genre, should be excised in the name of “realism.”

The classic elemental environments (fire level, ice level, etc.) have already fallen out of favor. Characters being obstructed by knee-high fences is unforgivable in the era of open-world games. RPG loot drops must be rationalized because a celebrity game designer noticed that birds don’t carry swords. And the explosive barrel – stalwart vetran of FPS design – is now seen as a ridiculous affront to realism.

What the detractors fail to acknowledge is that explosive barrels are fun. Very fun. I’d go so far as to say that the explosive barrel is one of the fundamental elements in the Periodic Table of Interactivity.

Just trying to enumerate the possible uses for a well-realized explosive barrel (like the version in Half-Life 2) reveals how much interactivity is packed into each:

The barrel is a physics object, and thus can be stacked or climbed upon. It can be thrown with the game’s gravity gun, wherupon it explodes on impact, providing an impromptu grenade launcher. It can be exploded with a burst of gunfire, but a few careful shots with the pistol will start it burning, creating a timed explosive used by player and enemy alike. Finally, the barrel can be triggered by other explosions, leading to the possibilty of chain reactions engineered by the level designer, or savvy players.

The explosive barrel is a powerful tool of player expression. It elevates the vocabulary afforded to the FPS player beyond just “point” and “shoot.”

I could go on raving about the merits of the explosive barrel, or describe how other “unrealistic” conceits like elemental levels/abilities or double-jumping benefit their respective genres, but I hope my point is clear.

These “cliches” persist because they are more fun than the monotonous grey “realism” of so many of today’s games.

[Note: I do give the TV Tropes site credit for understanding that tropes are not bad. Now we just need to get the players out there to understand it.]

Terminology

So Why Does This Matter?

Stepping Back

What Do We Know?

What Don’t We Know?

Which Path Gets Followed First?

Follow the “branch taken” option first

Follow the “not taken” option first

Follow “taken” first for back edges, “not taken” for forward edges

Follow whichever path has fewer active invocations

Where/when do we re-converge?

Re-converge at the end of an if or loop

Re-converge at the immediate post-dominator

Re-converge at “thread frontiers”

Maximal convergence

When Are Deferred Paths Followed?

Conclusion

Introduction

Why Do We Need These Things?

A Brief Digression

Guarantees, and the Lack Thereof

So Then What About Explicit Locations?

So What Now?

Horizontal Blank

A Less Simple Renderer

Try It Out!

Next Steps

The Game Boy

Background Tile Map

Sprites

Tiles

Take-Away

Tile Replacement

A Simple Scheme

Special Addendum for Pedants

What Worked

Automatic Recovery after Battles

More Difficult Battles

No Penalty for Failure

Chaining, Stagger and Launch

Roles and Character Differentiation

What Didn’t Work

The Story

The Summons

Battle System Design

End-Game Tedium

Conclusion

Re-converge at the end of an `if` or loop