Difference between revisions of "Geometry Shader"
(→Output limitations: Formatting.) 
Nguillemot (talk  contribs) m (→Output limitations: more math formatting) 

Line 184:  Line 184:  
The other limit, defined by {{enumGL_MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS}} is, in layman's terms, the total amount of stuff that a single GS invocation can write. It is the total number of output values (a component, in GLSL terms, is a component of a vector. So a {{codefloat}} is one component; a {{codevec3}} is 3 components) that a single GS invocation can write to. This is different from {{enumGL_MAX_GEOMETRY_OUTPUT_COMPONENTS}} (the maximum allowed number of components in {{codeout}} variables). The total output component is the total number of components + vertices that can be written.  The other limit, defined by {{enumGL_MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS}} is, in layman's terms, the total amount of stuff that a single GS invocation can write. It is the total number of output values (a component, in GLSL terms, is a component of a vector. So a {{codefloat}} is one component; a {{codevec3}} is 3 components) that a single GS invocation can write to. This is different from {{enumGL_MAX_GEOMETRY_OUTPUT_COMPONENTS}} (the maximum allowed number of components in {{codeout}} variables). The total output component is the total number of components + vertices that can be written.  
−  For example, if the total output component count is 1024 (the smallest maximum value from GL 4.3), and the output stream writes to 12 components, the total number of vertices that can be written is 1024  +  For example, if the total output component count is 1024 (the smallest maximum value from GL 4.3), and the output stream writes to 12 components, the total number of vertices that can be written is <math>floor(\tfrac{1024}{12}) = 85</math>. This is the absolute hard limit to the number of vertices that can be written. Even if {{enumGL_MAX_GEOMETRY_OUTPUT_VERTICES}} is larger than 85, because this vertex shader writes 12 components per vertex, the true maximum that this geometry shader can write is 85 vertices. If the geometry shader instead wrote only 8 components per vertex, then it could write 128 (subject to the output vertices limit, of course). 
Note that even builtin outputs like {{codegl_Layer}} count towards {{enumGL_MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS}}. For example, a geometry shader with a total output component count of 1024 which outputs {{codevec4 gl_Position}} and {{codeint gl_Layer}} supports a maximum of <math>floor(\tfrac{1024}{4 + 1}) = 204</math> vertices.  Note that even builtin outputs like {{codegl_Layer}} count towards {{enumGL_MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS}}. For example, a geometry shader with a total output component count of 1024 which outputs {{codevec4 gl_Position}} and {{codeint gl_Layer}} supports a maximum of <math>floor(\tfrac{1024}{4 + 1}) = 204</math> vertices. 
Revision as of 16:36, 4 August 2015


OpenGL Rendering Pipeline

A Geometry Shader (GS) is a Shader program written in GLSL that governs the processing of Primitives. Geometry shaders reside between the Vertex Shaders (or the optional Tessellation stage) and the fixedfunction Vertex PostProcessing stage.
A geometry shader is optional and does not have to be used.
Geometry shader invocations take a single Primitive as input and may output zero or more primitives. There are implementationdefined limits on how many primitives can be generated from a single GS invocation. GS's are written to accept a specific input primitive type and to output a specific primitive type.
While the GS can be used to amplify geometry, thus implementing a crude form of tessellation, this is generally not a good use of a GS. The main reasons to use a GS are:
 Layered rendering: taking one primitive and rendering it to multiple images without having to change bound rendertargets and so forth.
 Transform Feedback: This is often employed for doing computational tasks on the GPU (obviously preCompute Shader).
In OpenGL 4.0, GS's gained two new features. One was the ability to write to multiple output streams. This is used exclusively with transform feedback, such that different feedback buffer sets can get different transform feedback data.
The other feature was GS instancing, which allows multiple invocations to operate over the same input primitive. This makes layered rendering easier to implement and possibly faster performing, as each layer's primitive(s) can be computed by a separate GS instance.
Primitive in/out specification
Each geometry shader is designed to accept a specific Primitive type as input and to output a specific primitive type. The accepted input primitive type is defined in the shader:
layout(input_primitive) in;
The input_primitive type must match the primitive type for the vertex stream provided to the GS. If Tessellation is enabled, then the primitive type is specified by the Tessellation Evaluation Shader's output qualifiers. If Tessellation is not enabled, then the primitive type is provided by the drawing command that renders with this shader program. The valid values for input_primitive, along with the valid OpenGL primitive types or tessellation forms, are:
GS input  OpenGL primitives  TES parameter  vertex count 

points  GL_POINTS  point_mode  1 
lines  GL_LINES, GL_LINE_STRIP, GL_LINE_LIST  isolines  2 
lines_adjacency  GL_LINES_ADJACENCY, GL_LINE_STRIP_ADJACENCY  N/A  4 
triangles  GL_TRIANGLES, GL_TRIANGLE_STRIP, GL_TRIANGLE_FAN  triangles, quads  3 
triangles_adjacency  GL_TRIANGLES_ADJACENCY, GL_TRIANGLE_STRIP_ADJACENCY  N/A  6 
The vertex count is the number of vertices that the GS receives perinput primitive.
The output primitive type is defined as follows:
layout(output_primitive, max_vertices = vert_count) out;
The output_primitive must be one of the following:
 points
 line_strip
 triangle_strip
These work exactly the same way their counterpart OpenGL rendering modes do. To output individual triangles or lines, simply use EndPrimitive (see below) after emitting each set of 3 or 2 vertices.
There must be a max_vertices declaration for the output. The number must be a compiletime constant, and it defines the maximum number of vertices that will be written by a single invocation of the GS. It may be no larger than the implementationdefined limit of MAX_GEOMETRY_OUTPUT_VERTICES. The minimum value for this limit is 256. See the limitations below.
Instancing
Core in version  4.6  

Core since version  4.0  
Core ARB extension  ARB_gpu_shader5 
The GS can also be instanced (this is separate from instanced rendering, as this is localized to the GS). This causes the GS to execute multiple times for the same input primitive. Each invocation of the GS for a particular input primitive will get a different gl_InvocationID value. This is useful for layered rendering and outputs to multiple streams (see below).
To use instancing, there must be an input layout qualifier:
layout(invocations = num_instances) in;
The value of num_instances must not be larger than MAX_GEOMETRY_SHADER_INVOCATIONS (this will be at least 32). The builtin value gl_InvocationID specifies the particular instance of this shader; it will be on the halfopen range [0, num_instances).
The output primitives from instances are ordered by the gl_InvocationID. So if the user renders two primitives, and has num_instances set to 3, then the GS will be called effectively in this order: (prim0, inst0), (prim0, inst1), (prim0, inst2), (prim1, inst0), ... The output primitives from the GS's will be ordered based on that input sequence. All invocations of the first input primitive will execute before any invocations from the second primitive.
Inputs
Geometry shaders take a primitive as input; each primitive is composed of some number of vertices, as defined by the input primitive type in the shader.
The outputs of the vertex shader (or Tessellation Stage, as appropriate) are thus fed to the GS as arrays of variables. These can be organized as individual values or as part of an interface block. Each userdefined input will be an array of the length of the primitive's vertex count. The order of vertices in the input arrays corresponds to the order of the vertices specified by prior shader stages.
Geometry shader inputs may have interpolation qualifiers on them. If they do, then the prior stage's outputs must use the same qualifier.
Geometry Shaders provide the following builtin input variables:
in gl_PerVertex
{
vec4 gl_Position;
float gl_PointSize;
float gl_ClipDistance[];
} gl_in[];
These variables have only the meaning the prior shader stage(s) that passed them gave them.
There are some GS input values that are based on primitives, not vertices. These are not aggregated into arrays. These are:
in int gl_PrimitiveIDIn;
in int gl_InvocationID; // Requires GLSL 4.0 or ARB_gpu_shader5
 gl_PrimitiveIDIn
 the current input primitive's ID, based on the number of primitives processed by the GS since the current drawing command started.
 gl_InvocationID
 the current instance, as defined when instancing geometry shaders.
Outputs
Geometry shaders can output as many vertices as they wish (up to the maximum specified by the max_vertices layout specification). To provide this, output values in geometry shaders are not arrays. Instead, a functionbased interface is used.
GS code writes all of the output values for a vertex, then calls EmitVertex(). This tells the system to write those output values to where ever it is that output vertices get written. After calling this function, all output variables contain undefined values. So you will need to write to them all again before emitting the next vertex (if there is a next vertex).
The GS defines what kind of primitive these vertex outputs represent. The GS can also end a primitive and start a new one, by calling the EndPrimitive() function. This does not emit a vertex.
In order to write two independent triangles from a GS, you must write three separate vertices with EmitVertex() for the first three vertices, then call EndPrimitive() to end the strip and start a new one. Then you write three more vertices with EmitVertex().
Output variables are defined as normal for GLSL. They can be grouped into interface blocks or be single values, as appropriate. Output variables can be defined with interpolation qualifiers. The Fragment Shader equivalent interface variables should define the same variables with the same qualifiers.
Geometry Shaders have the following builtin outputs.
out gl_PerVertex
{
vec4 gl_Position;
float gl_PointSize;
float gl_ClipDistance[];
};
gl_PerVertex defines an interface block for outputs. The block is defined without an instance name, so that prefixing the names is not required.
The GS is the final Vertex Processing stage. Therefore, unless rasterization is being turned off, you must write to some of these values. These outputs are always associated with stream 0. So if you're emitting vertices to a different stream, you don't have to write to them.
 gl_Position
 the clipspace output position of the current vertex. This value must be written if you are emitting a vertex to stream 0, unless rasterization is off.
 gl_PointSize
 the pixel width/height of the point being rasterized. It is only necessary to write to this when outputting point primitives.
 gl_ClipDistance
 allows the shader to set the distance from the vertex to each UserDefined Clip Plane. A positive distance means that the vertex is inside/behind the clip plane, and a negative distance means it is outside/in front of the clip plane. In order to use this variable, the user must manually redeclare it (and therefore the interface block) with an explicit size.
Certain predefined outputs have special meaning and semantics.
out int gl_PrimitiveID;
The primitive ID will be passed to the fragment shader. The primitive ID for a particular line/triangle will be taken from the provoking vertex of that line/triangle, so make sure that you are writing the correct value for the right provoking vertex.
The meaning for this value is whatever you want it to be. However, if you want to match the standard OpenGL meaning (ie: what the Fragment Shader would get if no GS were used), you must do this for each vertex before emitting it:
gl_PrimitiveID = gl_PrimitiveIDIn;
This naturally assumes that the number of primitives output by the GS equals the number of primitives received by the GS.
Layered rendering
Layered rendering is the process of having the GS send specific primitives to different layers of a layered framebuffer. This can be useful for doing cubebased shadow mapping, or even for rendering cube environment maps without having to render the entire scene multiple times.
Layered rendering in the GS works via two special output variables:
out int gl_Layer;
out int gl_ViewportIndex; // Requires GL 4.1 or ARB_viewport_array.
The gl_Layer output defines which layer in the layered image the primitive goes to. Each vertex in the primitive must get the same layer index. Note that when rendering to cubemap arrays, the gl_Layer value represents layerfaces (the faces within a layer), not the layers of cubemaps.
gl_ViewportIndex, which requires GL 4.1 or ARB_viewport_array, specifies which viewport index to use with this primitive.
Layered rendering can be more efficient with GS instancing, as different GS invocations can process instances in parallel. However, while ARB_viewport_array is often implemented in 3.3 hardware, no 3.3 hardware provides ARB_gpu_shader5 support.
Which vertex
gl_Layer and gl_ViewportIndex are pervertex parameters, but they specify a property that applies to the entire primitive. Therefore, a question arises: which vertex in a particular primitive defines that primitive's layer and viewport index?
The answer is that it is implementationdependent. However, OpenGL does have two queries to determine which one the current implementation uses: GL_LAYER_PROVOKING_VERTEX and GL_VIEWPORT_INDEX_PROVOKING_VERTEX.
The value returned from glGetIntegerv will be one of the following enumerators:
 GL_PROVOKING_VERTEX: The vertex used will track the current provoking vertex convention.
 GL_LAST_VERTEX_CONVENTION: The vertex used will be the one defined by the lastvertex provoking vertex convention.
 GL_FIRST_VERTEX_CONVENTION: The vertex used will be the one defined by the firstvertex provoking vertex convention.
 GL_UNDEFINED_VERTEX: The implementation isn't saying.
For maximum portability, you will have to provide the same layer and viewport index to each primitive. So if you wanted to output a triangle strip, where different triangles had different indices, too bad. You have to split it into different primitives.
Output streams
Core in version  4.6  

Core since version  4.0  
Core ARB extension  ARB_transform_feedback3 
When using Transform Feedback to compute values, it is often useful to be able to send different sets of vertices to different buffers at different rates. For example, GS's can send vertex data to one stream, while building perinstance data in another stream. The vertex data and perinstance data will be of different lengths, written at different speeds.
Multiple stream output requires that the output primitive type be points. You can still take whatever input you prefer.
To provide this, output variables can be given a stream index with a layout qualifier:
layout(stream = stream_index) out vec4 some_output;
The stream_index ranges from 0 to GL_MAX_VERTEX_STREAMS  1.
A default value for the stream can be set with:
layout(stream = 2) out;
All following out variables will use stream 2 unless they specify a stream. The default can be changed later. The initial default is 0.
To write a vertex to a particular stream, the function EmitStreamVertex is used. This function takes a stream index; only those output variables are written. Similarly, EndStreamPrimitive ends a particular stream's primitive. However, since multiple stream output requires using points primitives, the latter function is not very useful.
Only primitives emitted to stream 0 will actually be pass along to Vertex PostProcessing and rendered; the rest of the streams will only matter if transform feedback is being used. Calling EmitVertex or EndPrimitive is equivalent to calling their stream counterparts with stream 0.
Output limitations
There are two competing limitations on the output of a geometry shader:
 The maximum number of vertices that a single invocation of a GS can output.
 The total maximum number of output components that a single invocation of a GS can output.
The first limit, defined by GL_MAX_GEOMETRY_OUTPUT_VERTICES, is the maximum number that can be provided to the max_vertices output layout qualifier. No single geometry shader invocation can exceed this number.
The other limit, defined by GL_MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS is, in layman's terms, the total amount of stuff that a single GS invocation can write. It is the total number of output values (a component, in GLSL terms, is a component of a vector. So a float is one component; a vec3 is 3 components) that a single GS invocation can write to. This is different from GL_MAX_GEOMETRY_OUTPUT_COMPONENTS (the maximum allowed number of components in out variables). The total output component is the total number of components + vertices that can be written.
For example, if the total output component count is 1024 (the smallest maximum value from GL 4.3), and the output stream writes to 12 components, the total number of vertices that can be written is . This is the absolute hard limit to the number of vertices that can be written. Even if GL_MAX_GEOMETRY_OUTPUT_VERTICES is larger than 85, because this vertex shader writes 12 components per vertex, the true maximum that this geometry shader can write is 85 vertices. If the geometry shader instead wrote only 8 components per vertex, then it could write 128 (subject to the output vertices limit, of course).
Note that even builtin outputs like gl_Layer count towards GL_MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS. For example, a geometry shader with a total output component count of 1024 which outputs vec4 gl_Position and int gl_Layer supports a maximum of vertices.