OpenSG: Basic Concepts - LSI/USP

OpenSG: Basic Concepts Dirk Reiners OpenSG Forum [email protected]

Gerrit Voß Camtech [email protected]

Johannes Behr ZGDV [email protected]

15th February 2002 Abstract One of the main shortcomings of current scenegraphs is the inability to support multi thread-safe data. Another area that leaves things to be desired is extensibility. This work describes a system that allows multiple asynchronous threads to independently manipulate the scenegraph without interfering. This demands replication of data. As scenegraph data can get very big, a distinction of structural and content data is introduced, and a method to replicate the latter only if necessary. To make the whole concept generic and easily extensible, Reflectivity is introduced to the system. Besides allowing to manage multiple independent threads the described approaches can also be used for generic scenegraph access ,e.g. for loader/writer and GUIs. An extension allows also to the use the developed methods for cluster support.

1 Introduction Scenegraphs have been used in computer graphics to structure the scene data for quite a while now. The first commercial scenegraphs date back to the early nineties, and graph structures have been used in computer science for much longer than that. The concept of a scenegraph has proven itself to be useful for a wide range of applications. Different scenegraph systems have been written, and focus on different application areas. Open Inventor[4] has been used primarily for highly interactive mouse-based applications and as a training system. DirectModel[1] and OpenGL Optimizer[3] are targeted at application working with CAD data due to their ability to handle free-form surfaces very efficiently. OpenGL Performer[2] is the most widely used scenegraph system for visual simulation and Virtual Reality applications due to its strong focus on speed. There are some features that currently available scenegraphs are missing. One of them is multi-thread safe data handling. Performer was the first scenegraph to introduce multi-threading. But only in a very specialised sense. It is used to separate the application from the rendering task, and to split the rendering task into culling and drawing. Later versions introduced the concept of a database thread for paging and a compute thread for slow simulations, as in general Performer’s threading is not easy to extend. It also doesn’t completely separate the threads from interfering with each other. Only manipulations of the scene graph’s structure are safe, changing the external data used by the nodes, e.g. vertex positions or normals, directly influences all other threads. The application can work around this for the fixed Performer APP-CULL-DRAW pipeline, but other multi-threading setups are not as lenient. Especially VR applications can have multiple asynchronous threads with a need to access consistent data. Haptic simulation for example needs to run at kHz rates to generate acceptable results, which contrasts strongly with physical simulation, which due to the complexity of the calculations involved, often can only run with a small number of iterations per second. Rendering usually lies somewhere in the middle with framerates around 30. All of these threads depend on their data being consistent. In the visual case the result would just be a strange frame otherwise, but in the other cases the results depend on the previous frame’s results and thus one error can destroy the integrity of the system until a restart.

1

Multi-threading is becoming more and more important, as the trend in processor architectures is going towards Simultaneous Multi-Thread (SMT), to combat the discrepancy between processor core and memory access speeds. Current processors can easily be stalled for hundreds of cycles if they have to access memory that is not in the cache. To utilise this unused capacity modern processor designs have multiple threads in core concurrently, and as soon as one thread stalls due to memory access, another is run. To fully utilise the power of these processors, multi-threaded applications are necessary. Thus when designing the OpenSG scenegraph system provisions for multi-thread safe data were one of the prime goals. The approach taken to create thread-safe data is described in section 2. A feature that is needed to make the synchronisation generic and extensible is reflectivity, the ability of the system to describe itself, which is described in section 3. Extensibility is another area where current scenegraphs don’t fulfil all the needs of an application. Reflectivity already helps a lot in simplifying extending the system. Further means of making the system extensible are described in section 4. Some applications of the concepts that go beyond simple multithreading are given in section 5, followed by conclusions and future work in section 6.

2 Multi-Thread Safe Data To allow every thread to not only access but also change the scenegraph every thread needs a private copy of the data. As the scenegraph data can get very big, it is not possible to simply replicate everything. It would also not be efficient to do that, as a synchronisation step would have to copy everything from one thread to the other.

2.1 Data Distribution The amount of data in the scenegraph is usually not evenly distributed between the different parts of the graph. A typical scenegraph for a medium complexity model consists of 4,000 nodes (interior and leaf nodes). Assuming a node takes up 200 byte, this amounts to 800 kbyte of data. A typical number of triangles for this kind of model would be around 200,000. At 40 byte per triangle this amounts to 18.8 mbyte of data, 18 times as much. For other examples the ratio will vary, but typically the geometric data will be 10 to 20 times larger than the structural data of the scenegraph. As a consequence replicating the geometric data would severely impact the memory consumption of the system, while replicating the structural data is much less of a problem.

2.2 OpenSG Data Replication To fit the above results into a general framework OpenSG defines three basic classes that are used as base classes for all the data in the system: SingleFields, MultiFields and FieldContainers. SingleFields SingleFields are the equivalent to simple member variables. They store just a single value of their respective type, and that value is stored directly in the field. These fields are replicated directly, and their values are assumed to be small to not tax the memory too much by replicating them. MultiFields MultiFields are equivalent to STL vectors and in fact use STL vectors for their data management. They store a dynamically sized 1D array of values, and the data is only referenced by the field but stored outside. MultiFields are expected to contain the main part of the data. Thus only the references to the data are replicated, not the data itself. The data itself is only replicated if necessary, i.e. when a thread starts writing the data it receives a personal copy. When two threads with private copies synchronise their data, one of the copies is thrown away and both threads reference the same data again, until one of them starts writing it.

2

FCPtr Base Size SField MField

SField

MField Data

MField

SField MField

MField Data

Figure 1: OpenSG Data Structure FieldContainers FieldContainers are the wrapper which is used to encapsulate the replication. There are different possibilities how to replicate the data. Conceptually replicating the fields, i.e. each field keeping an array of values instead of a single one, is the simplest approach, as the replication can be kept very locally. The disadvantage is that the data relevant for a thread is interleaved with irrelevant data, which is used by other threads. This would negatively impact cache hits, which would lead to bad performance. An alternative approach, and the one taken by OpenSG, is replicating the FieldContainers instead of the Fields (see fig. 1). Multiple copies of the FieldContainer are located consecutively in memory. As most FieldContainers will be larger than the typical cache line, the cache impact should be minimal. From the FieldContainer’s point of view inside its methods access to the data is absolutely normal, all Fields are just members. From the outside, things look different. As all the copies start at different addresses, which is the right one for the current thread? Furthermore it would be very desirable to be able to pass pointers between threads and not having to map the pointers whenever they are used in another thread. This is also important when pointers are used inside the system itself. The used solution lies in a new pointer type. Standard C pointers cannot be used to access FieldContainers, instead specific pointer objects called FCPtrs are used. They know the start address of the first copy and the size of the FieldContainer. To access the correct copy for the active thread only the index associated with the current thread needs to be known. This index is associated with the thread and the same for all FieldContainers. Thus it can be stored in thread-local storage and doesn’t have to be passed around by the application. A consequence of this data structure is the restriction that instances of the FieldContainers can not be created using standard methods like new, to ensure that the consistency of the structures is maintained. Thus another creation method has to be used. The Prototype pattern was chosen, as it fits best into the extensibility goals (see section 4).

2.3 Copies The copies of the MFields, in contrast to the SFields’ copies, have to be made explicitly. In general a copy-on-write paradigm needs to be realised. Operating support a copy-on-write paradigm, but only on the page level, which is much too coarse for the application here. Thus the scenegraph needs to implement its own version. The decision when a write is about to happen and thus a copy has to be made can be hidden inside the scenegraph or be left to the application. The hidden case would demand a check if a private copy has already been made in all methods that could change the MField’s data. The disadvantage lies in the fact that this test has to be made every time the data is changed, no matter if a copy has already been made or

3

not. As pretty often all the data of a field will be changed, a lot of unnecessary tests will have to be made. To get around this inefficiency the second option was realized, the application has to explicitly notify the scenegraph when it starts changing data. For more efficient synchronisation (see section 2.4), this has to be done not only for MFields, but also for SFields.

2.4 Synchronisation Separating the scenegraph’s copies of the different threads is one part, but sooner or later they will have to be synchronised, so that the results of one thread have an influence on the other thread(s). To do this one thread’s data has to be copied to another thread. Copying all the data of a thread is going to be quite a lot of work, given that the data is distributed across a large number of FieldContainer structures. In practice most of the FieldContainers will not change for every frame, as these extremely dynamic applications are not a good match for a scenegraph. Most application will have a static part that doesn’t change at all and some dynamic objects that move and maybe some dynamic objects that change their attributes. In general only a small part of the scenegraph will change, thus it is more efficient to just synchronise this part. To allow that the changed FieldContainers together with a bit mask indicating the changed fields are stored in a thread-specific change list. When two threads want to synchronise their data only the FieldContainers stored in this list are synchronised, and only the fields indicated in the bit mask are copied. Thus only the minimum amount of data has to be copied. This change list can also be used for other purposes, see section 5.

3 Reflectivity To create a system whose extensibility is actually used, adding new classes and extending existing classes has to be easy. This also means that it shouldn’t be necessary to touch different places all over the system when something is changed. The aspect of this that has already been mentioned is the synchronisation. It would be very desirable if generic code could be written to synchronise all FieldContainers, including ones added by applications, without the application having to write specific code for synchronisation. To allow that the FieldContainer needs to know which Fields it has and have methods to access and synchronise the contents of the Fields. This introspective capability is called Reflectivity. Some languages provide it as a part of the language, C++ does not, thus it was necessary to add the capability to the scenegraph. This also allows extensions specific for the scenegraph to be added, like the ones used by Clustering (see section 5). Types The Types are classes used to keep information about the FieldContainer and all its Fields. FieldContainerType The FieldContainerType is a place where information about the FieldContainer that can not be accessed using standard C++ methods is stored. This includes the name of the FieldContainer and the name of its’ parent class (to walk the class tree), the prototype instance (see section 4) and the list of FieldTypes for the fields of this container. FieldType The FieldType keeps the information about the field. This includes the name of the field, the type of the field and access methods to get a reference to the field, which is used for reading and writing the field’s value. The actual synchronisation is done by the Field itself. This allows a generic implementation, which mainly has to distinguish between Single and MultiFields, and which is extensible for new applications and types of Fields.

4

4 Extensibility The graphics landscape has gotten more and more dynamic recently. 3D rendering acceleration is becoming ubiquitous, thanks to the volume of sales of companies like ATI and nVidia, which produce very powerful graphics boards at prices acceptable for home users. With the wider availability of 3D graphics, the realm of potential applications is expanding rapidly. This complicates the situation for a scenegraph system, as the variety in applications expands, and it’s pretty much impossible to guess what people will want to use the system for.. At the same time the variability in the feature set is expanding, too. To differentiate themselves from the competition, manufactures add proprietary extensions to OpenGL, which are often necessary to fully utilise the hardware. As these change from generation to generation, an application written some time ago will have to be adapted to new hardware. To do that it has to be possible to change the types of objects created at runtime, so that a new or additional scenegraph library can be used without disturbing the application. The main tool to support this extensibility is the use of the Prototype creation pattern. The Prototype is an instance of the class to be created, which is kept in a centrally accessible place. In the case of OpenSG, the FieldContainerType is a class that already keeps global type-specific data, and as such is suitable for keeping the prototype, too. When a new instance of a FieldContainer is needed, the Prototype is cloned. This allows replacing the Prototype at runtime with an instance of another class that features the same interface. Thus new classes that are better adapted to current hardware or newer developments will be used, whenever an object of the given type is needed. This affects system-internal as well as application-created objects and thus ensures consistency. Another advantage of the Prototype pattern is the ability to store and change default values in the Prototype. As all objects are clones of the prototype, new instances will have the values set automatically.

5 Applications The main application of the methods described in this paper is the multi-thread safe data handling. But the Reflectivity and Synchronisation infrastructure can also be used for other tasks. Generic Loader/Writer Using the Reflectivity, any system structure can be converted to and from an ASCII format that can be written to or read from a file. As the conversion to and from is left to the FieldType, new types can be added transparently. Writing arbitrary scenegraphs but also non-scenegraph classes containing configurations or similar data can be used. Generic GUI Similarly it is possible to use the Type information to display the current state of the system at runtime (see fig. 2). This GUI can display every FieldContainer inside the system including values. It is also possible to add value changing, thus creating a generic editor for any OpenSG-based structure. Clustering As common off-the-shelf PC systems are getting more and more powerful, one trend is to use many of them instead of single large systems. Using multiple independent systems to render one or more consistent images each of them needs to have the same data. The problem is very similar to the multi-thread synchronisation described in section 2.4. In addition to the information which data has changed the changed data values need to be transmitted to each node of the cluster and integrated into its local copy of the data. Using this mechanism any OpenSG application can easily be used on a cluster system. Support for a cluster was added to the OpenSG-based Avalon system within two hours.

6 Results and Future Work The concepts described in this work have been realized in the OpenSG initiative and have proven to be a viable approach for supporting multi-thread-safe data in a scenegraph system. The developed Reflectivity 5

Figure 2: Generic GUI displaying a part of the scenegraph

Figure 3: VW Beetle on a tiled screen driven by a cluster (Model courtesy Volkswagen, Image taken at NCSA using their Display Wall-In-A-Box implementation)

6

is also a good basis fro extensibility, together with using the Prototype pattern. Besides supporting multithread safe data the described concepts have also been used to create generic loaders and writer as well as a generic GUI and cluster distribution support. One problem of the synchronisation is that every thread has a full copy of all the data. Not all threads will need everything, thus a lot of data will be copied that is never used. A filter concept would help each thread to specify the data it’s interested in and copy only that part. This is especially useful for clusters, as transferring the data over a network is expensive.

References [1] HP. DirectModel, 2000. [2] John Rohlf and James Helman. IRIS performer: A high performance multiprocessing toolkit for real– Time 3D graphics. In Andrew Glassner, editor, Proceedings of SIGGRAPH ’94 (Orlando, Florida, July 24–29, 1994), Computer Graphics Proceedings, Annual Conference Series, pages 381–395. ACM SIGGRAPH, ACM Press, July 1994. ISBN 0-89791-667-0. [3] sgi. Unleashing the power of sgi’s next generation http://www.sgi.com/software/optimizer/whitepaper.html, 2001.

visualization

technology.

[4] Paul S. Strauss and Rikk Carey. An object-oriented 3D graphics toolkit. In Edwin E. Catmull, editor, Computer Graphics (SIGGRAPH ’92 Proceedings), volume 26, pages 341–349, July 1992.

7