Real-time Ray Tracing Technology: A Game-Changer for Mobile Graphics

By Kristof Beets

VP of Technology Insights

Imagination Technologies

November 17, 2021


Real-time Ray Tracing Technology: A Game-Changer for Mobile Graphics

When was the last time you played a video game on a PC or console? If it was one of the most popular titles, you may have noticed how incredibly realistic the graphics are. Today, the graphics in the most cutting-edge PC and console games are approaching the photorealism we see in films.

One of the most important elements in creating photorealistic real-time computer-generated graphics is the lighting of a scene. The traditional way to render 3D graphics – rasterization – is not the most capable technology when it comes to simulating realistic lighting and shadows. This is because rasterization works by meshing virtual polygons into 3D models which are then broken down into pixels that must be individually colored, and this process requires an extremely heavy amount of effort and computational resources, and frequently complicated developer techniques.

Movie animators have long known this, and instead use ray tracing technology to produce effects such as global illumination, shadows, and reflections. Ray tracing is a way of easily and accurately simulating the behaviour of light, resulting in a scene that looks more realistic and is created with less work. The technology is more aligned with how the human eye sees the world, with light bouncing off objects in the environment and absorbing/reflecting that light based on an object’s material composition.

Because ray tracing can enable more realistic effects, developers can use it to create a more immersive experience for gamers. And since these effects are easier to create using ray tracing than by using traditional rasterization, it frees up developers’ time to work on other aspects of the game.

However, the downside of using ray tracing technology for gaming is that to make a genuinely immersive experience, ray tracing must be done in real time. This means that images must be rendered in fractions of seconds, and this makes it almost prohibitively expensive in terms of processing and resources.

Today, hardware in the market can enable real-time ray tracing effects in console and PC games, but it’s used selectively to light only the most critical objects in a scene, and in some cases, a user can decide how to use it. Given performance constraints, they can decide whether they want to use it to generate reflections versus shadows, or to use it at all – they may opt to turn it off to optimise the framerate instead. The bottom line is that there are performance tradeoffs.

Current ray tracing-capable hardware increases performance by providing more ray tracing processing than previous generations, but it requires more silicon area and more power. Today it is extremely challenging to implement real-time ray tracing in power-constrained devices, and thus the technology has yet to grace the screens of mobile gamers. However, since mobile gaming will represent 52% of the global gaming market in 2021, this is a graphics demographic that shouldn’t be ignored.

Graphics in Mobile Devices Today

Mobile graphics quality continues to improve despite not yet having the assistance of ray tracing technology. As graphics become ever more realistic, the ability to produce them using traditional techniques is becoming more complex, and this is pushing performance, bandwidth and power requirements to the limits of mobile processors.

The generation of shadows is a good example of this. Shadow generation is traditionally done through cascaded shadow maps – a process that requires an extreme amount of geometry processing, the allocation of massive buffers, the processing of multiple render targets at high resolution, and expensive shader operations. The costs are huge in terms of processing cycles, power consumption, and bandwidth. And, after all this investment, the results are still not as realistic as those that could be created in a much easier fashion by using ray tracing.

With ray tracing technology, rays are sent from individual pixels to a light source, and if the ray hits something, that region is in shadow. One ray per pixel makes this a simple, inexpensive, and straightforward process, especially with dedicated hardware in place. Many of the resolution problems, artifacts, and other challenges encountered with traditional rasterization – such as the biasing needed to avoid floating geometry – don’t exist in ray tracing. Creating lighting and shadow effects becomes a trivial challenge as long as the dedicated hardware can be implemented efficiently.

Today, the industry is at a crossover point where the approximations being done using shadow maps and other traditional techniques are becoming so expensive that real-time ray tracing is actually becoming a more efficient option. In addition, as we approach the end of Moore’s Law, we can no longer rely on seeing programmable hardware become exponentially more powerful every two years. Therefore, to continue to speed the progress of ray tracing technology, particularly for power-constrained mobile platforms, it will be essential for the industry to look to efficiently designed fixed-function accelerator solutions.

The key to making continually better graphics in mobile devices lies in the efficiency of both hardware and developer optimization techniques.

The State of Ray Tracing Hardware

Real-time ray tracing technology has long been the holy grail of 3D graphics. Ray tracing can be performed with varying levels of performance and efficiency, and to highlight this Imagination established the Ray Tracing Levels System (RTLS), identifying six levels of ray tracing, from Level 0 to Level 5. Initial hardware-accelerated ray tracing efforts – what we call Level 0 on our RTLS scale, weren’t optimal. These solutions had limited functionality and needed custom hardware and application programming interfaces (APIs). Of course, these solutions weren’t very interesting for developers. The market then progressed to ‘Level 1’ solutions which used software-based compute on traditional GPUs. Such solutions provided more flexibility but were still a long way from ideal.

The majority of ray tracing technologies we see in the market today – used in PCs and consoles for example – are what we call Level 2 RTLS solutions. These feature dedicated hardware for the most fundamental and extensive operations of ray tracing: the testing of rays against boxes and triangles for intersections. Putting this into fixed-function hardware achieves better power efficiency, but it’s still not good enough for mobile. There are still multiple additional levels of the ray tracing process that need to be done in software shader programs on GPU compute cores, and worse, those processing stages are not friendly to the parallel execution engines within the GPU. This also leads to lower performance for traditional graphics effects, and it’s not as efficient in terms of using the arithmetic logic unit (ALU) pipelines, restricting overall throughput.

The most sophisticated ray tracing solutions in the market today are at Level 3. These solutions implement even more ray tracing functionality in dedicated hardware, offloading the shader cores for higher efficiency. At this level, the full ray intersection processing, which involves traversing the bounding volume hierarchy (BVH) – the principal data structure of ray tracing – is implemented in dedicated hardware. This increases ray tracing efficiency for scenes with more complexity and better offloads the ray tracing functionality, reducing the impact on traditional graphics performance.

However, these solutions still lack an essential component needed to make ray tracing a possibility in mobile devices: coherency gathering. Since rays tend to scatter in many different directions, if the coherency problem isn’t solved, many of the parallelism benefits that typically come with GPUs are lost. This leads to lower bandwidth utilization, complex data access patterns, and reduced shader pipeline processing efficiency. While these solutions may make claims about achieving high gigarays per second, they typically have low efficiency. This can be linked to low utilization of GPU processing resources or memory access limitations due to non-coherent memory access patterns resulting from the scattering of rays throughout a scene.

For real-time ray tracing to be a reality in mobile devices, it’s about efficiency, taking advantage of the inherent parallelism in GPUs, and developing smart algorithms for hardware optimization. What’s needed is a smarter hardware solution.

Smarter Ray Tracing Hardware

The industry must surpass Level 3 hardware to make real-time ray tracing a possibility in mobile devices. At Imagination, we have already delivered the 3D graphics technology for more than 10 billion mobile devices to date. We know how to deliver stunning graphics in highly-efficient hardware. In 2016, Imagination’s ray tracing development board was already more sophisticated than the solutions in the market today. Now we’re bringing to market Level 4 RTLS ray tracing solutions.

At Level 4, the ray traversal of the BVH is done in dedicated hardware as it is in Level 3, but more crucially, so is the coherency sorting of rays. This is a process in which we group rays that are travelling in the same direction, allowing for the processing of large batches to take full advantage of the parallel compute approaches that form the foundation of every GPU.

We’ve long taken advantage of the inherent parallelism in GPUs with technologies like tile-based rendering, through which we increase efficiency through spatial locality sorting of tiles. Today, this is generally accepted practice. Now we are bringing the same idea to ray tracing with the sorting of rays. In this way, we increase the overall utilization of the wide ALUs and increase testing efficiency significantly. There is also minimal impact on traditional graphics performance because we’re almost completely offloading the ray tracing processing into dedicated hardware, leaving the shader cores available for all other non-ray tracing processing.

Our Level 4 RTLS solution represents the highest ray tracing available today, and implementations will soon come to market with this technology inside. With Imagination’s Level 4 ray tracing IP, IMG CXT, companies can build SoCs that reach up to 9TFLOPS of FP32 rasterized performance and more than 7.2GRay/s of ray traced performance, while at the same time achieving up to 2.5x greater power efficiency compared to today’s Level 2 or 3 solutions.

Developer Optimisation Techniques

Even the most efficient hardware can still use some assistance in delivering the best graphics quality on a mobile platform. Optimization for mobile devices is more essential than it is for consoles or PCs that, to some degree, tolerate inefficiency by brute forcing the problem with excessive bandwidth and power budget availability. Phones do not have this brute force tolerance because everything must work in a tiny form factor and run based on the juice provided by a small battery. The characteristics of a phone mean hardware must be driven efficiently by the game engine to ensure the mobile phone won’t overheat, throttle down the clock, and drop the framerate. This means that one of the keys to creating hardware for mobile is by optimally managing the thermal effects.

The first thing we tell developers to consider is that less is more: not every pixel in a scene needs ray tracing. We recommend that developers use their ray tracing budget to achieve the biggest visual impact and game experience value. Similarly, it’s far too complex to test every ray against every triangle. By building an efficient hierarchy, developers can improve the hardware efficiency when determining which triangles to test. Building the best acceleration structure is often best done offline.

Developers should also avoid all brute force algorithms. Effects like soft shadows can be achieved with smarter sampler patterns across pixels and combining them with a spatial filter rather than issuing many rays per pixel. In addition, most ray tracing effects can be rendered at lower resolution – for example, processed at a quarter of the resolution – and then scaled up. Sparse sampled ray tracing noise can then be mitigated with de-noising filters using temporal and spatial properties of the image. Many of these stages can be folded together and can run efficiently on traditional GPU compute hardware. It may also be possible to take advantage of the dedicated neural network acceleration engines that are already available in almost every phone.

There are many other examples of techniques developers can use to optimize for mobile devices – such as dynamic probes for global illumination, thoughtful use of APIs, and many others. The bottom line is that highly-efficient ray tracing hardware combined with smart developer optimizations will be the key to making real-time ray tracing a reality for mobile.

A Game Changer for Mobile Devices

There is the possibility – through new highly efficient dedicated hardware and design techniques – to bring ray tracing technology to mobile devices. This is a genuine game-changer. Not only will it make the world’s largest gaming demographic available to developers, but it can also enable mobile device manufacturers to build further differentiation into their products and allow developers to create new immersive user experiences on mobile devices.

I’m a GPU expert with 25 years of experience across almost the whole breadth of the domain, including hardware and software architecture, performance and competitive analysis, developer support, total solution selling, sales and technical marketing, and requirements analysis. I’ve got a wealth of experience in the whole problem space, think quickly and clearly in front of external parties, and I’m still in love with the field even as I hit 2.5 decades working on it.

More from Kristof