Sony to Release Two Types of Stacked Event-Based Vision Sensors with 4.86μm Pixel Size for Detecting Subject Changes Only

By Tiera Oliver

Associate Editor

Embedded Computing Design

September 09, 2021


Sony to Release Two Types of Stacked Event-Based Vision Sensors with 4.86μm Pixel Size for Detecting Subject Changes Only

Sony Semiconductor Solutions Corporation announced the upcoming release of two types of stacked event-based vision sensors. These sensors designed for industrial equipment are capable of detecting only subject changes.

Event-based vision sensors asynchronously detect luminance changes for each pixel and output the changed data only, combining it with information on pixel position (xy coordinates) and time, thereby enabling high-speed, low latency data output.

The two new sensors employ stacking technology leveraging Sony’s proprietary Cu-Cu connection*2 to achieve the industry’s smallest*1 pixel size of 4.86μm. In addition to operating with low power consumption and delivering high-speed, low-latency, high-temporal-resolution data output, the new sensors also feature a high resolution for their small size. All of these advantages combine to ensure the detection of moving subjects in diverse environments and situations.

These two sensors were made possible through a collaboration between Sony and Prophesee, by combining Sony's CMOS image sensor technology with Prophesee's unique event-based vision sensing technology. According to the company, this enables high-speed, high-precision data acquisition and contributes to improve the productivity of the industrial equipment.

*1: Among stacked event-based vision sensors. According to Sony research (as of announcement on September 9, 2021).

*2: Technology that provides electrical continuity via connected Cu (copper) pads when stacking the pixel section (top chip) and logic circuits (bottom chip). Compared with through-silicon via (TSV) wiring, where the connection is achieved by electrodes intruded around the circumference of the pixel area, this method gives more freedom in design, improves productivity, allows for a more compact size, and increases performance.

Stacked event-based vision sensors Left:IMX636 Right:IMX637


Sample ship date (planned)

IMX636 1/2.5-type (7.137mm diagonal) approx.

0.92-effective-megapixel*3 stacked event-based vision sensor

October 2021

IMX637 1/4.5-type (3.983mm diagonal) approx.

0.33-effective-megapixel*3 stacked event-based vision sensor

October 2021

*3: Based on image sensor effective pixel specification method.

The new Sony sensors utilize an event-based method that asynchronously detects pixel luminance changes and outputs data with pixel position (xy coordinates) and time information. They employ a proprietary stacked structure leveraging Cu-Cu connection to achieve conduction between the pixel chip and the logic chip, which is equipped with a signal processing circuit for detecting luminance changes for each pixel. This design makes it possible that only the pixels that have detected a change in luminance for the object can output data, allowing the sensor to detect the luminance changes with high-speed, low-latency, high-temporal-resolution while operating with low power consumption. 

These products can sense slight changes in vibration, detecting abnormalities for use in predictive maintenance of equipment. They also sense changes in sparks produced during welding and metal cutting, information which can be used to notify workers of the optimal time to change tools, among other possibilities. Likewise, they can contribute to improved production in various applications that frame-based image sensors have trouble with, as well as to support work processes that rely on human experience.     

Imaging example 1: Vibration detection (left: frame-based image, right: event-based sensing)

Imaging example 2: Spark detection during metal cutting (left: frame-based image, right: event-based sensing)

Main Features

■    Compact design with high-resolution

The pixel unit and signal processing circuit are typically placed on the same board. With these products, the structure delivers a small pixel size while maintaining a high aperture ratio*4 for the pixel unit. This design delivers both a compact form factor and high resolution, providing high-precision recognition capabilities.

*4: Ratio of aperture section (section other than light-shielding portion) as viewed from the light incident side, per pixel.

■    High-speed, low-latency, high-temporal-resolution event data extraction capability despite low power consumption

The proprietary stacked construction allowed Sony to opt for an event-based method. Unlike the frame-based method, this allows the sensors to read only the required data, thereby saving on power consumption, but with microsecond-level high speed and low latency, enabling high-temporal-resolution output.

■    Event filter function for streamlined information acquisition

These sensors are equipped with event filtering functions developed by Prophesee for eliminating unnecessary event data, making them ready for various applications. Using these filters helps eliminate events that are unnecessary for the recognition task at hand, such as the LED flickering that can occur at certain frequencies (anti-flicker), as well as events that are highly unlikely to be the outline of a moving subject (event filter). The filters also make it possible to adjust the volume of data when necessary to ensure it falls below the event rate that can be processed in downstream systems (event rate control).


Image with event data accumulated for an equivalent of a single frame at 30 fps (approx. 33 ms)

   (Left:Event filter off, right:Event filter on.Approx. 92% data volume reduction from the left)

As part of the collaboration between Sony and Prophesee on these products, Metavision Intelligence Suite, an event signal processing software optimized for sensor performance, is available from Prophesee. Combining Sony’s event-based vision sensors with this software is designed to enable efficient application development and provide solutions for various use cases.

Key Specifications




Effective pixels

Approx. 0.92 megapixels

(1280 × 720 [H × V])

Approx. 0.33 megapixels

(640 × 512 [H × V])

Unit cell size

4.86μm × 4.86μm

Array size

Diagonal 7.137 mm


Diagonal 3.983 mm


Color filter

Black and white


Ceramic LGA package

(Exterior: 13 × 13 mm)



Power supply


3.0 V


1.1 V


1.8 V

Nominal contrast threshold (ln)






Less than 100 microseconds @ 1 Klux,

Less than 1000 microseconds @ 5 lux

Thinned read-out*7

Less than 220 microseconds @ 1 Klux,

Less than 1000 microseconds @ 5 lux

Maximum event rate

1.06 Geps (giga events per second)

Dynamic range

86 dB or more (5-100,000 lux)*8

Background rate

0.1 Hz @ 1 Klux / 10 Hz @ 5 lux


Input: I2C (400 kHz / 1 Mhz),4-wire SPI

Output: MIPI D-PHY(1.5 Gbps/lane)2lane,

SLVS(800Mbps/lane) 4lane

Event signal processing function


Event filter

Event rate control

*5: When using the event signal processing function.

*6: Region of Interest, 9 × 9 = 81 pixels.

*7: Thinned read-out, 1/5 (horizontal direction).

*8: 5 lux is the minimum light condition that guarantees imaging characteristics. Low-Light Cutoff(not guaranteed) is 0.08 lux

with 50% of pixels respond to linear 100% contrast.

For more information, visit:

Product page:

Technology page:

Tiera Oliver, Associate Editor for Embedded Computing Design, is responsible for web content edits, product news, and constructing stories. She also assists with newsletter updates as well as contributing and editing content for ECD podcasts and the ECD YouTube channel. Before working at ECD, Tiera graduated from Northern Arizona University where she received her B.S. in journalism and political science and worked as a news reporter for the university’s student led newspaper, The Lumberjack.

More from Tiera