In-sensor optoelectronic computing using electrostatically doped silicon

Complementary metal–oxide–semiconductor (CMOS) image sensors allow machines to interact with the visual world. In these sensors, image capture in front-end silicon photodiode arrays is separated from back-end image processing. To reduce the energy cost associated with transferring data between the sensing and computing units, in-sensor computing approaches are being developed where images are processed within the photodiode arrays. However, such methods require electrostatically doped photodiodes where photocurrents can be electrically modulated or programmed, and this is challenging in current CMOS image sensors that use chemically doped silicon photodiodes. Here we report in-sensor computing using electrostatically doped silicon photodiodes. We fabricate thousands of dual-gate silicon p–i–n photodiodes, which can be integrated into CMOS image sensors, at the wafer scale. With a 3 × 3 network of the electrostatically doped photodiodes, we demonstrate in-sensor image processing using seven different convolutional filters electrically programmed into the photodiode network. A network of dual-gate silicon p–i–n photodiodes, which are compatible with complementary metal–oxide–semiconductor fabrication processes, can perform in-sensor image processing by being electrically programmed into convolutional filters.

integrated with the remainder of the CMOS image sensor electronics, while replacing the chemically doped silicon photodiode array. Such silicon-based approach could expedite the real-world application of in-sensor computing due to its compatibility with the mainstream CMOS electronics industry 3,4,15,16 .
Concretely, we rst demonstrate large-scale device production by fabricating thousands of dual-gate silicon p-i-n photodiodes at the wafer scale. We then perform in-sensor computing on serial optical images using a 3 × 3 network of these electrostatically doped photodiodes by electrically programming the network into 7 different convolutional lters.

Electrostatically doped silicon photodiodes
The photocurrent of a diode, I ph , grows with the power of the incident light, P, with the responsivity, R, being the proportionality constant 17 , i.e., I ph = R · P. A conventional, chemically doped photodiode exhibits a constant responsivity R, since the parameters that determine R, especially the doping densities of the p and n regions, are xed. On the other hand, in an electrostatically doped photodiode, where the doping densities can be modi ed by gate biasing, R is electrically programmable. The electrostatically doped photodiode can thus perform analog multiplication between the incident light power P and the electrically programmed responsivity R. This programmable optoelectronic analog multiplication is the key to insensor image processing.
Our electrostatically doped photodiode is built on an intrinsic silicon wafer. It contains two contact electrodes-i.e., electrode 1 and 2-to provide the current path, and two top gate metals, which, when biased with the same voltage magnitude of opposite signs, create electrostatically doped p and n regions in silicon (Figs. 1a and b). The part of the silicon without any overlying gate metal is an intrinsic (i) region, and acts as a channel in the device. This channel region is directly exposed to light from above. The contact and gate electrodes are arranged in an interdigitated fashion for a high channel width/length ratio of 5576 µm/5 µm. Detailed fabrication steps are described in Methods and Supplementary Fig. 1.
The resulting p-i-n diode exhibits a standard rectifying behavior ( Supplementary Fig. 2), which con rms the electrostatic doping. As we swap the signs of the two gate biases, the rectifying behavior ips its polarity ( Supplementary Fig. 2), which further veri es the electrostatic doping.
Illumination of the intrinsic channel region with a frequency of light higher than the silicon bandgap (~1.12 eV, or ~1,100 nm) generates a photocurrent. For this photocurrent generation mode, throughout this work, we bias both contact electrodes at zero voltage and de ne the current ow from electrode 1 to 2 as positive. The genesis of the photocurrent is the electrons and holes excited by the light, which are swept in opposite directions by the built-in potential (V bi ) of the diode, which is determined by the doping densities of the p and n regions (Fig. 1b). The electrostatic alteration of the doping densities via the gate voltages changes V bi , which in turn can modify the magnitude and direction of the photocurrent for a given power of incident light. In other words, the gate biases tune the responsivity R.
We demonstrate this dependence of R on the gate biases by measuring the photocurrent with a xed power, red-ltered halogen lamp that is periodically shuttered on and off, while the voltage at the gate above electrode 1 (V G, 1 ) is stepped up from -3 V to 3 V with a 0.5 V step, and simultaneously the voltage at the gate above electrode 2 (V G, 2 ) is stepped down from 3 V to -3 V with a 0.5 V step (inset of the Fig.   1c). The optical power of the light source, P source , is 15 µW, which is different from, but proportional to, the power P of the light incident on the device, scaled according to the device and/or beam area. The measured photocurrent, shown in Fig. 1c, exhibits the expected modulation of R by the gate bias voltages. Repetition of such gate-controlled photocurrent modulation for ~50 min shows the stability of the programmability in R (Supplementary Fig. 3).
COMSOL Multiphysics simulation also con rms the operating principle of the electrostatically doped p-i-n diode. The gating clearly creates p and n regions, with the band bending across the channel (Supplementary Fig. 4a-c) and the responsivity R changing with the gating as expected (Supplementary Fig. 4d; more on this shortly, in connection with Fig. 2c).

Programmable optoelectronic multiplication
We further investigate the dependence of the photo response on the gate voltages, and now, also on the light power (Fig. 2a). Figure 2b shows the photocurrent map with two gate voltages independently swept, each from -5 V to 5 V with a step of 0.1 V, while the photodiode is illuminated by blue laser light (473 nm) with a xed P source of 125 µW. When the two gate voltages are identical, i.e., V G, 1 = V G, 2 , whether it is positive (n-n doping) or negative (p-p doping), no overall potential gradient develops, and thus no photocurrent should be produced. The corresponding pp to nn line, with zero current, is indeed close to the ideal positive diagonal line, and its slight deviation is possibly due to charge carrier trapping at defects formed during fabrication. On the other hand, when we sweep the two gate voltages at the same magnitude, but with opposite signs, along the negative diagonal line, the photocurrent monotonically increases from the negative maximum to the positive maximum, which is consistent with the monotonic change of V bi from the negative maximum to the positive maximum (Fig. 1b). Figure   2c plots this photocurrent response along the negative diagonal line as a function of V G, 1 = -V G, 2 , which we denote as programming voltage V p . This measured dependence of the photocurrent on V p is also qualitatively consistent with the COMSOL simulation ( Supplementary Fig. 4d). From here on, all the gate biasing is con gured as V p = V G, 1 = -V G, 2 .
Moreover, we demonstrate the linear dependence of I ph on P source --and therefore on P--for any given R programmed by tuning V p . This linearity is important for high-delity analog multiplication between P and a given R. Figure 2d shows the measured I ph as a function of P source (red-ltered halogen lamp) for various V p (and thus R) values. A simple linear t yielding a high coe cient of determination (0.996 averaged across all V p values) con rms the linear dependence of I ph on P source , and thus on P, for each programmed value of R. Linearity is also con rmed for different wavelengths of incident light ( Supplementary Fig. 5).

Wafer-scale characterization of electrostatically doped silicon photodiodes
Electrostatically doped silicon photodiodes may accelerate the real-world realization of in-sensor computing due to their suitability for large-scale integration with CMOS electronics. To demonstrate, we have fabricated, in-house, 4,900 of the dual-gate p-i-n silicon photodiodes on a 4-inch silicon wafer (Fig.  3a, left) using the CMOS-compatible fabrication (see Methods). The fabricated wafer features 7 × 7 = 49 reticles, with each reticle containing 10 × 10 = 100 photodiodes (Fig. 3a, right). Figure 3b shows photocurrent maps obtained by illuminating a 400 nm LED light with a xed P source of 170 µW serially--diode by diode--across an example reticle containing 100 photodiodes, for various V p values (-5 V to 5 V with a 2 V step, clockwise from right, top corner). These maps show a high device-todevice uniformity in the responsivity programming within the reticle. In the wafer-scale photocurrent measurement of a 5 × 5 reticle array (2,500 photodiodes) with an automated probe station with V p varied from -5 V to 5 V with a 0.1 V step, 2,372 devices showed programmable responsivity (~95% yield).
Concretely, as we sweep V p , the photocurrents of the 2,372 devices, in response to the 400 nm LED light with the xed P source of 170 µW, varied from -380 ± 50 nA to 430 ± 47 nA (Fig. 3c). Figure 3d shows the distribution of the 2,372 photocurrents for selected V p values (-5 V to 5 V with a 1 V step), where device-todevice variations are more pronounced than those from the single reticle, which is standard at the wafer scale.
Optoelectronic convolutional image processing in a photodiode network We connect 9 photodiodes as shown in Fig. 4a to perform analog multiplication between the incident light power and the programmed responsivity in each photodiode, and to sum, or accumulate, the resulting 9 photocurrents via Kirchhoff's current law. The photocurrent sum resulting from this analog multiply-accumulate (MAC) operation is a dot product between the 1 × 9 incident light power vector and the 1 × 9 vector of programmed responsivities. Consequently, the 9-photodiode network of Fig. 4a serves as an optoelectronic convolutional processor, with the 1 × 9 vector of programmed responsivities--or equivalently the 3 × 3 map of responsivities programmed across the photodiode array--serving as an image lter kernel. The accumulated photocurrent is converted to an output voltage (V out ) via a transimpedance ampli er on a printed circuit board (PCB). Our measurement system is detailed in Supplementary Fig. 6.
With an image lter kernel programmed, the 9-photodiode network not only captures an input scene, but also processes it simultaneously. Figures 4b-d show an example demonstration where the network nds the edges of a moving light spot. We program the photodiode network to feature the speci c responsivity map of Fig. 4b by independently tuning V p of each photodiode (see Methods). This lter kernel is designed for edge-detection along the x-axis, resulting in positive and negative photocurrents when the photodiode network is at the right and left edges of the light spot, respectively, and otherwise negligible photocurrents. Figure 4c shows V out monitored with a light spot from an LCD projector (power set to 255 out of 255, green channel only) moving from left to right at a frequency of 4 Hz, demonstrating the consistent positive (6 V) and negative (-6 V) responses as the spot moves over the array. We have evaluated this dynamic processing up to a spot movement frequency of 500 Hz (Fig. 4d), the maximum frequency of our optical setup. Expanding from the simple example above, we perform in-sensor processing of a 256 × 256 pixel image (Fig. 5a, grey scale, 8 bit depth) with the contrast inversion lter kernel (Fig. 5b) programmed into the 9-photodiode network (see Methods for programming). Illumination of a 3 × 3 patch of the image onto the photodiode network using an LCD projector (green channel only) results in an accumulated photocurrent as the outcome of the optoelectronic convolution. By sliding the 3 × 3 patch through the 256 × 256 image and repeating the optoelectronic convolution, we generate a 254 × 254 matrix of accumulated photocurrents (a total of 64,516 accumulated photocurrents), which represents the image (Fig. 5c) processed with the contrast inversion lter kernel.
Besides the contrast inversion ltering, we have repeated in-sensor image processing using 6 other widely used lter kernels 11,[18][19][20][21][22] : difference of Gaussians (DoG), Gaussian blurring, image sharpening, box blurring, horizontal Sobel, and vertical Sobel lters ( Supplementary Fig. 7). As the 63 photocurrent values programmed with a xed P source from the LCD projector (green channel only; 9 values per lter and a total of 7 lters), which correspond to the 63 programmed R values, are compared to their target values, which range from -2 µA to 4 µA, the maximum error was 18 nA. Since the ratio of the maximum error to the target range, 1/333, lies between 1/2 9 and 1/2 8 , the programming accuracy is 8 bit. The images shined with the LCD projector (green channel only) and processed with these lter kernels are shown in Fig. 5e, bottom; the Sobel ltered image in Fig. 5e, bottom is a composite produced by the root sum of squares of the horizontal and vertical Sobel ltered images (Supplementary Fig. 8) 20 . The juxtaposition of these images processed in the analog domain within the photodiode array (Fig. 5e, bottom) with those computed digitally (Fig. 5e, top) unequivocally verify our in-sensor computing scheme.

Conclusion
Bio-inspired computing has nucleated intense, worldwide research efforts in recent years, with in-memory computing motivated by the co-location of memory and computing in the brain, and with the even more recent in-sensor computing inspired by the sensory peripherals of the brain, where sensing is accompanied by early information processing 3,4,10,11,[23][24][25][26][27][28] . In this work, we have demonstrated analog image convolution processing as a form of in-sensor computing, where we have developed and utilized the electrostatically doped silicon photodiode array. This approach based on silicon devices suggests a way to accelerate the practical realization of in-sensor computing by taking advantage of the mainstream CMOS electronics infrastructure. A monolithic integration of the electrostatically doped silicon photodiode array with conventional CMOS image sensor electronics, while replacing the chemically doped silicon photodiode array, is the next step of research our development suggests. The electrostatically doped photodiode we have fabricated in-house in this demonstration occupies an area orders of magnitude larger than a chemically doped photodiode in the state-of-the-art CMOS image sensor. Thus, increasing the density of the electrostatically doped photodiode array through substantial device miniaturization, while maintaining per-pixel gate and contact electrodes and their control by CMOS electronics, would be both a key direction and a challenge in this line of investigation. Iterative programming. While all the pixels are exposed to constant, maximum light from the LCD projector (255 for the 8 bit range, green channel only), V p for each pixel is set to a calculated V k for the k th iteration calculated by the following equation:

Methods
where I target is the target current and I k is the measured photocurrent at n th iteration cycle, for each of 9 pixel. We keep modulating V p until the difference between I target and I k is less than allowed error range, i.e., 23 nA if we set an 8 bit accuracy for the full range of 6 µA.

Declarations Code availability
Experimental code is available from the corresponding authors on reasonable request.

Data availability
The data that support the ndings of this study are available from the corresponding authors upon reasonable request. Figure 1 Electrostatically doped silicon p-i-n photodiode. a, Optical microscope (top), scanning electron microscope (SEM, middle), and atomic force microscope (AFM, bottom) images of an electrostatically doped p-i-n photodiode prototype. Contact electrode 1 and 2 and two gate electrodes above (false-colored with blue and red shades in the SEM and AFM images) are interdigitated. b, Part of the device SEM image (top) and corresponding schematic illustration (bottom) of the cross-sectional view of the photodiode, gate-biased to form p-i-n and n-i-p con gurations. For a more realistic spatial pro le of the electron concentration under gate biasing, see Supplementary Fig. 4. c, Measured photocurrent with pulsed light (blue) and stepped gate voltages (V G,1 , red; V G,2 , not shown). A red-ltered halogen lamp (P source = 15 µW) is used as the light source, V G, 1 is stepped up from -3 V to 3 V with a 0.5 V step, and V G, 2 is simultaneously stepped down from 3 V to -3 V with a 0.5 V step.

Figure 2
Programmable photo response of the dual-gate silicon p-i-n photodiode. a, Schematic illustration of the measurement setup. Incident light with power P-i.e. P source scaled according to the device and/or beam area-is converted to the photocurrent, I ph , which is modulated by the two gate voltages, V G, 1 and V G, 2 . b, Photocurrent map measured with each gate voltage independently swept from -5 V to 5 V with a 0.1 V step. c, Photocurrent response with V p = V G, 1 = -V G, 2 swept from -5 V to 5 V with a 0.1 V step. The light source for parts b and c is a blue laser (473 nm) with a P source of 125 µW. d, The measured photocurrent vs. P source with V p as a parameter, varied from -4 V (blue) to 4 V (red) with a 1 V step. The light source is a red-ltered halogen lamp. Wafer-scale array of the dual-gate silicon p-i-n photodiodes. a, Optical images of a fabricated wafer containing 7 × 7 reticles (left) and a reticle containing 10 × 10 photodiodes (top, right), and an SEM image of 9 example photodiodes. b, The photocurrent map of a single reticle (10 × 10 photodiodes) with V p varied from -5 V to 5 V with a 2 V step (clockwise from top, right). c, The average (solid line) and standard deviation (shades) of photocurrents measured from 2,372 working photodiodes from 5 × 5 reticles containing 2,500 photodiodes, with V p varied from -5 V to 5 V with a step of 0.1 V. d, The histogram of the photocurrent data collected in part c., with V p from -5 V (blue) to 5 V (red) shown in 1 V increments for clarity. All measurements shown in this gure were performed with a 400 nm LED light with a P source of 170 µW.

Figure 4
A 3 × 3 photodiode network for analog multiply-accumulate (MAC) computation. a, Schematic illustration of the network comprising 9 dual-gate silicon p-i-n photodiodes. The accumulated photocurrent as a result of the analog optoelectronic MAC operation is converted to a voltage (V out ) by a transimpedance ampli er on a printed circuit board. b, A photocurrent map programmed with a constant light power, i.e., a responsivity map, which represents a lter kernel for edge-detection along the x-axis. c, Measured V out of the photodiode network arranged into the lter kernel of part b, with a light spot moving from left to right at 4 Hz. d, Repetition of part c, but with the light spot moving frequencies at 10, 50, 100, 250, and 500 Hz.
For all experiments in this gure, an LCD projector (green channel only) with a power 255 out of 255 is used as a light source. In-sensor image processing using the 3 × 3 dual-gate p-i-n photodiode network. a, A 256 × 256 input image (left) and its example portion (top, right). The bottom right is an example of a 3 × 3 patch from this input image, which is projected onto the photodiode network. b, A programmed photocurrent map with a xed power of light-i.e., a responsivity map-for contrast inversion ltering. The maximum LCD projector brightness (255 out of 255, green channel only) is used for this programming. c, The 254 × 254 map of accumulated photocurrents with the 9 photodiode network programmed as in part b, where the 64,516 accumulated photocurrents are serially obtained by illuminating, using the LCD projector (green channel only), the photodiode network with a 3 × 3 patch sliding through the 256 × 256 input image. d, Various ltered images obtained with digital computing (top) and in-sensor computing (bottom).