An LO phase shifter with frequency tripling and phase detection in 28 nm FD-SOI CMOS for mm-wave 5G transceivers

This paper presents an LO phase shifter with frequency tripling for 28-GHz 5G transceivers. The phase shifting and frequency tripling are achieved using an injection-locked oscillator and injection-locked frequency tripler, respectively. A phase detector based on third harmonic mixing is also implemented and is used to detect the applied phase shift, supporting automatic calibration of the phase shifter. Additionally, an algorithm to automatically tune the oscillators to their respective locking frequency is presented. To test the phase shifter, a 24–30-GHz sliding-IF receiver is implemented. Simulations show that a > 360∘\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^\circ $$\end{document} tuning range over the full 24–30 GHz span is achieved, with a gain variation of 0.11 dB or less, and that the phase detector has an rms phase error of < 2.5∘\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^\circ $$\end{document}. The circuit is implemented in a 28nm FD-SOI CMOS process and the entire chip measures 1080 μm\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\upmu \hbox {m}$$\end{document}×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} 1080 μm\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\upmu \hbox {m}$$\end{document}, including pads, and consumes 27–29 mW from a 1 V supply.


Introduction
To account for the ever-increasing needs for mobile data, the fifth generation of cellular network technology (5G) has enabled the use of mm-wave frequencies for mobile communication, i.e., frequencies between 30 and 300 GHz. Since the mm-wave spectrum is mostly unoccupied, very large bandwidths can be allocated to each user [1]. For instance, the first commercial bands operating at 24-30 GHz and 37-43.5 GHz offer up to 400 MHz of bandwidth per user, enabling unprecedented data rates in mobile communication [2]. However, the mm-wave communication suffers from significantly higher path loss than its sub-6GHz counterpart [3].
This necessitates the use of antenna arrays, see Fig. 1, in which tens to hundreds of antenna elements are used to focus the transmitted or received power in a certain direction, in a process called beamforming, thus greatly improving the achievable communication distance [4]. The direction of the main beam, or lobe, can be controlled by applying an appropriate phase shift to each antenna element signal. In addition to the main lobe, there will be nulls, where the signal is completely canceled, and sidelobes, see Fig. 1.
The phase shift can either be implemented in the digital domain, referred to as digital beamforming, in the analog domain, referred to as analog beamforming, or in both domains, referred to as hybrid beamforming [5]. Digital beamforming results in the highest system capacity, since all degrees of freedom can be utilized in the channel, but it also requires a full RF chain and a data converter for each antenna element, causing high power consumption [1]. On top of this, it requires extremely fast digital signal processing due to the huge amount of data generated, further increasing the power consumption.
Analog beamforming, on the other hand, only requires a single data converter, significantly reducing the power consumption. However, this means that only a single beam can be created at the time, resulting in a poor utilization of the frequency spectrum resources [5].
Hybrid beamforming has proven to be an efficient middle ground for mm-wave communication, almost reaching the system capacity of digital beamforming, while consuming less power [5,6]. Thus, analog phase shifters are typically required for mm-wave communication. For simplicity of discussion, the rest of this paper will focus on phase shifting 1 3 in receivers. However, in general, the same concepts apply equally well to transmitters.
The analog phase shift can be applied to the received signal at baseband (BB), at mm-wave frequencies, or to the local oscillator (LO) signal of the mixers [7]. The major benefit of mm-wave phase shifting is that the signals are combined before they are down-converted. This means that only a single mixer is required, making the LO signal routing very simple. However, the phase shifter and combiner must operate at mm-wave frequencies, resulting in high power consumption and/or losses, and any amplitude-variation with phase setting will directly affect the signal. IF/BB phase shifting also suffers from this direct relation between phaseshifter gain and signal amplitude, but is typically easier to implement due to the lower operating frequency. LO phase shifting has, due to the weak relation between mixer gain and LO amplitude, a much lower sensitivity to amplitudevariations of the phase shifter. Both IF/BB and LO phase shifting require one mixer per antenna element, complicating the LO distribution, especially for very large arrays. However, much research has recently focused on combining a large number of smaller integrated circuits (ICs), each with a limited number of antenna elements, in a so-called tiled approach, thereby improving yield and modularity while reducing cost [4]. This means that the penalty of using LO or IF/BB phase shifting in terms of LO distribution becomes less significant compared to mm-wave phase shifting and combining.
An important aspect of an analog phase shifter is phaseamplitude control orthogonality, that is, it should be possible to control the amplitude and phase of each antenna element independently [8]. If the amplitude of each antenna element can be controlled prior to combining the signals, tapering can be used to reduce the amplitude of the sidelobes, at the cost of widening the main beam [9]. However, the sidelobe suppression will be limited by the phase resolution of each phase shifter. In [8], it is shown that for an 8x8 antenna array, the sidelobe suppression will degrade by about 5 dB if the phase shifters have a resolution of 22.5 • , compared to using phase shifters with infinite resolution. Phase shifters with 5 • resolution, on the other hand, only degrade the sidelobe suppression by about 1 dB. Interestingly, in both cases, the beam direction resolution is less than 1 • when non-uniform phase settings are used.
Phase-amplitude control orthogonality is also important for the achievable peak-to-null ratio. If the amplitude varies with varying phase setting, or vice versa, the signals will not perfectly cancel in the null direction. The same is true if the actual phase shift deviates from the desired phase setting. In [10], it is shown that to achieve a 30 dB peak-to-null ratio in a four-element array, the rms phase error and amplitude variation must be less than 2 • and ±1.5 dB, respectively.
Due to process, voltage, and temperature variations, this kind of performance is typically only achieved with a time-consuming and costly manual calibration. To speed up that process or even completely circumvent it, several designs with with either built-in self-tests (BIST), automatic calibration schemes or very robust design, requiring little to no calibration, have been proposed [10][11][12][13][14][15]. Wu et al. [10,11] achieve excellent phase accuracy and amplitude stability with automatic calibration, but the phase resolution is limited to 22.5 • . On the other hand, Inac et al. [12] uses a BIST and achieves a phase resolution of 11.25 • , but the rms phase error is about 4 • . Yin et al. [13] implements phase shifters with a resolution of 5.6 • that only requires calibration at one frequency to cover all of its intended frequencies (23.5-29.5 GHz), but the amplitude variation and phase error is still 1.1 dB and 4.8 • , respectively. [14] claims a design robust enough to not require any calibration. However, while the design does achieve an uncalibrated phase resolution and amplitude variation of less than 6.1 • and ± 0.8 dB, respectively, the phase error is significant. While not explicitly stated, based on the presented plots it appears to be several degrees. Lastly, [15] achieves an extraordinary rms phase error of 0.08 • and rms amplitude error of 0.01 dB after automatic calibration, with a phase resolution of 0.05 • . However, their calibration is based on connecting each transmitter output to each receiver input through switches, degrading noise performance and potentially causing cross-talk. For some frequencies, their noise figure (NF) is as high as 11 dB with a gain of − 3 dB, meaning that any circuitry added at the baseband will severely degrade the NF. Additionally, it severely complicates the layout.
In this work, a 24-30 GHz LO phase shifter intended for a hybrid beamforming array is presented, see Fig. 2. The phase shift is accomplished using an injection-locked oscillator (ILO) followed by an injection-locked frequency tripler (ILFT), similar to the work in [10,11]. A phase detector (PD) is added for built-in measurements of the phase shift and to automatically find the frequency control settings to lock the ILO and ILFT. Additionally, a 28-GHz receiver is implemented to verify the performance of the phase shifter. This paper is an extended version of the work presented in [16].

Injection-locked phase shifter and phase detector
If an oscillator with a free-running oscillation frequency f 0 is injected with a signal at frequency f inj , the oscillator can be forced to oscillate at frequency f inj [17]. The oscillator is then said to be injection-locked. Injection-locking will occur if the difference between f 0 and f inj is smaller than the one-sided locking range f L , given by [18]: where Q is the quality factor of the resonator in the oscillator, and I inj and I osc are the magnitudes of the injected and free-running oscillation currents, respectively. Since the injection-locked oscillator (ILO) will not oscillate at the resonance frequency of its tank, the ILO output must be phase-shifted relative to the injected signal in order to sustain a 360 • phase shift when going through the oscillator loop [18]. This phase shift can be approximated by [18]: Thus, by changing the free-running frequency of the ILO, for instance by using a varactor, this phase shift can electronically be controlled. An issue with this approach is that Eq. 2 is limited to phase shifts of up to ± 90 • , while for a phased array, phase shifts of up to ± 180 • are required. This can be solved by using a frequency tripler [10,19], which also triples the phase shift, extending the achievable phase shift to ± 270 • . The frequency tripler can also be implemented as an injection-locked oscillator, but with a resonance frequency three times that of the phase-shifting ILO, so that it locks to the third harmonic of the injected signal.
In addition to improving the phase shifting range, using a tripler has two added benefits, both related to that the central PLL seen in Fig. 2 only needs to operate at 1/3 of the final frequency. Firstly, it makes the frequency distribution more power efficient, since lower frequency means less losses [20], thus requiring less buffering. The buffers themselves will also be more power-efficient at lower frequencies. Secondly, the phase noise will improve. This is because the phase noise of ILO will ideally follow the phase noise of the PLL, while the frequency tripler will follow the phase noise of the ILO, multiplied by a factor 3 2 , corresponding to an addition of 9.5 dB [10]. Since VCOs, due to poor varactor quality factor, typically have worse figure-of-merit at higher frequencies, a PLL operating at the final frequency would most likely have worse phase noise performance than a PLL operating at one-third of the frequency followed by a frequency tripler.
If a single PLL is used for multiple ILOs and ILFTs, as in Fig 2, the phase noise in each antenna element path will be correlated inside the injection-locked bandwidth of the oscillators. While correlated noise is typically something to be avoided in circuit design, it may actually be an advantage in multi-user beamforming applications [21]. This is because correlated phase noise will affect the phase of each antenna element signal the same, causing the relative phase difference between antenna elements to be unaffected, thus not impacting the shape of the beams and nulls. On the other hand, uncorrelated phase noise will affect the phase of each antenna element signal differently, thus distorting the shape of the beams and nulls. Figure 3 shows our proposed architecture for the LO phase shifter with frequency tripling and phase detection. It comprises an ILO, an ILFT, a polyphase filter (PPF), a phase detector, two ADCs, two DACs, and a DSP. Note that the converters and DSP are not implemented in this work. A 6-7.5-GHz external clock is injected into the ILO, which, assuming that injection-locking occurs (more on that later), adds a phase shift . This signal is then injected into the ILFT, which outputs an 18-22.5-GHz signal with a phase  The phase detector is implemented as two Gilbert mixers, one for the I-signal and one for the Q-signal, see Fig. 4. The output from the ILFT is fed to the common source transistors M 1 and M 2 , while the output from the PPF is fed to the commutating pairs, M 3 -M 6 . The commutating pairs are sized so that the current is completely steered from one branch to the other, creating a large third harmonic. This third harmonic mixes with ILFT and, assuming that the current in each commutating pair is a perfect square wave and only accounting for the third harmonic, results in an output: where g m1 is the transconductance of M 1 and M 2 , R L is the load resistor, and V FT is the output amplitude of the frequency tripler.
After low-pass filtering the two outputs and converting them to digital signals using the ADCs, the phase can, using simple digital processing, be calculated as: Based on this phase measurement, the ILO can then be tuned with a DAC to achieve the desired phase shift without any further calibration, see Fig. 3.
However, this only works if the ILO and ILFT are injection-locked, but the phase detector can also be used to automatically tune the oscillators to achieve lock. The following example, in which the oscillators are tuned to lock to an injected signal with frequency clk , illustrates this.
1. Initially, the free-running frequencies of the ILO ( ILO ) and the ILFT ( ILFT ) are tuned to their lowest settings, i.e. the varactor voltages V tune,ILO and V tune,ILFT are set to 0 V. 1 This means that clk > ILO and 3 clk > ILFT , see the top part of Fig. 5(a). This means that in the phase detector, clk,I and clk,Q are mixed with ILFT . Since ILFT is not a harmonic of clk , no DC output will be generated in the PD and thus v PD,I -v PD,Q ≈ 0 V, see Fig. 5(a). 2. Next, V tune,ILO is increased, shifting ILO up in frequency, see Fig. 5(a). With V tune,ILO = 0.2 V, locking of ILO has yet to occur, and the PD still outputs close to 0 V. 2 But when V tune,ILO reaches about 0.3V, the ILO locks, see Fig. 5(b). This drastically increases the amplitude of the signal at clk that is injected into ILFT, which in turn means that a significant portion of this signal leaks through the ILFT to the PD, creating a DC output when mixed with clk,I and clk,Q , see the bottom part of Fig. 5(b). Thus, when v PD,I -v PD,Q starts to diverge significantly from 0V, the ILO is locked. 3. Lastly, the ILFT needs to be tuned. For optimum performance, 3 clk should be exactly equal to ILFT , since this gives the highest amplitude. Combining the low-pass filtered results from (3) and (4) with the Pythagorean identity gives: That is, PD,Q will be proportional to V FT , which, as noted above, reaches its peak value when is maximized, see Fig. 5(c). Since the phase can be tuned continuously with a varactor, the phase resolution will only be limited by the resolution of the ILO DAC. The phase detector ADCs will similarly cause a phase error due to the quantization noise. This leads to the question, what are the ADC and DAC resolutions required to make the quantization effects negligible, and what power consumption can be expected for the converters? We start by analyzing the DAC controlling the ILO. Simulations show that the steepest slope for phase vs V tune,ILO is about 1.5 • / mV. Since the phase gets tripled in the ILFT, the resulting slope will be 4.5 • /mV. Assuming a full-range voltage of 1V, this corresponds to a minimum DAC resolution of 12 bits to achieve about 1 • resolution. Since only a DC voltage is required to control the phase, the sampling rate of the DAC only has to be high enough for the beam to track a moving target, which requires phase update intervals on the order of milliseconds [22]. In [23], a 12-bit DAC with a 112 kS/s sample rate is presented. The DAC consumes 50.8 μW , while occupying an area of only 270 μm 2 . Thus, the ILO DAC can be implemented with negligible impact on the total power consumption and area of the LO generation circuit, while achieving a worst-case phase resolution of about 1 • and more than sufficient phase update rate. The ILFT DAC needs a resolution of about 1 mV, or 10 bits, to find the optimal tuning voltage of the tripler. The speed of the ILFT DAC is more relaxed than for the ILO DAC, since this voltage will not change with phase setting, and thus only has to be fast enough to counteract any drift in free-running oscillation frequency. Thus, the power and area consumption will be even smaller than for the ILO DAC.
The ADCs will cause phase errors due to quantization noise, which can easily be investigated using MATLAB or similar software, by simply quantizing the ideal V PD,I and V PD,Q signals for phases varying from 0 • to 360 • . Then, by comparing arctan(V PD,Q,quant ∕V PD,Q,quant ) to the initial phase, the phase error for various ADC resolutions can be found. Figure 6 shows the rms phase error versus number of bits of the ADC. As seen, for resolutions above 9 bits, the rms error will be less than 0.1 • , assuming full-swing input to the ADC. The sampling rate should be on the same order as the DAC. As an example of an ADC that can be used, in [24], a 10 MS/s calibration-free ADC with 11 effective number of bits (ENOB) is presented, consuming 0.41 mW and occupying 0.04mm 2 . Reducing the rate to kS/s should provide significant power reductions, making the impact of the ADCs on the total power consumption negligible.
Another source of phase error is mismatch in the PPF and between the I-and Q-part of the phase detector. To investigate this, a 500-samples Monte Carlo simulation was  Figure 7 shows the resulting histogram of the phase offset. As seen, the rms phase error is about 0.45 • . This error is uncorrelated with the phase error due to the ADCs, and will dominate the total random phase error, which will be about 0.5 • . There is, however, also a deterministic phase error due to unwanted mixing that will dominate the total phase error of the circuit, as will be seen in Sect. 4.

Circuit implementation
The LO phase shifter is implemented in STMicroelectronics' 28 nm FD-SOI CMOS process and uses a 1 V supply. To properly load the phase shifter and test its capabilities, a 24-30 GHz sliding-IF receiver is also implemented. Figure 8 shows the block schematic of the full circuit.

LO phase shifter
The implemented LO phase shifter consists of an ILO, a peak detector, an ILFT, a polyphase filter followed by digital logic to generate 25% duty cycle pulses, a PD, and a buffer, as seen in Fig. 8. The ILO is implemented as a regular differential cross-coupled LC oscillator, but with two additional injection transistors, see Fig. 9. Using five unary-weighted switched capacitor cells and a varactor, a tuning range between 5.6 and 8.4 GHz is achieved. The varactor is sized so that its tuning capacitance is considerably larger than the capacitance step of one switched capacitor cell. This is done so that a ± 60 • phase shift can be achieved using only the varactor, no matter the frequency. If the gates of the current sources M 3 and M 6 would have been connected to fixed DC voltages, the ILO output amplitude would vary significantly with phase setting, since a large phase shift would correspond to the oscillator operating far from its resonance frequency. This would in turn cause the ILFT amplitude and therefore also the RX gain and noise figure to vary with phase setting. To counteract this, M 3 and M 6 are instead connected to the output of the peak detector, V PEAK , which regulates the ILO amplitude to be almost constant. While it would be enough to only connect M 3 to the peak detector to obtain constant amplitude, the I inj ∕I osc ratio, and thereby also  Fig. 9 Schematic of the injection-locked oscillator f L , is kept approximately constant by also connecting M 6 to V PEAK . Otherwise, there would be a risk of losing injectionlock for certain phase settings. The peak detector is based on the design presented in [25] and is shown in Fig. 10. Figure 11 shows the output amplitude of the ILFT versus phase shift at 21 GHz, with and without the peak detector feedback. The variation in amplitude is reduced from 5.7% when not using the peak detector to 0.7% when using the peak detector. The ILO and peak detector consume between 1.2 and 1.7 mW combined, depending on frequency and phase setting.
The ILFT uses the same architecture as the ILO. However, to maximize the third harmonic current, the injection transistors are biased in weak inversion. Additionally, since the ILFT free-running oscillation frequency is not changed once lock is achieved, there is no need for peak detection feedback, so the current-source transistors are biased with fixed DC gate voltages. A tuning range between 17.4 and 23 GHz is achieved with three unary-weighted switched capacitor cells and a varactor. An inverter-based buffer is also implemented between the ILFT and the receiver. This is done to reduce the capacitive load of the ILFT, and to reduce frequency pulling of the ILFT when a large interferer is present in the RX. The ILFT and buffer combined consume 6-8 mW, depending on frequency.
The phase detector schematic was shown in Fig. 4. The transistors should be as large as possible to minimize the effect of mismatch and flicker noise, but the size must be limited not to load the ILFT too much and to prevent roll-of before 22.5 GHz, the highest output frequency of the ILFT. The total power consumption of both mixers is 1.2 mW.
Lastly, to generate the quadrature signals, a two-stage PPF is used, see Fig. 12. The resistors used are polysilicon resistors, while the capacitors are metal-oxide-metal capacitors. The PPF is followed by a simple digital circuit to generate pulse waves with 25% duty cycle, which consumes 7 mW.

Receiver
The full schematic of the receiver is shown in Fig. 13. It comprises an on-chip balun, an LNA, an active mixer and quadrature passive mixers. As can be seen, the LNA is a typical inductively source-degenerated cascode LNA, with the load tuned to 28 GHz. The active mixer is double-balanced, implemented as a Gilbert cell, and is driven by the ILFT, with an LC load tuned to 7 GHz. The image frequency is situated around 14 GHz, far from the desired signal and will be heavily filtered by the antenna, input matching, and LNA output. Thus, no explicit image filter is required. The LNA and Gilbert mixer combined consumes 10 mW.
The two quadrature mixers are implemented with a double-balanced architecture and are driven by the 25% dutycycle quadrature clock, providing isolation between the the mixer I and Q outputs, providing quadrature baseband signals.

Post-layout simulation results
The following simulations were performed after post-layout parasitic extractions using Cadence QRC. Inductors, transformers and longer interconnects were modeled using Keysight Momentum. The full layout is shown in Fig. 14, and measures 1080 μ m × 1080 μ m, including pads. The entire IC comsumes 27-29 mW, depending on phase setting and frequency.
To simulate the phase noise of the LO chain, a phase noise profile based on the 7GHz integer-N PLL in [26] was added to the external clock signal. Figure 15 plots the phase noise of the external clock, and the outputs of the ILO and ILFT. Also plotted are the phase noise profiles for ILO and ILFT when free-running. As can be seen, the ILO perfectly follows the external clock for low offset frequencies, and the phase noise of the ILFT is about 9.5 dB above the clock phase noise, matching the theory. However, for large offset frequencies, the ILO and ILFT phase noise profiles start to deviate from the external clock phase noise and instead start to follow the free-running phase noise. This occurs when the frequency offset exceeds the onesided locking-range. The phase noise at these frequency offsets will then be uncorrelated. However, this should not cause any significant beam distortion due to the low phase noise levels at these offsets. In [18], it is noted that the phase noise performance of an injection-locked oscillator will be worse when operating far from the free-running oscillation frequency due to the change in tank impedance, which is the case when applying a phase shift of ± 60 • . However, the simulated phase noise varies by less than 1 dB across all phase settings. Figure 16(a)-(c) show the phase detector measured phase versus the actual BB output phase, at RF input frequencies 24 GHz, 26.8 GHz and 30 GHz, respectively. Also shown is the phase error, i.e. the difference between the detected and the actual phase. For all three frequencies, the phase shifter achieves more than 360 • of phase shift range. The rms phase  respectively. This is not random phase error, but rather deterministic due to unwanted mixing in the phase detector. Part of the ILO signal injected into the ILFT will leak through to the PD and mix with the PPF output. Since these signals are at the same frequency, this will generate a DC component, distorting the desired DC component. Figure 17(a)-(c) show the receiver gain versus phase shift for the same simulations. As can be seen in the plots, the receiver gain varies between 16.3 and 18.3 dB versus input frequency, but the maximum gain variation versus phase shift for a given frequency is only 0.11 dB, proving the usefulness of the peak detector feedback. The noise figure at these frequencies are 5.6 dB, 4.8 dB, and 6.1 dB, respectively, with negligible variation with phase setting. The performance of the circuit is summarized in Table 1, where it is also compared with other receivers with either automatic phase calibration, built-in self test or limited phase calibration. While this work should not be directly compared to the other works presented in Table 1, given that this work is only simulated, the table still gives an indication of the proposed LO phase shifter's ability to drive a mmwave receiver with competitve performance.

Conclusion
An LO phase shifter for 28-GHz 5G transceivers is presented. It features a frequency tripler and a phase detector based on third harmonic mixing to support automatic phase tuning. An algorithm to automatically detect injection-locking using the output of the phase detector is also presented.

3
The performance of the phase shifter was tested using a 24-30 GHz sliding-IF receiver. The receiver achieves a gain of 16.3-18.3 dB and a noise figure between 4.8 and 6.1 dB, proving the driving capabilities of the LO circuit. Owing to a peak detector-based feedback in the phase shifter, the gain variation of the receiver is only 0.11 dB across all phase settings. The rms difference between the output phase of the receiver and the phase detector, i.e. the phase error, is 2.4 • .