Interpretable physics-based models compared with symbolic regression for hopping transport in organic eld-effect transistors

The past few decades have seen an uptick in the scope and range of device applications of organic semiconductors, such as organic field-effect transistors, organic photovoltaics and light-emitting diodes. Several researchers have studied electrical transport in these materials and proposed physical models to describe charge transport with different material parameters, with most disordered semiconductors exhibiting hopping transport. However, there exists a lack of a consensus among the different models to describe hopping transport accurately and uniformly. In this work, we first evaluate the efficacy of using a purely data-driven approach, i.e., symbolic regression, in unravelling the relationship between the measured field-effect mobility and the controllable inputs of temperature and gate voltage. While the regressor is able to capture the scaled mobility well with mean absolute error (MAE) ~ O (10 -2 ), better than the traditionally used hopping transport model, it is unable to derive physically interpretable input-output relationships. We then examine a physics-inspired renormalization approach to describe the scaled mobility with respect to a scale-invariant reference temperature. We observe that the renormalization approach offers more generality and interpretability with a MAE of the ~ O (10 -1 ), still better than the traditionally used hopping model, but less accurate as compared to the symbolic regression approach. Our work shows that physics-based approaches are powerful compared to purely data-driven modelling, providing an intuitive understanding of data with extrapolative ability.


Introduction
Several classes of organic semiconductors-conjugated polymers and small molecules-have been discovered and explored in a wide range of electronic applications, such as flexible displays 1 , self-healing materials for electronic skin 2 , energy harvesting 3,4 , chemical and biological sensors 5 . The electrical transport of these semiconductors have been studied using time-of-flight (TOF) or charge extraction by linearly increasing voltage (CELIV) measurements 6 and most commonly by fabricating organic field-effect transistors (OFETs) employed in Hall measurements [7][8][9] . The electrical transport in these materials is complex and no specific transport mechanism has been established due to their weak interactions between molecules and strong dependence on processing conditions and device morphologies. Such weak van der Waals interactions lead to different degrees of disorder ranging from amorphous to crystalline materials, which affects the electrical transport. Small molecules and single crystals in which π-π interactions dominate possess less degree of disorder and are expected to follow band-like transport. 10 Alternatively, polymers that lack order in packing are dominated by thermal disorder, and are thereby expected to follow hopping transport 11 . Thus, thermally activated Arrhenius-type hopping usually describes the electrical transport in conjugated polymers with the assumption that a fraction of the total number of available energy states is occupied. While physical models like the Miller-Abrahams formalism 12 , explaining hopping transport in material systems are complex, involving many parameters that are defined empirically, it provides an intuitive understanding of the physics of transport and hence can be predictive in nature. In essence, this boils down to the dependence of charge hopping upon the thermodynamic temperature (T) and the modulation of additional carriers in the polymeric channel via the applied gate voltage (Vg) in the OFET configuration.
Similar efforts at understanding physical principles in other materials and physical systems have been employed in recent times with simulated physics datasets. "AI Feynman" (AI stands for Artificial Intelligence) is a recursive symbolic regression algorithm, which connects neural networks with physics-inspired methods such as dimensionality analysis, generalized symmetries, multiplicative separability, and inversion to discover physical equations that explain the data 13,14 . The use of symbolic neural networks to learn the dynamics of data with a Partial Differential Equation (PDE-NET) fixes the functional form but is able to capture the underlying parameters 15 . Physics-informed Neural Networks (PINNs) respect the laws of physics while solving both differentiable (data-efficient approximators) and discrete datasets and have been used to solve problems in fluids 16 , quantum mechanics and more 17 . In this work, we investigate the efficacy of a powerful mathematical approach, namely symbolic regression (SR), in recapturing underlying physics after fitting transport data of three different OFETs based on organic molecules of p-type pentacene, and n-type buckminsterfullerene (C60) as well as conjugated polymer poly(3-hexylthiophene-2,5-diyl) (P3HT). While the regressor can fit the ground truth data after carefully tuning the hyperparameters, physical interpretability is not easy and guaranteed in all cases. The regressor does succeed in identifying the physically meaningful and essential terms to describe the transport. However, it is unable to decouple the effect of the activation energy modulated by varying gate voltage from the magnitude of the thermal energy window that is directly proportional to temperature. We see that the regressor treats the whole dataset as one entity, as opposed to a system of five curves following the same bi-parametric equation, where the data in each curve is generated by only varying the value of one of the parameters, while keeping the other parameter constant. The resulting equation fits the entire dataset at once instead of providing a solution that is repeatable curve-by-curve by varying the value of the second variable/parameter (gate voltage in this case). To overcome this missing physical insight, we compare this with a renormalization approach to describe the transport by defining dimensionless quantities from the temperature dependent mobility data.
Derivation of dimensionless quantities captures both the effects of temperature as well as gate voltage embedded in a single parameter, rather than in multiple parameters as seen in the existing models 18 , and provides us with a one-shot approach in describing and characterising hopping transport in organic semiconductors. We demonstrate that this approach is powerful as it also helps to differentiate between processing conditions for the same molecules by comparing the extent of activation energy that is accessed over the range of the applied gate voltages. Figure 1 depicts various approaches for extracting input-output relationships involved in hopping transport phenomena.
As a benchmark, we use the model derived by Bassler et al. for the field-effect mobility (μe) using the effective medium approximation and the concept of effective transport energy 11,19 , while considering Gaussian density of states (DOS) and Miller-Abrahams jump rates for thermally activated Arrhenius-type hopping. Here, according to the so-called Bassler model, As presented in Figure 2, σ indicates the broadening of the Gaussian DOS, a/b is the ratio of inter-site distance and localization radius. Further, n/N is the ratio of occupied and total number of transport states. EMN is the Meyer-Neldel energy; T0 is the isokinetic temperature at which all curves of varying Ea converge; n is the number of occupied transport states, which can be tuned by the applied gate voltage (Vg) since it is directly proportional to Vg. The total number of transport states, N, is a constant of the order 10 23 cm -3 , which relates to the lattice constant (a) as = /) for a cubic lattice. In the equation above, the hopping process is affected by both spatial (a, b) and energetic terms ( # ). The spatial terms indicate the extent of hopping (given by a) and the localization radius between two sites (given by b), while the energetic terms specify the activation required for the phonon-assisted excitation, spatial tunnelling, and phonon emission -all together considered as one hopping event -to take place from one site to another. The tunnelling strongly depends on the extent of wave function overlap between neighbouring molecular orbitals. This results in an increase in mobility when the wave functions overlap is higher, which thereby decreases the activation energy for the hopping event 20,21 . The activation energy also relates to the structure and morphology of the semiconducting layer through molecular structure and processing conditions.

Figure 2: The hopping picture in a disordered polymeric semiconductor represented in terms of both energy and distance coordinates: the electronic Density of States (DOS) is characterized by the value of its energy broadening (σ). On the right, an illustrative single hopping event between two sites i and j, is shown, where sites i and j are characterized by electronic wavefunctions ψi and ψj respectively,
and energies εi and εj respectively. Here, "a" represents the extent of wavefunction localization and "b" represents the distance between sites i and j.

Data description
The temperature-dependent mobility data for pentacene, P3HT, C60 OFETs is obtained from literature [22][23][24] . While pentacene and P3HT were fabricated through solution processing methods on Si/SiO2 substrates with Au contacts, C60 film was fabricated through standard flush evaporation or hot-wall epitaxy 25 on ITO/divinyltetramethyldisiloxanebis(benzocyclobutane) (BCB) substrates with LiF/Al top contacts. The pentacene film can be regarded as polycrystalline, while the P3HT film has ordered regions that alternate with disordered regions. The C60 film is of greater crystallinity compared to both P3HT and pentacene films. The field-effect mobility (output) data recorded as a function of the change in two variables namely temperature and gate voltage (inputs), is shown in Figure 3a for pentacene OFET and Figure S1 for P3HT and C60 OFETs. Typical of materials science dataset sizes, each OFET data is comprised of ~100 points, stacking input-output information from five temperature-dependent curves at five different gate voltages that were used to record the data (as seen in Figure 3a). The activation energy can be regulated by varying the applied gate voltage, i.e., a larger gate voltage signifies the need for smaller activation energy.

Symbolic Regression: an approach for data-driven learning of OFET behaviour
First, we introduce Symbolic Regression (SR) and discuss the fundamentals of genetic programming and how it is employed specifically in this work to extract the underlying inputoutput relationship in the form of comprehensive mathematical expressions 26,27 . Although other machine learning methods are useful in extracting input-output relations, they employ predetermined functional forms to fit the data (for instance, polynomial regression). On the contrary, SR attempts to solve such problems without requiring any prior knowledge of the functional forms, by utilizing a finite set of operators and functions (like arithmetic, trigonometric, exponential). Such a symbolic input-output mapping is beneficial as it offers interpretability in applied physics and material science, which is the motivation behind using this technique over other machine learning models [27][28][29] .
For characterizing transport phenomena in OFETs, several models have been proposed to relate the field-effect mobility with inputs like temperature and back-gate voltage. However, these models differ in both the functional forms as well as the nature and number of the transport parameters used. In the present work, SR is applied on each OFET dataset where the input variables are defined as " = # and % = . and the output variable as ,. The activation energy Ea for each Vg curve is obtained from the slope of the respective curve by expressing the logarithmic mobility as a function of the reciprocal temperature. The thermal energy associated at temperature T is calculated as a product of the temperature T and the Boltzmann constant, kB. This choice of the two input variables, " = # and % = . is so that both the variables are in the same units (eV) and hence guide the SR outcome expressions into a meaningful and interpretable form, since the range of values for both Ea and kBT were similar. This is of special significance because a choice of Ea >> kBT would mean that the hopping energy is too large, and transport does not take place. On the other hand, Ea << kBT would result in a transport mechanism that is different from hopping, akin to metallic conduction, which is unlikely in organic thin films. Further, the usage of dimensionless input variables obtained by normalising the existing X0 and X1 parameters with the thermal energy at room temperature, i.e., . (300 ), was also explored and discarded as it did not help in accuracy or interpretability.
The output variable Y, denoting revised mobility, is a dimensionless quantity. The Bassler model expression can be written in log10 scale as, where, is the dimensionless revised mobility.
The framework applied in this paper seeks to find out a mapping = ( " , % ). We use the gplearn package in Python to perform SR 26 .
GP minimizes the Mean Absolute Error (MAE) between the experimental data and the predictions to produce symbolic expressions. In addition, upon evaluating the best values of SR parameters, specifically, the parsimony coefficient and the tournament size (details can be found in Supplementary Information) are determined using grid search, and the resulting trees render mathematical expressions for the three different OFETs as shown in Table 1. After inspecting the results from SR, we find that the outcome expressions fit the ground truth data with an MAE of ~0.02-0.03 with a fit better than that offered by the Bassler model; a comparison of the results between the SR outcome and the Bassler Model, is shown in Figure   3b.  accuracy with sparse data, which can be mitigated by using more densely populated data 13 .

Molecule
In order to extract underlying physical equations, we next consider a simple yet robust physicsdriven approach that makes use of fewer parameters to characterize the underlying transport phenomena in OFETs. In solid-state physics, the concept of renormalization theory is utilized to renormalize the inputs into dimensionless quantities in such a way that certain critical points are identified as scale-invariant, which captures the physics behind the observed data. This has been employed in the study of second-order magnetic phase transitions, for instance. 31,32 We consider the same OFET data as used for the SR and the Bassler Model. We notice that the carrier mobility, the average velocity of the charge carrier to the electric field, of these disordered systems strongly depends on the carrier concentration, i.e., on applied gate bias

Figure 4: (a) Scaled mobility data at different gate voltages (coloured circles) for pentacene OFET and the corresponding exponential curve that fits the scaled data (black dashed line) (data extracted from Ref < 19 >); (b) juxtaposition of scaled mobility data at different gate voltages for five different molecules (coloured triangles) resulting in all of them following the same exponential function (shown as black dashed line) as observed in (a).
Furthermore, the knowledge of the Tr value enables us to calculate the value of the activation energy for that temperature-dependent mobility curve. Figure 5a shows the obtained/calculated range of activation energies, which are accessible to five different molecules by varying the gate voltage. In general, at low gating (low energy state filling) charge carriers require more energy for the hopping to occur, and this activation energy decreases as the gating voltage is increased (more negative values). The strength of this approach, in comparison to SR and the Bassler Model described earlier, is that a general rule to fix Tr as a function of Vg, and the values of Ea, are sufficient to capture the trends for five different organic systems. This, therefore, can be regarded as a generic fitting approach wherein the only descriptor required to convert the µ-T data to the µ'-θ' data is the value of Tr for each Vg curve.

Discussion
The activation energy Ea can be expressed in terms of Tp and Tr as # = . ln(2)  This approach is not only useful for studying the effect of gate voltage on the activation energy, but also that of the processing conditions such as the choice of solvent, polymer weight, and physical growth temperature, to name a few. This approach could also be extended to other techniques of controlling the extent of energy state filling such as chemical (or vacuum dedoping).
Another salient feature is that the general fit shown in Figure 4b can also act as a screening A summary of the different data-fitting approaches is available in Table 2.

Conclusions
In this work, we use gate voltage and temperature-dependent OFET data from various small molecules/polymers and show that symbolic regression outperforms physics-inspired fitting techniques such as the Bassler model and the renormalization approach in fitting the OFET data, by one order of magnitude in the corresponding MAE values, although not offering generality achieved through physics-inspired fitting approaches. Through simple graphical SR fitting Accuracy and interpretability, but no generality. Considers nonlinear terms.

Bassler Model
Physics-based fitting, but material-and range-specific. Not accurate beyond the bounds selected above, so not general.

Fitting
General and interpretable approach but requires rescaling depending on the physics at play. Reconstruction is less accurate compared to Bassler and SR.
value extraction, we demonstrate that the hopping mechanism in OFETs can be described by a general equation that relates normalized mobility with a scaled, dimensionless temperature descriptor. It is challenging to come up with a single model that accurately captures mobility characteristics in both low and high temperature regions; however, the SR outcome and physics-inspired model prove to be better than the baseline in this regard. Although powerful mathematical approaches like symbolic regression may offer reasonably good performance in terms of accuracy of fits, the physical interpretability is often incomplete and not consistent with the established know-how. We also proffer that a similar methodology can be extended to other transport mechanisms wherein the mobility is normalized with respect to the highest observed value and the temperature is then reduced to a dimensionless quantity that encompasses the dependency of the mobility on both the temperature and another experimental variable such as the gate effect or other processing conditions.