A Method of Joining Piecewise Functions to Produce Continuous Functions of Difficult Data

— This article describes a method of modelling data that involves splitting the curve into two (or more) and creating separate piecewise functions for each part; these functions are then concatenated via a linking function to create one overall continuous function that better describes the original data than is otherwise achievable. The linking function is able to do this by separating the original two (or more) subfunctions so that they are each active in only the relevant portion of the overall curve without the use of dummy variables. The final result is a continuous function in which it is straightforward to smooth the transition at the knot between the piecewise subfunctions. In addition, the piecewise subfunctions do not need to align at the knot since the degree of smoothing is readily controlled. All types of functions may be concatenated so that the method is flexible and relatively simple to apply.


INTRODUCTION
HE purpose of data modelling is to find a function that best describes the data. Specialist packages such as Minitab have many functions available for curve-fitting, as indeed do several websites and other packages. Nevertheless, there are occasions when a good fit is difficult to find. Spline regression or other piecewise functions are then often employed to obtain a sensible model, but these are not continuous functions and often employ dummy variables to 'join' the separate functions together. For instance, Huang et al. [1] split a curve in two or three portions and derived functions for each portion. A computer program was used to analyse the curve and suggest the inflection points for each portion. Each of the portions was treated separately -they were not combined into a single function as is described in this article.
Other authors (e.g. Singh [2]; Chandra [3]) have described the technique of spline regressions where the curve is divided into several sections which are joined together at a "knot". These are computer techniques in which the original curve is divided into sections via a conditional statement, and they are combined into a single function with the use of dummy variables. These two blogs describe the advantages of splines, one being that they produce good fits without having to increase the degree of the polynomials.
In another recent article Caglar et al. [4] describe some naturally derived sigmoid and double-sigmoid curves that are commonly observed in biology. They discuss the difficulty of finding adequate models for double-sigmoid curves. To solve this, one approach is to obtain functions for each sigmoid curve and multiply these together to derive a third function. Computer optimisation is then used to find the best fit, though sometimes convergence is not achieved. In the example given below in this article the functions for each sigmoid curve are concatenated rather than multiplied together, and this produces a single function which fits the original double sigmoid curve.
Another blog (Gelman [5]) describes a continuous hinge function in which two straight lines are joined together, including the option of smoothing the join. The method has some similarities to that described in this article in that it uses a sigmoid derivative to join the functions. However, the method described in the blog only applies to straight lines that converge at the hinge, whereas the method described in this article, in principle, applies to a wide range of functions which do not necessarily need to join at the hinge, and therefore is more widely applicable.

METHOD
Where a set of data is difficult to model the dataset is split in two, and a function derived for each portion (f1 and f2) using standard regression methods. The two functions are concatenated via a link function (f3): where (a) is the change-over point between the two functions (the knot), and (b) is the slope at that point.
The function f3 is itself a sigmoid function, mostly returning a value of 0 or 1. (Other sigmoid functions may also be used). There are only two variables in the link function: (a) is the position of the knot, and (b) is the slope at that point. At the knot (point (a)) in (1) a value is 0.5 is returned. Here both functions (f1 and f2) contribute equally to the calculation. At all other values of (x) either f1 or f2 predominates. In the examples given below (b) is mostly set high (≥ 300) resulting in a very steep slope. Thus, the transition from f1 to f2 is very sudden. It may be that a more gradual transition is required, in which case a lower value of (b) may be chosen.
The final model, f4, is put together with the use of f1, f2, and f3 as follows: 3 WORKED EXAMPLES 3.1 Example taken from Huang et al. [1] Huang et al. [1] provide an example of a curve which is divided into two subregions (Fig. 1). The first portion of the curve is fitted to the function: whilst the second portion is linear with the equation: The linking function for this example is: Therefore, in the new method described here the curve in Fig. 1 can be described by the single function f8: 3.2 Example of a double sigmoid plot taken from Caglar et al. [4] Caglar et al. [4] provide a diagram of a double sigmoid curve as an example without giving exact details. Therefore, the following sigmoid equations have been generated as representing the type of double-sigmoid curve discussed in that paper, as shown in Fig. 2. The two functions are linked at position x = 9, therefore the linking function is: ( ) = 1 − (1/(1 + ( /9)^300)) (9) and the curve shown in Fig. 2 can be fully described by the single function f12:

Example with three subregions
In their paper Huang et al. [1] separated some curves into three subregions. There are no details given, so in order to demonstrate three subregions the curve in Fig. 1 has been chosen with the addition of an exponential term added beyond the point x = 8. This is shown in Fig. 3.

Fig. 3. Example of a curve with three subregions
The worked example 1 above has demonstrated that the first two functions (x = 1 to 8) can be combined into one function (f8). The last part of the curve (the third subregion) with x = 8 to 13 is given in (11).
( ) = 42.07 . (11) The linking function for this subregion is: and the three subregions shown in Fig. 3 can expressed by the single function f15:

Example demonstrating smoothing at the knot
In the examples given above the curves in the different subregions have given the same value at the knot, and therefore no smoothing of the curve was required (though could be applied if so desired). However, it may be the case that the lines either side of the knot do not meet exactly. An example is shown in Fig. 4. where b is the slope at the knot (x = 12) and the overall curve can be expressed by the function f19: In order to demonstrate how the two lines may be joined at the knot, different values for (b) in (16) have been used which affects the degree of smoothing. With the slope (b) equal to 1000 a very close match to the original curve can be found. This is shown in Fig. 5, which also shows the curves achieved with value of (b) equal to 100 and 10 resulting in variation in smoothing of the curve.   5 shows that using the method outlined in this article it is possible to find a single function to model complex data without the aid of dummy variables whilst easily changing the degree of smoothing at the knot where different subregions are joined.

SUMMARY
Where complex curves have been difficult to model with the use of a single function, one solution has been to subdivide the curve into subregions and develop a function for each subregion. This avoids the need of using ever higher degrees of polynomials to explain complex curves but has the disadvantage of having separate equations for each subregion requiring the use of dummy variables to model the entire curve.
The method described in this article provides a simple procedure for joining together functions of subregions to form one continuous function. The subfunctions are joined via a linking function which ensures that each subfunction is active only in the appropriate portion of the overall curve. At the junction of each subregion (the knot) the linking function has a value of 0.5 and thus the subfunctions have equal weight at that point. The degree of smoothing can be controlled by altering the slope of the join function (i.e. altering (b) in (1)). This results in a smoothing of the curve at the junction, whether or not the two subfunctions are properly aligned at that point.
The approach described in this article may find wider application where a single function is required to model accurately a complex curve which is difficult to model using standard methods.