Two useful Python tools and their application in Physics


 Python has become a popular programming language among physicists and students/researchers of other fields as well. However, Python still needs improvement to provide ease of use in physical problems. One such case is Python’s list array which is more powerful than conventional arrays in other languages like C, C++, Fortran, or Java, but, it becomes tedious and complicated to construct a list array of larger dimensions in Python in certain physical problems. Another such case may be — reading column-wise data from a data-file consisting of multiple columns - which may be simplified with the introduction of a Python class in a package. This article discusses the difference and limitations of array variables in various languages and introduces a new Python tool to make the construction of N-dimensional list arrays easier. This also introduces a way to handle data-file in Python in a simpler way and with some analytical features. It is also shown – how these two tools may improve our experience in dealing with Python in physics problems.


Introduction
The use of computer programs to deal with problems of physics and other branches of science is inevitable. Python has become popular among scientists for its simplicity and flexibility to enhance functionality by adding open-source packages in the program. Python is new as compared to other languages and still evolving.
However, there are various tools contributed by Python developers in the PyPi repository that can be added to a python program to enhance its performance. There are certain applications, where user experience may be improved by introducing new classes or functions in the form of open-source packages. For example, N-dimensional arrays with variable parameters are frequently required in physics applications to solve problems programmatically. However, this is a bit complicated in Python in some cases. Those who have proficiency in languages like Fortran, C, C++, Java, VB.NET, etc. may know how to create arrays of any number of dimensions of a given variable type (i.e. either float, int, or str) in a more or less similar manner. An array in Python is in general defined by a List, but while construing a list in Python, one is required to give some initial values to all its elements which is not mandatory in the above languages. If the number of dimensions is very large it may be problematic to give values to individual elements while declaring a list variable in Python. However, one may use the Numpy package (Harris et al. 2020, Oliphant et al. 2007, Cai et al. 2005, Sympy package (Meurer et al. 2017), Awkward arrays in Python (Pivarski et al. 2020), or array class of Python itself, which can create arrays similar to other languages, but none of these give the output arrays in the form of inbuilt 'list' class of Python which may be advantageous in many cases (elaborated in next section). To achieve this goal and to make the 'list' type arrays of multiple dimensions in a way similar to arrays in other languages, the Python tool dimpy has been developed. Its installation, use, and application in physics with examples have been discussed in this article.
Apart from the above, another useful tool has been discussed in this paper that may be advantageous to the students of physics/researchers who work with tabulated data in a text file. It is often required to read data from a data-file in which data are presented in a tabulated format consisting of a set of columns. The columns are usually separated by a delimiter which may be a comma, tab, blank space, or any other character. The users of the Fortran language can very easily read such data in a very simple way because it has been designed to do so. Although reading a file is simpler in Python as compared to other languages, but there are still some issues that must be addressed for ease of application and to make it popular among scientists and researchers. In this aim, the Python tool tablefile was developed which reads data from a data-file in a more convenient manner. In addition to that, it performs some elementary analytical tasks such as -averaging, summation, standard deviation, the maximum and minimum value from the data columns to assist data analysts in their work. This paper elaborates the advantage, working, and the use of the above two opensource Python tools with some applications to show how these tools in Python code may be advantageous to solve problems of physics.

Scope of improvement
(i) Arrays in Python: The ways to create a real-valued array variable in different languages are summarized in Table-1. Other types (i.e. integer or string) of arrays can be created similarly. In Python, there are several ways to generate an array. Most of the methods need that each of the elements must be given a value by the programmer when it is created. The closest approach in Python which is analogous to the other languages as shown in Table- Here 'dtype' maybe float, int, or str depending on the requirement. It can be seen that the type of array A is not 'list' but 'numpy.ndarray'. Therefore, the operations that are generally acceptable in a list variable of Python are not applicable in this case. For example, if we have two list variables x and y, one consisting of 3 elements and the other of 5 elements, the operation x+y gives a new list consisting of 3+5=8 elements which include all the members of x and y. This operation cannot be conducted on 'numpy.ndarray' variables. In this case, the summation operator (+) is applicable between only arrays of similar dimensions, and it performs arithmetic addition between the numbers corresponding to similar index values. If 'dtype' is str, then this operation is not valid. Similarly, if during runtime of the program it is required to add an element to the array, it will be not possible in 'numpy.ndarray' type variable, but can be done with the append() function if the array type is 'list'. Another advantage of list arrays over 'numpy.ndarray' is that they can include elements of mixed data types. That is, an array consisting of some string elements, some integers, and some float variables is allowed in a list class variable, which is not the case in 'numpy.ndarray'.
While 'numpy.ndarray' may be advantageous in many cases, list type arrays may be in many other applications. However, list type variables must be given some initial values to the individual elements when they are created. If the array is of many dimensions, it becomes a tiresome task to give each initial input individually and to construct many complicated nested lists.
Hence, to create a list array similar to most of the programming languages and numpy.empty() function in Python, the dimpy package was developed. The use and working are described in the subsequent section.
(ii) Reading data from a file: A basic Fortran code to read data from a data-file maybe The above code reads the data fields from some file 'FILE.TXT' in which 1st and 3rd columns are read as float whereas 2nd column is read as a string of maximum length 3. This code is simple, but leads to runtime error if the data format in the file does not match with what is being read (like that is shown in Fig.1).

Fig1
. Screenshot of the file 'data.txt' shown in Notepad. The first line is a header, and below this values are separated by tabs. Fig.1 shows an input data-file where the fields are separated by \t (i.e. tab). In between numeric data, there are strings too. In Python, it's easy to read such datafiles as a list. A Python code to read data from this file in Python maybe >>> f1=open("c:/..../data.txt", "+r") It may be noticed that 'dat' is a 'list' in which each line is a member in the form of a string. Data analysts would like to split the fields and store them as float values and in a more sequential format so that lines and column values can be easily accessed. That would require writing a few lines of code more in Python to finally get the result. To simplify this task, tablefile package was developed which has been described in the next section. The default value assigned to the elements is 0 which can be changed to anything (whether an integer, float or string) by the use of function dfv(). It should be noted that the array type of A is the 'list' and therefore it allows mixed variable types within it.

Installation and Use
If one wishes to convert the type of A from the 'list' to a new array C of type 'numpy.ndarray', one may use C=npary(A) function, provided the numpy package is already installed in the system.
The point to note here is that as one of the elements in A is a string, all the elements will be converted to string type in C. Further, as the intrinsic data type of C is now 'str', one cannot now assign a 'float' or 'int' value to a particular element in C as we have done in case of A. In this case, the float or int value will be converted to 'str' and then stored in C But remember that, the converse is not true and will through ValueError if tried.
That means if C is 'int' or 'float' type then one cannot assign an 'str' data to a given element, because 'str' to 'int' or 'float' conversion is not allowed.
(ii) The tablefile package: To install from cmd or terminal, enter $ python3 -m pip install tablefile An example code to read column-wise data from 'data.txt' as shown in Fig.1  If the data-file separator is one or more blank-space, then one may not specify it at the 2 nd argument of file() function. That is, in this case, we may write >>> f1=file("C:/.../data.txt") # If column separator is a blank-space It can be seen that the output of 'lines' is a list that is already divided into lines and columns. The fields that cannot be converted to float remain string and lines starting with '#' have been considered as a comment line and therefore skipped.
If we want to do the statitstical calculations above on a particular list -whether it is a line or column or any other list containing some numbers, we can do that using tablefile functions as follows Note that in the all above cases the string elements in List1 were neglected during calculation and they did not through any error. Python's inbuilt functions sum(), max(), min() and NumPy's functions such as numpy.std() and numpy.average() can do the same task, but all these will through errors due to the presence of string elements in the list.

Some Physical Examples
In this section, some physical examples are shown which demonstrates how we can use the above packages to solve various problems of physics In principle, a matrix with variable elements can be defined with the help of the Array function of the Sympy package, but it is not practically useful in cases like this one. Let us see why -suppose we have a function of five variables, so we would require a Hessian matrix of 5×5 =25 elements.
Our logical approach would be to first define an array of 5×5 =25 elements with dummy values and then assign each of them corresponding double derivative in two nested loops as shown below from sympy import * from math import pi n=5 # A straight forward approach that may be considered to construct the 5×5 Hessian matrix using Array function would be to write the array like this - But, one would hardly prefer to follow this approach. Instead of this, the problem can be easily handled by the use of dimpy as followsfrom sympy import * from math import pi from dimpy import * x1,x2,x3,x4,x5=symbols('x1,x2,x3,x4,x5') # Defines the symbolic variables x=[x1,x2,x3,x4,x5] f=x1**2+x1*x2+cos(x1*x2)+x3+x4*x5**0.5 # Defines the function n=5 # Number of rows or columns H=dim(n,n) for i in range(n): for j in range(n):

x[i]) print(H)
When we run the above program, following output is obtained

(ii) Test for Symplectic Condition (Canonical Transformation):
In classical mechanics, we use the 'symplectic condition' to test whether a transformation of coordinates in phase space is canonical or not Goldstein 1998.
The mathematical approach to test canonicality of an n particle system in phase space demands the following condition be satisfied:  M � is the transpose matrix of M, and, J is an 2n×2n anti-symmetric matrix given by where O is a n×n null matrix or zero matrix (i.e. a matrix whose all the elements are zero) and I is an n×n unit matrix For a two or three-particle system though the test is not difficult to carry out manually, but, if the number of particles in the system is large we must do it programmatically. Let us solve a problem by using the Python program for a twoparticle system which can be extended to any number of particles system with little modification.
Problem: Prove that the following transformation is canonical M=dim(2*n,2*n) J=dim(2*n,2*n) for i in range(2*n): for j in range(2*n): When we run this program, we obtain the message "The symplectic condition is satisfied" at the output. Although the above program deals with a two-particle system, our program can be easily used for any number of particles system by introducing necessary parameters and equations in the first few lines.
The Array() function can not be used in the above program due to the same reason as in the Hessian matrix case, and therefore dimpy is the only option.
(iii) Analysis of Astronomical data: Hipparcos catalog (Perryman et al. 1997) is an astronomical database in the form of an ASCII table (i.e. can be opened by any text editor like Notepad). It contains various astronomical data for 118218 stars in each line tabulated in 77 columns called Fields (excluding the Field 0). Deb and Chakrabory 2014 used a FORTRAN program to read 13 among these 77 columns in their work to identify stars with incorrect spectral classification. As the first step to investigate further on those wrongly classified stars (Table-2, Deb and Chakraborty 2014), we might be interested to list the following information-(1) star identifier number (2) location of the stars (equatorial coordinates), (3) their distance from our sun, (4) the color temperature, (5) uncertainty in color temperature data and (6) the absolute magnitude M v (which represents luminosity of the star relative to our sun). On the other hand, if we import tablefile package to the code, the same task can be performed with the following codefrom tablefile import * f1=file("D:/data_deb.txt",'\t') lines_deb=f1.read() # reads 'data_deb.txt' in default line/column format f2=file("D:/hip_main.dat", "|") cols_hip=f2.read("c/l") # reads 'hip_main. If we compare the above two codes, it can be seen that with the use of tablefile package in our program we can make it simpler as well as concise. Both the codes are properly commented on so that readers can easily understand the steps. (See the last paragraph for file availability information)

(iv) Analysis of Experimental Data:
The temperature dependence of an ohmic conductor is given by Where R T is the resistance at temperature T, R o is the resistance at a reference temperature T o which is generally 0 o C, and α is the temperature coefficient of resistance for the material. The experimental determination of α needs temperature v/s resistance data for a wide range of temperatures. Fig.2 shows a computerized experimental setup for the determination of α. The Arduino microcontroller automatically collects data for the temperature of the oil bath (which is at equilibrium temperature with the resistance), the current through the circuit, and the voltage across the resistance R with the help of respective sensors and sends it to the computer through a USB connection which can be logged to a text file utilizing a serial data read software like Terminal or Putty. Fig.3 shows the screenshot a of typical output data-set from the microcontroller which is programmed to send 50 observations at the interval of 20 milliseconds, then waits 5 minutes for the temperature change, and then repeats the process. Here it can be seen that some entries are due to serial data read errors and hence can not be converted to a floating-point number during calculation and therefore would lead to errors unless it is handled separately in the program. Complete removal of the lines containing errors may not be suggested because it will delete some correct data too. But if we use inbuilt functions of tablefile package then this limitation can be overcome without adding extra steps to the code. Two codes producing identical results -with and without importing tablefile are shown below. It can be seen that the use of tablefile makes the code simpler and shorter. The output is shown in Fig.4.
Working and theory behind the codes: In the codes, we first read the four columns from the data-file and place them in a two-dimensional list array called 'cols' where the first index would indicate column number and the second would indicate the serial number of the data. In the data-file, the 1 st column gives the serial number of observations, the 2 nd column gives temperature data, the 3 rd gives voltage data, and 4 th that of current data, and the corresponding data can be accessed by setting the first index of 'cols' equal to 0,1,2 and 3 respectively. We divide each column into groups of 50 data sets and then take their average and sample standard deviations. This gives average temperature (T av ), voltage (V av ), and current (I av ) at different observation times and their corresponding sample standard deviations T sds , V sds, and I sds respectively. Corresponding standard errors can be calculated from T err =T sds /√N, V err =V sds /√N, and I err =V sds /√N where N is the sample size (which is 50 in this case). Now resistance R can be computed from R= V av /I av and uncertainty in R can be obtained from the well-known relation Here δV=V err and δI=I err are standard errors in V and I measurements respectively. The temperature and resistance data and their uncertainties are stored in four separate lists which are used to plot a graph as shown in Fig.4. A least-square fit with the help of the 'numpy.polyfit()' function is also shown in Fig.4. The (slope/intercept) ratio of this line gives the value of α.
Codes for all the four examples described in this section and the related files are available for download at https://github.com/DwaipayanDeb/dimpy-tablefileexamples.git

Discussion
This paper has discussed the limitations of array formation and reading data from files in the present form of Python and introduces two new tools which eliminate these issues and improve the user experience for scientific calculations and analysis. The tablefile package simplifies reading tabulated data from a file with any kind of field separator, and also some inbuilt functions are provided to perform basic calculations like summation, averaging, standard deviations, etc. On the other hand, dimpy package can generate Python 'list' type arrays of any number of dimensions with any number of elements. Two physical examples in each case are given to show how these tools may simplify Python programming in physics.
With the use of dimpy we may greatly simplify and enhance the calculations within arrays of two or more dimensions especially for symbolic operations like differentiation and integration. These examples also demonstrate how calculus can be applied to the matrix elements within nested loops that would be a difficult or time-consuming process without the help of dimpy in a Python program. Although given examples deal with two-dimensional arrays, a similar procedure may be applied to perform calculus operations on arrays of any number of dimensions (as in tensors). On the other hand, we see that reading data from huge files like the Hipparcos catalog or a file containing experimental data is simplified with the use of tablefile. Also, inbuilt functions like convert(), av(), sd(), etc. are more efficient in the sense that they do not through run time error if the input list argument contains one or more string elements. The students/researchers of physics may find these two tools useful in their work.

Declarations:
Funding: No funding was received for this work

Conflict of interest/Competing interests: No conflict of interest/ competing interest applicable
Availability of data and material: Data reported in this work are available to be used by anyone without restrictions.