There are two phases of the study. As EEG analysis requires humongous amount of data, it’s inevitable that there are hundreds of hypothesis. We will later see that there are thousands of hypothesis to test. Therefore, to avoid “harking,” there is a need to divide the research into “hypothesis generating” phase and “hypothesis testing” phase in completely different subjects.
We need to divide the subjects randomly for two phase of research. We need EEG recordings from 100 subjects diagnosed with schizophrenia and 100 subjects of healthy individual (or at least non-psychotic patient) for hypothesis generating, and then another EEG recordings of 100 patient with schizophrenia and 100 healthy individual for hypothesis testing. The subjects were randomized to the two phases from a pool of 200 schizophrenia patient EEG and 200 normal patient EEG.
Hypothesis generating phase
(This is a stagnated research protocol. We have no physical nor mental resource to continue the idea, as author is having unwell mental condition. Thus, it is better to spread the idea to the community instead of holding it without much chance of continuation. However, if you are interested in the idea or have anything to query, please contact author email ([email protected]))
- Determine x-axis and y-axis. Sagittal axis in normal EEG reading, such as F7-T7, T7-P7, F4-C4, or others are considered y-axis. As normal EEG reading does not have x-axis (coronal axis), we need to generate them by making use of approximation using Avg axis
For example, to generate Fp1 – Fp2, we need to substract potential in Fp2 – Avg with Fp1 –Avg. To generate F7 – F3, we can substract F3-Avg with F7-Avg
- If we compare potential change in every time-frame (every 0.04 s, or if we want to detail it further, 0.01 s) of adjacent x-axis with adjacent y-axis, we can get the degree of potential vector by using function
f(x) = Arctan (dy/dx)
Arctan (x) = reverse tangent, operation to find the degree of x from the known value of tan (x)
dy = change of potential in y axis during very short time frame (for example, 0.01 s)
dx = change of potential in x axis during very short time frame
For example, if we want to know the direction of vector in plane F3-F7-Fp1 during the time between t=0.01s to t=0.02s, we need to observe potential change in axis F3-F7 and Fp1-F7 during such short time period. We can do that using program or simply using manual observation, for example by adding extra horizontal lines and extra vertical lines in EEG recording image. If during the observable time, Fp1-F7 has +7 mV change while F3-F7 has +1 mV change, we can find the direction by measuring arctan ((+7)/(+1)) = arctan 7 = 81.87 degree or 278.13 degree.
As the formula results have two possible value, the positive value is used when dy is positive, and vice versa. The resulting pattern of degree is what will be analyzed. For this, we can use objective data of potential change in every milisecond, but if such data is not available, we can use image data of EEG recording, enlarge and analyze the change of potential visually for every 10 miliseconds.
- The 16 y-axis being matched with adjacent x-axis would result in 48 different planes to analyze. There are obviously multi-counted plane, for example, F7-F3-C3-T3 can be considered a square plane. However, to account for possible mistakes, I believe it’s better to analyze F3-F7-T3, F7-F3-C3, F3-C3-T3, and F7-T3-C3 separately.
The resulting observable planes are:
Fp1-F7-F3, F7-F3-Fp1, Fp1-F3-Fz, F3-Fz-Fp1, Fp2-Fp1-Fz,
Fp1-Fp2-Fz, Fp2-Fz-F4, Fz-F4-Fp2, Fp2-F4-F8, F4-F8-Fp2,
A1-T3-F7, F3-F7-T3, F7-T3-C3, T3-C3-F3, F7-F3-C3, FzF3C3
F3-C3-Cz, C3-Cz-Fz, F3-Fz-Cz, F4-Fz-Cz, Fz-Cz-C4, Cz-C4-F4,
Fz-F4-C4, F8-F4-C4, F4-C4-T4, C4-T4-F8, F4-F8-T4, F8-T4-A2,
A1-T3-T5, C3-T3-T5, T3-T5-P3, T5-P3-C3, T3-C3-P3, Cz-C3-P3,
C3-P3-Pz, P3-Pz-Cz, C3-Cz-Pz, C4-Cz-Pz, Cz-Pz-P4, Pz-P4-C4,
Cz-C4-P4, T4-C4-P4, C4-P4-T6, P4-T6-T4, C4-T4-T6, A2-T4-T6,
P3-T5-O1, T5-P3-O1, Pz-P3-O1, P3-Pz-O1, Pz-O1-O2, O1-O2-Pz, P4-Pz-O2,
Pz-P4-O2, T6-P4-O2, P4-T6-O2
Although most of them are not strictly perpendicular, the approximate vector direction should still hava values to observe.
- Obviously, the pattern gathered in step 2 would be extremely chaotic if we analyze the pattern as is, as there might be no exact repetition. We need algorythm to account for tolerable errors. We need to consider the possibility of lengthening of any segment, shortening of segment, and tolerable change of vector degree. It’s easier if we consider it as inserted or deleted segments.
- Find a pattern to test. Once we get the pattern which is suspected to be repeating (define it as “tested pattern”), we need to formulate all the possible tolerable alteration.Instead of the strict degree we get from the arctan function, we can regroup it into a new unit using degree range. For example, >0-18 degrees as 1 unit, >18-36 degree as 2 units, >36-54 degree as 3, ....., >162-180 degree as 10, >180-198 degree as -9,....., and >342-0 degree as 0 unit. That way, even if there is small degree difference, it can be considered the same pattern.
Up to this point, we have analyzed and converted EEG sequence to number sequence. The method up to this part is already a novel approach. For example, we might find out that the first second of plane F3-F7-T3 is actually recoded as 100 consecutive values such as...
1 2 3 2 1 -1 -4 8 -8 7 10 8 -8 7 10 8 -8 7 10 3 2 1 -1 -4 8 -8 7 1 3 2 1 -1 -4 8 -8 7 1 ..... and so on.
There might be multiple possibilities after this part of the method, as there is no exact repetition and there are various ways to interpret tolerable error. We will only discuss 1 possible method for this protocol draft.
- We need to set objective value of the tolerable alteration. I believe 20% difference from the original pattern can be considered tolerable such that it can be considered a repeating pattern.
- If we observe the pattern every 0.01 second, we can consider pattern of 1 second length as 100 segments. For alternate pattern of similar length (without insertion or deletion), we can calculate the “difference” with the formula:
Difference from reference (%) =
((Number of segment with 1 unit difference)+ 2(Number of segments with 2 unit difference)+3(Number of segments with 3 unit difference)+...+10(number of segments with 10 unit difference))/(10 X number of segments)
- For segments with different length, or have been considered to have insertion or deletion, we need to create a new reference pattern. If we want to “insert” 10% extra length between two adjacent segment, it’d be easier to define the new inserted segment to have the mean degree of the two previously adjacent segment. The resulting reference pattern would then be considered to have 10% difference from the previous pattern. The new reference pattern can now be used to measure other patterns of the same length. Thus,
Difference (%) = (percentage of inserted/deleted segment) + difference from new reference pattern (%)
Note that it will be very confusing to use multiple reference patterns to check if other patterns are repetition of the original tested pattern. Instead, we use them to generate patterns that still fall within 20% difference of the original tested pattern. We would then simply match how many of the remaining patterns are tolerably a repeat of the tested pattern. This current phase might be exhaustive for humans, but current computer analysis might be able to do so. There are also methods to link multiple computers to cut the analysis. Sony’s “foldingathome” is one such program, but it might not necessarily be as complicated to cut the task into minitasks to be solved by 10-50 separate computers (we don’t need tens of thousands of computers for this).
For example, for a tested pattern of 1s length (100 segments), we need to generate reference patterns of length 80 segments – 120 segments and then create “patterns of tolerable errors” to find out how many times the tested pattern is repeated for a certain period of time (for examples, the average number of repetition in 5 minutes). It becomes complicated since the pattern can be “inserted” or “deleted” in at most 20 different places (for 1s pattern) and each “insertion” can be of variable length with the total of 20 segments. A “tested pattern” can even get “inserted” and “deleted” at the same time in different areas to create new “reference pattern.”
- Now, from every “tested pattern,” we can get “patterns of tolerable errors” and how many times the pattern is repeated on average in 5 minutes. Now we can use mean comparison analysis such as t-test tocompare the number of repetition in schizophrenia patient with normal (non-psychotic patient). The pattern with significant result will be further tested for hypothesis testing. If we want to limit the pattern to pass to hypothesis testing phase, we might choose the pattern with lowest 10 p-value. The limitation is important since there might be tens of thousands of hypothesis which might mean that about 500 hypothesis are going to be mistakenly considered significant.
- It was obvious that there was a skipped explanation in step 5, which is an explanation of “which pattern is tested?”. However, since this is a new method, we have no idea what pattern is likely to be repeated and in which plane. The easiest method is to test them all. For example, begin by testing segments of 1s length in all subjects (segment 1-100, segment 2-101, segment 3-102, etc.) Since it’s possible that the subject is unique to the population, we need to check all patterns from the second subjects, third subjects, and so on, to find patterns that are relatively often repeated in all subjects. If we find less than 2% of all tested pattern is significant, we can test shorter pattern, for example 50 segments. If it still has too few significantly different patterns, we cut the length further to 25 segments. However, if there are too many patterns that are repeated in 1s length, for example, more than 15% of all patterns is significantly different between schizophrenia patiens and normal patients, we can consider longer pattern, for example 200 segments. The other possibility is to test random 1000 patterns, however, this approach is less favored, since there is significant loss of data.For simplicity, it’d better to compare only data of the same plane, but it might be an interesting prospect to compare data accros the plane or combination data from multiple planes (for example, segment 1 from plane 1, segment 2 from adjacent plane 2, and so on).