Convolutional Neural Network with embedded Fourier Transform for EEG
classification
Hubert Cecotti, Axel Graeser
Institute of Automation (IAT), University of Bremen, Germany {cecotti;ag}@iat.uni-bremen.de
Abstract the solution to some specific pattern recognition meth-
ods. When a visual stimulus with a constant frequency In BCI (Brain - Computer Interface) systems, brain is presented to the user, a continuous brain response signals must be processed to identify distinct activities at the same frequency is present in the visual cortical that convey different mental states. We propose a new area. This paper deals with the classification of such technique for the classification of electroencephalo- responses that are called steady-state visual evoked po- graphic (EEG) Steady-State Visual Evoked Potential tentials (SSVEP). One of the actual challenges in BCI (SSVEP) activity for non-invasive BCI. The proposed is to improve the reliability of the commands, i.e. the method is based on a Convolutional Neural Network EEG classification in a short time period. In the fol- that includes a Fourier transform between hidden lay- lowing sections, we propose a neural network that uses ers in order to switch from the time domain to the fre- a priori knowledge of the problem and that can adapt quency domain analysis in the network. The first step to each subject. The second section presents the EEG allows the creation of different channels. The second signals from SSVEP. The main system is described in step is dedicated to the transformation of the signal in the third section. The experiments and the results are the frequency domain. The last step is the classifica- detailed in the fourth section. tion. It uses a hybrid rejection strategy that uses a junk class for the mental transition states and thresholds for 2 EEG signals the confidence values. The presented results with offline processing are obtained with 6 electrodes on 2 subjects The EEG signals classification is one of the current with a time segment of 1s. The system is reliable for challenges for real BCI applications [4]. Different types both subjects over 95%, with rejection criterion. of classifiers have been used for EEG classification like neural networks [2, 7] and Hidden Markov Models [12]. In the following, we consider Nelec for the EEG signal 1 Introduction acquisition. The signal corresponds to the voltage mea- sure between a reference electrode and one of the Nelec Brain - computer interface (BCI) systems allow peo- electrodes. Let SF be the signal frequency acquisition ple to communicate through direct measures of brain and T S the time segment, in second, attributed to the activity [1, 3]. Unlike all other means of communica- analysis of the signal. For SSVEP stimuli, we consider tion, BCIs require no movement [8]. BCI have been Nf req visual stimulation on a LCD screen with boxes primarily used to enable communication for persons flickering at different frequencies. with severe disabilities who are unable to communicate Usually different channels are created to perform the through any classical devices. Usually, a BCI is decom- classification. A channel is used as a linear combina- posed into four parts: the signal acquisition, the signal tion of the signals measured by the Nelec electrodes. A processing/classification, the output device components channel c is defined by: and the operating protocol that links the three previous N elec components. This work deals with the signal classifica- X tion component. This part includes two main steps: the cj = wi ∗ Xi,j i=0 extraction of brain signal features and the translation of these signals into device commands. To classify differ- The creation of channels allows the analysis of a set ent brain signals, the knowledge of the stimuli drives of spatially independent vectors instead of a matrix. The
information from the electrodes is resumed in one scalar • Layer 0 (L0 ): the input layer. Xi,j with 0 ≤ i < at a time j. For the EEG signal pre-processing, the first Nelec and 0 ≤ j < SF ∗T S. SF ∗T S corresponds step is to find an optimal set wi,k , 0 ≤ i < Nelec and k to the number of samples in T S seconds. is a finite number, as small as possible. There exists ac- tual solutions for the creation of one or several channels: • Layer 1 (L1 ): the first hidden layer. L1 is com- the average combination, the native combination [5], posed of 10 maps. We define L1 Mm , the map the bipolar combination [10], the Laplacian combina- number is m. Each map of L1 has the size 1*SF ∗ tion, the Minimum energy combination, the Maximum T S −4. (The reduction of 4 is due of the border ef- contrast combination [6]. fects of the kernel for filtering in the time domain). We propose the creation of channels that are tailored • Layer 1’ L01 ): L01 is the result of L1 after the together and function to their discriminant power once Fourier Transform. Each map of L01 has the size they are combined. The goal is to determine the optimal 1*22, a selection of the frequencies that play a fea- set of weights for a finite number of channels, which can tures role. help for the final classification. A solution for creating such channels is proposed in the next section. • Layer 2 (L2 ): the second hidden layer. L2 is com- posed of 1 map of 100 neurons. This map is fully connected to its corresponding map on L01 . 3 System overview • Layer 3 (L3 ): the output layer. This layer has only one map of 6 neurons, which represents the 5 fre- The model is based on a convolutional neural net- quencies to detect and the transition state. This work (CNN), which is based on a multi-layer percep- layer is fully connected to each map of L2 . tron with a special topology. This kind of model is widely used in handwriting character recognition [9]. 3.2 Propagation Its interest is to directly classify the raw signal and to integrate the signal processing functions within the dis- The first hidden layer is dedicated to the automatic criminant steps. Indeed, it is not always possible to creation of the channels (their weights) and the auto- know the kind of features to extract. However some matic linear filtering of the signal in time. This step knowledge of the problem may be included in the net- may be useful to cancel some artifacts due to difference work topology. It is the reason why we propose to add of the phase in the signal between electrodes. For each some signal processing methods between 2 hidden lay- map of L1 is processed as following: ers. The inputs are the EEG signal values from the elec- First a convolution is applied on the input layer for trodes during a time segment, Xi,j , 0 ≤ i < Nelec , creating the channels: The value of a neuron n of 0 ≤ j < SF ∗T S. The output corresponds to the Nf req L1 Mm is defined by: SSVEP frequencies and a class for rejection. Thus, for the classification task, there are Nf req + 1 classes. L1 Mm (n) = f (σ) The process within the neural network is composed of the signal normalization and denoising, the channels where creation, the selection of a pool of frequencies and their i,j=N Xelec ,5
harmonics then their classification. σ= (XSF ∗T S∗i,j+n Wm,i,j ) + Wm )
Before processing the signal, data are normalized i,j=0,0 (Xi,j ← (Xi,j − X¯i,j )/(σi,j ) where X¯i,j and σi,j are where 0 ≤ m < 10, 0 ≤ i < Nelec , respectively the average value and the first deviation of 0 ≤ j < SF ∗ T S, 0 ≤ n < SF ∗ T S − 4 and f is a the electrode i at the time j in a time segment T S. tanh function, W is the set of weights. Let notice that each neuron of the map shares the 3.1 Neural network topology same set of weights and is only connected to a window of size 5 ∗ Nelec . This window allows filtering the sig- The network is composed of 4 layers, which are nal in the space and time domain at the same time. In- composed of one or several maps. We define a map stead of learning one set of weights for each neuron, as a layer entity that has a specific semantic: each map dependent to the neuron position, the weights are learnt of the first hidden layer is a channel. The first hidden independently to their corresponding output neuron. layer is dedicated to the denoising of the input data and Once the channels are created. The Fourier Trans- the creation of the different channels. form is applied on the neurons value to pass in the fre- The network topology is described as follows: quency domain. Then frequencies between 12.5Hz and 17.5Hz with a step of 0.5Hz, and their first harmonics, surgery to implant electrodes. It only uses sensors are selected as they represent the frequency band of the with contact on the surface of the scalp via 8 standard stimuli. Finally we consider the layer L01 and its map EEG electrodes. They are placed on AFZ for ground, L00 Mm 0 , which have all 22 neurons. Between L01 and CZ for the reference and P O3 , P O4 , PZ , O1 , 02 , L3 , it corresponds to a classical multi-layer perceptron. OZ for the input electrodes on the international 10-20 L2 and L3 are fully connected, respectively to L01 and system of measurement [11]. The stimuli are flickering L2 . lights and their responses should correspond to an SSVEP response. The system must reflect the user 3.3 Backpropagation attention to a fast oscillating stimulus. EEG signals were recorded on 2 subjects. For each subject, we The backpropagation for the L3 and L2 is done by have 5 different trials of about 3 minutes. Each trial using a gradient descent by minimizing the least mean is composed of a sequence of events. The sequence square error. The error must be transfered back in the simulates the actions that can occur during online time domain by using the Inverse Fourier Transform for processing where the user has to shift his gaze between correcting the weights in the first hidden layer. As the the different visual stimuli. During Event(i), the subject errors in this layer are complex numbers, only the real has to look for 4s at a box flickering at (13 + i)Hz, part is used for updating the weights. 0 ≤ i < Nf req , Nf req = 5. The sequence is defined as:
3.4 Rejection strategy for i=0 to Nf req − 1
for j=0 to Nf req − 1 The system must be very reliable. Therefore we { combine 2 decisions for the rejection. First, we dedi- Event(i) cate one class for the transition states as described pre- Event(j) viously. Then there are 2 thresholds for each class. } They are determined function to the validation database, which is also used to find the best epoch for generaliza- Table 1. Results on the test database. tion. For each class Ci , we define the 2 thresholds: Subject A B 2M ax(P (x ∈ Ci )) + Ave(P (x ∈ Ci )) Recognition (learning) 93.44 76.39 ζi = Recognition (validation) 56.57 63.52 3 Recognition (test) 53.47 49.95 2M ax(DP (x ∈ Ci )) + Ave(DP (x ∈ Ci )) Rejection (test) 43.77 48.59 ψi = 3 Reliability (test) 95.08 97.17 where M ax(P (x ∈ Ci )) and Ave(P (x ∈ Ci )) are re- spectively the maximum and average probability values for a pattern to be accepted as belonging to the class The learning, validation and test database are com- Ci , i.e the signal at (13 + i)Hz. M ax(DP (x ∈ Ci )) posed respectively of 6392, 2130 and 2134 patterns for and Ave(DP (x ∈ Ci )) are respectively the maximum each subject. The classes are equally distributed in the and average distance between the first and second best three database. For each signal, the three first trials are answer for a pattern to be accepted as belonging to the dedicated to the training of the system, the fourth trial class Ci . A signal X is attributed to the class Ci by the is used for the validation and the fifth is for the test. classifier E, i ≤ 0 < Nf req when: The recognition, error, rejection and reliability rate (τrec , τerr , τrej , τrel ) are defined by: • i = argmaxn L3 (n) P • P (x ∈ Ci ) > ζi ((E(X) = Ci ) and (X ∈ Ci )) τrec = X∈DB P X∈DB X ∈ Ci • DP (x ∈ Ci ) > ψi P otherwise the class is rejected, E(X) = R. X∈DB ((E(X) = R) and (X ∈ Ci )) τrej = P X∈DB X ∈ Ci
4 Experiments where DB is the considered database.
τerr = 1 − τrec − τrej and τrel = τrec /(τrec + τerr ). The experiments are made on a particular BCI Table 1 presents the results (in %) obtained with type: the system is non-invasive. It does not require the CNN with 2 subjects. The results correspond to the epoch giving the best recognition rate on the val- CT-2004-014211, within the 6th European Community idation database. For other epochs, we observe that Framework Program. the recognition rate reaches about 97% for the learn- ing database, which proves the pertinent choice of the References topology. However, the generalization remains average and the system must reject a lot of patterns to achieve [1] B. Z. Allison, E. W. Wolpaw, and A. Wolpaw. Brain- a high reliability. Such generalization can be explained computer interface systems: progress and prospects. by the choice of the time segment and maybe the diffi- Expert Review of Medical Devices, vol. 4, no. 4, pages culty for the subjects to focus on the stimuli. 463–474, 2007. The proposed approach is well suited for the channel [2] C. Anderson, S. Devulapalli, and E. Stolz. Determining creation as it allows the creation of channels that enable mental state from eeg signals using parallel implemen- efficient results in spite of the very noisy signals. These tations of neural networks. IEEE Workshop on Neu- results validate the idea to switch from the time domain ral Networks for Signal in Processing, Cambridge, MA, to the frequency domain, and the need to adapt the BCI USA,, pages 475–483, 1995. [3] N. Birbaumer and L. G. Cohen. Brain-computer inter- function to the user. faces: communication and restoration of movement in The two subjects give good SSVEP response. The paralysis. Journal of Physiology-London, vol. 579, no. rejection strategy offers the same kind of results for both 3, pages 621–636, 2007. subjects. A time segment of 1s is quite challenging for [4] B. Blankertz, G. Dornhege, S. Lemm, M. Krauledat, observing a peak in the expected frequency. Thus al- G. Curio, and K. Muller. The berlin brain-computer most half of the patterns are rejected. Contrary to other interface: Eeg-based communication without subject pattern recognition applications where the ground truth training. IEEE Trans. on Neural Systems and Rehabili- can be checked easily, the expected brain signals are dif- tation Engineering, vol. 14, no. 2, pages 147–152, 2006. ficult to verify during the experiment. Such incertitude [5] G. Burkitt, R. Silberstein, P. Cadush, and A. Wood. leads on rejecting naturally a lot of patterns. The subject Steady-state visual evoked potentials and travelling waves. Clin. Neurophysiol, vol. 111, no. 2, pages 246– can shift his gaze and produce unwanted signals dur- 258, 2000. ing the experiment. Therefore there is an inevitable risk [6] O. Friman, I. Volosyak, and A. Graser. Multiple chan- that the data used for training may have some parts that nel detection of steady-state visual evoked potentials for do not correspond to an expected SSVEP response even brain-computer interfaces. IEEE Trans. on Biomedical if we consider the transition between events in a junk Engineering, vol. 54, no. 4, pages 742–750, 2007. class. However the proposed strategy allows a good re- [7] E. Haselsteiner and G. Pfurtscheller. Using time de- liability while keeping a short time segment, which is a pendent neural networks for eeg classification. IEEE key factor for obtaining a high information transfer rate Trans. on Rehabilitationi Engineering, vol. 8, no. 4, during online processing. pages 457–463, 2000. [8] A. Kostov and M. Polak. Parallel man-machine training in development of eeg-based cursor control. IEEE Trans 5 Conclusion Rehabil Eng, vol. 8, no. 2, pages 203–205, 2000. [9] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. A new model based on a Convolutional Neural Net- Gradient-based learning applied to document recogni- work has been presented. The network integrates be- tion. Proceedings of the IEEE, vol. 86, no. 11, pages tween 2 hidden layers the Fourier transform, which 2278–2324, 1998. changes the layer semantic from the time domain anal- [10] G. Muller-Putz, R. Scherer, C. Brauneis, and G. Pfurtscheller. Steady state visual evoked potential ysis to the frequency domain analysis. This method al- (ssvep)-based communication: impact of harmonic fre- lows also the automatic creation of channels and linear quency components. J. Neural Eng. vol. 2, no. 4, pages time filters function to the user and their discriminant 123–130, 2005. powers for the signal classification. The proposed re- [11] G.-E. Sharbrough, R. Chatrian, H. Lesser, M. Lders, jection criterion are successful and allow a high relia- T. Nuwer, and P. TW. American electroencephalo- bility. The model has been validated for the classifica- graphic society guidelines for standard electrode posi- tion of EEG signal in a BCI system based on SSVEP tion nomenclature. J. Clin. Neurophysiol, vol. 8, 200-2, responses. 1991. [12] S. Zhong and G. Joydeep. Hmms and coupled hmms for multi-channel eeg classification. In Proc. IEEE Int. Acknowledgment Joint. Conf. on Neural Networks, vol. 2, pages 1154– 1159, 2002. This research was supported by a Marie Curie Euro- pean Transfer of Knowledge grant Brainrobot, MTKD-
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind