Вы находитесь на странице: 1из 46

Accepted Manuscript

A study of time-frequency features for CNN-based automatic heart sound


classification for pathology detection

Baris Bozkurt, Ioannis Germanakis, Yannis Stylianou

PII: S0010-4825(18)30174-4
DOI: 10.1016/j.compbiomed.2018.06.026
Reference: CBM 3006

To appear in: Computers in Biology and Medicine

Received Date: 29 August 2017


Revised Date: 24 June 2018
Accepted Date: 24 June 2018

Please cite this article as: B. Bozkurt, I. Germanakis, Y. Stylianou, A study of time-frequency features
for CNN-based automatic heart sound classification for pathology detection, Computers in Biology and
Medicine (2018), doi: 10.1016/j.compbiomed.2018.06.026.

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to
our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and all
legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT

A study of time-frequency features for CNN-based automatic heart sound

classification for pathology detection

Baris Bozkurt1, Ioannis Germanakis2, Yannis Stylianou3

barisbozkurt0@gmail.com, germjohn@med.uoc.gr, yannis@csd.uoc.gr

PT
1
Electrical and Electronics Engineering Department, Izmir Democracy University,

Turkey

RI
2
Faculty of Medicine, University of Crete, Greece
3

SC
Computer Science Department, University of Crete, Greece

U
Corresponding author contact information:
AN
Baris Bozkurt1, Electrical and Electronics Engineering Department, Izmir Democracy

University, Üçkuyular Mahallesi, Gürsel Aksel Bulvarı, No:14 35140


M

Karabağlar/İZMİR,
D

Phone: +90 232 260 1001, Fax: +90 232 260 1004
TE

Abstract
EP

This study concerns the task of automatic structural heart abnormality risk detection

from digital phonocardiogram (PCG) signals aiming at pediatric heart disease screening
C

applications. Recently, various systems based on convolutional neural networks trained


AC

on time-frequency representations of segmental PCG frames has been presented that

outperform systems using hand-crafted features. This study focuses on the segmentation

and time-frequency representation components of the CNN-based designs. We consider

the most commonly used features (MFCC and Mel-Spectrogram) used in state-of-the-

1
This study has been carried during a research visit of Baris Bozkurt in Computer
Science Department of University of Crete in period January to July 2017.

1
ACCEPTED MANUSCRIPT

art systems and a time-frequency representation influenced by domain-knowledge,

namely sub-band envelopes as an alternative feature. Via tests carried on two high

quality databases with a large set of possible settings, we show that sub-band envelopes

are preferable to the most commonly used features and period synchronous windowing

PT
is preferable over asynchronous windowing.

RI
Keywords: heart disease screening, heart sound classification, phonocardiogram

SC
analysis, automated cardiac auscultation, time-frequency features

U
1. Introduction
AN
The recording and analysis of acoustic vibrations recorded at the chest of a patient using

a microphone-transducer is referred as phonocardiography (PCG). Several heart


M

conditions can be successfully studied via analysis of phonocardiogram signals: murmur


D

of mitral and aortic regurgitation, murmur of mitral and aortic stenosis and rheumatic

valvular lesions [1]. Clinicians listen to the heart sounds of a patient for monitoring
TE

functionalities of the heart tissues, especially the opening and closing of the valves.
EP

Such evaluation of heart sounds targeted towards diagnosis is referred as auscultation

and typically involves analysis of time and frequency characteristics of heart sounds and
C

murmurs, the overall periodicity characteristics and the quality of sounds.


AC

Heart disease represents a major health issue with significant costs worldwide2.

Although coronary heart disease and hypertension predominate in adults, structural

heart disease including various heart malformations may already be present since birth

(congenital heart disease (CHD)) which is a considerable cause of pediatric morbidity

2
http://www.who.int/mediacentre/factsheets/fs317/en/

2
ACCEPTED MANUSCRIPT

and mortality. Up to 1% of newborn children are considered to be affected by some

form of CHD [2], with a wide spectrum of clinical presentations based on the severity

of underlying heart malformation. Experienced cardiac auscultation is amongst the most

important first line clinical screening tools to detect individuals with CHD risk.

PT
Although early CHD screening offers considerable health advantages, the primary

health care physician is confronted with the difficult clinical task, to differentiate

RI
between (innocent) murmurs often present among healthy children from those

SC
associated with abnormal hemodynamics indicative of CHD (abnormal murmurs) [3].

Referring all children with a murmur for expensive diagnostic tests (such as

U
echocardiography) is not a cost-effective approach [4]. Still, expert auscultation is
AN
frequently recommended as first line screening tool prior to application of diagnostic

echocardiography [5]. In order to address the observed declining clinical skills in


M

cardiac auscultation, several approaches have been applied, including multimedia


D

teaching interventions, tele-medical applications or other computer-based clinical

decision support systems [6].


TE

One important resource that can support pediatric structural (CHD) screening is the
EP

use of automatic heart sound classification technologies. Efficient screening has high

potential to both lower the financial costs and also allow use of expert resources more
C

effectively. A low-cost, non-invasive and fast screening method would also provide the
AC

opportunity to perform screening of a large number of people in their early ages

resulting in timely diagnosis of some of the pathological cases. Thanks to recent

advances in machine learning and computing, close to human performances have been

reached in many audio classification tasks including the heart sound classification. Our

3
ACCEPTED MANUSCRIPT

study targets improving the performance of automated heart sound classification

technology aimed at screening for CHD risk detection.

Focus and contributions of our study

In the present study, we followed the approach common to design of most of the

PT
recently developed high performance systems: convolutional neural networks trained on

time-frequency representations of segmental PCG frames. While we present a

RI
functional system for automatic PCG classification that has been tested and shown to

SC
have a performance at the level of the state-of-the-art, our main focus is to study various

strategies in feature extraction in a CNN-based approach. The novel contributions of

U
this study are as follows: we present results of extensive comparative tests carried with
AN
a multitude of settings of segmentation (period-synchronous and asynchronous with

various sizes), time-frequency representations and neural network models on two large
M

databases: i) our proprietary PCG database which is composed of PCG recordings from
D

patients referred to a cardiology expert by pediatricians, ii) a recent challenge data used

for comparing performances of state-of-the-art systems. As a result of these tests, we


TE

show that sub-band envelopes, comparatively rarely used as a time-frequency feature in


EP

this domain, is preferable over the commonly used time-frequency representations like

Mel Frequency Cepstral Coefficients (MFCC). For reproducibility of our study, we


C

share our codes for one of the settings which can be tested using the publicly available
AC

data.

This manuscript is organized as follows: In section 2, we first review basics of

cardiac auscultation and the literature of automatic heart sound classification with a

specific focus to CNN based approaches. The proposed method is presented in section 3

with subsections on feature extraction (in which segmentation is also covered) and the

4
ACCEPTED MANUSCRIPT

machine learning models. We explain the test design (together with the specification of

the test data) in section 4 followed by test results in section 5. Section 6 is dedicated to

conclusions drawn from the test results. Finally, discussions and future work are

presented in section 7.

PT
2. Automatic PCG classification, a short review

RI
Basics of cardiac auscultation

SC
Before developing an automated heart murmur classification, a basic understanding of

heart sound generation is needed. Heart sounds are acoustics signals created within the

U
heart through blood flow and heart apparatus (mainly valve) motion and transmitted to
AN
the chest surface where they can be audible either through direct ear placement

(historical auscultation since Hippocrates times) or by using a stethoscope as transmitter


M

to the ears (modern auscultation since the discovery of the stethoscope by Laennec) [7].
D

Portable electronic stethoscopes are largely used today for recording, which facilitate

further analysis on computer based systems.


TE

Proper cardiac auscultation typically involves analysis of time and frequency


EP

characteristics of normal fundamental heart sounds (S1 and S2, corresponding to inflow

and outflow valve closure, respectively) and (if present) of murmurs or extra heart
C

sounds (such as clicks, abnormal split of S2 etc). Pitch, duration, location and shape
AC

characteristics of heart sounds and murmurs are main components investigated. Most of

the CHD cases are associated with abnormal blood flow patterns due to the presence of

intracavity communications (shunt lesions) and /or valvular abnormalities (narrowing or

leakage). The clinical classification of heart murmurs as innocent or abnormal is mainly

based on their temporal classification within the heart cycle (with innocent murmurs

5
ACCEPTED MANUSCRIPT

being exclusively early to mid-systolic), spatial classification based on punctum

maximum (the location where best heard) and finally due to a very subjective murmur

sound quality validation (with innocent murmurs often described as having “musical” or

“vibratory” properties) [8, 9, 10].

PT
Cardiac auscultation, especially pediatric cardiac auscultation remains a challenging

clinical task. Not only it requires long-term practice and experience, but there are also

RI
perceptual difficulties: The heart sounds and murmurs involve low-frequency

SC
components (carrying discriminative characteristics for detection of abnormalities)

which are hardly audible, often with the presence of a high degree of noise

U
(environmental noise, breath, scratches due to microphone movement) in many cases,
AN
especially in pediatric cardiac auscultation.

A through cardiologic analysis involves, in addition to auscultation and


M

phonocardiography, use of multitude of techniques and technologies such as


D

echocardiography, ultrasound imaging, angiography, electrocardiograms, chest X-rays,

etc. some of which are costly. A very large percentage of the cases referred to a
TE

cardiology expert for such costly analysis have no serious problems [11]. Automatic
EP

heart sound classification technology has high potential to support the screening process

and lower costs.


C
AC

Automatic heart sound classification

Cardiac abnormality detection via automatic heart sound(HS)/phonocardiogram(PCG)

classification, is a very largely studied research domain. Manuscripts providing in depth

reviews of phonocardiogram(PCG) processing methods and comparison of large lists of

state-of-the-art methods (such as [12]) are already available. Leng et al [13] and

6
ACCEPTED MANUSCRIPT

Noponen et al [14] presented a detailed overview of the problem discussing types of

abnormalities and related PCG characteristics. Shen [15] reviewed the use of

phonocardiogram(PCG) signals for diagnosis from its early days to today. Abbas and

Bassam [16] overviewed the signal processing steps involved in processing PCG signals

PT
in detail. Leng et al [13] further reviewed the state-of-the-art hardware systems used for

electronic stethoscope for recording PCG signals and recent techniques used in

RI
automatic PCG classification. Marascio and Modesti [17] and Liu et al [18] presented

SC
detailed reviews of the trends in feature selection and automatic classification strategies.

As for many other automatic classification problems, the crucial components of an

U
automatic classification systems are: (training) data, segmentation, feature extraction
AN
and machine learning. Each of these components (and their interrelation) have been

discussed in various dimensions within this very large literature (with a few hundreds of
M

papers). Our particular interest in this work is the feature extraction and machine
D

learning components.

Features used for automatic PCG classification can be grouped as follows: time
TE

domain, frequency domain, statistical domain and time–frequency domain features [18].
EP

Studies using the time domain features typically include in the feature vector, duration

measures (for S1, S2, diastole, systole, R-to-R) and their ratios (for example the ratio of
C

systolic interval to the heart beat), relative amplitude/energy measures of heart sound
AC

components, and other common time-domain features like the zero-crossing rate. An

open-source system involving such features is presented in [18] as the baseline system

for the PhysioNet-2016 challenge [12].

Spectral (frequency domain) features involve a large variety of spectral

representations and/or measures as is the case for other automatic sound classification

7
ACCEPTED MANUSCRIPT

tasks. Schmidt et al [19] have considered a wide range of spectral features for automatic

PCG classification such as parametric models for the spectra, instantaneous frequency

and amplitude (IFA), power in octave bands with a conclusion that the low-frequency

bands carry important information that can be effectively modeled for designing

PT
discriminative features for PCG signals. In our previous study [20], we have used

reassigned spectrogram for feature computation. Signal complexity can also be

RI
modeled/computed and used as feature, such as sample entropy, simplicity and spectral

SC
entropy [19].

Here, due to availability of these numerous reviews on this topic, we will limit our

U
discussions on a specific methodology which is being used since the early days of
AN
automatic PCG classification and becoming more of an attraction recently due to

advances in machine learning technology: use of time-frequency representations and


M

neural networks to build effective automatic systems for cardiac abnormality detection.
D

Time-frequency features and neural networks for automatic heart sound classification
TE

Time-frequency representation of a sound signal often refers to stacking (some


EP

representation of) spectra computed from windowed segments of the signal to form a

two dimensional representation. The basic assumption in using time-frequency


C

representation in audio classification is that for specific classes of sounds, some patterns
AC

exist within these image-like representations. While a multitude of options exist for

computing such representations, the most common technique is to map a linear

representation like Short-Time Fourier Transform(STFT) in a logarithmic scale

mimicking human auditory response such as Bark or Mel. Introduced by Davis and

Mermelstein in 1980 [21], Mel Frequency Cepstral Coefficients (MFCC) is possibly the

8
ACCEPTED MANUSCRIPT

most frequently used feature in all automatic sound classification domains. MFCC is

also very commonly and effectively used in automatic PCG classification studies for

pathology detection [22-26]. A large number of automatic PCG classification studies

use wavelet based features as a time-frequency representation (for example [23] and

PT
[27]) as wavelets has certain advantages in terms of resolutions over STFT.

Following advances in deep learning, a large number of recent studies considered

RI
various different sound classification tasks using time-frequency representations and

SC
deep neural network architectures reporting high success rates [28, 29]. For the

automatic heart sound classification task, this approach is also gaining popularity and

U
systems based on this approach rank among the best in recent challenges. In PhysioNet-
AN
2016 challenge [12], half of the systems [24-26, 30] among the top 8 (selected out of 48

systems) use such an approach. Except [30] for which we could not find a detailed
M

description/documentation, all other three systems make use of MFCC and some other
D

time-frequency features with fixed settings (13-14 MFCC coefficients computed on 25

milliseconds (ms) windows with hop size of 10 ms, etc.) that is fed into a neural
TE

network classifier. Some of parameter choices seem to be highly influenced from other
EP

audio classification tasks. For example, the window and hop sizes used in speech

processing tasks are often chosen to be multiple of some commonly accepted average
C

period lengths (for example a fundamental frequency of 100 Hz referring to 10 ms


AC

period). The use of a few periods (25-30 ms) window length and about a period (10 ms)

hop length is common in speech processing. Applying the same lengths (25-30 ms

window length and 10 ms hop length) in PCG classification is an interesting choice

given the maximum frequency would not exceed 5 Hz (200 ms period), yet appear to be

preferred in some state-of-the-art systems (for example [25]).

9
ACCEPTED MANUSCRIPT

3. Proposed method

Our study follows the Convolutional Neural Network(CNN)-based classification with

PT
the following difference: while most studies present a single best selected configuration,

indeed there exists various options for segmentation (period synchronous or

RI
asynchronous, different sizes (close to average PCG period lengths)), various features

SC
(MFFC, Mel-Spectrogram, Spectrogram, etc. with different sizes (time resolution and

number of frequency coefficients)), various CNN architectures. Hence, a large number

U
of different settings is worth testing. We have designed tests to compare various settings
AN
of common time-frequency representations used directly as the input to a CNN

classifier. The features considered for this study are: Mel-Spectrogram, MFFC and sub-
M

band envelopes. With tests on two high quality datasets, we show that sub-band
D

envelopes are preferable over other options in many settings and a system with

relatively simple architecture, built using this feature achieves high performances. Next,
TE

we discuss the computation of these features.


EP

3.1 Feature extraction


C

Feature extraction can be performed at the whole signal level or frame/segment level
AC

where multiple frames are extracted via windowing. As explained in the introduction,

within this study, we limit ourselves with frame level time-frequency feature extraction.

Further, to classify individual files, frame level decisions are fused.

Frame level feature extraction necessitates automatic segmentation of a signal into

frames. Segmentation can be performed period synchronously or asynchronously and

10
ACCEPTED MANUSCRIPT

we include both of these strategies in our comparative tests. For period synchronous

segmentation, period marking is necessary.

Period marking and segmentation

PT
Period marking (segmentation into heart cycles or marking of cycle starting instances)

can be carried directly on PCG signals and there is a large literature for this task and

RI
publicly available state-of-the-art tools [31]. As the location of the PCG recording (on

SC
the body of the patient) influences the relative energies of S1-S2 components and high

amount of noise may be present, performing marking reliably on PCG signals is a

U
challenging task. When Electrocardiogram (ECG) signals, recorded in parallel, is
AN
available, automatic marking can be more reliably performed since ECG signals are less

noisy and they include a main peak which can be tracked for reliable period marking.
M

Our database (explained in the test design part) includes ECG signals recorded
D

simultaneously together with the PCG signals, hence an algorithm with the following

steps is implemented and used for extracting period marks from the ECG signals
TE

(detecting R-peak locations of ECG which refer to S1 onset of the PCG [31]):
EP

● High-pass filtering the ECG signal to remove very low frequency variations

● Element-wise multiplication of the pre-emphasized ECG with the original ECG


C

to obtain a more impulsive version of the signal


AC

● Computing energy signal and amplitude normalization of the energy signal

● Autocorrelation based period detection from energy signal

● Estimation of number of heart cycles in the signal

● Signal peaks detection via applying a threshold: The threshold (with an initial

value of 0.5) is lowered incrementally until peak count is larger than four times

11
ACCEPTED MANUSCRIPT

the estimated number of cycles. This choice aims coping with possible octave

errors in the period estimation. In addition, a secondary peak may be prominent

within the cycle.

● Removing spurious peaks using peak amplitude comparison and distance to

PT
surrounding peaks

We did not carry a formal testing of this algorithm but visually checked most of the

RI
samples in our database to observe potential problems. The method provides high

SC
quality period marking for almost all cases. In Figure 1, two samples from our database

are presented. Top figures include the ECG signals where period marks are represented

U
with red dots together with fixed-length frames obtained and the bottom figures show
AN
the corresponding PCG signals with period marks. For the windowing operation to

extract frames that also involve S1 component of the heart sound, period marks are
M

shifted to left by 75 milliseconds.


D
TE
C EP
AC

12
ACCEPTED MANUSCRIPT

PT
RI
Figure 1: Automatic period marking and period synchronous segmentation on ECG

SC
signals into 0.5 second frames. Pitch marks are indicated in red dots and obtained fixed-

length frames with black rectangles.

U
Once the period marks are available, different strategies can be used for
AN
segmentation to obtain PCG frames. The following segmentation strategies are worth

testing:
M

● Period synchronous segmentation with segment length defined proportionally to


D

the local period (half a period, one period, two periods, etc.). For segment
TE

lengths higher that one period, overlap is inherent.

● Period synchronous segmentation with fixed segment length (0.5 sec., 1 sec., 2
EP

sec., etc.). Wherever the segment length exceeds period length, overlap is

inherent.
C

● Period asynchronous segmentation with fixed segment length (0.5 sec., 1 sec., 2
AC

sec., etc.) with or without overlap. This is exemplified in Figure 2 where a

sample is depicted together with frame boundaries for fixed length of 2 sec. and

hop size of 1 sec. (i.e. %50 overlap).

13
ACCEPTED MANUSCRIPT

PT
RI
SC
Figure 2: Asynchronous segmentation example with length of 2 sec. and hop size of 1

U
sec. Frame boundaries are indicated with black solid lines.
AN
A number of such options are considered in the tests carried. This of course adds a

multiplicative factor to the number of tests to be performed.


M
D

Computing time-frequency features


TE

Considering automatic audio/sound classification systems with CNN architectures, the

following time-frequency representations were selected to be amongst the most


EP

common (which are also often included in audio processing software libraries):

● Spectrogram
C

● MFCC (with or without delta coefficients)


AC

● Mel-Spectrogram (with or without delta coefficients)

In addition to these common representations, we included sub-band envelopes as a

time-frequency representation. Here, sub-band envelopes of a given PCG signal (in the

form of a time-frequency feature) is formed by stacking temporal modulation envelopes

of sub-bands obtained by band-pass filtering the PCG signal (discussed in detail in the

14
ACCEPTED MANUSCRIPT

next subsection). For all features, various time and frequency resolutions were tested as

explained in the test design section. A Tukey window (with r=0.08) is applied to signal

segments just before the feature computation.

PT
Sub-band envelopes as a time-frequency feature

One of the important steps in PCG analysis by a cardiology expert is the investigation

RI
on signal shapes of murmurs and heart sounds. The experts often use dedicated software

SC
tools to apply band-pass filters (with flexible settings they can control) and check the

shape and localization of signal components. Using their experience with prior cases,

U
they check for patterns in the shapes of the signals and signal envelopes. In
AN
development of automatic classification systems, this practice can be imitated by

building time-frequency representations as stack of sub-band signal envelopes.


M

The sub-band (temporal) envelope has been successfully used as a time-frequency


D

feature in the automatic speech recognition domain [32, 33]. In the automatic PCG

analysis literature, the use of envelop signal is more common for segmentation
TE

purposes. Liu et al [18] presented a detailed review of envelope-based methods used for
EP

automatic segmentation of PCG signals. While envelope signals are successfully used in

segmentation tasks (for example [34]), they were also directly used as features fed into
C

neural network classifiers, although rarely, since the early days of automatic PCG
AC

classification [35]. Some of the wavelet based features, when the extracted coefficients

are down-sampled, can be also be interpreted as sub-band envelope features. A recent

study following that approach is by presented by Deng & Han [36], where sub-band

envelopes are calculated from discrete wavelet decomposition (DWT) coefficients. [24]

15
ACCEPTED MANUSCRIPT

uses median powers of sub-band signals which can also be considered as a similar

representation where very few samples were used for each sub-band envelope.

The sub-band envelopes can be computed in various ways. We have chosen the

following steps (also depicted in Figure 3) for computing sub-band envelopes of a PCG

PT
segment:

● Band-pass filtering applied to PCG signal using Gammatone filter banks3

RI
● Envelope detection via computing analytical signal using Hilbert transform

SC
● Envelopes are resampled to a specific time-resolution (inherently involves low-

pass filtering that removes high-frequency components)

U
● Logarithmic compression applied to the final envelope signals
AN
● All envelopes are stacked to obtain an image-like time-frequency representation

● The matrix obtained is processed to have zero-mean and normalized amplitude


M

In Figure 3, we present the flow diagram for the process and an example for feature
D

extraction, depicting the sub-band signal envelopes computed and final feature obtained

in matrix form. The top sub-plots include 8 sub-band signals and their resampled
TE

versions extracted from the original PCG signal (shown in blue). Considering this
EP

particular example, after stacking 8 vectors (corresponding to 8 sub-bands) of size 128

(number of time bins), a 8*128 image-like representation is derived and plotted with
C

color coding element values resulting in the bottom sub-plot which is the main feature
AC

used as input to the classifier.

3
Filter banks designed using Python library by Jason Heeris: https://github.com/detly/gammatone

16
ACCEPTED MANUSCRIPT

PT
RI
Figure 3: Sub-band envelopes feature computation. Original PCG signal depicted in

SC
blue. Sub-band envelope feature is the matrix obtained at the output of the feature

extraction process which is depicted as a colored image via mapping matrix coefficients

U
to color code (low values: dark, high values: bright).
AN
M

3.2 Machine learning

A large variety of neural network models may be used for the PCG classification task.
D

Our tests are limited to use of feed-forward CNN models applied on frame level
TE

features, one of the most popular approaches in the recent state-of-the-art systems in

this domain.
EP

To keep the number of tests limited (so that tests can be repeated in a reasonable
C

amount of time), we have considered three similar models which include common
AC

sequence of layers used for similar tasks in literature: 2D convolutional layers (with

kernel size 3 by 3, rectified linear unit activation) followed by max-pooling and drop-

out layers. The input dimension is equal to the feature dimension and the output

dimension is two (number of categories: normal and pathological). The models are

17
ACCEPTED MANUSCRIPT

implemented using Keras4 with TensorFlow5 as the backend. The Keras models and all

other design parameters are available from the accompanying repository6. Since PCG

database sizes are often relatively small (compared to other automatic classification

tasks with CNNs), complex models with high capacities learn to memorize the train

PT
data. For this reason, the number of layers were kept to be small and L1-regularisation

is applied to avoid over-fitting. The number of 2D convolutional layers in the three

RI
models are: 1,2 and 4.

SC
Each model is designed to compute probability of a segment to belong to a recording

of a patient with pathology. Further, to compute a file’s/patient’s probability of

U
belonging to pathological class: all frame probabilities are sorted, 15% lowest and %15
AN
highest values are discarded and finally, the probability for a file is computed as the

mean probability of these remaining frames.


M
D

4. Test design

Here, we first explain the data used in the tests and further discuss various dimensions
TE

of the automatic system considered in the test design process.


EP

Databases:
C

Two databases (with large differences in patient ages and the pathologies) were
AC

considered for the comparative tests. The first database is a proprietary database

representing real-life scenarios for PCG based pediatric cardiology-screening. It is one

of the largest exclusively pediatric digital phonocardiogram databases. The second

4
https://keras.io
5
https://www.tensorflow.org
6
https://github.com/barisbozkurt/AutomaticPCGclassification script models.py

18
ACCEPTED MANUSCRIPT

database is a publicly available one, involving mainly adult heart sound recordings, it

represents the largest up-to-date phonocardiogram database worldwide, and is used for

comparing various classification studies in the literature.

PT
University of Crete, PCGs with murmur (UoC-murmur) database:

Our database is composed of anonymized digital phonocardiograms (4 to 10 seconds

RI
length including 4 to 18 PCG cycles with an average of 8 cycles), obtained from

SC
pediatric cardiology outpatients as standard of care (provided time allowance) and from

a pilot pediatric cardiology screening program for school age children (8-year olds),

U
approved by Greek Ministry of Education and local Health Authorities, including digital
AN
phonocardiogram as a component for pediatric heart disease screening (Cretan Pediatric

Cardiology Screening program-CPCS). The database includes abnormal murmurs


M

associated with various types and severity levels of CHD. This database is proprietary
D

and is not publicly available.

Each recording was labeled as normal (i.e. having innocent murmur) or abnormal by
TE

a single expert in pediatric cardiac auscultation (I.G, the second author) based on
EP

clinical auscultation and final echocardiography confirmatory study. Our database

involves therefore samples with abnormal murmurs obtained from children of various
C

ages, and often suboptimal recording conditions, or innocent murmurs which were
AC

either difficult for their pediatricians to classify as such, or were recorded during

primary school visits (associated with high probability of external noise). The available

database is representative of assessment of pediatric patients of various ages in real life

conditions.

19
ACCEPTED MANUSCRIPT

Due to high cost of echocardiography confirmatory analysis, only part of this

database (83 PCG samples) has been cross-validated blindly by two pediatric cardiology

experts independently [37]. The database includes samples with various levels of CHD

considered as a “difficult to classify”, even for experts. It is a good representative of

PT
real-life daily clinical challenges scenario. Selected recordings of the same database

have been also used for teaching purposes [6, 38]. Representative digital

RI
phonocardiograms of this database, along with extended introductory web-lectures in

SC
pediatric cardiac auscultation are free available as open sources material in the

institutional web-server7.

U
The database contains 336 recordings from 327 healthy children with innocent
AN
murmurs and 130 recordings from 117 children with various forms of CHD, of various

ages (infants-adolescents). The technique of digital phonocardiogram recording has


M

been standardized and described previously [37]. Briefly, a sensor based electronic

stethoscope with incorporated 3-lead ECG8 was used. Four recordings of were
D

performed from each patient, corresponding to the apical, lower left (fourth intercostal
TE

space) and upper (second intercostal space) left and right parasternal location. Digital
EP

acoustic data (with a sampling rate of 44100 kHz, 16-bit dynamic resolution) and ECG

signals, were transferred and stored as wave files, in a personal laptop computer using
C

the designated software9. Any personal identification data has been removed and
AC

replaced by a random ID prior to data analysis.

For each patient, one or two recordings was selected by the expert to have the highest

quality for murmur detection and all other recordings were removed from the set. The

7
https://opencourses.uoc.gr/courses/enrol/index.php?id=367 Password for the video lectures is available
upon request from the authors.
8
TheStethoscope®; Welch Allyn-Meditron, Welch Allyn Inc., NY, USA
9
Meditron Analyzer 4®

20
ACCEPTED MANUSCRIPT

following steps were applied for pre-processing of the original data: i) ECG data was

down sampled to 882 Hz and PCG data was down sampled to 4410 Hz, ii) both ECG

and PCG signals were amplitude normalized to have a maximum level of 0.9.

PT
PhysioNet-2016 database:

Recently, a large open PCG database has been announced: Classification of

RI
Normal/Abnormal Heart Sound Recordings: the PhysioNet/Computing in Cardiology

Challenge 201610 [12]. This database includes a compilation of various other databases

SC
and is a very good resource for comparing a specific system with various state-of-the-art

U
algorithms without the need of implementing them and running experiments. The
AN
PhysioNet-2016 data includes some very noisy data (even some non-PCG samples) and

does not include ECG channels. Detailed profiles of the 9 included databases are
M

provided in Section 2 and Table 1 of [12], reaching a total number of 2435 PCG
D

recordings.
TE

Deciding system settings


EP

An automatic PCG classification system involves the following basic blocks:

segmentation, feature extraction and machine learning. Our test design started with
C

considering cross-combinational settings for these blocks. As the first step, an initial list
AC

of combinations has been created:

Segmentation strategies (11 options):

10
https://www.physionet.org/challenge/2016/

21
ACCEPTED MANUSCRIPT

● Period synchronous segmentation with: 0.5 period, 1 period, 2 period sized

segments, or fixed sizes of 0.5, 1, 2, 3 seconds segments. Overlap exists if size exceeds

the local period.

● Period asynchronous segmentation with fixed sized segments of 0.5, 1, 2 or 3

PT
seconds with overlap of 1 second.

Features (48 options):

RI
● Spectrogram, Mel-spectrogram, MFCC and sub-band envelopes

SC
● Time resolutions: 32, 64, 128 (points)

● Frequency bands: 8, 16, 24, 32 (bands)

U
Machine learning models (3 options):
AN
● Models with 1,2 or 4 2D convolutional layers

Databases (2 options):
M

● UoC-murmur database: innocent murmur versus pathological murmur


D

● PhysioNet-2016 database: normal versus pathological

This initial list refers to 1584 systems (asynchronous systems to be tested on two
TE

databases) where each test would also need to be repeated several times to remove bias
EP

of random splitting applied to the databases (explained further in this section).

While we think its worthwhile to consider all these settings, due to this high number
C

of tests, several additional options (such as using other machine learning models
AC

(LSTM,RNN,etc.), using other file-level features in literature) have been left out. For

practical reasons of computation time, some dimensions have been considered in

isolation without test repetition. Leaving out the worst-cases in these preliminary tests,

the list has been reduced to a total of 90 systems for the final tests. Due to space

22
ACCEPTED MANUSCRIPT

considerations, here, we will only mention our observations that has lead us to leaving

out some options.

In isolated and reduced combinational tests without repletion, we have sorted

systems with respect to their F1 measures and observed that systems using spectrogram

PT
performed with the lowest scores. Hence, spectrogram was removed from the feature

list. Segment lengths defined in relative to local period length did not appear more

RI
advantageous than using fixed sizes and were also removed. For tests with period

SC
asynchronous segments, 0.5 and 1 second lengths were too short, learning could not

converge for those cases. Frequency bands higher than 16 did not bring improvement

U
(PCG spectrum is limited to 2.2kHz) and the performances observed for 8 frequency
AN
bands and 16 frequency bands were similar. Sorting all systems, machine learning

models with 2 and 4 convolutional layers were ranked higher than the systems with a
M

single convolutional layer. Using delta coefficients with Mel-Spectrogram and MFCC
D

was also tested and observed to bring no improvement.


TE

Tested system settings


EP

We finally arrived at the following reduced list of (90) systems for which the tests

can be re-run/repeated with a single GPU in a few days for one of the databases.
C

Segmentation strategies (5 options):


AC

● Period synchronous segmentation with 0.5, 1, 2 second lengths

● Period asynchronous segmentation with 2, 3 second lengths

Features (9 options):

● Mel-spectrogram, MFCC and sub-band envelopes

● Time resolutions: 32, 64, 128

23
ACCEPTED MANUSCRIPT

● Frequency bands: 16

Machine learning models (2 options):

● Models with 2 or 4 2D convolutional layers

● Our tests involve performing repeated experiments for 90 systems (54 period

PT
synchronous and 36 asynchronous) on the UoC-murmur database and then picking a

high performance period asynchronous system and repeating tests for this system on

RI
PhysioNet-2016 data. In the tests with the UoC-murmur database, for each

SC
segmentation strategy the following options have been tested: use of three different

features (Mel-spectrogram, MFCC and sub-band envelopes) with three different time

U
resolutions (32, 64, 128) and two different CNN models. Our shared repository includes
AN
the implementation of this period asynchronous system and testing scripts. The readers

can reproduce our results with PhysioNet-2016 data simply running our shared test
M

script.
D

Data splitting, augmentation and balancing


TE

For the learning experiments, the data needs to be split into three subsets: train,
EP

validation/development and testing. In our tests, the validation set is used to observe

how accuracies and losses vary during learning, altering the model parameters
C

(manually) based on these observations (for avoiding overfitting) and (automatically)


AC

saving the best model learned in a learning test (when highest accuracy is achieved for

the validation set). The split ratios used for train, validation and test are 65%, 15% and

20%.

An important detail in random splitting is to ensure each set is composed of

completely independent samples. Splitting is performed at file level keeping a similar

24
ACCEPTED MANUSCRIPT

distribution of sample numbers in categories per set (i.e. train, validation and test sets

includes similar distributions of normal and pathological cases).

Data augmentation refers to creating new samples in an artificial/automatic way to

increase the size of the database for training and has been shown to be beneficial in

PT
many applications [39]. One straightforward way of adding new samples is creating

new copies of existing samples via applying transformations for which the system

RI
should be invariant to. For our problem, we would like our system to be invariant to

SC
minor or moderate variations in heart rate and murmur frequency band. One easy way to

create new samples with varied heart rate and murmur frequency band is to resample

U
existing samples and save them as if the sampling rate is not altered. This would
AN
compress/expand the spectrum which corresponds to modification of the murmur

frequency band and the heart rate.


M

Data augmentation is performed by changing the sampling rate with a random value
D

in range 10%-20% on randomly selected samples. In all tests, an augmentation ratio of

2 is used (i.e. the size of the data is doubled). Data augmentation is applied to only the
TE

train set.
EP

Our original database is unbalanced in terms of the number of samples in each

category: samples in the pathological category are lower in number. Balancing the data
C

could be easily performed by leaving out samples from the largely populated category.
AC

However, we cannot afford leaving out samples due to the low database size. We have

followed an alternative path: creating new samples for the category with few samples

using re-sampling. The procedure used is the same as data augmentation step. Balancing

operation, via creating new transformed samples of original files, is applied to the train

and validation sets but not to the test set.

25
ACCEPTED MANUSCRIPT

5. Test results

While the number of systems to be tested and compared is reduced to 90, there is still

need for a way to sort the systems in terms of performance. For our application of

screening, we would like our automatic systems to detect as many pathological cases as

PT
possible (i.e. we want to increase the true positive rate(TPR)) while we can tolerate

some normal cases to be labeled as pathological (i.e. we can tolerate some increase in

RI
the false positive rate(FPR)). In a real life scenario, this would correspond to labeling

SC
high number of samples as pathological, referring some extra normal cases to an expert

for consultation. The output of the automatic classification system for each sample is

U
the probability for belonging to a category. The straight-forward class assignment is
AN
performed by using 0.5 as threshold for probability in a binary classification task.

Reducing the threshold for pathology detection, more cases will be labeled as
M

pathological. This would increase both true and false positive rates. For finding an
D

optimum point of operation, TPR versus FPR are plotted for different threshold values

and the commonly used Receiver Operating Characteristics(ROC) curves are obtained.
TE

The area under the ROC curve is considered as the main measure of performance for the
EP

ranking. Following the sorting, we also provide other performance measures for a

selected system.
C
AC

Test results with the UoC-murmur database:

To start our comparison of various features with a sample, below we present three ROC

curves obtained for three different features while keeping all other settings the same:

time resolution of 32, 16 frequency bands, ECG synchronous segmentation with a fixed

length of 500 milliseconds, using CNN model with 2 convolutional layers. ROC curves

26
ACCEPTED MANUSCRIPT

are obtained via averaging 5 repeated random experiments on the UoC-murmur

database.

PT
RI
U SC
AN
Figure 4: ROC curves for systems using three different features while keeping all other

design parameters fixed


M

In Figure 4, the best system among the three is the one using sub-band envelopes
D

since the ROC curve for that system is closer to the upper left corner (high TPR, low
TE

FPR) and the area under the ROC is largest. Following the intuition of this sample, for

comparison of 90 systems, we have used the area under the ROC as a single measure to
EP

sort all system performances. Since random splitting is involved, tests are repeated 5
C

times and average ROC curves were used for each system.
AC

Table 1: Sorted list of systems with respect to area under the ROC. Naming convention:

M1/2: CNN model number, eSyn: ECG synchronous, ASyn: asynchronous. Rightmost

number refers to fixed length of the frame in milliseconds. Table includes the best and

worst 25 systems. Please refer to the github repository for the table with all 90 systems:

/results4allSystems_UocDba/sortingWRTareaUnderROC.txt

27
ACCEPTED MANUSCRIPT

Rank Area System setting (Rank: 1-25) Rank Area System setting (Rank: 66-90)
under under
ROC ROC
1 0.8772 M2SubEnv128by16_eSyn500 66 0.6945 M2MelSpec128by16_nASyn2000
2 0.8764 M2SubEnv32by16_eSyn500 67 0.6897 M2MelSpec64by16_nASyn3000
3 0.8736 M2SubEnv64by16_eSyn1000 68 0.6878 M2MelSpec32by16_eSyn500
4 0.8716 M1SubEnv32by16_nASyn2000 69 0.6873 M2MelSpec32by16_eSyn1000
5 0.8705 M2SubEnv32by16_eSyn2000 70 0.6808 M2MelSpec32by16_nASyn2000

PT
6 0.8691 M2SubEnv64by16_eSyn500 71 0.6719 M2MFCC64by16_eSyn500
7 0.8679 M2SubEnv64by16_eSyn2000 72 0.6662 M2MelSpec128by16_nASyn3000
8 0.8648 M2SubEnv32by16_nASyn2000 73 0.6657 M1MFCC128by16_nASyn2000

RI
9 0.8646 M2SubEnv128by16_eSyn2000 74 0.6552 M2MFCC64by16_eSyn1000
10 0.8640 M1SubEnv32by16_eSyn500 75 0.6544 M2MelSpec64by16_nASyn2000

SC
11 0.8636 M2SubEnv32by16_eSyn1000 76 0.6516 M1MFCC32by16_nASyn3000
12 0.8617 M1SubEnv32by16_eSyn1000 77 0.6509 M1MFCC64by16_nASyn2000
13 0.8595 M2SubEnv128by16_eSyn1000 78 0.6502 M1MFCC32by16_nASyn2000

U
14 0.8522 M1SubEnv32by16_nASyn3000 79 0.6482 M1MFCC64by16_nASyn3000
15 0.8497 M1SubEnv32by16_eSyn2000 80 0.6452 M1MFCC128by16_nASyn3000
AN
16 0.8437 M1SubEnv64by16_eSyn1000 81 0.6334 M2MFCC32by16_eSyn500
17 0.8404 M1SubEnv64by16_eSyn2000 82 0.6209 M2MFCC64by16_eSyn2000
18 0.8362 M1SubEnv64by16_eSyn500 83 0.6205 M2MFCC32by16_eSyn2000
M

19 0.8352 M1SubEnv128by16_eSyn500 84 0.6039 M2MFCC128by16_eSyn2000


20 0.8350 M2SubEnv32by16_nASyn3000 85 0.5928 M2MFCC128by16_nASyn2000
D

21 0.8258 M1SubEnv64by16_nASyn3000 86 0.5917 M2MFCC64by16_nASyn2000


22 0.8255 M1SubEnv128by16_eSyn2000 87 0.5844 M2MFCC128by16_nASyn3000
TE

23 0.8245 M1SubEnv128by16_eSyn1000 88 0.5835 M2MFCC32by16_nASyn3000


24 0.8238 M1MelSpec32by16_eSyn1000 89 0.5762 M2MFCC32by16_nASyn2000
25 0.8227 M1MelSpec32by16_eSyn2000 90 0.5604 M2MFCC64by16_nASyn3000
EP

ROC curves for these best and worst 20 systems are presented below.
C

Best 20 systems Worst 20 systems


AC

28
ACCEPTED MANUSCRIPT

PT
RI
Figure 5: ROC curves of best and worst 20 systems tested on the UoC-murmur database

SC
In figure below, we present ROC curves of systems using specific features.

Systems using sub-band envelopes


U Systems using MFCC
AN
M
D
TE

Systems using Mel-Spectrogram


C EP
AC

Figure 6: ROC curves grouped in terms of features

The test results show that systems using sub-band envelopes are ranked higher than

those using MFFC and Mel-Spectrogram features: 23 systems out of the best 25 use

29
ACCEPTED MANUSCRIPT

sub-band envelope as the feature. ROC curves also support this observation: ROC

curves of systems using sub-band envelopes are closer to the left top corner compared

to other ROC curves. One interesting observation is that a system using asynchronous

frames have high enough performance to be ranked as fourth. This is specifically

PT
important since period marking, therefore the ECG channel, is not needed in the design

of such systems.

RI
To compare performances of synchronous and asynchronous systems, in Figure 7,

SC
we provide the ROC curves for systems using sub-band envelopes into two groups, one

that uses synchronous frames, the other asynchronous frames.

U
Synchronous systems (sub-band env.) Asynchronous systems (sub-band env.)
AN
M
D
TE
EP

Figure 7: Break-down of Figure 6a (ROC curves of systems using sub-band envelopes)

into two groups: systems applying synchronous segmentation and systems applying
C

asynchronous segmentation
AC

From Figure 7, we observe that synchronous system performances are in general

higher but a few of the asynchronous system performances are comparable to highest

performances of the synchronous systems. This is also reflected in Table 1: the fourth

ranked asynchronous system has a ROC area of 0.8716 where the best system

(synchronous) has a ROC area of 0.8772. This observation suggests that an

30
ACCEPTED MANUSCRIPT

asynchronous system (designed applying successful parameter optimization) can have a

performance very close to performances of synchronous systems (as also reported by

Zabihi et al [25]).

PT
Test results with PhysioNet-2016 data:

Thanks to the authors of PhysioNet-2016 data [18], it serves as an excellent resource for

RI
comparing new proposals with recent state-of-the-art systems without the need of

SC
implementing these systems and re-running the experiments of the challenge since the

performances of these systems are already reported in [12]. Here, we present our tests

U
carried with this openly available data and report our system performance which can be
AN
contrasted with results in [12].

For PhysioNet-2016 data, ECG channels are not available. Recently, Zabihi et al [25]
M

has shown that high performances can also be achieved using asynchronous frames
D

(systems without segmentation) which is also supported by our observation above. We

have run experiments for the most performant system using asynchronous frames that
TE

was ranked as fourth in experiments with the UoC-murmur database.


EP

Using a hop size of 1 second, and balancing via creation of new samples, the number

of frames extracted from the PhysioNet-2016 database were 103228. As the number of
C

segments is relatively very high, data augmentation is not applied in these tests. Each
AC

test is repeated 5 times and results are averaged. In Table 2 we present the confusion

matrix and other performance measures the system: M1SubEnv32by16_nASyn2000

31
ACCEPTED MANUSCRIPT

Table 2: Confusion matrix for CHD risk detection (after averaging results of 5

experiments, using 0.5 as probability threshold for class assignment) System:

M1SubEnv32by16_nASyn2000

n = 301 (test set size) Pathological (predicted) Normal (predicted)

PT
Pathological (actual) 127.6 23.4 151

Normal (actual) 32.2 117.8 150

RI
159.8 141.2

SC
Sensitivity = 0.845, Specificity = 0.785, Accuracy = 0.815

Openly accessible Physionet-2016 contains a train set and a validation set (shared

U
with the aim of pre-testing functionality of a submission to the challenge) which is
AN
actually a subset of the train set. Since the main aim is to run an open challenge, test set

is not available. For facilitating comparison of our results with tests in other studies, we
M

decided to use the validation set provided as the test set, removed copies from the train
D

set and further split the train set into train and validation subsets (this validation set
TE

referring to the subset in a machine learning experiment). The implementation of this

system, testing scripts (that downloads PhysioNet-2016 data, performs splitting and
EP

runs the experiments) and more detailed results involving other evaluation measures has

been shared openly on github11 for facilitating reproducibility of our test results.
C

In [12] Table 3, the top 8 systems’ (out of 48 submitted systems) performances are
AC

listed to have specificity values ranging in 0.7120 to 0.9424, specificity values ranging

in 0.7569 to 0.9521 and mean accuracy ranging in 0.7057 to 0.8602. These values are

computed via applying weighting with respect to the signal quality on classification

results obtained on the test data that is not openly available. In the tests we have carried

11
https://github.com/barisbozkurt/AutomaticPCGclassification

32
ACCEPTED MANUSCRIPT

(with train, validation and test set split explained above), the following scores have been

obtained for our shared system: 0.845 sensitivity, 0.785 specificity and 0.815 mean

accuracy. While these results cannot be directly compared with results in [12] (since

they are not computed on the same test subset and weighting is not applied), they show

PT
that our system performs similar to the top ranked state-of-the-art systems. The reader

can refer to the complete table in [12] for details regarding performances of the best

RI
systems in the challenge.

SC
6. Conclusions

U
This study targeted comparing various features and segmentation strategies in the
AN
context of automatic PCG classification for screening purposes, based on feedforward

convolutional neural networks trained on time-frequency representations of segmental


M

PCG frames. To arrive at an optimum design, 90 different system settings were tested
D

on a challenging dataset containing innocent and abnormal murmur cases (UoC-murmur

database) and a system selected to have high performance in these tests was also tested
TE

on the PhysioNet-2016 data containing normal and pathological cases. The codes (of
EP

this specific system and test scripts) have been openly shared with the community for

reproducibility of our study and facilitating comparisons with state of the art. We should
C

stress here that our main contribution with this manuscript is in comparing various
AC

segmentation and feature computation strategies, not proposing a single best system that

is more performant than state of the art. Our analysis with PhysioNet data supports the

fact that the comparative tests have been carried using system architectures as

performant as state-of-the-art systems.

33
ACCEPTED MANUSCRIPT

For ranking 90 distinct systems, ROC curves are obtained via applying different

levels of thresholds for final categorization from probabilities of pathology and area

under ROC has been used as the single measure representing potential of each system

for screening applications. All systems are sorted in terms of area under the ROC for

PT
comparison. Further we have provided other performance measures for a selected

system. The sensitivity and specificity are critical measures for screening applications.

RI
Together with accuracy, these evaluation metrics are most common in comparative

SC
studies such as [12].

As presented in Table 1, the systems using sub-band envelopes have the highest

U
ranks in the sorted list of systems (with respect to area under ROC): the 23 highest rank
AN
systems out of 90 use sub-band envelopes as the feature. Considering most of the state-

of-the-art systems prefer MFCCs as time-frequency representation, this is an important


M

observation. The ROC curves of the systems using sub-band envelopes are in general
D

closer to the left-top corner than systems using MFCC or Mel-Spectrogram (Figure 6).

The UoC-murmur database included PCG samples with murmur which were
TE

recorded from patients who were referred to a cardiology expert. That means the
EP

pediatricians have considered all cases in this data set to have a potential for heart

malfunction/defect risk, hence this is indeed a challenging set for an automatic


C

classification task. Our database consists exclusively of pediatric digital


AC

phonocardiograms, corresponding to various levels of CHD (with the mildest forms

being misclassified also by experienced pediatric cardiologists), as well as from

innocent murmurs, most being misclassified as abnormal by treating physicians.

Compared to adult auscultation, specific challenges also exist for auscultation of young

children. Obtaining clean recordings free of scratch noise are in some cases difficult.

34
ACCEPTED MANUSCRIPT

Heart rate is often higher compared to adults (up to double of the adult norm) which

leads to further challenges in period marking and selection.

The best system developed through tests on our data (UoC-murmur database) is

M1_SubEnv64by16_eSyn1000: CNN with 2 convolutional layers using sub-band

PT
envelopes with time resolution of 64, 16 frequency bands computed period synchronous

1 second frames as the feature. This system has not been tested on the PhysioNet-2016

RI
data due to unavailability of the ECG channel. Following the tests with the UoC-

SC
murmur database, a system using period asynchronous frames has been tested on the

PhysioNet-2016 challenge data: the highest performing asynchronous system (ranked 4

U
in tests with UoC-murmur database). We have shown that our asynchronous system
AN
performs similar to the top ranked state-of-the-art systems reported in [12] with 0.845

sensitivity, 0.785 specificity and 0.815 mean accuracy.


M

7. Discussions and future work


D

Our study involves some processes requiring further in-depth analysis which we

consider as challenges for further studies. The first is gaining better understanding for
TE

the effectiveness of data augmentation step applied and alternatives for it. While the
EP

transformation applied is mild (expanding/contracting %10-20 of original duration),

uniform resampling does not reflect variability of the cardiac cycles governed by
C

physical constraints of the heart. Data augmentation strategies respecting the physical
AC

settings of the heart should be developed.

For sub-band envelope computation, we have only considered one specific setting of

Gammatone filter banks: simply setting the number of banks to 8, 16, 24, etc. Our study

lacks an in-depth analysis of the sub-band filtering process. Gammatone filter banks has

been preferred as it reflects some of the auditory response characteristics (although not

35
ACCEPTED MANUSCRIPT

all, such as the loudness related non-linear auditory behavior). A study of optimization

of filter-bank computation may potentially lead to improved performance.

We have applied frame-level classification which later were fused via averaging to

deduce probability of the whole PCG signal to belong to a category. Many other options

PT
exist for such a step (for example majority voting). We have not tested other strategies

to avoid inclusion of one more dimension of complexity in our tests.

RI
Design of automatic PCG classification systems requires optimization of a large

SC
number of settings. Improving system performance through parameter optimization is

one option for future studies. Another direction for further studies is the use of multi-

U
sensor signal processing techniques to lower the need for experienced operators for
AN
screening applications. In [40], the authors propose noise cancellation using the multi-

channel PCG recordings that would result in a more robust PCG analysis systems. Joint
M

analysis of various modalities such as Ballistocardiography (BCG), ECG and PCG


D

recorded using multisensor systems [41,42] also has the potential to lead to improved

performance for screening applications. Building end-products and testing them in real-
TE

life scenarios is an important future direction our research community should consider.
EP

Acknowledgements
C

This project has been funded by Special Account for Research of University of Crete
AC

(code number 4305). We would like to thank the Greek Ministry of Education and the

local Health Authorities (7th Health Region Crete), for their support of CPCS program,

and the University of Crete for the support of innovative cardiac auscultation teaching

approaches (including web-lecture hosting). We would like to thank Vassilis Tsiaras for

his valuable help and assistance throughout the study and the fruitful discussions that

36
ACCEPTED MANUSCRIPT

lead to the final designs and Alena Burianova Bagaki, for the valuable assistance in

digital phonocardiogram recordings.

References

PT
[1] Rangayyan, R. M., & Lehner, R. J. (1986). Phonocardiogram signal analysis: a

review. Critical reviews in biomedical engineering, 15(3), 211-236.

RI
[2] Ferencz, C., Rubin, J. D., Mccarter, R. J., Brenner, J. I., Neill, C. A., Perry, L. W., ...

SC
& Downing, J. W. (1985). Congenital heart disease: prevalence at livebirth: the

Baltimore-Washington Infant Study. American journal of epidemiology, 121(1), 31-36.

U
[3] Van Oort, A., Le Blanc-Botden, M., De Boo, T., Van Der Werf, T., Rohmer, J., &
AN
Daniels, O. (1994). The vibratory innocent heart murmur in schoolchildren: difference

in auscultatory findings between school medical officers and a pediatric cardiologist.


M

Pediatric cardiology, 15(6), 282-287.


D

[4] Michael, S. Y., Kimball, T. R., Tsevat, J., Mrus, J. M., & Kotagal, U. R. (2002).

Evaluation of heart murmurs in children: cost-effectiveness and practical implications.


TE

The Journal of pediatrics, 141(4), 504-511.


EP

[5] Cheitlin, M. D., Armstrong, W. F., Aurigemma, G. P., Beller, G. A., Bierman, F. Z.,

Davis, J. L., ... & Kussmaul, W. G. (2003). ACC/AHA/ASE 2003 guideline update for
C

the clinical application of echocardiography: summary article. Circulation, 108(9),


AC

1146-1162.

[6] Germanakis, I., Petridou, E. T., Varlamis, G., Matsoukis, I. L., Papadopoulou-

Legbelou, K., & Kalmanti, M. (2013). Skills of primary healthcare physicians in

paediatric cardiac auscultation. Acta Paediatrica, International Journal of Paediatrics,

102(2), 74–78. http://doi.org/10.1111/apa.12062

37
ACCEPTED MANUSCRIPT

[7] Hanna, I. R., & Silverman, M. E. (2002). A history of cardiac auscultation and some

of its contributors. The American journal of cardiology, 90(3), 259-267.

[8] Newburger, J. W., Rosenthal, A., Williams, R. G., Fellows, K., & Miettinen, O. S.

(1983). Noninvasive tests in the initial evaluation of heart murmurs in children. New

PT
England Journal of Medicine, 308(2), 61-64.

[9] Smythe, J. F., Teixeira, O. H., Vlad, P., Demers, P. P., & Feldman, W. (1990).

RI
Initial evaluation of heart murmurs: are laboratory tests necessary?. Pediatrics, 86(4),

SC
497-500.

[10] Geva, T., Hegesh, J., & Frand, M. (1988). Reappraisal of the approach to the child

U
with heart murmurs: is echocardiography mandatory?. International journal of
AN
cardiology, 19(1), 107-113.

[11] Telatar, Z., & Erogul, O. (2003, September). Heart sounds modification for the
M

diagnosis of cardiac disorders. In IJCI Proceedings of International Conference on


D

Signal Processing (Vol. 1, No. 2, pp. 100-105).


TE

[12] Clifford, G. D., Liu, C., Moody, B., Springer, D., Silva, I., Li, Q., & Mark, R. G.

(2016). Classification of Normal / Abnormal Heart Sound Recordings : the PhysioNet /


EP

Computing in Cardiology Challenge 2016. Computing in Cardiology, 3–6.

http://doi.org/10.22489/CinC.2016.179-154
C

[13] Leng, S., Tan, R. S., Chai, K. T. C., Wang, C., Ghista, D., & Zhong, L. (2015). The
AC

electronic stethoscope. Biomedical Engineering Online, 14(66).

http://doi.org/10.1186/s12938-015-0056-y

[14] Noponen, A.-L., Lukkarinen, S., Angerla, A., & Sepponen, R. (2007). Phono-

spectrographic analysis of heart murmur in children. BMC Pediatrics, 7(1), 23.

http://doi.org/10.1186/1471-2431-7-23

38
ACCEPTED MANUSCRIPT

[15] Shen, C.-H. (2012). Acoustic based condition monitoring. University of Akron.

[16] Abbas, A. K., & Bassam, R. (2009). Phonocardiography Signal Processing.

Synthesis Lectures on Biomedical Engineering (Vol. 4).

http://doi.org/10.2200/S00187ED1V01Y200904BME031

PT
[17] Marascio, G., & Modesti, P. A. (2013). Current trends and perspectives for

automated screening of cardiac murmurs. Heart Asia, 5(1), 213–218.

RI
http://doi.org/10.1136/heartasia-2013-010392

SC
[18] Liu, C., Springer, D., Li, Q., Moody, B., Juan, R. A., Chorro, F. J., … Clifford, G.

D. (2016). An open access database for the evaluation of heart sound algorithms.

U
Physiological Measurement, 37(12), 2181–2213. http://doi.org/10.1088/0967-
AN
3334/37/12/2181

[19] Schmidt, S. E., Holst-Hansen, C., Hansen, J., Toft, E., & Struijk, J. J. (2015).
M

Acoustic features for the identification of coronary artery disease. IEEE Transactions on
D

Biomedical Engineering, 62(11), 2611–2619.

http://doi.org/10.1109/TBME.2015.2432129
TE

[20] Markaki, M., Germanakis, I., & Stylianou, Y. (2013, May). Automatic
EP

classification of systolic heart murmurs. In Acoustics, Speech and Signal Processing

(ICASSP), 2013 IEEE International Conference on (pp. 1301-1305).


C

[21] Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for
AC

monosyllabic word recognition in continuously spoken sentences. IEEE transactions on

acoustics, speech, and signal processing, 28(4), 357-366.

[22] Chauhan, S., Wang, P., Sing Lim, C., & Anantharaman, V. (2008). A computer-

aided MFCC-based HMM system for automatic auscultation. Computers in Biology and

Medicine, 38(2), 221–233. http://doi.org/10.1016/j.compbiomed.2007.10.006

39
ACCEPTED MANUSCRIPT

[23] Vepa, J. (2009). Classification of heart murmurs using cepstral features and support

vector machines. Proceedings of the 31st Annual International Conference of the IEEE

Engineering in Medicine and Biology Society: Engineering the Future of Biomedicine,

EMBC 2009, 2539–2542. http://doi.org/10.1109/IEMBS.2009.5334810

PT
[24] Potes, C., Parvaneh, S., Rahman, A., & Conroy, B. (2016). Ensemble of Feature-

based and Deep learning-based Classifiers for Detection of Abnormal Heart Sounds.

RI
Computing in Cardiology. http://doi.org/10.22489/CinC.2016.182-399

SC
[25] Zabihi, M., Rad, A. B., Kiranyaz, S., Gabbouj, M., & Katsaggelos, A. K. (2016).

Heart Sound Anomaly and Quality Detection using Ensemble of Neural Networks

U
without Segmentation. Computing in Cardiology.
AN
http://doi.org/10.22489/CinC.2016.180-213

[26] Rubin, J., Abreu, R., Ganguli, A., Nelaturi, S., Matei, I., & Sricharan, K. (2016).
M

Classifying Heart Sound Recordings using Deep Convolutional Neural Networks and
D

Mel-Frequency Cepstral Coefficients. Computing in Cardiology, 1–4.

http://doi.org/10.22489/CinC.2016.236-175
TE

[27] Ergen, B., Tatar, Y., & Gulcur, H. O. (2012). Time-frequency analysis of
EP

phonocardiogram signals using wavelet transform: a comparative study. Computer

Methods in Biomechanics and Biomedical Engineering, 15(4), 371–81.


C

http://doi.org/10.1080/10255842.2010.538386
AC

[28] Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A.-R., Jaitly, N., … Kingbury,

B. (2012). Deep Neural Networks for Acoustic Modeling in Speech Recognition. Signal

Processing Magazine, IEEE, 29(6), 82–97. http://doi.org/10.1109/MSP.2012.2205597

40
ACCEPTED MANUSCRIPT

[29] Piczak, K. J. (2015). Environmental sound classification with convolutional neural

networks. In 2015 IEEE 25th International Workshop on Machine Learning for Signal

Processing (MLSP).

[30] Kay, E., & Agarwal, A. (2017). DropConnected neural networks trained on time-

PT
frequency and inter-beat features for classifying heart sounds. Physiological

measurement, 38(8), 1645.

RI
[31] Springer, D. B., Tarassenko, L., & Clifford, G. D. (2016). Logistic regression-

SC
HSMM-based heart sound segmentation. IEEE Transactions on Biomedical

Engineering, 63(4), 822–832. http://doi.org/10.1109/TBME.2015.2475278

U
[32] Lu, X., Unoki, M., & Nakamura, S. (2011). Sub-band temporal modulation
AN
envelopes and their normalization for automatic speech recognition in reverberant

environments. Computer Speech and Language, 25(3), 571–584.


M

http://doi.org/10.1016/j.csl.2010.10.002
D

[33] Mitra, V., Wang, W., & Franco, H. (2014, December). Deep convolutional nets and

robust features for reverberation-robust speech recognition. In Spoken Language


TE

Technology Workshop (SLT), 2014 IEEE (pp. 548-553).


EP

[34] Schmidt, S. E., Toft, E., Holst-Hansen, C., Graff, C., & Struijk, J. J. (2008).

Segmentation of heart sound recordings from an electronic stethoscope by a duration


C

dependent Hidden-Markov model. Computers in Cardiology, 35, 345–348.


AC

http://doi.org/10.1109/CIC.2008.4749049

[35] Barschdorff, D., Bothe, A., & Rengshausen, U. (1989). Heart sound analysis using

neural and statistical classifiers: a comparison. Computers in Cardiology, 415–418.

41
ACCEPTED MANUSCRIPT

[36] Deng, S. W., & Han, J. Q. (2016). Towards heart sound classification without

segmentation via autocorrelation feature and diffusion maps. Future Generation

Computer Systems, 60, 13–21. http://doi.org/10.1016/j.future.2016.01.010

[37] Germanakis, I., Dittrich, S., Perakaki, R., & Kalmanti, M. (2008). Digital

PT
phonocardiography as a screening tool for heart disease in childhood. Acta Paediatrica,

International Journal of Paediatrics, 97(4), 470–473. http://doi.org/10.1111/j.1651-

RI
2227.2008.00697.x

SC
[38] Germanakis, I., & Kalmanti, M. (2009). Paediatric cardiac auscultation teaching

based on digital phonocardiography. Medical education, 43(5), 489-489.

U
[39] Wong, S. C., Gatt, A., Stamatescu, V., & McDonnell, M. D. (2016, November).
AN
Understanding data augmentation for classification: when to warp?. In Digital Image

Computing: Techniques and Applications (DICTA), 2016 International Conference on


M

(pp. 1-6).
D

[40] Nunes, D., Leal, A., Couceiro, R., Henriques, J., Mendes, L., Carvalho, P., &
TE

Teixeira, C. (2015, August). A low-complex multi-channel methodology for noise

detection in phonocardiogram signals. In Engineering in Medicine and Biology Society


EP

(EMBC), 2015 37th Annual International Conference of the IEEE (pp. 5936-5939).

IEEE.
C

[41] Nedoma, J., Fajkus, M., Martinek, R., Kepak, S., Cubik, J., Zabka, S., & Vasinek,
AC

V. (2017). Comparison of BCG, PCG and ECG signals in application of heart rate

monitoring of the human body. In Telecommunications and Signal Processing (TSP),

2017 40th International Conference on (pp. 420-424). IEEE.

42
ACCEPTED MANUSCRIPT

[42] Marcelli, E., Capucci, A., Minardi, G., & Cercenelli, L. (2017). Multi-sense

cardiopatch: A wearable patch for remote monitoring of electro-mechanical cardiac

activity. Asaio Journal, 63(1), 73-79.

PT
RI
U SC
AN
M
D
TE
C EP
AC

43
ACCEPTED MANUSCRIPT
Highlights:

● In a CNN-based PCG classification scheme using time-frequency


representations, sub-band envelope features are preferable over the most
commonly used features: MFCC and Mel-Spectrum
● While deep learning based methods are considered end-to-end with little use
of domain knowledge, domain knowledge (how humans perform a particular
classification task) can be beneficial in designing the input representation (in
our case sub-band envelopes) that leads to more performant systems

PT
● Automatic PCG classification technology is well developed to start supporting
real-life screening applications

RI
U SC
AN
M
D
TE
C EP
AC
ACCEPTED MANUSCRIPT
Conflict of interest statement for manuscript:

Title: A study of time-frequency features for CNN-based automatic heart sound


classification for pathology detection
Authors: Baris Bozkurt, Ioannis Germanakis, Yannis Stylianou

The authors of this manuscript claims no conflict of interest with any person or institutional
body.

PT
RI
U SC
AN
M
D
TE
C EP
AC

Вам также может понравиться