Audio Chat For Pretty Good Concealing (AC4PGC) - Part I - CodeProject

22/07/2019 Audio Chat for Pretty Good Concealing (AC4PGC) - Part I - CodeProject
Audio Chat for Pretty Good Concealing (AC4PGC) - Part

I
Clark Fieseln
9 Jul 2019 CPOL
Low rate data exchange (e.g., chat) using audio steganography, ensuring privacy, anonymity and cybersecurity
Introduction
With increasing levels of online data surveillance, user activity tracking and user profiling, threats to online data security and user
privacy continues to worry most of "us", normal users. But, can we do anything to protect our privacy when even the most
widespread tools seem to fail this goal? The answer is "yes". The following video inspired me in trying to contribute:
ProtonMail, Min. 8:50
After some months of hard work, the result is the Android App shown below, a solution which actually works!
..and to be honest, it works really well ! ..for almost no money, besides a few bucks needed for cables and adapters:
AC4PGC (Audio Chat for Pretty Good Concealing) - "SECRET Audio Chatting"
While Videoconferencing!
https://www.codeproject.com/Articles/5161775/Audio-Chat-for-Pretty-Good-Concealing-AC4PGC-Part?display=Print 1/11
AC4PGC (Audio Chat for Pretty Good Concealing) - Screenshot of Loop Test
Steganography, a "game changer" in this context has not yet spread out. The practice of steganography (data hiding in an
innocuous carrier-medium) has lots of potential to contribute in a meaningful way towards mitigating some of these concerns. The
present idea embodies audio steganography methods for embedding a covert message within a digital audio signal transmitted
over an analog channel used as a data-diode. Two devices, A and B, are only connected e.g., over their speaker and microphone
interfaces (speaker1 --> mic2, mic1 <-- speaker2), resulting in two separate physical channels, each allowing controlled
unidirectional data transmission, behaving like a data-diode; thus providing the basis for independent transmission and reception
channels which will not allow any infection or leakage of data.
Figure 1: Device Connection
Warning: Connecting audio interfaces as shown in Figure 1 may damage your device! Although I experienced no
problems when doing all sorts of things with several different devices, care shall be taken considering the allowed input
levels of the microphone/line-in interfaces. Modern devices adapt their impedance automatically depending on the
detected audio signal level, so in the general case no problems will occur.
The Idea in Short

Low rate, full-duplex bidirectional communication over a covert channel is established by transmitting data in superposition over
voice, the carrier signal. While data is superposed over voice in the time domain, in frequency domain it is allocated in an own
frequency-range, complementary/non-overlapping to the frequencies used by voice. Thus constructing a side channel with own
characteristics, e.g., low signal-power (which allows placing the coding signal at noise level). The following arrangement shows the
general idea in a basic configuration:
Figure 2: Basic Configuration
Devices B and C, possibly separated by a long distance, communicate with each other transmitting bidirectional audio information
over an open network (over a public switched telephone network (PSTN), the internet, or any other network) and using standard
communication protocols like VoIP. Devices B and C could be two laptops making a videoconference or teleconference using
Skype, or even better, using qTox. Devices B and C could also be mobile phones or landline-telephones. Devices A and D could be
two smartphones. The main assumption is that all these devices make use of standard interfaces and technologies and are
vulnerable to exploits of all sorts. Concealed communication between devices A and D is achieved by “plugging” them between the
audio peripherals (e.g. speaker and microphone) and devices B and C, as shown in Figure 2, thus being able to use the audio carrier
as a transmission media for secret messages. In addition, the secret messages can be encrypted. Devices A and D have each a
display to show the secret messages and an input interface to enter them (e.g. a smartphone touch-screen combining both). Before
connection establishment, devices A and D shall be put offline in order to prevent any information leakage. This can be done, e.g.,
by turning off the following interfaces if available:
WLAN
LAN
Bluetooth
USB
While offline, devices A and D offer “almost” Air-Gap conditions increasing security. These devices are connected only via analog
audio interfaces to B and C, thus implementing two data-diodes which prevent any attacks from outside or leakage of information.
That is, during session establishment, the generated or typed keys cannot be sent to an attacker. This is a kind of “enhanced”-end-
to-end encryption which is one of the major advantages of this method.
Further Features
Alternatively, assuming the possibility to connect other “trusted networks or devices”, devices A and D can be used to receive and
send the message from/to other sources, offering for example the following services:
Repeater (digital-audio with embedded message converted to analog audio).

Gateway (data adapter, socket-communication with message converted to analog-audio with the embedded message.
This could be used to extend existent chat applications).
Tunnel (protocol embedder, low-rate-protocol to analog-audio. In this case, the low-rate-protocol is the message).
Adapter (proprietary protocol to convert Digital Files stored in media into analog-audio.
In this case, the binary file is the message.
Each of these alternatives have several advantages:
Increased anonymity, geographical range extension

Increased anonymity, geographical range extension, reuse of existent applications (chat)
Increased anonymity, geographical range, reuse of existent solutions
Increased anonymity, reuse of existent solutions (file system)
The following diagram shows an example:
Figure 3: Extended Configuration
In the example presented in Figure 3, the stego-device D offers services (repeater, gateway, tunnel, adapter) to device E located in a
trusted network. Depending on the service used, the “enhanced” end-to-end encryption is realized between devices A and E or
between devices A and D. In this example, the source and recipient of secret messages are devices A and E, or the “users” working
on these devices.
In AC4PGC, the "Gateway Mode" is already implemented, allowing to forward messages received over sockets and vice
versa. That is, the following connections are supported:
audio <--> audio

audio <-- (gateway) --> sockets (as shown in Figure 3 between devices A, D and E)
sockets <-- (gateway) -- (gateway) --> sockets
sockets <--> sockets
Communication Protocol
Due to possible loss, corruption or repetition of data during transmission and reception, as well as the need to exchange a public
key in each session, a simple communication protocol is required. Besides the data field, the telegrams of the stego-protocol consist
at least of the following fields:
sequence-number
CRC
telegram type (types: key-exchange, chat, acknowledge,..)
When required, the chat information will be retransmitted. In idle-mode, when no data is transmitted, the protocol transmits instead
dummy-telegrams with pseudo-random bits as a countermeasure against steganalysis. On correct reception of data, the receiver
will reply with a positive acknowledge. On timeout of a retransmission timer or reception of a “negative” acknowledge, the sender
will retransmit the last telegram. The complete communication protocol is transmitted as embedded steganographic data, hidden in
the carrier. In order to increase the defense against steganalysis, the telegram bits can be “scrambled” every time according to a
pseudo-random generator, which is initialized with the session key calculated after exchanging the public key at startup. This
feature will keep the statistical distribution of the audio signal at “normal” levels. The public cryptographic keys can be exchanged
during connection establishment using Diffie-Hellman. The session key is the same on each side obtained as a function of the
public key and the own private key: K = g^ab mod(p) = g^ba mod(p) Everything is based on the previous agreement on a prime
(p) and a generator (g) upon which the private keys are generated and the public keys are calculated.
Message Embedding
The following diagram shows the details of message embedding:
Figure 4: Architecture Overview
In Figure 4, a rough overview of the stego-embedding and encryption is presented. The input message mA is entered with a
keyboard or touch-screen and then immediately encrypted with the encryption key (Key 4). The result is mAe. The encrypted
message mAe is then encoded applying the channel/stego-algorithm which uses Key 3. Key 3 consists of the specific “settings”
used for stego-embedding (like threshold values). The values of Key 3 and Key 4 may be derived from the session key exchanged
during connection establishment. A simple steganographic algorithm based on multiple-FSK (Frequency Shift Keying) converts
each of the bits in the input message (mAe) to different frequencies within the bandwidth of the carrier signal. In order to avoid
distortions introduced by the carrier signal a “dedicated/exclusive” and sufficiently small frequency range can be used which does
not overlap to the frequencies used by the voice. Because the bandwidth used by the voice is completely used, this requires that the
carrier signal gets the embedding-frequency-range removed/filtered with a band-stop filter FcA. Advanced embedding-techniques
may take the carrier signal cA in consideration in the process of embedding. That is, the embedding process may depend on the
current value of the carrier signal. The correction factor SA (Stego-Amplitude) is selected according to the expected or measured
channel conditions, especially depending on the Signal-to-Noise ratio (SNR):
The SA value is selected so the embedded-stego-signal is close to the noise level. Depending on the technique used, the
embedded-stego-signal can even be below the noise level. Then, with help of a correlation function, the message can be recovered.
For standard applications without “a-priori” knowledge of the channel conditions, a value of SA = MAX_AMP/1000 shows good
results. That is, with this value, the embedding is not perceptible by the human ears and it can still be recovered out of the noise
present in the carrier signal. As explained before, the carrier signal cA is filtered with a “band-stop” filter FcA which removes the
frequency range used for coding the stego message. Then, it is multiplied by the factor (1-SA) which adapts the signal to such an
amplitude that, when added with the stego-signal, it can never exceed the maximum level and saturate. With this, the signal output
to the speaker interface of the device is:
XA = cAf*(1-SA) + mAeS*SA
When considering in addition some noise added in the communication channel, we have:
YA = cAf*(1-SA) + mAeS*SA + nA
It is important to note that the channel noise is an advantage and a disadvantage at the same time. If well implemented, the
steganographic modifications embedded at noise-level will survive the transmission and will be detected correctly at the recipient.
In that case, the channel noise offers a good concealment so the attacker is not able to distinguish between natural noise and
embedding noise. On the other side, the channel noise may be too high or there may be other channel disturbances, which affect
the hidden communication.
In that case, we rely on the steganographic protocol described above, which takes care to retransmit data if required. We don't have
to forget that all carrier signals will inevitably have some added noise, being the most usual the “pink-noise”.
This “indistinguishable” noise has not been explicitly shown in Figure 4 and is just considered to be part of cA. On the top of Figure
4, the inverse process (decoding and extraction) is shown which demonstrates how the message can be recovered.
The “band-pass” filter FYB-pass will give as a result YBf, which contains only the frequency-range of the input signal yB, where the
embedded information was transmitted by the other device:
YB = cBf*(1-SA) + mBeS*SA + nB
YBf = mBeS*SA + nB’
Then, YBf is multiplied by the factor 1/SA giving back mBeS’, a “very approximate” version of mBeS. mBeS’ can be then stego-
decoded and decrypted to give back the original message mB. As mentioned before, the actual implementation will not transmit mB
directly, as shown in the simplified overview, but it will instead transmit a telegram containing mB. The telegram will support error
detection and correction assuring consistency, data integrity and timeliness. On error detection, the recipient can send back a
“negative acknowledge” or simply do nothing and wait for the retransmission timeout on the sender side to expire. Finally, the
„optional“ use of filter FYB-stop will result in YB‘ ~ Voice B which is output to the speaker. Even without the filter FYB-stop, a human
user is not able to perceive the embedded-signal or the missing frequencies in the voice B.
Data Throughput vs. Channel Capacity

This idea considers a full-duplex audio communication over VoIP with 64kpbs in each direction, which is the typical communication
link used by most systems based on VoIP. As a reference, it can be considered that the current App version is working with 16-bit
telegrams transmitted in 341ms -> 16/0.341 ~ 47 bps. This results in a fraction of 1361 of the total bandwidth/capacity being used
for steganographic information, which is a usual figure for stego-applications. In fact, the stego-capacity is even lower, with
64000/2722 it is only 23 bps of data-payload that are actually transmitted in every second, that is, 1 bit of stego data for every 2722
bits of audio data. This is an inevitable characteristic of this method, which is why it is most appropriate for applications where data
transmission has a low rate.
Cybersecurity Considerations
As explained above, in the stego-devices, we shall avoid the use of standard digital interfaces like WLAN, LAN, Bluetooth and USB,
all of which are known to be vulnerable against exploits. This dramatically reduces the number of measures required when
compared with devices which are online. Ironically, in the era of "digitalization", a solution based on the "good old analog
communication" seems to overcome many of the security problems we face when dealing with digital communication. The
following figure presents the layers involved in this idea and shows some of its advantages. Layers 1 up to 7 are realized in the
stego-device, offering a multiplicity of keys. Layers 7 to 9 are in the host device, which is unaware of all layers "below". As far as it
concerns, it only transports the voice (audio carrier).
Figure 5: Overview of Layers
As usual, we shall assume that all protocols, even the "proprietary" stego-protocol, are open and only the “keys” are secret.
That is, security shall only depend on the keys and not on the algorithms.
Key 1: end-to-end encryption (example: Skype)

Key 2: stego bits are scrambled according to a pseudo-random generator
Key 3: stego embedding according to “fine-tuned” settings agreed on communication setup (which is a kind of "key")
Key 4: “enhanced” end-to-end encryption of message
Summary
In short, some of the main points of the method presented in this article are:
Additional device “plugged” between audio peripherals and “unsafe” communication device
Simple communication protocol with error detection and correction
Bits of telegrams of communication protocol scrambled as a measure against steganalysis
Stego/channel-encoding based on multi-FSK in a reduced frequency range:
Audio carrier pre-filtered to remove components in coding-frequency-range

Audio carrier added to stego-signal under consideration of relative amplitudes (depending on SNR)
The proposed embedding technique is for sure not the "strongest" against steganalysis, but it works well, providing a good
compromise between complexity, real-time behavior, audio quality and robustness.
This, and many other aspects of this idea can be improved in future.
Why Do We Need Something like AC4PGC ?

Providers of chat applications offer “end-to-end” encryption as the ultimate measure against violation of privacy. Unfortunately,
end-to-end encryption is only as secure as the end-nodes, and most of the end-nodes suffer under massive vulnerability problems.
Infections with simple exploits like „key-loggers“ or even tools which make periodical „screen-shots“ give easy access to the initial
encryption key, thus making the end-to-end encrypted communication useless. By combining audio steganography, „enhanced“-
end-to-end encryption, based on the use of additional hardware, and two separate physical audio interfaces hosting a
communication protocol, a concealed data exchange is achieved which not only protects the information itself but also the fact that
a communication is being held.
Advantages
The following cannot be compromised during session:
Geolocation of users
IP of users
Identity of users
Quantity of communication
Message length
The fact that „this technique“ is being used
Solution based on „ubiquitous“ and cheap technologies (audio interfaces) making it accessible for everyone.
Solution can be used virtually with any communication device which has an audio interface like e.g. telephone, mobile-
phone, laptop, desktop PC, tablet.
In Germany, a large amount of unused and outdated smartphones (from a total of over 100 million) are available for use as
additional hardware. Therefore, besides the low-cost audio cable and the chat application, no investment is needed. The
user will sure be happy to give the old nice device a meaningful use.
Additional features like, e.g., Gateway functionality allow an increased flexibility and reuse of existent infrastructure and
extension of current chat applications.
Using the Code

This information will be available soon, together with the code, in Part II of this article.
Points of Interest
Borrowing some words from Andy Yen (see link above):
"What we have here is just the first step, but it shows that with improving technology privacy doesn't have to be difficult, it
doesn't have to be disruptive. Ultimately, privacy depends on each and everyone of us. And we have to protect it now
because our online data is more than just a fractions of ones and zeros. It's actually a lot more than that. It's our lives, our
personal stories, our friends, our families, and in some ways also our hopes and aspirations. So, now it's the time for us to
stand up and say: yes, we do want to live in a world with online privacy. And yes, we can work together to turn this vision
into reality!".
History
2019.07.08: Part I of article posted
License
This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)
About the Author

Clark FieselnNo Biography provided
Germany
Comments and Discussions

6 messages have been posted for this article Visit https://www.codeproject.com/Articles/5161775/Audio-Chat-for-Pretty-
Good-Concealing-AC4PGC-Part to post and view comments on this article, or click here to get a print view with messages.
Permalink Article Copyright 2019 by Clark Fieseln

Advertise Everything else Copyright © CodeProject, 1999-2019
Privacy
Cookies Web03 2.8.190718.1
Terms of Use

Audio Chat For Pretty Good Concealing (AC4PGC) - Part I - CodeProject

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Audio Chat For Pretty Good Concealing (AC4PGC) - Part I - CodeProject

Загружено:

Авторское право:

Доступные форматы

22/07/2019 Audio Chat for Pretty Good Concealing (AC4PGC) - Part I - CodeProject

Audio Chat for Pretty Good Concealing (AC4PGC) - Part

9 Jul 2019 CPOL

ProtonMail, Min. 8:50

Figure 1: Device Connection

The Idea in Short

Figure 2: Basic Configuration

Repeater (digital-audio with embedded message converted to analog audio).

Each of these alternatives have several advantages:

Increased anonymity, geographical range extension

The following diagram shows an example:

Figure 3: Extended Configuration

audio <--> audio

Figure 4: Architecture Overview

Data Throughput vs. Channel Capacity

Figure 5: Overview of Layers

Key 1: end-to-end encryption (example: Skype)

Audio carrier pre-filtered to remove components in coding-frequency-range

Why Do We Need Something like AC4PGC ?

Using the Code

About the Author

Comments and Discussions

Permalink Article Copyright 2019 by Clark Fieseln

Вам также может понравиться