Вы находитесь на странице: 1из 8

© Peter B.L. Meijer 2010. All rights reserved.

The vOICe MIDlet


For information about installing the software, please first visit the web page

http://www.seeingwithsound.com/midlet.htm

In reading this manual, it is assumed that you have already succeeded in


installing1 and running23 The vOICe seeing-with-sound MIDlet. The purpose of
this manual is mostly to discuss the available key
commands in more detail. You may wish to use a
headset in order not annoy other nearby people
with the rather unusual and hence attention-
drawing sounds from your phone. Moreover, a
stereo headset is advised for use with phones that
offer stereo sound capabilities4. The vOICe’s
options menu - called up with the phone’s Options
key - contains a “Channels” entry that allows you
to select mono (default), stereo or 3D audio
channels. The stereo and 3D audio options ease
the perception of The vOICe’s visual left-to-right scanning, but they may give
severely distorted mono sound on phones that lack stereo capabilities. Just
check what works for your phone. Also, some phones contain built-in stereo
speakers on their side5, and the “View rotation” entry in the options menu can
be used to correspondingly adapt the camera view when holding the phone
with left and right speaker on the left and right, respectively.
Note that while using a screen reader it may be necessary to first mute The vOICe (with key "0") to
hear the screen reader speak all menu and submenu items under the phone’s Options key. After
changing settings, you can then unmute The vOICe again with key "0".

Once started, The vOICe MIDlet continuously grabs and sounds live
snapshots from your phone camera. There are no connection costs while
using it, because The vOICe MIDlet runs off-line. Each camera snapshot is
sounded via a left-to-right scan through the view, while associating height with
pitch and brightness with loudness. By default, a black-and-white camera view
is sounded in just one second. For example, a bright rising line on a dark
background sounds as a rising pitch sweep, a small bright spot sounds as a
short beep, and a bright filled rectangle sounds as a noise burst. The vOICe’s
1
The vOICe MIDlet runs on MIDP-2.0 and MMAPI compliant camera phones. Note that on many phones you must
have the "Warning tones" setting in your active profile turned On, or else The vOICe may not sound properly, if at all.
Also, with many Nokia phones you should after installation (also of an upgrade) select Tools | (App) Manager | The
vOICe | Options | (Suite) Settings | Multimedia | Ask first time, or else user permission may be asked for every single
camera snapshot. Other phones may hold similar settings. Depending on the phone, the application Manager may be
found in the main application Menu or under Menu | Tools.
2
Some phones automatically launch their built-in camera application upon opening the camera cover, and in that
case you first need to close that other camera application before The vOICe can access the camera. This is because
only one application can access the camera at any given time. Of course you should not forget to slide open any lens
cover in the first place, or else the view will remain “black” because no light can enter the camera.
3
On some phones, user permission to control (and turn off) the camera shutter sound will be requested after startup
of The vOICe.
4
For example the Nokia 6620, Nokia 6630 and Nokia 9500 camera phones have stereo sound capabilities.
5
For example the Nokia N82. The N82 may be held at head level with the two speakers facing upwards.
© Peter B.L. Meijer 2010. All rights reserved.

simplest application is as a light probe, but it is actually far more powerful


because its changing polyphonic visual sounds or “soundscapes” now also
track position and shape of objects, even with multiple objects within your
camera view. Thus it allows you to locate light sources, recognize basic image
patterns such as stripes and various textures, find borders, identify shapes,
and so on. In addition, The vOICe MIDlet offers a number of color detection
features, and includes a talking color identifier.

Many settings are accessible through the application’s Option menu if you
have a phone screen reader, but there is also built-in speech support for the
main features. For compatibility with phone screen readers, The vOICe
supports two menu styles: the "Textual" style for the submenus is only advised
for use with the old Talks 1.40.1 (to avoid crashes), while the "Normal" style is
advised with Mobile Speak as well as Talks 2.0 and later. In addition, a
number of keyboard shortcuts exist for direct access to various features, and
we begin with a brief overview of the main key commands. At the end of this
document a table overview is given.

The "0" key toggles the muted state. Pressing this key twice in rapid
succession, like “00”, toggles a muted paused state that minimizes CPU load
while releasing the camera resource for best responsiveness when accessing
the menus with a phone screen reader. The "1" key toggles the negative video
mode, which can help to see/find small or thin dark items on a bright
background. The "3" key toggles the built-in speech off and on. The "7" key
toggles a mode that helps prevent visual sound stuttering and buzzing on
devices that cannot handle simultaneous visual sound rendering and playing.
You can try it to find out what works best with your phone. The "9" key cycles
over different contrast enhancement modes. The "*" (star, asterisk) key
toggles the talking color identifier on and off. The "#" (pound, hash) key cycles
over different sound volume levels when not muted. Other settings are
controlled with the joystick. The default audio sample rate is 16 kHz, but lower
sample rates can be selected by using the "DOWN" key (joystick down), and
higher sample rates can be selected by using the "UP" key (joystick up).
Available sample rates are 8 kHz, 11 kHz, 16 kHz and 22 kHz, but phones
need not support all of these sample rates. Lower sample rates give lower
sound quality, but may make the phone more responsive. The "RIGHT" key
doubles the visual sound duration to at most two seconds, while the "LEFT"
key halves the visual sound duration to at least half a second. Note that on
some phones the "UP", "DOWN", "LEFT" and "RIGHT" keys may be mapped
through the "2", "8", "4" and "6" numeric keys, respectively. Many of the
program settings persist across multiple runs of The vOICe MIDlet.

Now more about the color detection features. As was stated above, the "*"
(star) key toggles the talking color identifier on and off. This mobile color
recognizer speaks the color of whatever shows at the center of your camera
view, while alternating with the visual sound of the camera view that tells you
about the shape and brightness of items in your view. If you prefer to only
hear the talking color identifier, simply press the "*" (star) key twice in rapid
succession, much like a double-click, and you will then only get to hear the
color names. So “*” toggles color identification alternating with visual sounds,
© Peter B.L. Meijer 2010. All rights reserved.

while “**” toggles color identification without the visual sounds. Pressing the
joystick “Fire” button will speak the color name once, even if The vOICe was
muted, and on suitable phones6 it will use the built-in flash. In any case,
recognized colors include (dark, normal, and light) red, green, blue, cyan,
yellow, orange and magenta, as well as combination colors such as red-
orange. Black, grey and white are also identified, bringing the total number of
identified colors and shades to 47. Beware that the choice of color names can
be culturally biased: cyan is a color in between green and blue, while magenta
is basically the same as the color purple. Also, light-magenta and light-red
make for the color pink or very similar colors, while dark-red-orange, dark-
orange and dark-orange-yellow appear as various shades of brown. Dark
yellow-green makes for olive-green.

Results of color recognition inevitably depend on ambient light and camera


quality. Try to use good lighting whenever possible, preferably broad daylight.
Still, under relatively low light conditions, better results may be obtained by
first calibrating The vOICe for the given visual environment. To do this, point
the camera to a known white surface (such as a white sheet of paper) near
the object of which you want to identify the color,
and apply the “Calibrate white” entry in The
vOICe’s options menu7, which will basically tell
The vOICe that this surface really is white or light
grey rather than its actual grey or dark grey
appearance. In fact it will also correct for the
yellowish colors from incandescent lighting and
many other sources of color bias. Next you can
point the camera to other items of interest to
identify their colors. Apply the calibration option
with care: only apply it when you are certain that
the full camera view is indeed white and relatively bright, or else you may get
very poor color identification results due to a badly skewed color calibration!
Calibration settings do not persist across runs to avoid unintended continued
use of a calibration that would no longer match changing ambient light
conditions. The vOICe does not normally need calibration in broad daylight
conditions, but if applied with care, it can yield significantly more accurate
color recognition results under relatively low light conditions. The calibration
process takes only about a second and applies for the duration of the run
unless you recalibrate or reset The vOICe via its menus.

The color identifier tells you the color at the center of the camera view, but
sometimes you may wish to know where items of a given color are. Rather
than pointing the camera around until the color identifier finally “hits” the
object with the color of interest, you can tell The vOICe to sound the entire
camera view but only sound items of the color that you specified. This is done
either via the color filter options in the menus or by keying the first letter of the
supported color name, being “r” for red, “g” for green, “b” for blue, “c” for cyan,
“y” for yellow, “o” for orange and “m” for magenta. Now you need to know how

6
E.g., the Nokia N70, N90 and N91 support the required Advanced Multimedia Supplements (AMMS, JSR-234).
7
You can also apply long-press "*" (long-press star key) as a shortcut.
© Peter B.L. Meijer 2010. All rights reserved.

to enter these letters, unless your phone includes a QWERTY keyboard 8. As


you may know, letters are associated with keys 2 through 9 on your phone. In
particular, key “2” holds the associated letters “a”, “b” and “c”, or “abc” for
short. If you press key “2” once in The vOICe, you specify the digit “2”, but if
you press key “2” multiple times in rapid succession, you get to the letters “a”,
“b” and “c”. Pressing key “2” twice means “a”, pressing key “2” three times
means “b” (which toggles the blue-only color filter), and pressing key “2” four
times means “c” (which toggles the cyan-only color filter). The same principle
applies to the other numeric keys. Key “3” holds “def”, key “4” holds “ghi”, key
“5” holds “jkl”, key “6” holds “mno”, key “7” holds “pqrs”, key “8” holds “tuv”,
and key “9” holds “wxyz”. Therefore, if you want to see and find only green
items in your view, you press key “4” twice to specify “g” for green, or if you
want to see and find only red items in your view, you press key “7” four times
to specify “r” for red. These functions act like a toggle, so applying the same
one another time turns the color filter off to return to the normal mode of
operation. (Alternatively, you may also press key “9” twice to apply “w” for
white which is equivalent to having no color filter.)

If you want to run a more complete analysis of what items of what shape and
of what color show where in your camera view, you can press key “2” twice to
toggle “a” for “Analyze”, which will then cycle over all available color filters for
finding any objects and shapes that are red, green, blue, cyan, yellow, orange
or magenta.

The combination of color filters with the visual sound bitmaps implies that over
4000 (namely 64×64) different locations for colored items can be represented,
while at the same time including shape, shading and texture information - in
just one or two seconds of sound. The general image-to-sound mapping
makes that top left gives high pitch early in the visual sound, bottom left gives
low pitch early in the visual sound, top right gives high pitch late in the visual
sound, and bottom right gives low pitch late in the visual sound, with other
positions giving intermediate positions in pitch and time.

Let’s consider an example where the general scanning of the visual sounds is
combined with color filters to solve a practical problem. Suppose you want to
know the color of something small or thin, say a thin electrical wire. Then it is
extremely difficult to orient the camera such that the center of the camera
view points exactly at this item of interest to get the color identification right.
However, by using the visual sounds of the full view along with the "Analyze"
submenu option for filtering colors (keyboard shortcut "a"), The vOICe will
filter for each color in turn, such that at some point it only sounds any red
8
The Nokia 9500 camera phone includes a QWERTY keyboard.
© Peter B.L. Meijer 2010. All rights reserved.

items in the view along with saying the color name "red", and any red wire will
appear as a single tone going up or down in pitch depending on its visual
orientation. Of course such advanced uses may require some practice
depending on the exact nature of what you are trying to accomplish.

On suitable phones9, pressing key “p” will save a snapshot picture to the
memory card. The resulting JPEG 10 format image file contains a numeric
timestamp11 in the filename, e.g., as in "vOICe_1155477325843.jpg". The
timestamp ensures that each snapshot automatically receives a unique
filename. You may have to give several permissions while saving, depending
on the phone’s security limitations. The saved image file may subsequently be
used for many purposes, such as OCR (optical character recognition), or for
sharing with friends. The file location is either the Images folder or the root of
the memory card, depending on the type of phone. After saving the image,
which may take several seconds, normal operation resumes.

When using the phone camera outside in the sunshine, color readings can be
badly affected by glare if there is direct sunlight on the phone. In such
situations, try to cup one hand over the phone without blocking the camera
view, such that your hand acts like a sunshade - much like a hat can keep
your face out of direct sunlight.

Finally, there is support for an additional very special color: skin. Pressing key
“1” twice, or “11”, will toggle the skin-only color filter. This will in principle only
sound any exposed skin in your view, such as faces and hands, which might
find uses in for instance determining how many people are nearby or locating
empty chairs in a conference room. The skin color filter also takes into
account typical racial differences. However, certain materials such as wood
can have a color that is very similar to skin, in which case you need to also
take into account apparent shape and size in the visual sounds to try and
determine for yourself if results of the skin-only filter only show skin. The best
way to start and learn is to experiment. There should be a difference with and
without clothes on.

In all uses, please stay aware that pointing the phone’s camera at people who
do not know you or The vOICe, in public places or elsewhere, might trigger
hostile reactions, for instance because people may think that you are taking
their photograph without their permission or otherwise invading their privacy.
Similar issues may apply when pointing the camera at certain properties.

Have fun!

9
The file I/O standard JSR-75 must be supported, e.g., as with the Nokia 6630 and Nokia 6680. Otherwise the
screen will display show a “File I/O not supported” error message (may not be detectable with screen reader).
10
On phones that do not support JPEG, The vOICe will try to save in PNG format.
11
Timestamped filenames can if desired be converted to human-readable dates and times using an online
timestamp converter, because The vOICe timestamps are in fact the number of (milli)seconds since January 1, 1970.
© Peter B.L. Meijer 2010. All rights reserved.

Key Action Default

0 Toggles muted state Off

1 Toggles negative video Off

3 Toggles speech feedback On

7 Toggles "anti-stutter/buzzing" mode Off

9 Cycles contrast enhancement 100%

* Toggles talking color identifier Off

# Cycles sound volume levels 50%

UP Higher sample rate, up to 22 kHz 16 kHz

DOWN Lower sample rate, down to 8 kHz 16 kHz

LEFT 0.5 or 1 second visual sound 1 second

RIGHT 1 or 2 second visual sound 1 second

FIRE [Flash and] say color Off

00 Mute and pause (low CPU load) Off

** Color identifier, no visual sounds Off

## Toggle blinders (narrow view) Off

r Red-only color filter Off

g Green-only color filter Off

b Blue-only color filter Off

c Cyan-only color filter Off

y Yellow-only color filter Off

o Orange-only color filter Off

m Magenta-only color filter Off

11 or s Skin-only color filter Off

a Analyze colors by cycling filters Off


© Peter B.L. Meijer 2010. All rights reserved.

p Save snapshot picture to memory card

Overview of The vOICe key commands


© Peter B.L. Meijer 2010. All rights reserved.

Quirks mode

You can also experiment with a “bat call” quirks mode toggled by a long-press
of the FIRE button: this gives you two loud but very brief high-pitched chirps in
rapid succession, much like an audible version of the clicks or sound flashes
emitted by bats during echolocation. The double sound flashes may thus be
used with the phone’s built-in speaker to detect nearby obstacles from any
echoes that you hear. The sound flash patterns are repeated with the same
interval used for the visual sounds, every second by default. If you prefer, you
can toggle use of single sound flashes by pressing the “1” key while in the bat
call mode. You can also independently cycle the audio volume of the bat calls
by pressing the "#" (pound, hash) key while in the bat call mode. If necessary,
use your hand to form a cone over the phone’s speaker for improved
directionality of the sound flashes, and hold the phone in a position that is
consistently aligned with your ears, for instance in front of your face.

Please note that in general palate clicks made with your tongue may work
better, because you can more readily adapt these to your current situation.
Since these tongue clicks originate from your mouth they are also always
consistently aligned with your ears, such that you can better train for very
subtle changes as needed for echolocation purposes. On the other hand, your
mouth may run dry after a few minutes of intensive tongue clicking, so use
whatever suits you best.

Вам также может понравиться