Академический Документы
Профессиональный Документы
Культура Документы
AbstractInsights extracted from spatial queries in geo- hierarchical regions, arbitrary precision, and simple proxim-
database systems introduce signicant opportunities for busi- ity estimation. However, geohashes do not replace lat./long.
ness intelligence. However, geodatabases are unable to keep coordinates, primarily due to the format of disseminated
up with the required performance due to the massive (and
sky-rocketing) amounts of data generated from embedded data, as well as the dependency of several spatial algorithms
location-enabled devices. In this paper, we focus on geographic on lat./long coordinates.
information systems that make use of geohash; specically, we This work tackles the (bit-serial) conversion of geohash
tackle the kernel of converting geohash codes to and from codes to/from lat./long. coordinate pairs, a frequent problem
longitude/latitude pairs. We present the rst hardware imple- in geohash-based spatial querying frameworks. Handling
mentation of a geohash conversion engine operating at wire
speed. The presented geohash converter is further enhanced geohash codes can be computationally demanding, and en-
with runtime exibility with respect to characteristics of the tails bit-granular operations, a class of computations where
data it can process; furthermore, the architecture allows the general purpose processors are not known to shine.
user to compromise on performance when limited by hardware We present a novel and parallel hardware bi-directional
resources (design time exibility). Experimental results of the geohash conversion engine, converting geohash codes at
geohash conversion engine on a Xilinx XC7K325T FPGA show
>13X (end-to-end) speedup compared to optimized industry-
wire speed (no stalls), at the constant throughput rate of
grade software running on 16 CPU hardware threads. one geohash code per hardware cycle. The novel geohash
converter is enhanced through the incorporation of runtime
Keywords-geospatial analysis; spatial databases; data con- exibility, where the length of the geohash codes can
version; geohash; FPGA; recongurable architectures; logic
design; accelerator architectures; parallel architectures be varying, whilst minimizing misspent input/output band-
width. Furthermore, if lower performance can be tolerated,
the geohash conversion engine can be made to consume
I. I NTRODUCTION
fewer hardware resources. The proposed hardware converter
The ubiquitous availability of location sensing devices is shown to outperform optimized industry-grade software,
embedded in smartphones, cars, taxis, and other devices, demonstrating end-to-end throughput of 100 million conver-
combined with the ability to collect data at scale, enables the sions per second, while occupying 30% of a Xilinx Kintex
ne grained monitoring and modeling of human movement, 7 XC7K325T, a mid- to small-sized FPGA.
both at the individual level and at the group level. Using The contributions of this work can be summarized as
trajectory data harvested by GPS, RFID and mobile devices, follows:
complex pattern queries can be posed against objects moving The rst hardware geohash converter engine (to the
in both time and space. Answering these queries introduces best of our knowledge), exploiting hardware pipelining
opportunities for business intelligence, where prediction techniques to maximize conversion throughput.
regarding future patterns can be used to solve challenging A user-controlled exible compute-based bi-directional
problems such as trafc congestion prediction, crime pattern converter architecture that operates on any size geo-
analysis and prediction, epidemic spread characterization hashes (with architected max), whilst minimizing
and alerting, insurance pricing, and targeted advertising. This wasted bandwidth.
work focuses on the spatial aspect of spatiotemporal queries. A memory-bound lookup-based converter architecture
One of the most signicant challenges posed by pro- for converting geohash codes to lat./long. pairs.
cessing spatial queries is the sheer amount of available An extensive design space exploration studying the
spatial data; the volume of such data is increasing at an resulting resource utilization and end-to-end throughput
unprecedented rate, especially thanks to the widespread of the hardware converter on a Xilinx XC7K325T
use of GPS-enabled smart-phones. In this context, high- FPGA.
performance techniques are needed to process spatial queries A performance study comparing state-of-the-art CPU-
in a reasonable amount of time. based conversion to the proposed novel hardware con-
Geohash is a hierarchical geocoding system that is often version engine.
used for spatial indexing. Geohash presents several ad- The rest of the paper is organized as follows: the geohash
vantages over the traditional latitude/longitude geographic hierarchical geocoding system is reviewed in Section II.
coordinate system, such as efcient indexing, support for The proposed hardware geohash conversion engine is then
180
Figure 2. Basic conversion block for carrying out a single step: producing one geohash bit when converting from latitude (or longitude), or updating the
latitude (or longitude) value (i.e. interval mid) by processing one geohash bit when converting from geohash code to latitude (or longitude).
181
lat./long. values and geohash codes. Metadata would indicate
the conversion mode, as well as the number of bits to process
(i.e. geohash precision). In the case of geo-to-ll conversion,
bits of the input geohash code are de-interleaved and passed
to the latitude and longitude pipelines respectively. Similarly,
in the case of ll-to-geo conversion, the geohash codes at
the respective output interfaces of the latitude and longitude
pipelines are interleaved before being pushed to the output
stream.
In order to increase efciency in transferring data to the
geohash conversion engine, the input and output controllers
support batch transfers, where a batch header species char-
acteristics (metadata) of the data following it. Information
enclosed in the header includes (1) the batch size (number
of conversions to be performed), (2) the conversion mode
(geo-to-ll, ll-to-geo), (3) the number of bits to process per
conversion, and (4) the size of the geohash on the wire, Figure 4. Subset of the pre-meditated lookup table respective to the
longitude conversion.
i.e. during transfer to/from the converter (this eld will be
described below).
A geohash converter engine has the following design-time
architectural attributes: (1). For example, a geohash of size 56 bits can be
Max geohash size: this eld affects the size of the
transferred using 64 bits, essentially wasting 8 bits,
geohash code passed across single step blocks (Section while a geohash of size 30 bits can be transferred
III-A), as well as the size of the mask and num bits re- using 32 bits, essentially wasting 2 bits. Furthermore,
maining signals. The max geohash size is an architected only log2 (N ) shifters are deployed. In our implemented
maximum geohash code that can be processed (in either converter pipelines, geohashes transferred are of size
mode). Hence, an N-bit geohash converter pipeline can any power of two less than or equal to the max geohash
support geohashes of size N or any smaller size. size.
Number of deployed stages: this eld refers to the total Once the above three attributes are set, the geohash con-
number of single step blocks deployed. If the number verter pipeline can be developed. We implemented a (C++)
of stages is smaller than the max geohash size, then a utility to generate the HDL of the converter pipeline, using
loopback connection should be made from the last stage certain parameter inputs. These include the max geohash
back to the rst (see Figure 3). The number of stages size, the number of deployed stages, the supported transfer-
determines the number of passes through the pipeline side geohash sizes, the bit-width of the I/O interfaces, as
for a given conversion to complete, hence affecting well as other lower-level options such as extra buffering to
performance. When limited by hardware resources, the meet timing.
converter designer may decide to deploy fewer stages
rather than matching the max geohash size, if the C. Lookup-Based Conversion
performance hit is deemed tolerable. In this section a method is presented to perform the
The supported transfer-side geohash sizes: given an geohash code to lat./long. conversion using a lookup into
architected max geohash size N, any geohash of smaller a table of conversions pre-computed ofine. This approach
size can be processed through the converter. However, is in contrast to the online compute method described earlier.
when transferring geohashes of size G to and from Note that converting from lat./long. to a geohash code using
the converter pipeline (where G is specied in the lookup is not feasible, except for the caching of common
metadata using the num bits remaining eld, Section values. The lookup method is practical when faced with
III-A), there are two available options: (1) transfer each limited hardware resources for compute and compromising
geohash using N bits; this option wastes N G bits on performance is not an option.
for each conversion performed, hence is bandwidth Figure 4 encapsulates a subset of the pre-computed lookup
wasteful, especially for small geohashes. (2) transfer table with respect to the longitude conversion. Given a longi-
geohashes contiguously using G bits each. While (2) is tude geohash code of length L, the address for the conversion
more bandwidth-efcient, it requires the I/O interface is computed as: longitude geohash code + of f set, where
controllers to implement order of N shifters, which is L1
the offset is 0 when L is 1, otherwise of f set = i=1 2i .
generally not feasible. A middle-ground between the The offset need not be computed at runtime, rather it can
aforementioned 2 options is supporting a subset of N, be hard-wired in the address computation logic for different
such as all powers of two up to N. Here, geohashes values of L (L is generally kept small to limit memory size).
of size G would be transferred on the wire using The lookup conversion method can be combined with the
geohashes of the next closest power of two, which may compute method in order to eliminate some hardware stages,
still waste bandwidth, though much less than option as depicted by Figure 5. For generality, geohash codes of
182
A. Experimental Framework
1) Hardware Framework: Several versions of the pro-
posed hardware geohash conversion engine were imple-
mented on a Pico M-505 board connected to an Intel Xeon
processor via 8 lanes of PCI-e Gen. 2 [1]. The M-505 board
includes a Xilinx Kintex 7 XC7K325T FPGA [2], a mid-
to small-size FPGA by todays standards. Xilinx Vivado
2014.2 is used for synthesis and implementation, with de-
fault settings. The PCIe hardware interface and software
drivers are provided as part of the Pico framework. The
hardware engines communicate with the I/O PCIe interfaces
through one stream each way, with dual-clock BRAM FIFOs
in between the converter logic and the PCI-e interfaces. The
RAM on the FPGA board does not reside in the same virtual
address space as the CPU RAM, and data is streamed from
the CPU RAM to the FPGA. Since the proposed solution
does not require memory ofoading, RAM on the FPGA
board is not used. All performance numbers are reported
end-to-end, including streaming the data from the host CPU
RAM to the FPGA and back to the host CPU RAM.
2) Software Framework: We compare our hardware con-
verter engine to the software converter engine developed in
[3]. The highly optimized software converter is distributed
as part of the IBM Streams [4] and SPSS [5] products, as
Figure 5. Combining the lookup method with the compute method. a component of the spatiotemporal toolkit. While we did
Geohash codes of length 2N bits as drawn, alongside two pre-computed not gain access to the source code, the authors provided
geo-to-ll lookup tables that convert at most X bits each (one for each of the
latitude and longitude conversions). If N is less than X, then the conversion us with performance measurements. Software experiments
can be achieved fully by lookup. Otherwise, X bits are rst converted by were run on a single socket 8-core x2 Hyper-Threads Intel
lookup, and the remaining N-X bits are converted by compute, using the Xeon Processor running at 2.5GHz, with 20MB L3 cache
output of the lookup table as a starting interval.
and 32GB RAM.
3) Datasets Description: Synthetic datasets were gen-
erated for varying geohash sizes, namely geo 8, geo 16,
length 2N bits as drawn, alongside two pre-computed geo- geo 32, geo 64 and geo 128. Each geohash code le is
to-ll lookup tables that convert at most X bits each (one associated with a lat./long. le represented with double
for each of the latitude and longitude conversions). The precision oating point. Note that for a given geohash code
remainder of this discussion applies to either of the latitude size, modifying the geohash codes typically has no effect
or longitude portions of a geohash code. If N is less than on the performance of either of the software and hardware
X, then the conversion can be achieved fully by lookup (no converters.
need to compute). Otherwise, X bits are rst converted by
lookup, providing an output interval. Then, the remaining B. FPGA Resource Utilization
N-X bits are converted by compute, using the output of the We rst perform a design space exploration regarding the
lookup table as a starting interval. Note that loopback can resource utilization of the proposed converter. To that end,
be applied from the last to rst compute stages (omitted for several 128-bit converters were developed, while varying the
simplicity in Figure 5). number of deployed stages (single step blocks) from 8 to
Implementation results regarding the lookup method are 128. Each of these converters can process geohash codes
not offered in Section IV, as the end-to-end performance of any size up to 128 bits (with varying performance), and
of the lookup method is the same as that of the compute- run at 250MHz. Figure 6 summarizes the various resources
based conversion method, and the memory requirements of consumed by all converters. Generally, resource utilization
the lookup method can be trivially derived. Furthermore, increases linearly with the number of hardware stages.
a thorough design space exploration regarding resource Furthermore, the converters are LUT-dominated, and the
utilization is provided for the compute method, which helps largest (128-stage) converter consumes slightly less than
determine whether the lookup method should be used given 50% of available hard-wired DSPs (3 per stage for the
specic platform constraints. double precision adder), as well as 71% of available LUTs.
The 64-stage converter is relatively light-weight and oc-
IV. E XPERIMENTS AND A NALYSIS cupies around 33% of the FPGA; it is able to process 64-bit
geohashes through a single pass. Note that 64-bit geohashes
This section presents an extensive experimental evaluation are most common as they achieve cm precision on the
of the proposed geohash conversion engine. surface of the earth, and higher precision is rarely required.
183
Figure 8. Performance impact of deploying fewer stages in hardware.
Results are shown for 128-bit converters (running at 250MHz) and a
Figure 6. Various resources consumed by several 128-bit converters. The batch of 100M conversions of 64-bit geohashes. The red line represents
number of deployed stages is varied. Note that all converters can process the deterministic throughput of the isolated converter core on the FPGA,
any geohash size up to 128 bits (with varying performance). without taking into consideration the PCI-e transfers. In the case of less
than 64 stages, several passes are required through the deployed stages for
each conversion.
184
(a) (b)
Figure 7. Throughput achieved by a 128-bit 128-stage hardware converter (running at 250MHz) with respective modes (a) geo-to-ll and (b) ll-to-geo.
E. Hardware Converter vs. Multi-Threaded Software: a Figure 9. Speedup of the hardware converter versus the single-threaded
software converter, for conversions modes (a) geo-to-ll and (c) ll-to-geo.
Socket-to-Socket Comparison Figures (b) and (d) provide a zoomed in view of Figure (a) and (c)
By assigning separate subsets of a batch to different respectively, for batches of size 10, 100 and 1K. Slowdown is represented
below the red (horizontal) line, where speedup < 1.
threads, software conversion can be accelerated (data-level
parallelism). The next set of experiments we describe show-
cases the speedup of the hardware decoder against a multi-
threaded version of the software decoder. We limit the CPU
to a single socket for a fair (CPU)socket-to-(FPGA)socket
comparison. The available CPU socket comprises of 8 cores
with 2-way Hyper-Threading, a total of 16 hardware threads.
Figure 10 depicts the speedup of the hardware converter
against the software converter, while doubling the number
of software threads. Geohashes of size 64-bits are used with
batches containing 100 Million conversions. As expected,
the comparative speedup is halved as the number of threads
is doubled, up to the number of available cores (8). When Figure 10. Speedup of the hardware converter against the software
Hyper-Threading is used (at 16 threads), we see a 30% converter, while doubling the number of software threads. Geohashes of
reduction in speedup instead of the previous 50%. Increasing size 64-bits are used with batches containing 100 Million conversions.
the number of software threads beyond the number of
hardware threads results in a slight deterioration of software
performance. In summary, the hardware converter achieves 16.1X (geo-to-ll) and 13X (ll-to-geo) speedup versus the
185
best run of the software converter on the CPU socket. [2] KINTEX-7 FPGAS, http://www.xilinx.com.
As noted in Section IV-C, the speedup shown here can
be potentially more than doubled simply by attaching the [3] K. Lee, R. K. Ganti, M. Srivatsa, and L. Liu, Efcient Spatial
Query Processing for Big Data, SIGSPATIAL, vol. 7, no. 11,
hardware converter to a higher bandwidth PCI-e platform pp. 412, 2014.
(no modications to the converter core required).
[4] IBM InfoSphere Streams, http://www-
V. P RIOR A RT 03.ibm.com/software/products/en/infosphere-streams.
The geohash geocode system [6] has been introduced [5] IBM SPSS: Predictive Analysis Software and Solutions,
fairly recently (2008) and is witnessing a rapidly increas- http://www-01.ibm.com/software/analytics/spss/.
ing adoption. Several database management systems and
geographic information systems make use of geohash for [6] Geohash, http://www.geohash.org.
indexing and efcient querying. These include MongoDB
[7] MongoDB Manual 2.6: Geospatial Indexes and Queries,
[7], MySQL [8], IBM Streams and SPSS [3][4][5], as well http://docs.mongodb.org/manual/core/geospatial-indexes/.
as Apache Accumulo (through third-party research) [9].
MongoDB uses geohash for indexing where conversion is [8] MySQL 5.7 Reference Manual: Spatial Geohash Functions,
a frequent problem, as input queries process spatial data in http://dev.mysql.com/doc/refman/5.7/en/spatial-geohash-
a lat./long. format. The work in [3] proposes a lightweight functions.html.
scalable spatial index based on geohash, and extends their [9] A. Fox, C. Eichelberger, J. Hughes, and S. Lyon, Spatio-
approach for graph data structures. Temporal Indexing in Non-Relational Distributed Databases,
There exists an extensive body of work pertaining to in Big Data, 2013 IEEE International Conference on. IEEE,
the FPGA acceleration of certain spatial queries (a small 2013, pp. 291299.
subset is referenced here). Acceleration of point-in-polygon
[10] J. Fender and J. Rose, A High-Speed Ray Tracing En-
algorithms is demonstrated in [10][11][12], while speeding gine Built on a Field-Programmable System, in Field-
up k-Nearest Neighbors queries is tackled in [13][14]. Programmable Technology (FPT), 2003. Proceedings. 2003
The works in [15] [16][17] describe the hardware accel- IEEE International Conference on. IEEE, 2003, pp. 188
eration of spatio-temporal analytics, where queries in the 195.
form of regular-expressions are posed on moving objects.
[11] J. Schmittler, S. Woop, D. Wagner, W. J. Paul, and
The spatial history of the moving objects is represented P. Slusallek, Realtime Ray Tracing of Dynamic Scenes
as regions. Region information is derived from lat./long. on an FPGA Chip, in Proceedings of the ACM SIG-
coordinates through methods such as point-in-polygon. GRAPH/EUROGRAPHICS conference on Graphics hard-
To the best of our knowledge, this paper is the rst ware. ACM, 2004, pp. 95106.
to describe a hardware architecture for the conversion of
[12] M. Woulfe, M. MANZKE, and J. L. DINGLIANA, Hard-
geohash codes to/from lat./long. coordinates. ware Accelerated Broad Phase Collision Detection for Real-
time Simulations, 2007.
VI. C ONCLUSIONS
We present, to the best of our knowledge, the rst [13] H. Hussain, K. Benkrid, C. Hong, and H. Seker, An adaptive
hardware implementation of a geohash conversion engine FPGA Implementation of Multi-Core K-Nearest Neighbour
Ensemble Classier Using Dynamic Partial Reconguration,
operating at wire speed. The proposed geohash converter in Field Programmable Logic and Applications (FPL), 2012
is enhanced with runtime exibility with respect to char- 22nd International Conference on. IEEE, 2012, pp. 627630.
acteristics of the data it can process (no restrictions on
geohash sizes, bi-directional conversion, etc). Moreover, the [14] E. S. Manolakos and I. Stamoulias, Flexible IP Cores for the
architecture allows the user to compromise on performance k-NN Classication Problem and Their FPGA Implementa-
tion, in Parallel & Distributed Processing, Workshops and
when limited by hardware resources (design time exibility). Phd Forum (IPDPSW), 2010 IEEE International Symposium
A thorough experimental evaluation of the geohash con- on. IEEE, 2010, pp. 14.
version engine on a Xilinx XC7K325 FPGA shows >13X
(end-to-end) speedup compared to optimized industry-grade [15] L. Woods, J. Teubner, and G. Alonso, Complex Event
software running on 16 CPU hardware threads. We show Detection at Wire Speed With FPGAs, Proceedings of the
VLDB Endowment, vol. 3, no. 1-2, pp. 660669, 2010.
that higher (more than double) speedup can be attained
with no changes to the converter engine, using higher PCI-e [16] R. Moussalli, M. R. Vieira, W. Najjar, and V. J. Tsotras,
bandwidth (Gen. 3 and/or more lanes). Stream-Mode FPGA Acceleration of Complex Pattern Tra-
Future work will focus on the acceleration of more jectory Querying, in Advances in Spatial and Temporal
complex spatial queries in the geohash domain, using the Databases. Springer, 2013, pp. 201222.
proposed converter as a constituent. [17] R. Moussalli, I. Absalyamov, M. R. Vieira, W. Najjar, and
V. J. Tsotras, High Performance FPGA and GPU Complex
R EFERENCES Pattern Matching Over Spatio-Temporal Streams, GeoInfor-
[1] M-505-K325T Embedded, http://picocompu- matica, pp. 130, 2014.
ting.com/products/embedded-modules/m-505-k325t-
embedded-2/.
186