Configuring and Securing A SPARQL Endpoint

Configuring and Securing
a SPARQL endpoint
2012 VIVO Implementation
Fest
Welcome & Who am I?

Vincent Sposato, University of Florida
Enterprise Software Engineering
Primarily focused on VIVO operations and
reproducible harvests
John Fereira, Cornell University
Mann Library Information Technology
Services (ITS)
Programmer / Analyst / Technology Strategist
2
Goals of this session

Provide you with an overview of SPARQL
endpoint, and its uses
Provide you with a process for installing
and configuring a SPARQL endpoint
(Fuseki specifically)
Outline the possibilities for securing
such an endpoint
Answer questions
3
SPARQL Endpoint
Overview
What is a SPARQL endpoint?

A SPARQL endpoint enables users to query
a knowledge base via the SPARQL
language
Results returned are normally in a machine
readable language, as the primary purpose
of the endpoint is information exchange
Current Implementations
Joseki / Fuseki
Virtuoso
Many others depending on needs
5
Why use a SPARQL endpoint?

To provide querying services for your
dataset
Provide your semantic data to other
applications through machine readable
interfaces
Public SPARQL endpoints

US Government
Data.gov (http://semantic.data.gov/sparql)
University of Florida
VIVO (http://sparql.vivo.ufl.edu/sparql.html)
Bio2RDF
PubMed SPARQL (
http://pubmed.bio2rdf.org/sparql)
Data Reuse Example from Cornell

Data as it appears in VIVO for:
Abrua, Hctor D
Data as it appears Cornell Department of Chemistry and
Biology for:
Abrua, Hctor D
Why Fuseki and not Joseki?

Fuseki is the successor to Joseki, and is
based upon SPARQL 1.1
Joseki has database connection timeout
issues that Fuseki is able to resolve with
an additional library
Fuseki has true update support, and
ability to define specific graphs
Fuseki Installation
Requirements for Fuseki

Oracle/Sun Java 1.6+
OpenJDK would work
Latest Fuseki package

Download the distribution package as it is a
complete environment
https://repository.apache.org/content/repositor
ies/snapshots/org/apache/jena/jena-fuseki/0.2.2
-incubating-SNAPSHOT
/
Apache Web Server

Only if you want to redirect output by way of AJP
11
Ability to remove the :2020 from the end of the URL

of the SPARQL endpoint
JAVA 6 JDK
Can I use the open-jdk?
Yes, you can. However, if you are installing it on the same
server as your VIVO, you need to make sure it is configured
correctly not to interfere with Sun Java and the VIVO application
What is Java?
Write once, run anywhere popular quote about java
Installation
Debian/Ubuntu
apt-get install sun-java6-jdk
apt-get install openjdk-6-jre
Centos/Redhat
yum install java (need to configure alternatives)
yum install java-1.6.0-openjdk
Windows: download and install

12
Apache
Why do I need Apache too?
Allows for AJP for redirecting 2020 to a standard
web port (80, 443)
What is Apache?
a secure, efficient and extensible server that
provides HTTP services in sync with current
HTTP standards httpd.apache.org
Installation
Debian/Ubuntu apt-get install apache2
Centos/Redhat yum install httpd
Windows: download and follow the instructions
13
Fuseki
Download Fuseki (tar/zip)
wget
https://repository.apache.org/content/repositories/snapshots/
org/apache/jena/jena-fuseki/0.2.2-incubating-SNAPSHOT/jena-fu
seki-0.2.2-incubating-20120506.050243-16distribution.tar.gz
Extract contents of the file

tar xzvf fuseki-0.2.2-incubating-20120506.050243-16distribution.tar.gz
Create a Fuseki directory

mkdir /usr/local/fuseki
Copy extracted contents to new directory

cp R jena-fuseki-0.2.2-incubating-SNAPSHOT/* /usr/local/fuseki
Make fuseki_server executable

chmod 777 fuseki_server
14
Supporting Libraries
Download Jena-ARQ-2.9.9
wget
http://www.apache.org/dist/incubator/jena/jena-arq-2.9.0-incubating/jena-ar
q-2.9.0incubating.jar
Download Jena-IRI-0.9.0
wget
http://www.apache.org/dist/incubator/jena/jena-iri-0.9.0-incubating/jena-iri-0
.9.0incubating.jar
Download Jena-SDB-1.3.4
wget http://sourceforge.net/projects/jena/files/SDB/SDB-1.3.4/sdb-1.3.4.zip/
download
cp download sdb-1.3.4.zip
Download MySQL-Connector-Java-5.1.19
wget
http://mirrors.ibiblio.org/pub/mirrors/maven2/mysql/mysql-connector-java/5
.1.19/mysql-connector-java-5.1.19.
jar
15
Fuseki Configuration
Prepare supporting libraries

Make a lib directory under
/usr/local/fuseki
mkdir /usr/local/fuseki/lib
Copy all jar files into new lib directory

Make sure that you unzip the SDB-1.3.4 file,
and extract the jar file from it
17
Create configuration file

Create a new file in the /usr/local/fuseki
directory
nano /usr/local/fuseki/fuseki-vivo.ttl
This file will hold Fusekis:

Server Service definitions
RDF Dataset definitions
Graph definitions
18
Add namespaces to the file

# Licensed under the terms of
http://www.apache.org/licenses/LICENSE-2.0
@prefix
@prefix
@prefix
@prefix
@prefix
@prefix
@prefix
@prefix
:
<#> .
fuseki: <http://jena.apache.org/fuseki#> .
rdf:
<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
rdfs:
<http://www.w3.org/2000/01/rdf-schema#> .
tdb:
<http://jena.hpl.hp.com/2008/tdb#> .
ja:
<http://jena.hpl.hp.com/2005/11/Assembler#> .
jumble:
<http://rootdev.net/vocab/jumble#> .
sdb: <http://jena.hpl.hp.com/2007/sdb#> .
This section defines the namespaces we will be

utilizing throughout the configuration file. The
Fuseki configuration file is written in N3/Turtle
19
Define the Fuseki server

[] rdf:type fuseki:Server ;
# Timeout - server-wide default: milliseconds.
# Format 1: "1000" -- 1 second timeout
# Format 2: "10000,60000" -- 10s timeout to first result, then 60s
timeout to for rest of query.
# See java doc for ARQ.queryTimeout
ja:context [ ja:cxtName "arq:queryTimeout" ; ja:cxtValue
"10000,60000" ] ;
fuseki:services (
<#service_VIVO_read_only>
) .
This sections tells the Fuseki server which services

defined later should be enabled if they are not
turned on here they will be ignored in the file
later on.
20
Define the connection libraries

# SDB
[] ja:loadClass "net.rootdev.fusekisdbconnect.SDBConnect" .
jumble:SDBConnect rdfs:subClassOf ja:RDFDataset .
This section specifically defines the

connection classes you will be using. The
one needed for VIVO 1.2+ will be SDB.
21
Define the service

<#service_VIVO_read_only> rdf:type fuseki:Service ;
rdfs:label
"UF VIVO Service (R)" ;
fuseki:name
"VIVO" ;
fuseki:serviceQuery
"query" ;
fuseki:serviceQuery
"sparql" ;
fuseki:serviceUpdate
"update" ;
fuseki:serviceUpload
"upload" ;
fuseki:serviceReadWriteGraphStore
"data" ;
# A separate read-only graph store endpoint:
fuseki:serviceReadGraphStore
"get" ;
fuseki:dataset
<#ufvivo_dataset_read> ;
.
This section defines the name of the service, and

the different functionality that this service will
provide. It also has a link to the dataset that is
backing this service.
22
Define the dataset

<#ufvivo_dataset_read> rdf:type
sdb:store <#VIVOStore>
.
sdb:DatasetStore ;
Here the dataset that will be served by your services

are defined. You can add named graphs if you want to
only define a specific graph to be accessed. We also
have a link to the actual store that this data resides in.
23
Define the data store

<#VIVOStore> rdf:type jumble:SDBConnect;
rdfs:label
"UF VIVO SDB Store";
sdb:layout
"layout2";
jumble:defaultUnionGraph "true" ;
sdb:engine
"InnoDB";
sdb:connection
[ rdf:type sdb:SDBConnection;
sdb:sdbHost localhost";
sdb:sdbType "mysql";
sdb:sdbName vitrodb";
sdb:sdbUser vitro";
sdb:sdbPassword vitro123";
sdb:driver "com.mysql.jdbc.Driver";
]
.
We define the actual database connection

information required to allow the service to query
the database. Here we are assuming you are using
MySQL, other servers may be configured differently.
24
Create Fuseki launch script

Create a new file in the /usr/local/fuseki
directory
nano /usr/local/fuseki/launchFuseki.sh
This file will :

Set some environment variables
Execute the Java jar file for Fuseki
Output results to a log
25
Define the environment

#!/bin/bash
export FusekiInstallDir=/usr/local/fuseki
export FusekiPort=3030
export FusekiJVMArgs="-cp $FusekiInstallDir/fuseki-server.jar:
$FusekiInstallDir/lib/* -Xmx1200M"
export Date=`date +%Y-%m-%d`
export FusekiLogFile=$FusekiInstallDir/FusekiLog-$Date.log
export FusekiConfigFile=$FusekiInstallDir/fuseki-vivo.ttl
export FusekiServiceName=/VIVO
These items are needed in order to

properly call the remainder of the tasks
associated with initiating Fuseki.
26
Initiate Java & Fuseki

# Check to see if logfile exists
if [ ! -f $FusekiLogFile ]; then
touch $FusekiLogFile
fi
# Check to see if config file exists
if [ ! -f $FusekiConfigFile ]; then
echo ERROR Fuseki failed to start no configuration file - $FusekiConfigFile
>> $FusekiLogFile
exit 1
fi
# Execute Java calling the package for Fuseki
java $FusekiJVMArgs org.apache.jena.fuseki.FusekiCmd --desc $FusekiConfigFile
--port=$FusekiPort $FusekiServiceName >> $FusekiLogFile 2>&1 &
We do some basic checks and then

instantiate Fuseki server, passing it the
configuration needed.
27
Get Fuseki started

Change permissions on launchFuseki.sh to
allow for execution
chmod 777 launchFuseki.sh
Run launchFuseki.sh
./launchFuseki.sh
Tail the log to ensure that all is running
correctly
tail f fusekiLog-Date.log
Last line should appear as :
17:42:24 INFO Server :: Started
2012/05/08 17:42:24 EDT on port 3030
28
Test your Fuseki
Go to www.example.com:3030
Select Control Panel from the Server Management area
Select /VIVO from the dropdown that appears, and click Select
Lets enter a SPARQL query to test:
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX geo: <http://aims.fao.org/aos/geopolitical.owl#>
PREFIX core: <http://vivoweb.org/ontology/core#>
#
# This example query gets 50 geographic locations
# and (if available) their labels
#
SELECT ?countryName ?iso3
WHERE {
?country rdf:type core:Country
OPTIONAL { ?country geo:nameListEN ?countryName }
OPTIONAL { ?country geo:codeISO3 ?iso3 }
}
LIMIT 50
Select Text from the Output dropdown
Click Get Results
If the result returned 50 lines, then you now have a working endpoint. CONGRATULATIONS!
29
Securing Fuseki
Basic - Firewall
The easiest method of protecting your
SPARQL endpoint would be a firewall
You can block access to the specific
ports that Fuseki is running on
This is more a kin to using a machete,
when a scalpel might be better suited
Works well if you have no interest in
sharing data with the outside world
31
Intermediate Fuseki Config

If you want people to be able read data, but
not update data through your endpoint
Fuseki config file is a good start.
If you do not define an update process, no one
will be able to update your dataset PERIOD.
Even if you happen to leave in the update
configuration, unless you start the Fuseki
server with --update it will not allow updates
to happen either.
Intermediate level of configuration, although
still pretty broad controls of on or off
32
Advanced Fuseki Partitions

Partition 2+ separate Fuseki configs
that allow different levels of access
and/or to different datasets.
Grant access to the different Fuseki
servers based upon ports being used.
Also possibly add authentication at this
point to allow for some sort of external
authentication.
33
Questions?

Configuring and Securing A SPARQL Endpoint

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Configuring and Securing A SPARQL Endpoint

Загружено:

Авторское право:

Доступные форматы

Configuring and Securing

Welcome & Who am I?

Goals of this session

What is a SPARQL endpoint?

Why use a SPARQL endpoint?

Public SPARQL endpoints

Data Reuse Example from Cornell

Why Fuseki and not Joseki?

Requirements for Fuseki

Latest Fuseki package

Apache Web Server

Ability to remove the :2020 from the end of the URL

Windows: download and install

Extract contents of the file

Create a Fuseki directory

Copy extracted contents to new directory

Make fuseki_server executable

Prepare supporting libraries

Copy all jar files into new lib directory

Create configuration file

This file will hold Fusekis:

Add namespaces to the file

This section defines the namespaces we will be

Define the Fuseki server

This sections tells the Fuseki server which services

Define the connection libraries

This section specifically defines the

Define the service

This section defines the name of the service, and

Define the dataset

Here the dataset that will be served by your services

Define the data store

We define the actual database connection

Create Fuseki launch script

This file will :

Define the environment

These items are needed in order to

Initiate Java & Fuseki

We do some basic checks and then

Get Fuseki started

Test your Fuseki

Intermediate Fuseki Config

Advanced Fuseki Partitions

Вам также может понравиться