Date: 2013-01-25 08:44:41
From: katja.pfeifer01@sap.com
Hello,
I have a question concerning the use of the lucene index.
Assume the data in the RDF store is as follows:
ns:Object1 rdf:type ns:Type1 .
ns:Object2 rdf:type ns:Type1 .
ns:Object3 rdf:type ns:Type1 .
...
ns:Object1 ns:prop1 "v11" .
ns:Object1 ns:prop2 "v12" .
ns:Object2 ns:prop1 "v21" .
ns:Object2 ns:prop2 "v22" .
ns:Object3 ns:prop1 "v31" .
ns:Object3 ns:prop2 "v32" .
...
I like to have an index for searching within the values of the property ns:prop1 (i.e., "v11", "v21", "v31", ...). I thought that setting the parameter luc:includePredicates to ns:prop1 makes such an index restricted to values of a certain property possible. Unfortunately this does not work and the final index included all literals independent from the property (i.e., "v11", "v12", ...).
Here are the ASK statements I executed. Instead of ns:prop1 I used the complete URI as I read that using namespaces here is not working:
PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
ASK { luc:include luc:setParam " literal" . }
ASK { luc:includePredicates luc:setParam "ns:prop1" . }
ASK { luc:index luc:setParam "literal " . }
ASK { luc:myIndex luc:createIndex "true" . }
Afterwards (all ASK statements returned true) I wanted to get all values of ns:prop1 matching a certain search string (e.g., "v*") by executing:
PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
SELECT ?s
WHERE {
?s luc:myIndex "v*" .
}
Best regards,
Katja

asked 03 Apr '13, 12:21

Discussion-Board-Archive's gravatar image

Discussion-B...
6.1k133156205
accept rate: 30%


Date: 2013-01-25 11:21:44
From: damyan@sirma.bg
Hi Katja,
There is no direct way to index only those literals that appear as 
objects of statements with certain set of predicates.
but you could index the subjects  - by building a molecule model of size 
1, consisting only literals and looking only fro that exact predicate.
will give you an example using the same data from your email.
First, to insert the data:
PREFIX ns:<http://example.org/>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
INSERT DATA {
ns:Object1 rdf:type ns:Type1 .
ns:Object2 rdf:type ns:Type1 .
ns:Object3 rdf:type ns:Type1 .
ns:Object1 ns:prop1 "v11" .
ns:Object1 ns:prop2 "v12" .
ns:Object2 ns:prop1 "v21" .
ns:Object2 ns:prop2 "v22" .
ns:Object3 ns:prop1 "v31" .
ns:Object3 ns:prop2 "v32" .
}
then prepare and create the index by choosing:
- a model of size 1 (one hop traversal from the originating node),
- selecting only the uris to be included
- then specifying only ns:prop1 to be traversed
- then to include only values if type "literal" - not bNodes nor URI
PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
INSERT DATA {
luc:moleculeSize luc:setParam "1" .
luc:index luc:setParam "uris" .
luc:includePredicates luc:setParam "http://example.org/prop1" .
luc:include luc:setParam " literal" .
luc:myIndex luc:createIndex "true" .
}
then the same query should return only those nodes that are subjects of 
a statement with ns:pred1 and where the literals match the lucene query:
for instance
PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
select * where {
?s luc:myIndex "v?1" .
}
should return  [ns:Object1, ns:Object2 and ns:Object3]
while  this one
PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
select * where {
?s luc:myIndex "v?2" .
}
an empty result set
HTH
link

answered 03 Apr '13, 12:21

Discussion-Board-Archive's gravatar image

Discussion-B...
6.1k133156205
accept rate: 30%

From: katja.pfeifer01@sap.com
Hi Damyan,
thanks for the answer. It works, but actually does not match my use case.
We have a lot of data and need a way to get all available values (that are often restricted to a manageable number of distinct values) of a certain predicate in a fast way. Said in numbers, we have for example 2,500,000 triples with the predicate ns:prop1, but only 20 different values. Using the way you proposed, retrieving this 10 values (or even lower number due to search) blows up the data needing to be traversed and slows done answer time. I used the following query to retrieve the distinct values:
PREFIX ns:<http://example.org/><http://example.org/>
PREFIX luc: <http://www.ontotext.com/owlim/lucene#><http://www.ontotext.com/owlim/lucene>
select ?l where {
?s ns: prop1 ?l
?s luc:myIndex "*" .
}
It's clear that this way is not working for us. Is there another way, to quickly retrieve all possible values of a predicate in owlim (and if possible even alphabetically ordered)? Are there other indexing options that can be used?
Regards,
Katja
link

answered 03 Apr '13, 12:21

Discussion-Board-Archive's gravatar image

Discussion-B...
6.1k133156205
accept rate: 30%

-1

Data Visualization Software SQIAR (http://www.sqiar.com/solutions/technology/tableau) is a leading Business Intelligence company and provides Tableau Software consultancy across United Kingdom and USA

link
This answer is marked "community wiki".

answered 21 Dec '13, 01:22

sqiarbi's gravatar image

sqiarbi
1
accept rate: 0%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×261
×243
×9

Asked: 03 Apr '13, 12:21

Seen: 2,021 times

Last updated: 21 Dec '13, 01:22

powered by BitNami OSQA