This is second part of my tutorial about various querying options useful in Exif data search. You can find the first one here.
In this post we’ll play with DateTime fields, learn how to display only some fields use Fuzzy Search for finding device model with name that we don’t know/remember exactly. In final section, I’ll explain main difference between q
and fq
parameters.
DateTime ranges
When it comes to narrowing results by range, we should also talk about an actual Solr type keeping information about date and time. MD provides three fields of DateRangeField type extracted from particular directories:
- Exif IFD0 directory –
md_exif_ifd0_datetime
- Exif SubIFD directory –
md_exif_subifd_datetime_original
- ICC Profile directory –
md_icc_profile_profile_datetime
You can filter documents by date and time using general limits like a year:
md_exif_ifd0_datetime:[2001 TO 2002]
or months:
md_exif_ifd0_datetime:[2001-04 TO 2002-02]
The same for a day, hour, minute and seconds. You can use a more detailed value for only one part of range. The example below presents a full date time lower limit and more general upper:
md_exif_ifd0_datetime:[2001-05-01T18:22:42 TO 2002]
Left part consists of date and (after T
) time. You can use exclusive ranges as well. There is also one interesting thing when it comes to DateTime
format. If you need to use current time, you can use NOW
keyword, like this:
md_exif_ifd0_datetime:[2001-05-01 TO NOW]
Read more information about date time queries here.
Dealing with Date/Time Facets
In the post explaining Exif data indexing, you were shown how to use a field md_exif_ifd0_model
with faceting to count the same values. If you want to make some stats, let’s say – how many photos were taken in particular months of 2001 with “iPhone 6” (you can add more complex criteria of course), use the following query:
q=md_exif_ifd0_model:("iPhone 6")
and in Raw Query Parameters add the following:
facet.range=md_exif_ifd0_datetime&facet.range.gap=+1MONTH&facet.range.start=2001-01-01T00:00:00Z&facet.range.end=NOW/YEAR-18YEAR
Let’s split it into a list of particular params to understand it better:
- As we don’t want to count by each different second, we have to define a range (size of buckets) for our statistics. At first let’s set a faceting range field:
facet.range=md_exif_ifd0_datetime
- Define gap which is a size of a bucket/bin:
facet.range.gap=+1MONTH
. - Set start time:
facet.range.start= 2001-01-01T00:00:00Z
. You can use fixed value or functions like forfacet.range.end
(below). - Set end time:
facet.range.end=NOW/YEAR-18YEAR
. It means – current time rounded to a year minus 18 years.
You should have such results:
Read more about Range Faceting on official Solr guide.
Specifying list of fields for results
As you probably already noticed, some images contain many meta tags. If you want to look at hundreds of results and check only a specific field, it’s not easy. You can set a list of fields that will be displayed (instead of a whole document). To do that, pass all field names separated with a comma to a fl
parameter as shown below:
Fuzzy Search
Supposing you don’t remember the exact name of a device model but want to find a picture taken with it. You remember it was “Canon EOS 25D” or something like this. In such case you can use the following query:
md_exif_ifd0_model:("Canon EOS" AND 25D~2)
It means that Solr will search for “Canon EOS” as an exact phrase and 25D but the last part could be different by 2 edits, so results: 35D, 20D, 600D, etc. will match as well. It’s not a mathematical substraction – just maximum number of changes that could be made in a word to produce another one (Damerau-Levenshtein Distance).
Query vs. Filter Query (scoring)
In previous post I put the query for spatial search to fq
parameter instead of q
. It’s time for a short explanation. When you set something to q
, it’s not only for filtering purposes. Solr has to somehow calculate an order of results. To do that, each document gets a score field (based on many things) and then all of them are sorted by this field. Basically, thanks to scoring mechanism you receive more relevant results on the top of the list. It’s a complex topic and I won’t cover it today, but the most important from the performance point of view is that it’s an additional computation cost. If you put something into q
parameter, it will be considered in scoring (with some exceptions). The final algorithm depends on a parser settings and couple of other things. If you just want to narrow a result using a specific criteria, it’s better to put it to fq
and it won’t be used in the scoring process (it will be faster).
Spatial search is quite specific case because if you really want to impact scores of particular documents by Spatial search (let’s say: “I want to have on the top of the results images with geolocation closer to the point“), you have to also specify additional parameter (score
). Let’s modify query from the previous post:
{!geofilt sfield=md_gps_md_location pt=52.507,13.285 d=5000 score=recipDistance}
We won’t go into details right now but if you want to learn more about it, read this page.
If you want to check what is the score for a particular document, you can add score
field to fl
parameter and it will be displayed in results.
Next steps
You were just shown more advanced options of searching but it’s only a subset of great Solr features. I hope you will find it useful in your research.
In the following posts we are going to focus more on a visual side of our results. Stay alert (follow me on Twitter – @jca3s)!