Solr in OSX El-Capitan

 

STEP 1: for osx solr can be installed from Homebrew

$ brew install solr

1

 

STEP 2: To launch Solr, run:

$ solr start -e cloud -noprompt  (from inside the bin directory if solr is not on the path)

2

 

STEP 3: Then open http://localhost:8983/solr in browser you will see solr admin ui

3

 

STEP 4: INDEXING DATA – now the Solr server is up and running, but it doesn’t contain any data. The solr/bin directory includes the post* tool in order to facilitate getting various types of documents into Solr. We’ll use this tool for indexing examples as below.

$ brew info solr
$ cd /usr/local/Cellar/solr/6.1.0/bin
$ post -c gettingstarted /usr/local/Cellar/solr/6.1.0/example/exampledocs/

4

The command-line breaks down as follows:

  • -c gettingstarted: name of the collection to index into
  • /usr/local/Cellar/solr/6.1.0/example/exampledocs/: a relative path of the directory which is to be indexed

NOTE: You can browse the documents indexed at http://localhost:8983/solr/gettingstarted/browse.

Indexing a CSV: 

$ post -c gettingstarted example/exampledocs/books.csv

Other indexing techniques: 

  • Import records from a database using the Data Import Handler (DIH).
  • Use SolrJ from JVM-based languages or other Solr clients to programatically create documents to send to Solr.
  • Use the Admin UI core-specific Documents tab to paste in a document to be indexed, or select Document Builder from the Document Type dropdown to build a document.

5

Updating Data: You will notice that even if you index content more than once, it does not duplicate the results. This is because the example schema.xml specifies a “uniqueKey” field called “id“. Whenever you POST commands to Solr to add a document with the same value for the uniqueKey as an existing document, it automatically replaces it. You can see that that has happened by looking at the values for numDocs and maxDoc in the core-specific Overview section of the Solr Admin UI. numDocs represents the number of searchable documents in the index (and will be larger than the number of XML, JSON, or CSV files since some files contained more than one document). The maxDoc value may be larger as the maxDoc count includes logically deleted documents that have not yet been physically removed from the index. You can re-post the sample files over and over again as much as you want and numDocs will never increase, because the new documents will constantly be replacing the old. Go ahead and edit any of the existing example data files, change some of the data, and re-run the SimplePostTool command. You’ll see your changes reflected in subsequent searches.

Deleting Data: You can delete data by POSTing a delete command to the update URL and specifying the value of the document’s unique key field, or a query that matches multiple documents. Execute the following command to delete a specific document:

$ bin/post -c gettingstarted -d "<delete><id>SP2514N<id></delete>"

Searching: Solr can be queried via REST clients, cURL, wget, Chrome POSTMAN, etc., as well as via the native clients available for many programming languages. The Solr Admin UI includes a query builder interface – see the gettingstarted query tab at http://localhost:8983/solr/#/gettingstarted_shard1_replica1/query. If you click the Execute Query button without changing anything in the form, you’ll get 10 documents in JSON format (*:* in the q param matches all documents):

Solr Quick Start: gettingstarted Query tab

The URL sent by the Admin UI to Solr is shown in light grey near the top right of the above screenshot – if you click on it, your browser will show you the raw response. To use cURL, give the same URL in quotes on the curl command line:

$ curl "http://localhost:8983/solr/gettingstarted/select?q=*%3A*"wt=json&"indent=true"

In the above URL, the “:” in “q=*:*” has been URL-encoded as “%3A“, but since “:” has no reserved purpose in the query component of the URL (after the “?“), you don’t need to URL encode it. So the following also works:

$ curl "http://localhost:8983/solr/gettingstarted/select?q=*:*"wt=json"indent=true"

 

Basics: Search for a single term, To search for a term, give it as the q param value in the core-specific Solr Admin UI Query section, replace *:* with the term you want to find. To search for “foundation”:

$ curl "http://localhost:8983/solr/gettingstarted/select?wt=json"indent=true&"q=foundation"

You’ll see:


/solr-5.3.1$ curl "http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=foundation"
{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "indent":"true",
      "q":"foundation",
      "wt":"json"}},
  "response":{"numFound":2812,"start":0,"docs":[
      {
        "id":"0553293354",
        "cat":["book"],
        "name":"Foundation",
...

The response indicates that there are 2812 hits ("numFound":2812), of which the first 10 were returned, since by default start=0 and rows=10. You can specify these params to page through results, where start is the (zero-based) position of the first result to return, and rows is the page size.

To restrict fields returned in the response, use the fl param, which takes a comma-separated list of field names. E.g. to only return the id field:

$ curl "http://localhost:8983/solr/gettingstarted/select?wt=json"indent=true"q=foundation"fl=id"

q=foundation matches nearly all of the docs we’ve indexed, since most of the files under docs/ contain “The Apache Software Foundation”. To restrict search to a particular field, use the syntax “q=field:value“, e.g. to search for foundation only in the namefield:

$ curl "http://localhost:8983/solr/gettingstarted/select?wt=json"indent=true"q=name:foundation"

The above request returns only one document ("numFound":1) – from the response:

...
  "response":{"numFound":1,"start":0,"docs":[
      {
        "id":"0553293354",
        "cat":["book"],
        "name":"Foundation",
...

 

$ curl "http://localhost:8983/solr/gettingstarted/select?wt=json"indent=true"q=\"CAS+latency\""

You’ll get back:

{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "indent":"true",
      "q":"\"CAS latency\"",
      "wt":"json"}},
  "response":{"numFound":2,"start":0,"docs":[
      {
        "id":"VDBDB1A16",
        "name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM",
        "manu":"A-DATA Technology Inc.",
        "manu_id_s":"corsair",
        "cat":["electronics", "memory"],
        "features":["CAS latency 3,\t 2.7v"],
...

 

Combining searches: By default, when you search for multiple terms and/or phrases in a single query, Solr will only require that one of them is present in order for a document to match. Documents containing more terms will be sorted higher in the results list. You can require that a term or phrase is present by prefixing it with a “+“; conversely, to disallow the presence of a term or phrase, prefix it with a “-“. To find documents that contain both terms “one” and “three“, enter +one +three in the q param in the core-specific Admin UI Query tab. Because the “+” character has a reserved purpose in URLs (encoding the space character), you must URL encode it for curl as “%2B“:

$ curl "http://localhost:8983/solr/gettingstarted/select?wt=json"indent=true"q=%2Bone+%2Bthree"

To search for documents that contain the term “two” but don’t contain the term “one“, enter +two -one in the q param in the Admin UI. Again, URL encode “+” as “%2B“:

$ curl "http://localhost:8983/solr/gettingstarted/select?wt=json"indent=true"q=%2Btwo+-one"

For more Solr search options, see the Solr Reference Guide’s Searching section.

 

Spatial: Solr has sophisticated geospatial support, including searching within a specified distance range of a given location (or within a bounding box), sorting by distance, or even boosting results by the distance. Some of the example tech products documents inexample/exampledocs/*.xml have locations associated with them to illustrate the spatial capabilities. To run the tech products example, see the tech products example section. To learn more about Solr’s spatial capabilities, see the Solr Reference Guide’s Spatial Search section. here are all the commands that we have run so far in this post:

date ;
bin/solr start -e cloud -noprompt ;
  open http://localhost:8983/solr ;
  bin/post -c gettingstarted docs/ ;
  open http://localhost:8983/solr/gettingstarted/browse ;
  bin/post -c gettingstarted example/exampledocs/*.xml ;
  bin/post -c gettingstarted example/exampledocs/books.json ;
  bin/post -c gettingstarted example/exampledocs/books.csv ;
  bin/post -c gettingstarted -d "<delete><id>SP2514N<id></delete>"
bin/solr healthcheck -c gettingstarted ;
date ;

Cleanup: As you work through this guide, you may want to stop Solr and reset the environment back to the starting point. The following command line will stop Solr and remove the directories for each of the two nodes that the start script created:

$ bin/solr stop -all ; rm -Rf example/cloud/

For more information on Solr, check out the following:

 

Advertisements