Wednesday, 21 May 2014

Using post.jar for posting JSON, CSV, XML data on Solr

In my last few post, I discussed about "Dashboard introduction & how to post data on Apache Solr via it's dashboard screen" & also provides many examples for the same. In that approach, I can post only one record at a time i.e. I am not able to post data using different files having differently formatted records like JSON, XML, CSV.

Agenda for this post

  1. how to post XML data in form of a XML file using post.jar file?
  2. how to post CSV data in form of a CSV file using post.jar file?
  3. how to post JSON data in form of a JSON file using  post.jar file?
Schema for this post is same as that of my last post
http://versatileankur.blogspot.in/2014/05/how-to-query-to-apache-solr.html

how to post XML data in form of a XML file using post.jar file?
Apache java comes with a inbuilt jar file for document posting. This file is present at
<parent-directory>/solr-4.7.2/example/exampledocs
This exampledocs directory have many XML files for demo purpose. 
How to post XML document files using this jar file.
just create a XML file with given records.

<add>
<doc>
   <field name="id">Solr105</field>
   <field name="name">Solr 105</field>
   <field name="address">House No - 100, LR Apache, 40702</field>
   <field name="comments">Apache Solr comment 1</field>
   <field name="popularity">101</field>
   <field name="counts">1</field>    
</doc>
<doc>
   <field name="id">Solr106</field>
   <field name="name">Solr 106</field>
   <field name="address">House No - 100, LR Apache, 40702</field>
   <field name="comments">Apache Solr comment 2</field>
   <field name="popularity">100</field>
   <field name="counts">2</field>
   <field name="dynamicField_i">It is dynamically genrated field.</field>
</doc>
<doc>
   <field name="id">Solr107</field>
   <field name="name">Solr 107</field>
   <field name="address">House No - 100, LR Apache, 40702</field>
   <field name="comments">Apache Solr It's Cool.</field>
   <field name="popularity">109</field>
   <field name="counts">3</field>
   <field name="dynamicField_i">It is dynamically genrated field.</field>
</doc>
</add>

Save this file as dummy.xml under <solr>/example/exampledocs directory.
Go to exampledocs directory using command prompt & execute -
java -jar post.jar dummy.xml

For multiple XML files use -
java -jar post.jar dummy.xml dummy1.xml

For all XML files present in working directory use-
java -jar post.jar *.xml

SimplePostTool version 1.5
Posting files to base url http://localhost:8983/solr/update using content-type application/xml..
POSTing file dummy.xml
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/update..
Time spent: 0:00:00.547

it means your data XML document has been indexed on Apache Solr. just go to your dashboard screen
select collection1 -> query-> Click on Execute Query Button
you will get a screen just like.




















Syntax of XML file

<add></add> it behaves a the parent of all the records/entities i.e. Root Element.
<doc><doc> it denotes one record/entity to be added on Apache solr.
<field></field> it denotes the property of a record/entity.

"All required fields mentioned in schema.xml must present for all <doc> element in file".

Let's consider, If your second <doc></doc> element doesn't full fill this restriction then for the first record will be updated and then it do nothing with all other records in that file. i.e. after exception it stop reading your document, so be care full with your required fields and document provided to Apache Solr for data updation.

How to post CSV data in form of a CSV file using post.jar file?
first create a CSV file at /example/exampledocs/ directory using these records-

id,name,address,comments,popularity,counts,dynamicField_i
"Solr110","Solr 110","House No - 100, LR Apache","Apache Solr comment 1",110,110,"dynamic solr 110"
"Solr111","Solr 111","House No - 100, LR Apache","Apache Solr comment 1",111,111,"dynamic solr 111"
"Solr112","Solr 112","House No - 100, LR Apache","Apache Solr comment 1",112,112,"dynamic solr 112"
"Solr113","Solr 113","House No - 100, LR Apache","Apache Solr comment 1",113,113,"dynamic solr 113"

save this file as dummy.csv -
Go to /example/exampledocs directory using command prompt & execute

java -Durl=http://localhost:8983/solr/update/csv -Dtype=text/csv -jar post.jar dummy.csv

For multiple CSV files use -
java -Durl=http://localhost:8983/solr/update/csv -Dtype=text/csv -jar post.jar dummy.csv dummy1.csv

For all CSV files present in working directory use-
java -Durl=http://localhost:8983/solr/update/csv -Dtype=text/csv -jar post.jar *.csv

you will get on console a success message as -
SimplePostTool version 1.5
Posting files to base url http://localhost:8983/solr/update/csv using content-type text/csv..
POSTing file dummy.csv
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/update/csv..
Time spent: 0:00:00.577



it means your data CSV document has been indexed in Apache Solr. just go to your dashboard screen
select collection1 -> query-> Click on Execute Query Button
your screen looks like-
















Congrats your CSV document has been posted successfully.

how to post JSON data in form of a JSON file using post.jar file?
first create a JSON file at /example/exampledocs/ directory using these records
[{
"id":"Solr115",
"name":"Solr 115",
"address":"House No - 100, LR Apache, 40702",
"comments":"Apache Solr comment 1",
"popularity":115,
"counts":115
},
{
"id":"Solr116",
"name":"Solr 116",
"address":"House No - 100, LR Apache, 40702",
"comments":"Apache Solr comment 1",
"popularity":116,
"counts":116
},
{
"id":"Solr117",
"name":"Solr 117",
"address":"House No - 100, LR Apache, 40702",
"comments":"Apache Solr comment 1",
"popularity":117,
"counts":117
}]

save this file as dummy.json -
Go to /example/exampledocs directory using command prompt & execute given command
java -Durl=http://localhost:8983/solr/update/json -Dtype=application/json -jar post.jar dummy.json

For multiple JSON files use -
java -Durl=http://localhost:8983/solr/update/json -Dtype=application/json -jar post.jar d1.json d2.json

For all JSON files present in working directory use-
java -Durl=http://localhost:8983/solr/update/json -Dtype=application/json -jar post.jar *.json

you will get on console a success message as -
SimplePostTool version 1.5
Posting files to base url http://localhost:8983/solr/update/json using content-type application/json..
POSTing file dummy.json
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/update/json..
Time spent: 0:00:00.535

it means your data JSON document has been indexed in Apache Solr. just go to your dashboard screen
select collection1 -> query-> Click on Execute Query Button
your screen looks like-





















this post.jar file provides you some more parameters with <add> tag in XML file. I will discuss them in my later posts.

Namah Shivay