Registry Collection Overview

PDS Registry stores its data in Apache Solr collection called "registry". Registry Manager comes with default "configset", consisting of two files, managed-schema and solrconfig.xml, located in REGISTRY_MANAGER_HOME/solr/collections/registry folder. REGISTRY_MANAGER_HOME is a directory where you installed Registry Manager, for example /home/pds/registry.

Default managed-schema file defines few common fields such as lid, vid, lidvid, title, product_class, internal refrences and basic file information, such as file name, type, size, and MD5 hash. Those are the fields extracted from PDS4 product labels by Harvest by default.

Lidvid is a primary key. If you load the same Harvest-generated "intermediate" data file multiple times, existing Solr documents will be replaced with new documents with the same lidvid.

<field name="lidvid" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<uniqueKey>lidvid</uniqueKey>

When you load data, unknown (undefined) fields are ignored

<dynamicField name="*" type="ignored" />
<fieldType name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" />

Adding More Fields

Detailed information about Solr fields and schema design is available at Solr website.

You can define new fields in managed-schema file of the registry configset. The following XML fragment will add start_date_time and stop_date_time fields to the registry collection.

<field name="start_date_time" type="pdate" indexed="true" stored="true" multiValued="false"/>
<field name="stop_date_time" type="pdate" indexed="true" stored="true" multiValued="false"/>

To apply the changes you have to delete the registry collection and all its data!

registry-manager delete-registry

and then recreate the collection again.

registry-manager create-registry

If you copied default configset from REGISTRY_MANAGER_HOME/solr/collections/registry to some other directory, for example /tmp/reg, pass -configDir flag to Registry Manager.

registry-manager create-registry -configDir /tmp/reg

You can also add and delete fields dynamically by calling Solr Schema API. For example,

curl http://localhost:8983/solr/registry/schema -X POST -H 'content-type:application/json' --data-binary '{
  "add-field": {
     "name":"start_date_time",
     "type":"pdate",
     "indexed":true,
     "stored":true,
     "multiValued":false
  }
}'

When you add fields dynamically, you can keep your existing data, but old documents will not have new fields.

It is recommended to edit managed-schema file to simplify deployment and to keep track of your changes.