Common Operations

Extract PDS4 Product Metadata

Run Harvest tool to crawl PDS4 products and extract metadata in JSON (NJSON) format. In addition to some basic information, such as lid, vid, product class, internal references, file name and size, you can configure additional fields to export. Optionally the whole PDS product labels can be stored as BLOBs (Binary Large OBjects).

After running Harvest, the output folder (default is /tmp/harvest/out/) will have two files

  • es-docs.json - metadata extracted from PDS4 labels, stored in Newline-delimited JSON (NJSON) format.
  • fields.txt - a list of field names extracted from PDS4 labels.

See Harvest Documentation for more information.

Create, Delete, Customize Registry

You must create registry indices in Elasticsearch, before loading data generated by Harvest tool. See Registry Installation and Registry Manager for more information.

You may want to add more fields to the data dictionary or change default configuration. See Registry Customization for more information.

Load Metadata

Newline Delimited JSON file generated by Harvest, can be loaded into Elasticsearch by Registry Manager as shown below.

registry-manager load-data -file /home/pds/harvest/out/es-docs.json

Automatic Schema Update and Common Errors

By default, registry manager will try updating registry schema (add more fields) from fields.txt file (generated by Harvest) located in the same directory as es-docs.json.

You might see following error if you decide to copy es-docs.json file from Harvest output folder to another location and forget to copy fields.txt.

[ERROR] /my-folder/fields.txt (The system cannot find the file specified)

When registry is created, the registry data dictionary is populated with field definitions (field name to data type mappings) from PDS common and few discipline dictionaries. If you try loading labels with fields not defined in the registry data dictionary, you will get the following error:

[ERROR] Could not find datatype for field '...'

You have to update the registry data dictionary as described in Registry Customization section before you can load the data.

If you have non-standard registry configuration and know what you are doing, you can disable schema update by passing updateSchema parameter to load-data command.

registry-manager load-data -file /home/pds/harvest/out/es-docs.json -updateSchema n

Accidental Update of Existing Documents

Registry index uses lidvid as a primary key. If you load data with the same lidvids multiple times, old documents will be replaced by new documents. We plan to implement additional check to prevent accidental update of existing documents in next release.

View / Search Metadata

Elasticsearch Search API

You can either use simple Lucene queries, passed in the URL:

curl "http://localhost:9200/registry/_search?q=product_class:Product_Collection&pretty"

Or more advanced Elasticsearch queries defined in JSON and passed in request body:

curl -X GET "localhost:9200/registry/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "term": {
      "product_class": "Product_Collection"
    }
  }
}
'

You can find more about Elasticsearch Search API at Elasticsearch web site.

Delete Metadata

You can use Registry Manager tool to delete metadata by lidvid, lid, package id (Harvest run id), or to delete all data. Few examples are shown below.

registry-manager delete-data -lidvid urn:nasa:pds:context:target:asteroid.4_vesta::1.1
registry-manager delete-data -lid urn:nasa:pds:context:target:asteroid.4_vesta
registry-manager delete-data -packageId 8d8ae96d-044e-473d-a278-62635b1c5977
registry-manager delete-data -all

You can also use Elasticsearch delete by query API.

Export Metadata

You can use Registry Manager tool to export metadata by lidvid, package id (Harvest run id), or to export all data. Few examples are shown below.

registry-manager export-data -file /tmp/mydata.json -lidvid urn:nasa:pds:context:target:asteroid.4_vesta::1.1
registry-manager export-data -file /tmp/mydata.json -packageId 8d8ae96d-044e-473d-a278-62635b1c5977
registry-manager export-data -file /tmp/mydata.json -all

Data is saved in a Newline Delimited JSON file which can be loaded into Elasticsearch by 'load-data' command. The same file format is used by Harvest and Elasticsearch bulk API.

Export Files (BLOBs)

If PDS product label BLOBs (Binary Large OBjects) were generated by Harvest, they can be exported by Registry Manager tool as shown below.

registry-manager export-file -lidvid urn:nasa:pds:context:target:asteroid.4_vesta::1.1 -file /tmp/4_vesta.xml