Test Your Deployment

Overview

This document describes common tasks to do with the Registry, such as initializing your Registry, extracting metadata from PDS4 labels and loading it into the Registry, querying Elasticsearch, and using Registry APIs.

Extract metadata from PDS4 labels

To extract metadata from PDS4 labels (XML files) you have to use Harvest command-line tool. Harvest parses PDS4 files and stores extracted metadata in a "newline delimited JSON" data file. The JSON data file can be loaded into Elasticsearch by Registry Manager.

To run Harvest you need an XML configuration file. The configuration file has several sections which control which files the Harvest tool will crawl and what data to extract. Very basic configuration is shown below.

<?xml version="1.0" encoding="UTF-8"?>

<harvest nodeName="PDS_ENG">
  <directories>
    <path>/data/LADEE/ldex_20161118"</path>
  </directories>
</harvest>

You can process bundles or directories. There are few example configuration files in Harvest installation directory, e.g., harvest-3.5.1/conf/examples/directories.xml. Either edit one of the example files or use the minimal configuration shown above. Note that you have to use a valid value of "nodeName" attribute.

Update the path with the location of your dataset. A sample PDS4 archive is provided in the `test` folder of the pds-registry-app package which you just installed. IMPORTANT NOTE: don't keep this sample dataset in a production registry. More information about Harvest configuration is available in Harvest operation documentation.

To run Harvest, use harvest command on Unix or harvest.bat on Windows. The following Unix example uses config.xml configuration file. The generated JSON data files will be saved in ~/tmp/data1 directory.

harvest -c /path/to/my/config.xml -o ~/tmp/data1

After running Harvest, the output folder (~/tmp/data1) will have several files:

  • registry-docs.json - metadata extracted from PDS4 labels, stored in Newline-delimited JSON (NJSON) format.
  • refs-docs.json - product references extracted from collection inventory files.
  • fields.txt - a list of field names extracted from PDS4 labels.

Create Registry

You must create following registry indices in Elasticsearch, before loading data generated by Harvest tool.

  • registry - this index stores metadata extracted from PDS4 labels, one ES document per PDS label.
  • registry-dd - this index stores data dictionary - a list of searchable fields and its data types. When registry is created, the data dictionary is populated with fields (attributes) from PDS common and few discipline dictionaries. You can add more fields as described in Registry Customization / Data Dictionary section.
  • registry-refs - this index stores product references extracted from collection inventory files. There could be 1 or more ES documents per inventory file.

To create registry indices in local Elasticsearch (running at http://localhost:9200) with 1 shard and 0 replicas, run the following Registry Manager command

registry-manager create-registry

You can customize create-registry command by passing several parameters, such as Elasticsearch URL, number of shards and replicas, authentication parameters. To see the list of available parameters and basic usage run

registry-manager create-registry -help

To check that registry indices were created open the following URL in a browser: http://localhost:9200/_cat/indices?v or use curl.

curl "http://localhost:9200/_cat/indices?v"

The response should look similar to this. Make sure that index status is "green".

health status index         uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   registry      PY6ObzELRlSx9gHOWbR8dw   1   0          0            0       208b           208b
green  open   registry-dd   CuJ-nqg1SbKI9hejHrISWA   1   0       2505            0      625kb          625kb
green  open   registry-refs 1cJLc-9cQj2D_MAYo7gOpw   1   0          0            0       208b           208b

Registry Data Dictionary

When a registry is created, the registry data dictionary is populated with fields (attributes) from PDS common and few discipline dictionaries. There are many data dictionaries at PDS website but only few of them are loaded into Registry by default.

You may need to load more data dictionaries as described in Registry Customization section.

Load harvested data into Registry (Elasticsearch)

JSON files generated by Harvest, can be loaded into Elasticsearch by Registry Manager as shown below.

registry-manager load-data -dir ~/tmp/data1/

Where ~/tmp/data1/ is the Harvest output directory with JSON files.

By default, Registry Manager will try updating Registry schema (add more fields) from fields.txt file generated by Harvest in ~/tmp/data1/.

If you try loading JSON data files with fields not defined in the Registry data dictionary, you will get the following error:

[ERROR] Could not find datatype for field '...'

You have to update the registry data dictionary as described in Registry Customization section before you can load the data.

Query Elasticsearch

You can query Registry indices in Elasticsearch by calling Elasticsearch Search API.

You can either use simple Lucene queries, passed in the URL:

# Select all products
curl "http://localhost:9200/registry/_search?q=*&pretty"

# Select only collections
curl "http://localhost:9200/registry/_search?q=product_class:Product_Collection&pretty"

Or more advanced Elasticsearch queries defined in JSON and passed in the request body:

curl -X GET "localhost:9200/registry/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "term": {
      "product_class": "Product_Collection"
    }
  }
}
'

Use Registry API

Swagger UI

Open http://localhost:8080 URL in a web browser. You should see a page similar to this.

Select an API you want to call, for example collections

Select response content type "application/json" from a dropdown and click "Try it out!" button. You should see JSON response in "Response Body" section of the screen. You might need to scroll down to see the results.

Curl

You can also use curl to call Registry API. For example,

curl -X GET --header 'Accept: application/json' 'http://localhost:8000/collections?limit=10'