Elasticsearch vs Solr

Executive Summary

Revised 08/2020

Selection: ElasticSearch

Rationale:

Overall, both Solr and Elasticsearch provide similar search functionality, however, there are some significant benefits to ES that prove it the champion:

  • Third party apps - There are significantly more robust third party apps compatible with ES, e.g. Kibana, Graphana. Solr has some, but they are not maintained very well and most have some issues.

  • AWS Support - Spinning of ES in AWS is trivial and will save significant resources in the long run. Per discussions with IMG, trying to do the same with Solr is much more difficult and time consuming.

  • Documentation - Documentation is significantly more detailed and robust for ES vs Solr. Will not only improve development, but will also simplify custom documentation needed.

  • Overall quality of the product - ES is just overall a better software product. Between the documentation and comprehensive suite of tools and services, it is really worth the cost to refactor EN software to use ES.

Solr features used by PDS Registry and Harvest not available in Elasticsearch

Solr Elasticsearch (ES)

Blob Store API.

File upload / download. Version management.

ES supports binary fields, but we'll have to implement upload / download, version management, md5 hash calculation, etc.

Post Tool. 

Command line tool for uploading various types of content to Solr.

Supported formats: XML, JSON, CSV, PDF, Word, etc.

Not available. Have to use bulk API to insert documents. Only JSON format is supported.
XML documents (Harvest solr_doc.xml) JSON format only.
Search Protocol (custom request handlers / plugins) ES supports REST handler plugins, but unlike very simple Solr API, ES plugin API is very complex and undocumented.
Java API (SolrJ) Completely different APIs: Transport Client (deprecated), High Level REST Client (recommended).

License

Solr Elasticsearch (ES)
Apache 2.0

Apache 2.0: Basic / Limited features

Commercial: Security, Monitoring, Elastic App Search (server, GUI, relevance model tuning).

Java API

Solr Elasticsearch (ES)
Java Libraries SolrJ

Transport Client - has been used for years, but deprecated in the latest ES version.

High Level REST Client - new recommended API. Similar to Transport Client.

Dependencies 12 jars 40+ jars
Documentation Very limited Very good.
Query Builder No. As a workaround can use Lucene Query Builder. Yes

Customization / Plugins

Solr Elasticsearch (ES)
Summary

Very simple plugin API. Simple customization and configuration (few XML files).

Official Solr documentation is bad, but there are few books and online resources describing plugin development.

There are existing PDS plugins.

Very complex undocumented plugin API.

Plugins have to be rebuild for every new version of ES.

No one on the team has any experience with it.

Customizations

Request processing:

  • Request handlers (search, query parsers)

  • Highlighting

  • Update Requests 

  • Query Response writers

  • Similarity (scoring)

  • Cache Regenerator

Fields:

  • Analyzer

  • Tokenizer and TokenFilter

  • FieldType

Internals:

  • SolrCache

  • SolrEventListener

  • UpdateHandler

  • Custom settings plugin

  • REST handler (Action plugin)

  • Search plugin (Rescore example)

  • Script plugin

It is very likely that other ES features can be also customized, but no more information is available.

Integrated Solutions

Solr Elasticsearch (ES)
Big Data, Data Science

Cloudera + H ortonworks (Merged in 2019)

Multiple products, commercial and open source. Big Data, Data Science, including Solr, Spark, Hadoop, HBase, Pig, Hive, Kafka, etc.

Datastax (Solr + Cassandra database) 

Science Data Analytics Platform (SDAP) - Apache Incubator

SDAP has been developed collaboratively between NASA JPL, FSU, NCAR, and GMU

DevOps (logs, metrics, visualization) There are no integrated solutions, but Prometheus and Grafana plugins are available. ELK - Elasticsearch, Logstash, Kibana.
Semantic Web (search), Linked Data Not available?

Science Data Analytics Platform (SDAP)

Apache Jena - full text search in SPARQL

Metrics and Monitoring

Solr Elasticsearch (ES)
Multiple metrics, registries and reporters. Integration with Prometheus and Grafana. Requires X-Pack, Commercial license?

Support by Cloud Providers

Solr Elasticsearch (ES)
Amazon AWS
  • AMI (Amazon Machine Image - VM)

  • Amazon EKS (Kubernetes container)

ELK (Elasticsearch, Logstash, Kibana):

  • SaaS (Managed service)

  • AMI (VM)

  • Amazon EKS

Google Cloud Platform (GCP)
  • SaaS (SearchStax)

  • VM

  • SaaS

  • VM