Elasticsearch vs Solr
Executive Summary
Revised 08/2020
Selection: ElasticSearch
Rationale:
Overall, both Solr and Elasticsearch provide similar search functionality, however, there are some significant benefits to ES that prove it the champion:
-
Third party apps - There are significantly more robust third party apps compatible with ES, e.g. Kibana, Graphana. Solr has some, but they are not maintained very well and most have some issues.
-
AWS Support - Spinning of ES in AWS is trivial and will save significant resources in the long run. Per discussions with IMG, trying to do the same with Solr is much more difficult and time consuming.
-
Documentation - Documentation is significantly more detailed and robust for ES vs Solr. Will not only improve development, but will also simplify custom documentation needed.
-
Overall quality of the product - ES is just overall a better software product. Between the documentation and comprehensive suite of tools and services, it is really worth the cost to refactor EN software to use ES.
Solr features used by PDS Registry and Harvest not available in Elasticsearch
Solr | Elasticsearch (ES) |
---|---|
Blob Store API. File upload / download. Version management. |
ES supports binary fields, but we'll have to implement upload / download, version management, md5 hash calculation, etc. |
Post Tool. Command line tool for uploading various types of content to Solr. Supported formats: XML, JSON, CSV, PDF, Word, etc. |
Not available. Have to use bulk API to insert documents. Only JSON format is supported. |
XML documents (Harvest solr_doc.xml) | JSON format only. |
Search Protocol (custom request handlers / plugins) | ES supports REST handler plugins, but unlike very simple Solr API, ES plugin API is very complex and undocumented. |
Java API (SolrJ) | Completely different APIs: Transport Client (deprecated), High Level REST Client (recommended). |
License
Solr | Elasticsearch (ES) |
---|---|
Apache 2.0 |
Apache 2.0: Basic / Limited features Commercial: Security, Monitoring, Elastic App Search (server, GUI, relevance model tuning). |
Java API
Solr | Elasticsearch (ES) | |
---|---|---|
Java Libraries | SolrJ |
Transport Client - has been used for years, but deprecated in the latest ES version. High Level REST Client - new recommended API. Similar to Transport Client. |
Dependencies | 12 jars | 40+ jars |
Documentation | Very limited | Very good. |
Query Builder | No. As a workaround can use Lucene Query Builder. | Yes |
Customization / Plugins
Solr | Elasticsearch (ES) | |
---|---|---|
Summary |
Very simple plugin API. Simple customization and configuration (few XML files). Official Solr documentation is bad, but there are few books and online resources describing plugin development. There are existing PDS plugins. |
Very complex undocumented plugin API. Plugins have to be rebuild for every new version of ES. No one on the team has any experience with it. |
Customizations |
Request processing:
Fields:
Internals:
|
|
Integrated Solutions
Solr | Elasticsearch (ES) | |
---|---|---|
Big Data, Data Science |
Cloudera + H ortonworks (Merged in 2019) Multiple products, commercial and open source. Big Data, Data Science, including Solr, Spark, Hadoop, HBase, Pig, Hive, Kafka, etc. Datastax (Solr + Cassandra database) |
Science Data Analytics Platform (SDAP) - Apache Incubator SDAP has been developed collaboratively between NASA JPL, FSU, NCAR, and GMU |
DevOps (logs, metrics, visualization) | There are no integrated solutions, but Prometheus and Grafana plugins are available. | ELK - Elasticsearch, Logstash, Kibana. |
Semantic Web (search), Linked Data | Not available? |
Science Data Analytics Platform (SDAP) Apache Jena - full text search in SPARQL |
Metrics and Monitoring
Solr | Elasticsearch (ES) |
---|---|
Multiple metrics, registries and reporters. Integration with Prometheus and Grafana. | Requires X-Pack, Commercial license? |
Support by Cloud Providers
Solr | Elasticsearch (ES) | |
---|---|---|
Amazon AWS |
|
ELK (Elasticsearch, Logstash, Kibana):
|
Google Cloud Platform (GCP) |
|
|