Chef SolrCloud

Chef cookbook to Manage Apache SolrCloud

solrcloud Cookbook

Build Status

This is a Chef cookbook for Apache Solr.

It was primarily developed for Testing SolrCloud against Solr Master/Slave setup and its features.

Currently it supports only in built Jetty based SolrCloud deployment, more features and attributes will be added over time, feel free to contribute what you find missing!

SolrCloud is the default deployment and Solr Master/Slave setup is not supported by this cookbook.

Repository

https://github.com/vkhatri/chef-solrcloud

Supported Apache Solr Version

This cookbook was tested for Apache Solr v4.9, v4.10 and v5.1.0.

Supported Apache Solr Runtime

Currently this cookbook supports only Apache Solr in built Jetty based deployment.

Supported Apache Solr Package Deployment

Currently this cookbook only supports Apache Solr Tarball based deployment.

Supported Apache Solr Cluster Deployment

Currently this cookbook only support SolrCloud Cluster deployment. It does not support Apache Solr Master/Slave Cluster deployment.

Supported JDK Versions

Check Apache Solr Documentation for JDK Version requirement for current Solr version, Oracle JDK 7 is recommended.

Major Changes

v0.6.9

Recipes

solrcloud::tarball is the main recipe which includes all other recipe. For run_list use solrcloud::tarball.

SolrCloud configSet (Zookeeper Configs) LWRP

LWRP - solrcloud_zkconfigset

SolrCloud Zookeeper configSet is managed via LWRP - solrcloud_zkconfigset.

SolrCloud Zookeeper configSets management is enabled by default for all nodes. It means all nodes will get the configSets and will try to manage it against one of the configured zookeeper server via attribute node[:solrcloud][:solr_config][:solrcloud][:zk_host].

All the nodes communicate to a zookeeper cluster, hence attribute
`node[:solrcloud][:manage_zkconfigsets]` & `node[:solrcloud][:manage_zkconfigsets_source]`
does not require to be enabled on all the nodes.

Check Cookbook Advanced Attributes section for attribute details.

zookeeper configSet config changes

LWRP handles config changes by itself. When any change is made to configSet content, configSet will re-upload configSet to zookeeper.

LWRP example

Create a configSet using LWRP:

solrcloud_zkconfigset configset_name
  option option_name
end

Always re create/upload configSet even exists or no update to config files:

solrcloud_zkconfigset configset_name
  force_upload true
  option option_name
end

OR

Set attribute node[:solrcloud][:force_zkconfigsets_upload] to true, which affects all the configSets as resource attribute :force_upload defualt value is set to node[:solrcloud][:force_zkconfigsets_upload].

Delete a configSet using LWRP:

solrcloud_zkconfigset configset_name do
  action :delete
end

configSet via node attribute:

"default_attributes": {
  "solrcloud": {
    "zkconfigsets": {
      "abc": {
        "action": "delete"
      },
      "xyz": {
        "option name": "option value"
      }
    }
  }
}

configSets can either be configured in recipe using LWRP or using node attribute node[:solrcloud][:zkconfigsets].

configSets defined using attribute node[:solrcloud][:zkconfigsets] does not require LWRP.

LWRP Options

SolrCloud Zookeeper cmd Reference: https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities

Parameters:

LWRP configSet source cookbook/location management

All configSet content must be stored under node[:solrcloud][:zkconfigsets_cookbook]/files/default/config set name/conf/` if not managed separately.

configSets source cookbook is default set to solrcloud and can be changed via attribute node[:solrcloud][:zkconfigsets_cookbook].

If configSets are managed outside of the cookbook, configSet will only get uploaded in case it is missing in the zookeeper. Any update to separately managed configSets are not propogated to zookeeper by default. However, one can use attribute node[:solrcloud][:force_zkconfigsets_upload] to always upload the configSet regardless of the state.

Setting attribute node[:solrcloud][:force_zkconfigsets_upload] or resource attribute :force_upload would always trigger configSet upload to zookeeper. It is better not to enable rsource attribute :force_upload, but instead better to use attribute node[:solrcloud][:force_zkconfigsets_upload] on limited set of nodes.

This may vary environment to environment.

SolrCloud Collection LWRP

LWRP - solrcloud_collection

SolrCloud collection is managed via LWRP - solrcloud_collection.

Create/Delete Collection API does not require to run on all solrcloud cluster nodes, hence attribute

node[:solrcloud][:manage_collections] does not require to be enabled on all the nodes.

Check Cookbook Advanced Attributes section for attribute details.

collection Update/Change

collection LWRP only perform collection action=CREATE|DELETE|RELOAD and does not manage any UPDATE/change in the collection.

To make a change to a collection, first make the change in the LWRP or node attribute node[:solrcloud][:collections][:collection_nam][:attribute_name] for respective attribute.

Once changes are made in Chef cookbook, perform collection UPDATE or respective action call to one of the solrcloud node.

UPDATE call could be tricky and is not managed by Chef to avoid any unexpected behavior.

Re-issuing same command could hinder solrcloud cluster setup and must be re-issued carefully.

LWRP example

Create a collection using LWRP:

solrcloud_collection collection_name
  option option_name
end

Delete a collection using LWRP:

solrcloud_collection collection_name do
  action :delete
end

Reload a collection using LWRP:

solrcloud_collection collection_name do
  action :reload
end

collection via node attribute:

"default_attributes": {
  "solrcloud": {
    "collections": {
      "abc": {
        "action": "delete"
      },
      "def": {
        "action": "reload"
      },
      "xyz": {
        "num_shards": "1",
        "name": "xyz",
        "replication_factor": "1",
        "collection_config_name": "xyz",
        "option name": "value"
      }
    }
  }
}

collections can either be configured in recipe using LWRP or using node attribute node[:solrcloud][:collections].

collections defined using attribute node[:solrcloud][:collections] does not require LWRP.

LWRP Options

Collection API Reference: https://cwiki.apache.org/confluence/display/solr/Collections+API

Parameters:

Cookbook Advanced Attributes

Cookbook Core Attributes

Cookbook Ulimit Attributes

Cookbook log4j.properties Config Attributes

Cookbook Request Log Config Attributes

Cookbook Jetty Core Server Attributes

Cookbook Jetty Default Connector Attributes (org.eclipse.jetty.server.bio.SocketConnector)

Cookbook Jetty SSL Connector Attributes

Cookbook Jetty SSL Key Store Attributes

Cookbook Jetty JMX Attributes

Cookbook Jetty Context Attributes

Cookbook solr.xml Config Attributes

solr.xml Reference: https://cwiki.apache.org/confluence/display/solr/Format+of+solr.xml

Cookbook SolrCloud on HDFS Config Attributes

Note: SolrCloud on HDFS Deployment using this cookbook is not yet tested, check online solr on hdfs for more info

Cookbook Dependencies

SolrCloud Deployment Requirement

To deploy solrcloud using this cookbook, below items are required:

SolrCloud configSet Cookbook / Environments / Versioning

Directory Structure

SorlCloud configSet stored in zookeeper are configured as file resources.

Each configSet is stored under node[:solrcloud][:zkconfigsets_cookbook]/files/default/configSet name.

configSet folder follows the standard of having a conf folder with all the configuration files.

So, the directory structure will look like - node[:solrcloud][:zkconfigsets_cookbook]/files/default/configSet name/conf.

Managing same configSet for Multiple Environments

Managing configSet configuration across environments can be achieved in different ways, like

Simply, update node[:solrcloud][:zkconfigsets_cookbook] attribute with your configSet cookbook and update metadata.rb file with line:

'depends node[:solrcloud][:zkconfigsets_cookbook]'.

Single Node SolrCloud Test Cluster Deployment

Adjust the attributes according to your requirement. Below mentioned attributes will work just fine for a single node solrcloud cluster.

"default_attributes": {
  "solrcloud": {
    "zk_run": true,
    "port": "8080",
    "setup_user": true,
    "manager": true,
    "zkconfigsets": {
      "samplecollection": {}
    },
    "collections": {
      "samplecollection": {
        "collection_config_name": "samplecollection"
      }
    }
  }
}

Multi Node Manager Attributes

Below attributes are crucial for Multi Node Cluster. It is not advised to enable below solrcloud attributes on all the nodes in the cluster. Like, each new node will trigger a zookeeper configset re-upload. Creating new collection is better off maanged by one node to prevent a false collection state in the cluster.

"default_attributes": {
  "solrcloud": {
    "manage_collections": true,
    "manage_zkconfigsets": true,

Multi Node SolrCloud Test Cluster Deployment with zookeeper Cluster

Adjust the attributes according to your requirement. Below mentioned attributes will work just fine for a single node solrcloud cluster.

"default_attributes": {
  "solrcloud": {
    "solr_config": {
      "solrcloud": {
        "zk_host": [
          "zookeeper_ip:zookeeper_port"
        ]
      }
    },
    "port": "8080",
    "setup_user": true,
    "manage_collections": true,
    "manage_zkconfigsets": true,
    "zkconfigsets": {
      "samplecollection": {}
    },
    "collections": {
      "samplecollection": {
        "collection_config_name": "samplecollection"
      }
    }
  }
}

Note: You might want to enable attribute "manager": true on limited cluster nodes. In a large cluster, enabling this value on limited nodes would create less overhead for zookeeper.

Multi Node SolrCloud Test Cluster Deployment with embedded zookeeper

Adjust the attributes according to your requirement. Below mentioned attributes will work just fine for a single node solrcloud cluster.

On any one of the cluster node, enable attribute node[:solrcloud][:zk_run] and use its ip address as zookeeper server.

"default_attributes": {
  "solrcloud": {
    "solr_config": {
      "solrcloud": {
        "zk_host": [
          "instance_with_zk_run_ip:zookeeper_port_default_2181"
        ]
      }
    },
    "port": "8080",
    "setup_user": true,
    "zkconfigsets": {
      "samplecollection": {}
    },
    "collections": {
      "samplecollection": {
        "collection_config_name": "samplecollection"
      }
    }
  }
}

Multiple SolrCloud Cluster Deployment

To deploy multiple clusters, simply create multiple roles with different zookeeper server or update node attribute with respective cluster zookeeper server(s).

Zookeeper server attribute - node[:solrcloud][:solr_config][:solrcloud][:zk_host]

SolrCloud on HDFS Cluster Deployment

SolrCloud on HDFS has not been tested yet, but configuration from Apache Solr documentation has been added to the cookbook.

Tune Java Parameters for better Performance

Some of the common java options tuning by Shawn Heisey.

Node attributes:

"default_attributes": {
  "solrcloud": {
    "java_options": [
      "-Xms1024m",
      "-XX:+UseConcMarkSweepGC",
      "-XX:CMSInitiatingOccupancyFraction=75",
      "-XX:NewRatio=3",
      "-XX:MaxTenuringThreshold=8",
      "-XX:+CMSParallelRemarkEnabled",
      "-XX:+ParallelRefProcEnabled",
      "-XX:+AggressiveOpts"
    ]
  }
}

Contributing

  1. Fork the repository on Github
  2. Create a named feature branch (like add_component_x)
  3. Write your change
  4. Write tests for your change (if applicable)
  5. Run the tests (rake), ensuring they all pass
  6. Write new resource/attribute description to README.md
  7. Write description about changes to PR
  8. Submit a Pull Request using Github

Copyright & License

Authors:: Virender Khatri and Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.