Pick Up SphinxSearch: Installation And Configuration

 ·  · 

SphinxSearch is a widely used full-text search engine, especially in PHP community. It is free and opensource.

SphinxSearch works just like a standalone database. There are 3 ways for client application or DBMS to communicate with SphinxSearch:

  • SphinxAPI, official native API library;
  • SphinxSQL, a subset implementation of SQL, in this way application access data from SphinxSearch just like working with a real database such as MySQL;
  • SphinxSE, short for Sphinx Storage Engine, essentially this is a wrapper of SphinxAPI, can work as a pluggable storage engine for MySQL or MariaDB.

 

1. Installation

The official SphinxSearch document lists all supported system and corresponding installation method. Here will only show the detail of installing via Ubuntu launchpad PPA (Personal Package Archive) repository.

First go to launchpad, check if there is published packages for both your Ubuntu version and SphinxSearch version.

Take SphinxSearch release 2.2 PPA sphinxsearch-rel22 as an example, we can see from the screenshot below that sphinxsearch-rel22 support Ubuntu version from 10.04 to 15.04.

check_sphinxseach_PPA

Add this PPA to to apt source:

[[email protected] ~] sudo add-apt-repository ppa:builds/sphinxsearch-rel22
[[email protected] ~] sudo apt-get update

Install sphinxsearch-rel22:

[[email protected] ~] sudo apt-get install sphinxsearch

 

2. Configuration

The default official sample configuration file is under /etc/sphinxsearch/sphinx.conf on Ubuntu. There are 3 main sections in sphinx.conf file: source, index, searchd. Following picture illustrates architecture and relationship between sphinx.conf and SphinxSearch.

sphinxsearch_work_flow

2.1 source

The first section in sphinx.conf is: source. This section defines where the data source is. Usually the data source is MySQL, PostgreSQL or MariaDB. Below is a sample source configuration:

#############################################################################
## data source definition
#############################################################################

# articles index source
source articlesrc
{
    type            = mysql

    sql_host        = localhost
    sql_user        = dbuser
    sql_pass        = dbuser_passwd
    sql_db          = dbname
    sql_port        = 3306  # optional, default is 3306

    sql_query_pre     = SET NAMES utf8

    sql_query = SELECT id, title, excerpt, content, published_at, views FROM articles WHERE status=1

    sql_attr_timestamp = published_at
    sql_attr_uint = views

    sql_ranged_throttle = 0
}
  • articlesrc is the source name, which will be passed to and will be used by index section.
  • type, specify the type of the datasource, here is mysql.
  • sql_host, sql_user, sql_pass, sql_db, sql_port, define parameters used to connect datasource.
  • The most import part is sql_query. It gives what exactly is the data that will be indexed. Pay attention to the first column id in sql_query. It is required and ALWAYS the document id, which must be an UNIQUE UNSIGNED POSITIVE INTEGER NUMBER. Usually it is the primary key column.

We have to introduce two terms in SphinxSearch here: field and attribute.

  • field means "full-text field", will be full-text indexed, and will be used to match text query from client. Except id, other columns in sql_query sentence is optional, but these columns will be automatically assumed to be field, which will be full-text indexed by SphinxSearch, UNLESS you explicitly configure these columns as attribute.

  • attribute, according to official document, is "additional values associated with each document that can be used to perform additional filtering and sorting during search". So attribute is just stored in index with fields, but not full-text indexed. The purpose of attribute is to do some filter, sort or group stuff.

    To make column as an attribute, use sql_attr_* definitions in source section. Take above configurations as an example, we mark pulished_at and views as attribute, in order to filter or sort the match results.

    There is a special case: sql_field_string. This directive declare that the column will be both field and attribute, which means that this column not only be full-text indexed but also can be used to filter, sort or group the match results.

2.2 index

The indexer command of SphinxSearch will read the configurations in index section, then execute querying and indexing work. So the first thing in index section is to tell indexer which source settings will be used. Below is an example configuration.

# articles index
index articles
{
    source          = articlesrc

    path            = /var/lib/sphinxsearch/data/articles

    docinfo         = extern

    mlock           = 0

    morphology      = none

    min_word_len        = 1
}
  • source key specifies which source configuration will be used.
  • path key defines where to store the indexed data file.

2.3 searchd

searchd section will be used by searchd daemon command.

searchd
{
    listen          = 9312
    listen          = 9306:mysql41

    # log file, searchd run info is logged here
    # optional, default is 'searchd.log'
    log         = /var/log/sphinxsearch/searchd.log

    query_log           = /var/log/sphinxsearch/query.log

    read_timeout        = 5

    client_timeout      = 300

    max_children        = 30

    # PID file, searchd process ID file name
    # mandatory
    pid_file            = /var/run/sphinxsearch/searchd.pid
}

The most important setting in this section is listen. This setting specifies which IP address and port the searchd service will listen on. There are two default ports:

  • 9312 which will listen SphinxAPI request;
  • 9306 which will listen SphinxQL query.

2.4 $START

These is also a tricky setting: '$START', which is defined in /etc/default/sphinxsearch file. If $START is set to yes,

# in file: /etc/default/sphinxsearch
START=yes

the searchd daemon will spawn and relaunch itself even you kill it.