Implementing a search engine with elasticsearch and Symfony

Published on 2019-09-22 • Modified on 2019-09-22

In this post, we will see how to create a full-text search engine with elasticsearch in a Symfony application. We will use Docker compose to set up an elasticsearch stack. We will try to keep the configuration as minimal as possible keeping the sensible default components values. In the end, on this website, we will able to search for articles and snippets matching one or several keywords. Let's go! 😎

» Published in "A week of Symfony 665" (23-29 September 2019).

Tutorial

This post is the first part of the tutorial "Implementing a search engine with elasticsearch and Symfony":

Prerequisite

I will assume that you have a basic knowledge of Symfony. That you know how to set up an application and to handle a database schema with an ORM (we will use here Doctrine). As a docker-compose file will be used, I will assume you are also familiar with it, if not, please read the docker-compose getting started guide.

Configuration

  • PHP 7.2
  • Symfony 4.4

Setting the development environment with docker-compose

First, we need to prepare our development environment. As I am actually learning Docker, let's see how to set up most of the components with docker-compose so we can work (have fun? 😄) in good conditions. The stack will include:

  • elasticsearch 6.8
  • elastic head 5
  • MySQL 5.7
  • Adminer (last stable)

elasticsearch head will allow us to check our local elasticsearch cluster and adminer is a basic database administration interface that will allow us to easily check our tables and data (like PhpMyAdmin).

Let's have a look at the docker-compose.yaml file:

# ./docker-compose.yaml

# DEV docker compose file ——————————————————————————————————————————————————————
# Check out: https://docs.docker.com/compose/gettingstarted/
version: '3.7'

# docker-compose -f docker-compose.yaml up -d
services:

  # Database ———————————————————————————————————————————————————————————————————

  # MySQL server database (official image)
  # https://docs.docker.com/samples/library/mysql/
  db:
    image: mysql:5.7
    container_name: sb-db
    command: --default-authentication-plugin=mysql_native_password
    ports:
      - "3309:3306"
    environment:
      MYSQL_ROOT_PASSWORD: root

  # adminer database interface (official image)
  # https://hub.docker.com/_/adminer
  adminer:
    container_name: sb-adminer
    depends_on:
      - db
    image: adminer
    ports:
      - "8089:8080"

  # elasticsearch ——————————————————————————————————————————————————————————————

  # elasticsearch server (official image)
  # https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html
  elasticsearch:
    container_name: sb-elasticsearch
    image: docker.elastic.co/elasticsearch/elasticsearch:6.8.3 # 6.8.4 out
    ports:
      - "9209:9200"
    environment:
        - "discovery.type=single-node"
        - "bootstrap.memory_lock=true"
        - "ES_JAVA_OPTS=-Xms1G -Xmx1G"
        - "xpack.security.enabled=false"
        - "http.cors.enabled=true"
        - "http.cors.allow-origin=*"

  # elasticsearch head manager (fork of mobz/elasticsearch-head for elasticsearch 6)
  # /!\ it isn't an official image /!\
  # https://hub.docker.com/r/tobias74/elasticsearch-head
  elasticsearch-head:
    container_name: sb-elasticsearch-head
    depends_on:
      - elasticsearch
    image: tobias74/elasticsearch-head:6
    ports:
      - "9109:9100"

We have two sections. One containing the elasticsearch components and the second with the database ones. To launch the Docker hub, run the following command:

docker-compose -f docker-compose.yaml up -d

img3_alt

Now, you can access the Docker hub's components exposed through HTTP:

Several notes: to access a database with adminer, you must specify a server, for our hub, it's the container_name key we have set up in the docker-compose.yml file, in this case, it's sb-db, the user is "root" and the password too. Don't use this in production! ⛔

The adminer login form

For elasticsearch head, in the top bar, you must specify the URL of the elasticsearch cluster, it's http://localhost:9209. When validating, you should see an empty node.

On this project, I am using the Symfony web server, I start the local HTTP server with the following Symfony command: (the symfony/web-server-bundle component must be installed)

php bin/console server:start

Then, the project can be browsed locally at http://127.0.0.1:8000. On my MacBookPro and MacMini, I have installed PHP with Homebrew and on my Ubuntu Workstation, PHP 7.2 was the default version installed (the three set ups work flawlessly). We won't see here how to set up a full Web server/PHP environment with Docker. To do so please check the related posts of Pierstoval. 😉 Now that we have our dev stack ready to use, let see how to build our elasticsearch index.

Installing and configuring the FOSElastica Bundle

First, we need to install the FOSElastica Bundle (Of course you could directly use elastica or another wrapper). Note, that we won't use the last elasticsearch version (7.3) because it doesn't seem to be supported by the bundle yet. Also, note that changing the elasticsearch version we use is as easy as replacing 6.8.4 by 7.3.4 in the docker-compose file! That's the power of Docker. 💪

composer require friendsofsymfony/elastica-bundle

Open the config/packages/fos_elastica.yaml file and change the port to 9209:

# Read the documentation: https://github.com/FriendsOfSymfony/FOSElasticaBundle/blob/master/Resources/doc/setup.md
# config/packages/fos_elastica.yaml
fos_elastica:
    clients:
        default: { host: localhost, port: 9209 }
    indexes:
        app: null

Now, we can launch the create index command to see is the set up is OK:

php bin/console fos:elastica:create

If you go to the elasticsearch head component, you should see an empty app index created:

The elasticsearch head

Now, let's see how to add data to the index. We will not see here all the process to create a model and the corresponding entities and tables. On this blog, I have an article table which contains all posts and snippets. The schema was created with the API Platform schema generator. So the goal here will be to add all the articles in the elasticsearch index.

Indexing data in the elasticsearch index

In the rest of this post, I will take my database schema as the reference. So replace App\Entity\Article by your entity class name. Same thing for the entity's properties. First, let's add some fields in the elasticsearch mapping:

# config/packages/fos_elastica.yaml
fos_elastica:
    clients:
        default: { host: localhost, port: 9209 }
    indexes:
        app:
            types:
                articles:
                    properties:
                        type: ~
                        name: ~
                        slug: ~
                        keyword: ~
                    persistence:
                        driver: orm
                        model: App\Entity\Article

We have added several text fields and the type of the article. (article or snippet) Let's keep the default fields settings for now and let's launch the populate command that will be responsible for refreshing the elasticsearch index:

php bin/console fos:elastica:populate
Resetting app
 42/42 [============================] 100%
Populating app/articles
Refreshing app

If you can see this, it means that the populate command was successful. We can check that the elasticsearch documents were correctly indexed. Go to the web interface of "elasticsearch head", click the "browser" tab and click on a document to see it's raw JSON. We can see the entity id (14) and the fields we have declared to be indexed. (type, name, slug, keyword).

Raw source of an elasticsearch document we've just indexed.

Now that we have an index with some data, let's try to search.

Searching and displaying the results

For clarity, we will create a basic controller that will handle the search action. First, we need to bind a variable to the finder service of the "articles" type. This service is automatically created by the FOSElastica bundle depending on the types declared in the configuration. Add this in your config/services.yaml file.

# config/services.yaml
services:
    _defaults:
        bind:
            $articlesFinder: '@fos_elastica.finder.app.articles'

Then, thanks to autoloading, we can inject this service in our new controller:

<?php declare(strict_types=1);

// src/Controller/SearchController.php

namespace App\Controller;

use FOS\ElasticaBundle\Finder\TransformedFinder;
use Symfony\Component\HttpFoundation\Request;
use Symfony\Component\HttpFoundation\Response;
use Symfony\Component\HttpFoundation\Session\SessionInterface;
use Symfony\Component\Routing\Annotation\Route;

/**
 * You know, for search.
 *
 * @Route("/{_locale}", name="search_part1_", requirements={"_locale"="%locales_requirements%"})
 */
class SearchPart1Controller extends AbstractController
{
    /**
     * @Route({"en": "/part1/search", "fr": "/partie1/recherche"}, name="main")
     */
    public function search(Request $request, SessionInterface $session, TransformedFinder $articlesFinder): Response
    {
        $q = (string) $request->query->get('q', '');
        $results = !empty($q) ? $articlesFinder->findHybrid($q) : [];
        $session->set('q', $q);

        return $this->render('search/search_part1.html.twig', compact('results', 'q'));
    }
}

The action of this controller will be very concise. We get the keyword to search from the HTTP query (q is for query) then we call the findHybrid function to find the articles matching with it, then we save the keyword in session. For each result, the findHybrid function will return two objects: The first one, the "hit", will contain the meta information of the raw elasticsearch response for this result. It's in this object that will get the score of the document. When providing a keyword, all results are sorted by score from the most to the least relevant. The second object is the Doctrine entity matching the search so we don't have to handle to raw elasticsearch response manually. Now, we can display the results:

{% extends 'layout.html.twig' %}

{# templates/search/search_part1.html.twig // This is the template of the 1st part of the tutorial #}

{% trans_default_domain 'search' %}

{% set esArticle = article_es() %} {# Don't do this! This is to avoid polluting the SearchController #}

{% block content %}
    <div class="col-md-12">
        <div class="card">
            <div class="card-header card-header-primary">
                <p class="h3">{{ 'your_search_for'|trans}} <b>"{{ q }}"</b>, <b>{{ results|length }}</b> {{ 'results'|trans}}.</p>
            </div>
            <div class="card-body">
                <p class="h3">&raquo; {{ 'get_back'|trans}} "<a href="{{ path('blog_show', {'slug': esArticle[1].slug|a_slug(locale), 'q': q}) }}#search_form">{{ ('title_'~esArticle[1].id)|trans({}, 'blog') }}</a>"</p>
            </div>
        </div>
    </div>
    {% for result in results %}
        {% set hit = result.result.hit %}
        {% set article = result.transformed %}
        {% if article.isArticle %}
            {% set tag_route = 'blog_list_tag' %}
            {% set pathEn = path('blog_show', {'_locale': 'en','slug': article.slug|a_slug('en')}) %}
            {% set pathFr = path('blog_show', {'_locale': 'fr','slug': article.slug|a_slug('fr')}) %}
            {% set title = ('title_'~article.id)|trans({}, 'blog') %}
        {% else %}
            {% set tag_route = 'snippet_list_tag' %}
            {% set pathEn = path('snippet_show', {'_locale': 'en', 'slug': article.slug|s_slug('en') }) %}
            {% set pathFr = path('snippet_show', {'_locale': 'fr', 'slug': article.slug|s_slug('fr') }) %}
            {% set title = ('title_'~article.id)|trans({}, 'snippet') %}
        {% endif %}
        <div class="card">
            <div class="card-header">
                <h2 class="h3">
                    [{{ ('type_'~article.type.id)|trans({}, 'blog') }}] {{ title }} &raquo; {{ 'score'|trans }} <b>{{ hit._score }}</b>
                </h2>
            </div>

            <div class="card-body">
                <div class="blog-tags">
                    {% for tag in article.keywords %}<a class="badge badge-{{ random_class() }}" href="{{ path(tag_route, {'tag': tag}) }}"><i class="far fa-tag"></i> &nbsp;{{ tag|trans({}, 'breadcrumbs') }}</a> {% endfor %}
                </div>
                <br/>
                <p class="card-text text-center">
                    <a href="{{ pathEn }}" class="btn btn-primary card-link">🇬🇧 {{ 'read_in_english'|trans({}, 'blog') }}</a>
                    <a href="{{ pathFr }}" class="btn btn-primary card-link">🇫🇷 {{ 'read_in_french'|trans({}, 'blog') }}</a>
                </p>
            </div>
        </div>
    {% endfor %}
    <div class="col-md-12">
        {% if results is empty %}
            <p class="h3">{{ 'no_results'|trans }}</p>
        {% endif %}
    </div>

    <div class="col-md-12">
        {% include 'search/_form.html.twig' with {route: 'search_part1_main'} %}
    </div>
{% endblock %}

Let's have a look at the template. Don't be afraid! There are specific code and helpers developed for this blog, it's not the subject of the post (it's the real template used by the search). The two important lines are at the beginning of the for loop:

{% set hit = result.result.hit %}
{% set article = result.transformed %}

As mentioned before, first, we get the hit object, then we can access the score with hit._score (It is displayed at the right of the article or snippet title). Then, we get the Article Doctrine entity with result.transformed. Now, we can access the entity getters like we are used too with Twig. For example, article.isArticle will return true if the article is a blog post and false if it's a snippet (there are only two article types). That's it! You can test the search with the following form:

When launching a search, a new entry is automatically added in the debug panel so one can easily debug the raw elasticsearch query that was executed.

img4_alt

Note that there are up to ten results returned for now (no pagination). Our search engine is working but it's very basic for now. There is a very annoying problem because the texts of the posts and snippets are not completely indexed yet because they are stored in translations files. So, the next goal will be to include them when indexing so the search relevance is much better. We can also implement several interesting things, the pagination, the boosts, the alias... As this blog post is already quite big, let's keep all these things for the next one! 😌

That's it! I hope you like it. Check out the links below to have additional information related to the post. As always, feedback, likes and retweets are welcome. (see the box below) See you! COil. 😊

  Read the second part

  Read the doc  More on the web

They gave feedback and helped me to fix errors and typos in this article, many thanks to dkarlovi, jmsche. 😊


» Call to action

Did you like this post? You can help me back in several ways: (use the Tweet on the right to comment or to contact me )

  • Report any error/typo.
  • Report something that could be improved.
  • Like and retweet!
  • Follow me on Twitter
  • Subscribe to the RSS feed.

Thank you for reading! And see you soon on Strangebuzz! 😉

COil