Implementing a search engine with elasticsearch and Symfony (part 3)

Published on 2019-11-16 • Modified on 2019-11-29

In this third and last part, we will continue to improve our search engine. First, we will enhance our elasticsearch stack with Kibana. Then, we will implement an autocomplete using an elasticsearch suggester. Let's go! ๐Ÿ˜Ž

» Published in "A week of Symfony 672" (10-17 November 2019).

Tutorial

This post is the third and last part of the tutorial "Implementing a search engine with Elasticsearch and Symfony":

Prerequisite

The prerequisite are the same as the first two parts. It is, of course, recommended to read them (links above) before continuing with this one.

Configuration

I've just migrated the blog to Symfony 4.4. I had weird behaviours with tests but everything else worked flawlessly without changing a single line of code. ๐Ÿ˜‰

  • PHP 7.2
  • Symfony 4.4.1
  • elasticsearch 6.8

Installing Kibana

First, we will try to improve our Elasticsearch stack. Until now, we used the "head" plugin to manage our cluster. But this development tool is quite old and not maintained anymore. So, let's add Kibana to our Docker setup. Kibana is an open-source data visualization plugin for Elasticsearch. Of course, it will allow us to do all the basic maintenance tasks we used to with head: delete, close an index, create, delete an alias, check a document, verify the index mappingsโ€ฆ and much more! The list of what you can do with it is impressive (check out at the left menu of the next screenshot). Let's add the following entry to the docker-compose.yaml file:

kibana:
    container_name: sb-kibana
    image: docker.elastic.co/kibana/kibana:6.8.4
    ports:
      - "5601:5601"
    environment:
      - "ELASTICSEARCH_URL=http://sb-elasticsearch"
    depends_on:
      - elasticsearch
Click here to view the new full docker-compose.yaml file.
# ./docker-compose.yaml

# DEV docker compose file โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”
# Check out: https://docs.docker.com/compose/gettingstarted/
version: '3.7'

# docker-compose -f docker-compose.yaml up -d
services:

  # Database โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”

  # MySQL server database (official image)
  # https://docs.docker.com/samples/library/mysql/
  db:
    image: mysql:5.7
    container_name: sb-db
    command: --default-authentication-plugin=mysql_native_password
    ports:
      - "3309:3306"
    environment:
      MYSQL_ROOT_PASSWORD: root

  # adminer database interface (official image)
  # https://hub.docker.com/_/adminer
  adminer:
    container_name: sb-adminer
    depends_on:
      - db
    image: adminer
    ports:
      - "8089:8080"

  # elasticsearch โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”

  # elasticsearch server (official image)
  # https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html
  elasticsearch:
    container_name: sb-elasticsearch
    image: docker.elastic.co/elasticsearch/elasticsearch:6.8.4
    ports:
      - "9209:9200"
    environment:
        - "discovery.type=single-node"
        - "bootstrap.memory_lock=true"
        - "ES_JAVA_OPTS=-Xms1G -Xmx1G"
        - "xpack.security.enabled=false"
        - "http.cors.enabled=true"
        - "http.cors.allow-origin=*"

  # elasticsearch head manager (fork of mobz/elasticsearch-head for elasticsearch 6)
  # /!\ it isn't an official image /!\
  # https://hub.docker.com/r/tobias74/elasticsearch-head
  elasticsearch-head:
    container_name: sb-elasticsearch-head
    depends_on:
      - elasticsearch
    image: tobias74/elasticsearch-head:6
    ports:
      - "9109:9100"

  # kibana (official image)
  # https://hub.docker.com/_/kibana
  kibana:
    container_name: sb-kibana
    image: docker.elastic.co/kibana/kibana:6.8.4
    ports:
      - "5609:5601"
    environment:
      - "ELASTICSEARCH_URL=http://sb-elasticsearch"
    depends_on:
      - elasticsearch

  # Cache โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”
  # Redis (official image)
  # https://hub.docker.com/_/redis
  redis:
    image: redis:5.0.6-alpine
    container_name: sb-redis
    ports:
      - '6389:6379'

As you can see, we pass the URL of the Elasticsearch server which hostname is the name of the docker container (sb-elasticsearch). We keep the standard 5601 port. We also used the same image version (6.8.4) that we used for the Elasticsearch server so we are sure there are no compatibility problems. If you restart the docker hub, you can access the management page:

Kibana in action!

That's it for Kibana. I will stop here, it would require much more than a full article to introduce all the features. Check out the official website for more information. Kibana is very powerful, it can also be used to view your Symfony logs! Check out this very nice post on the JoliCode blog about this subject.

Adding an autocomplete in the search bar

As you can see, I have put a search bar in the header of this website. It works, but what about trying to autocomplete the user input and suggest terms they can find on this blog? Let's see how we can do this with Elasticsearch, we are going to build a new index that will be dedicated to this task.

Setting the mapping

Until now, we used the default "text" type for all mapping properties. In this case, we will use a special type: completion. We will add a new "suggest" index configuration just after the "app" we were using in the previous posts:

# config/packages/fos_elastica.yaml
fos_elastica:
    clients:
        default: { host: '%es_host%', port: '%es_port%' }
    indexes:
        app:
          ###
        suggest:
            use_alias: true
            settings:
                index:
                    analysis:
                        analyzer:
                            suggest_analyzer:
                                type: custom
                                tokenizer: standard
                                filter: [lowercase, asciifolding]
            types:
                keyword:
                    properties:
                        locale:
                            type: keyword
                        suggest:
                            type: completion
                            analyzer: suggest_analyzer
                            contexts:
                                - name: locale
                                  type: category
                                  path: locale
Click here to see the full YAML mapping.
# config/packages/fos_elastica.yaml
fos_elastica:
    clients:
        default: { host: '%es_host%', port: '%es_port%' }
    indexes:
        app:
            use_alias: true
            types:
                articles:
                    properties:
                        type: ~
                        keywordFr: { boost: 4 }
                        keywordEn: { boost: 4 }
                        # i18n
                        titleEn: { boost: 3 }
                        titleFr: { boost: 3 }
                        headlineEn: { boost: 2 }
                        headlineFr: { boost: 2 }
                        ContentEn: ~ # The default boost value is 1
                        ContentFr: ~
                    persistence:
                        driver: orm
                        model: App\Entity\Article
                        provider:
                            service: App\Elasticsearch\Provider\ArticleProvider
                        listener:
                            insert: false
                            update: false
                            delete: false
        suggest:
            use_alias: true
            settings:
                index:
                    analysis:
                        analyzer:
                            suggest_analyzer:
                                type: custom
                                tokenizer: standard
                                filter: [lowercase, asciifolding]
            types:
                keyword:
                    properties:
                        locale:
                            type: keyword
                        suggest:
                            type: completion
                            analyzer: suggest_analyzer
                            contexts:
                                - name: locale
                                  type: category
                                  path: locale

Some explanations about the new index and its mapping: before declaring the new type, I add a custom analyser in the setting section. This asciifolding filter will allow us to ignore french accents to make the match even accents are not used in the input. For example, if we type: "element", the "รฉlรฉment" word should be suggested.
Then, in the "type" section, we also use an alias as the main app index. In the mapping, we have two properties: first, the suggest of the completion types. We need this special type to be able to use the completion suggester as we will see in the next chapter. And we have a second property locale that will allow us to filter the suggestions depending on the current locale (en or fr). We can see that we have added a context to the suggest field and it's associated with the locale property (path: locale).
If we launch the populate command, the new index is created. As this point, we now have two indexes in the Elasticsearch cluster:

Kibana in action!

Populating the suggest index

Now, we must populate the new suggest index. As there is no model mapped to this index, we won't create a provider but a command. The idea is to extract all the words we have already indexed in the app index. Here is the new Symfony command: (some insights after the snippet ๐Ÿค”)

<?php declare(strict_types=1);

// src/Command/PopulateSuggestCommand.php

namespace App\Command;

use Doctrine\Common\Inflector\Inflector;
use Elastica\Document;
use FOS\ElasticaBundle\Elastica\Index;
use FOS\ElasticaBundle\Finder\TransformedFinder;
use FOS\ElasticaBundle\HybridResult;
use FOS\ElasticaBundle\Paginator\FantaPaginatorAdapter;
use Pagerfanta\Pagerfanta;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;

/**
 * Populate the suggest elasticsearch index.
 */
class PopulateSuggestCommand extends Command
{
    public const NAMESPACE = 'strangebuzz';
    public const CMD = 'populate';
    public const DESC = 'Populate the "suggest" elasticsearch index';

    private $articlesFinder;
    private $suggestIndex;

    public function __construct(TransformedFinder $articlesFinder, Index $suggestIndex)
    {
        parent::__construct();
        $this->articlesFinder = $articlesFinder;
        $this->suggestIndex = $suggestIndex;
    }

    protected function configure(): void
    {
        $namespace = self::NAMESPACE;
        $cmd = self::CMD;
        $desc = self::DESC;
        $this->setName($namespace.':'.$cmd)
            ->setDescription(self::DESC)
            ->setHelp(
                <<<EOT
{$desc}
<info>php bin/console {$namespace}:{$cmd}</info>
EOT
            );
    }

    protected function execute(InputInterface $input, OutputInterface $output): int
    {
        $output->writeln(self::DESC);
        $pagination = $this->findHybridPaginated($this->articlesFinder, '');
        $nbPages = $pagination->getNbPages();
        $keywords = [];

        foreach (range(1, $nbPages) as $page) {
            $pagination->setCurrentPage($page);
            foreach ($pagination->getCurrentPageResults() as $result) {
                if ($result instanceof HybridResult) {
                    foreach ($result->getResult()->getSource() as $property => $text) {
                        if ($property === 'type') {
                            continue;
                        }
                        $locale = explode('_', Inflector::tableize($property))[1] ?? 'en';
                        $text = strip_tags($text ?? '');
                        $textArray = str_word_count($text, 2, 'รงรฉรขรชรฎรฏรดรปร รจรนล“ร‡ร‰ร‚รŠรŽรร”ร›ร€รˆร™ล’'); // FGS dot not remove french accents! ๐Ÿ™ƒ
                        $textArray = array_filter(\is_array($textArray) ? $textArray : []);
                        $keywords[$locale] = array_merge($keywords[$locale] ?? [], $textArray);
                    }
                }
            }
        }

        // Index by locale
        foreach ($keywords as $locale => $localeKeywords) {
            // Remove small words and remaining craps (emojis) ๐Ÿ˜–
            $localeKeywords = array_unique(array_map('mb_strtolower', $localeKeywords));
            $localeKeywords = array_filter($localeKeywords, static function ($v) {
                return mb_strlen($v) > 2;
            });
            $documents = [];
            foreach ($localeKeywords as $idx => $keyword) {
                $documents[] = (new Document())
                    ->setType('keyword')
                    ->set('locale', $locale)
                    ->set('suggest', $keyword);
            }
            $responseSet = $this->suggestIndex->addDocuments($documents);

            $output->writeln(sprintf(' -> TODO: %d -> DONE: <info>%d</info>, "%s" keywords indexed.', count($documents), $responseSet->count(), $locale));
        }

        return 0;
    }

    private function findHybridPaginated(TransformedFinder $articlesFinder, string $query): Pagerfanta
    {
        $paginatorAdapter = $articlesFinder->createHybridPaginatorAdapter($query);

        return new Pagerfanta(new FantaPaginatorAdapter($paginatorAdapter));
    }
}

Some explanations: ๐Ÿ’ก

  • We perform a wildcard search to get the total number of pages.
  • We iterate over each page to get the articles.
  • For each article, we extract all the keys from the Elasticsearch document.
  • For each key, we extract from the text all the words with the PHP str_word_count() function.
  • We eliminate all empty, duplicates and too short words.
  • For each remaining word, we create an Elasticsearch document with the correct associated locale.
  • Eventually, we run the indexation process with the addDocuments function.

As I am writing there is about 3500 indexed words. Here is what outputs the populate MakeFile entry:

/Users/coil/Sites/strangebuzz.com$ make populate
php bin/console fos:elastica:reset
Resetting app
Resetting suggest
php bin/console fos:elastica:populate --index=app
Resetting app
 53/53 [============================] 100%
Populating app/articlesRefreshing app
Refreshing app
php bin/console strangebuzz:populate
Populate the "suggest" elasticsearch index
 -> TODO: 2167 -> DONE: 2167, "fr" keywords indexed.
 -> TODO: 1549 -> DONE: 1549, "en" keywords indexed.

Here is it's content:

## โ€”โ€” elasticsearch ๐Ÿ”Ž โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”
populate: ## Reset and populate the elasticsearch index
$(SYMFONY) fos:elastica:reset
$(SYMFONY) fos:elastica:populate --index=app
$(SYMFONY) strangebuzz:populate # populate the "suggest" index.

You can find my full Symfony MakeFile in this snippet. So, now that the index is populated, let's see how to use it for the autocomplete feature.

Implementing the autocomplete

The goal here will be to have an action that returns via Ajax the suggestions for the autocomplete field as the user is typing. So let's create a new controller that will handle this:

<?php declare(strict_types=1);

// src/Controller/SuggestController.php

namespace App\Controller;

use Elastica\Query;
use Elastica\Suggest;
use Elastica\Suggest\Completion;
use Elastica\Util;
use FOS\ElasticaBundle\Elastica\Index;
use Symfony\Component\HttpFoundation\JsonResponse;
use Symfony\Component\HttpFoundation\Request;
use Symfony\Component\Routing\Annotation\Route;

/**
 * @Route("/{_locale}", requirements={"_locale"="%locales_requirements%"})
 */
class SuggestController extends AbstractController
{
    private const SUGGEST_NAME = 'completion';
    private const SUGGEST_FIELD = 'suggest';

    /**
     * @Route({"en": "/suggest", "fr": "/suggerer"}, name="suggest")
     */
    public function suggest(Request $request, Index $suggestIndex, string $_locale): JsonResponse
    {
        $q = (string) $request->query->get('q', '');
        $suggest = $this->getSuggest($q, $_locale);
        $query = (new Query())->setSuggest($suggest);
        $suggests = $suggestIndex->search($query)->getSuggests();
        $options = $suggests[self::SUGGEST_NAME][0]['options'] ?? [];

        return $this->json(array_column($options, 'text'));
    }

    /**
     * Check-out the links at the end of the post to get more insights about this.
     */
    protected function getSuggest(string $q, string $locale): Suggest
    {
        $completionSuggest = (new Completion(self::SUGGEST_NAME, self::SUGGEST_FIELD)) // "suggest" here is the mapping field name
            ->setPrefix(Util::escapeTerm($q)) // items starting with...
            ->setParam('context', ['locale' => $locale]) // only suggestions for current user locale
            ->setSize(10); // return 10 items (default size is 5)

        return new Suggest($completionSuggest);
    }
}

Some explanations: ๐Ÿ’ก

  • As the search action, we get the user input by getting the "q" GET parameter.
  • Then, we create an elastica Suggest object with the name of the mapping property to use.
  • Just below, we add a context that will allow us to filter returned items: in this case, we filter on the current page locale (en or fr).
  • Then, we extract the returned options of the Elasticsearch response.
  • Eventually, we return a JsonResponse with a simple array with the options to display to the user.

Displaying the suggestions

Now that the suggest action is done, we can use it with an autocomplete widget. Try it on the form just below. It's the same form we used in the previous articles (only some JavaScript was added to get the suggestions). As you can see, on this page, only English words are returned but if you try on the French version, you can verify that only French ones are. It's the same action but the filter was done thanks to the context we added to the Elasticsearch suggester.

The JavaScript is basic, it's not the subject of this post. No vue.js this time! It's plain old jQuery. I like this jQueryUI component, it's easy to use and is easily customizable. Just a comment about the route we use here: as you can see we don't have to specify the locale: {{ path('suggest') }} (click on the link below to see the JavaScript code), because the routing component will automatically add it (here it's en). For now, I haven't added the autocomplete in the header search bar, but it's on the search results page. All I had to do is to include the JavaScript I have developed for this blog post:

{% block javascripts %}
    {{ parent() }}
    {% include 'blog/posts/_51_js.html.twig' %}
{% endblock %}
Click here to see the JavaScript code.
{% trans_default_domain 'post_51' %}
<script>
    /*global $, console, $http */
    /*jslint browser:true */
    "use strict";
    $(document).ready(function() {
        $("#q").autocomplete({
            delay: 0,
            minLength: 2,
            source: function(request, response) {
                $.ajax({
                    url: '{{ path('suggest') }}',
                    data: {
                        q: request.term,
                    },
                    success: function(data) {
                        response(data);
                    },
                    error: function(data) {
                        alert('{{ 'form_error'|trans }}');
                    }
                });
            }
        });
    });
</script>

Conclusion

That was the last part of this Elasticsearch tutorial. It was interesting writing it (but very long!) at the same time I was implementing the features on this website. There is still a lot to do, but I am happy with what I did so far ๐Ÿ˜Š. I am using the search every day to quickly find articles or snippets. There is great news: The FOSElastica bundle is updated to support elastica 7.0. So, as soon as it's out, I'll modify this tutorial to use the last Elasticsearch version: 7.4. See you in Amsterdam at the SymfonyCon! ๐Ÿ˜€

That's it! I hope you like it. Check out the links below to have additional information related to the post. As always, feedback, likes and retweets are welcome. (see the box below) See you! COil. ๐Ÿ˜Š

  Back to the second part

  Read the doc  More on the web  More on Stackoverflow

They gave feedback and helped me to fix errors and typos in this article, many thanks to greg0ire, jmsche, Nico.F (Slack Symfony). ๐Ÿ˜Š


» Call to action

Did you like this post? You can help me back in several ways: (use the Tweet on the right to comment or to contact me )

  • Report any error/typo.
  • Report something that could be improved.
  • Like and retweet!
  • Follow me on Twitter
  • Subscribe to the RSS feed.

Thank you for reading! And see you soon on Strangebuzz! ๐Ÿ˜‰

COil