Implementing a search engine with elasticsearch and Symfony (part 3/3)

Published on 2019-11-16 • Modified on 2020-04-18

In this third and last part, we will continue to improve our search engine. First, we will enhance our elasticsearch stack with Kibana. Then, we will implement an autocomplete using an elasticsearch suggester. Let's go! ๐Ÿ˜Ž

» Published in "A week of Symfony 672" (10-17 November 2019).

Tutorial

This post is the third and last part of the tutorial "Implementing a search engine with Elasticsearch and Symfony":

Prerequisite

The prerequisites are the same as the first two parts. It is, of course, recommended to read them (links above) before continuing with this one.

Configuration

  • PHP 7.4
  • Symfony 5.1.3
  • elasticsearch 6.8

Installing Kibana

First, we will try to improve our Elasticsearch stack. Until now, we used the "head" plugin to manage our cluster. But this development tool is quite old and not maintained anymore. So, let's add Kibana to our Docker setup. Kibana is an open-source data visualization plugin for Elasticsearch. Of course, it will allow us to do all the essential maintenance tasks we used to with head: delete, close an index, create, delete an alias, check a document, verify the index mappings and much more! The list of what you can do with it is impressive (check out at the left menu of the next screenshot). Let's add the following entry to the docker-compose.yaml file:

kibana:
    container_name: sb-kibana
    image: docker.elastic.co/kibana/kibana:6.8.6
    ports:
      - "5601:5601"
    environment:
      - "ELASTICSEARCH_URL=http://sb-elasticsearch"
    depends_on:
      - elasticsearch
Click here to view the new full docker-compose.yaml file.
# ./docker-compose.yaml

# DEV docker compose file โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”
# Check out: https://docs.docker.com/compose/gettingstarted/
version: '3.7'

services:
  # Application container (PHP 7.4+Apache) โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”
  app:
    container_name: sb-app
    # The build context is mandatory
    build:
      context: .
    # Default Apache port to 8080
    ports:
      - "8080:80"

  # Database โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”
  # MySQL server database (official image)
  # https://docs.docker.com/samples/library/mysql/
  db:
    image: mysql:5.7
    container_name: sb-db
    command: --default-authentication-plugin=mysql_native_password
    ports:
      - "3309:3306"
    environment:
      MYSQL_ROOT_PASSWORD: root
    healthcheck:
      test: ["CMD", "mysqladmin", "ping"]
      interval: 1m30s
      timeout: 10s
      retries: 3
      start_period: 40s

  # adminer database interface (official image)
  # https://hub.docker.com/_/adminer
  adminer:
    image: adminer:4.7
    container_name: sb-adminer
    depends_on:
      - db
    ports:
      - "8989:8080"

  # โ€”โ€” Elasticsearch โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”

  # Elasticsearch server (official image)
  # https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html
  # https://hub.docker.com/_/elasticsearch/
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.8.10
    container_name: sb-elasticsearch
    ports:
      - "9209:9200"
      - "9309:9300" # Important if you have multiple es instances running
    environment:
        - "http.port=9200"
        - "discovery.type=single-node"
        - "bootstrap.memory_lock=true"
        - "ES_JAVA_OPTS=-Xms1G -Xmx1G"
        - "xpack.security.enabled=false"
        - "http.cors.enabled=true"
        - "http.cors.allow-origin=*"

  # elasticsearch head manager (fork of mobz/elasticsearch-head for elasticsearch 6)
  # /!\ it isn't an official image /!\
  # https://hub.docker.com/r/tobias74/elasticsearch-head
  elasticsearch-head:
    image: tobias74/elasticsearch-head:6
    container_name: sb-elasticsearch-head
    depends_on:
      - elasticsearch
    ports:
      - "9109:9100"

  # kibana (official image)
  # https://hub.docker.com/_/kibana
  kibana:
    image: docker.elastic.co/kibana/kibana:6.8.10
    container_name: sb-kibana
    ports:
      - "5609:5601"
    environment:
      - "ELASTICSEARCH_URL=http://sb-elasticsearch"
    depends_on:
      - elasticsearch

  # Cache โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”
  # Redis (official image)
  # https://hub.docker.com/_/redis
  redis:
    image: redis:5.0.6-alpine
    container_name: sb-redis
    ports:
      - '6389:6379'

As you can see, we pass the URL of the Elasticsearch server, which hostname is the name of the docker container (sb-elasticsearch). We keep the standard 5601 port. We also used the same image version (6.8.6) that we used for the Elasticsearch server, so we are sure there are no compatibility problems. If you restart the docker hub, you can access the management page:

Kibana in action!

That's it for Kibana. I will stop here; it would require much more than a full article to introduce all the features. Check out the official website for more information. Kibana is very powerful; it can also be used to view your Symfony logs! Check out this excellent post on the JoliCode blog about this subject.

Adding an autocomplete in the search bar

As you can see, I have put a search bar in the header of this website. It works, but what about trying to autocomplete the user input and suggest terms they can find on this blog? Let's see how we can do this with Elasticsearch; we are going to build a new index that will be dedicated to this task.

Setting the mapping

Until now, we used the default "text" type for all mapping properties. In this case, we will use a particular type: completion. We will add a new "suggest" index configuration just after the "app" we were using in the previous posts:

# config/packages/fos_elastica.yaml
fos_elastica:
    clients:
        default: { host: '%es_host%', port: '%es_port%' }
    indexes:
        app:
          ###
        suggest:
            use_alias: true
            settings:
                index:
                    analysis:
                        analyzer:
                            suggest_analyzer:
                                type: custom
                                tokenizer: standard
                                filter: [lowercase, asciifolding]
            types:
                keyword:
                    properties:
                        locale:
                            type: keyword
                        suggest:
                            type: completion
                            analyzer: suggest_analyzer
                            contexts:
                                - name: locale
                                  type: category
                                  path: locale
Click here to see the full YAML mapping.
# config/packages/fos_elastica.yaml
fos_elastica:
    clients:
        default: { host: '%es_host%', port: '%es_port%' }
    indexes:
        app:
            use_alias: true
            types:
                articles:
                    properties:
                        type: ~
                        keywordFr: { boost: 4 }
                        keywordEn: { boost: 4 }
                        # i18n
                        titleEn: { boost: 3 }
                        titleFr: { boost: 3 }
                        headlineEn: { boost: 2 }
                        headlineFr: { boost: 2 }
                        ContentEn: ~ # The default boost value is 1
                        ContentFr: ~
                    persistence:
                        driver: orm
                        model: App\Entity\Article
                        provider:
                            service: App\Elasticsearch\Provider\ArticleProvider
                        listener:
                            insert: false
                            update: false
                            delete: false
        # L1->L29 snippet in templates/blog/posts/_48.html.twig
        suggest:
            use_alias: true
            settings:
                index:
                    analysis:
                        analyzer:
                            suggest_analyzer:
                                type: custom
                                tokenizer: standard
                                filter: [lowercase, asciifolding]
            types:
                keyword:
                    properties:
                        locale:
                            type: keyword
                        suggest:
                            type: completion
                            analyzer: suggest_analyzer
                            contexts:
                                - name: locale
                                  type: category
                                  path: locale

Some explanations about the new index and its mapping: before declaring the new type, I add a custom analyser in the setting section. This asciifolding filter will allow us to ignore french accents to make the match even accents are not used in the input. For example, if we type: "element", the "รฉlรฉment" word should be suggested.
Then, in the "type" section, we also use an alias as the main app index. In the mapping, we have two properties: first, the suggest of the completion types. We need this particular type to be able to use the completion suggester as we will see in the next chapter. And we have a second property locale that will allow us to filter the suggestions depending on the current locale (en or fr). We can see that we have added a context to the suggest field and it's associated with the locale property (path: locale).
If we launch the populate command, the new index is created. As this point, we now have two indexes in the Elasticsearch cluster:

Kibana in action!

Populating the suggest index

Now, we must fill the new suggest index. As there is no model mapped to this index, we won't create a provider but a command. The idea is to extract all the words we have already indexed in the app index. Here is the new Symfony command: (some insights after the snippet ๐Ÿค”)

<?php declare(strict_types=1);

// src/Command/PopulateSuggestCommand.php (used by templates/blog/posts/_51.html.twig)

namespace App\Command;

use Doctrine\Inflector\Inflector;
use Doctrine\Inflector\NoopWordInflector;
use Elastica\Document;
use FOS\ElasticaBundle\Elastica\Index;
use FOS\ElasticaBundle\Finder\TransformedFinder;
use FOS\ElasticaBundle\HybridResult;
use FOS\ElasticaBundle\Paginator\FantaPaginatorAdapter;
use Pagerfanta\Pagerfanta;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use function Symfony\Component\String\u;

/**
 * Populate the suggest elasticsearch index.
 */
final class PopulateSuggestCommand extends Command
{
    public const NAMESPACE = 'strangebuzz';
    public const CMD = 'populate';
    public const DESC = 'Populate the "suggest" Elasticsearch index';

    private TransformedFinder $articlesFinder;
    private Index $suggestIndex;
    private Inflector $inflector;

    public function __construct(TransformedFinder $articlesFinder, Index $suggestIndex)
    {
        parent::__construct();
        $this->articlesFinder = $articlesFinder;
        $this->suggestIndex = $suggestIndex;
        $this->inflector = new Inflector(new NoopWordInflector(), new NoopWordInflector());
    }

    protected function configure(): void
    {
        [$namespace, $cmd, $desc] = [self::NAMESPACE, self::CMD, self::DESC];
        $this->setName($namespace.':'.$cmd)
            ->setDescription(self::DESC)
            ->setHelp(
                <<<EOT
{$desc}
<info>%command.full_name%</info>
EOT
            );
    }

    protected function execute(InputInterface $input, OutputInterface $output): int
    {
        $output->writeln(self::DESC);
        $pagination = $this->findHybridPaginated($this->articlesFinder, '');
        $nbPages = $pagination->getNbPages();
        $keywords = [];

        foreach (range(1, $nbPages) as $page) {
            $pagination->setCurrentPage($page);
            foreach ($pagination->getCurrentPageResults() as $result) {
                if ($result instanceof HybridResult) {
                    foreach ($result->getResult()->getSource() as $property => $text) {
                        if ($property === 'type') {
                            continue;
                        }
                        $locale = explode('_', $this->inflector->tableize($property))[1] ?? 'en';
                        $text = strip_tags($text ?? '');
                        $words = str_word_count($text, 2, 'รงรฉรขรชรฎรฏรดรปร รจรนล“ร‡ร‰ร‚รŠรŽรร”ร›ร€รˆร™ล’'); // FGS dot not remove french accents! ๐Ÿ™ƒ
                        $textArray = array_filter($words);
                        $keywords[$locale] = array_merge($keywords[$locale] ?? [], $textArray);
                    }
                }
            }
        }

        // Index by locale
        foreach ($keywords as $locale => $localeKeywords) {
            // Remove small words and remaining craps (emojis) ๐Ÿ˜–
            $localeKeywords = array_unique(array_map('mb_strtolower', $localeKeywords));
            $localeKeywords = array_filter($localeKeywords, static function ($v) {
                return u((string) $v)->length() > 2;
            });
            $documents = [];
            foreach ($localeKeywords as $idx => $keyword) {
                $documents[] = (new Document())
                    ->setType('keyword')
                    ->set('locale', $locale)
                    ->set('suggest', $keyword);
            }
            $responseSet = $this->suggestIndex->addDocuments($documents);

            $output->writeln(sprintf(' -> TODO: %d -> DONE: <info>%d</info>, "%s" keywords indexed.', count($documents), $responseSet->count(), $locale));
        }

        return 0;
    }

    /**
     * @return Pagerfanta<mixed>
     */
    private function findHybridPaginated(TransformedFinder $articlesFinder, string $query): Pagerfanta
    {
        $paginatorAdapter = $articlesFinder->createHybridPaginatorAdapter($query);

        return new Pagerfanta(new FantaPaginatorAdapter($paginatorAdapter));
    }
}

Some explanations: ๐Ÿ’ก

  • We perform a wildcard search to get the total number of pages.
  • We iterate over each page to get the articles.
  • For each article, we extract all the keys from the Elasticsearch document.
  • For each key, we extract from the text all the words with the PHP str_word_count() function.
  • We eliminate all empty, duplicates and too short words.
  • For each remaining word, we create an Elasticsearch document with the correct associated locale.
  • Eventually, we run the indexation process with the addDocuments function.

As I am writing there is about 3500 indexed words. Here is what outputs the populate MakeFile entry:

/Users/coil/Sites/strangebuzz.com$ make populate
php bin/console fos:elastica:reset
Resetting app
Resetting suggest
php bin/console fos:elastica:populate --index=app
Resetting app
 53/53 [============================] 100%
Populating app/articlesRefreshing app
Refreshing app
php bin/console strangebuzz:populate
Populate the "suggest" elasticsearch index
 -> TODO: 2167 -> DONE: 2167, "fr" keywords indexed.
 -> TODO: 1549 -> DONE: 1549, "en" keywords indexed.

Here is it's content:

## โ€”โ€” Elasticsearch ๐Ÿ”Ž โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”
populate: ## Reset and populate the Elasticsearch index
	$(SYMFONY) fos:elastica:reset
	$(SYMFONY) fos:elastica:populate --index=app
	$(SYMFONY) strangebuzz:populate

You can find my full Symfony MakeFile in this snippet. So, now that the index is populated, let's see how to use it for the autocomplete feature.

Implementing the autocomplete

The goal here will be to have an action that returns via Ajax the suggestions for the autocomplete field as the user is typing. So let's create a new controller that will handle this:

<?php declare(strict_types=1);

// src/Controller/SuggestController.php

namespace App\Controller;

use App\Elasticsearch\ElastiCoil;
use Symfony\Component\HttpFoundation\JsonResponse;
use Symfony\Component\HttpFoundation\Request;
use Symfony\Component\Routing\Annotation\Route;

/**
 * @Route("/{_locale}", requirements={"_locale"="%locales_requirements%"})
 */
final class SuggestController extends AbstractController
{
    /**
     * @Route({"en": "/suggest", "fr": "/suggerer"}, name="suggest")
     */
    public function suggest(Request $request, string $_locale, ElastiCoil $elastiCoil): JsonResponse
    {
        $q = (string) $request->query->get('q', '');

        return $this->json($elastiCoil->getSuggestions($q, $_locale));
    }
}

And the related custom Elasticsearch service:

<?php declare(strict_types=1);

// src/Elasticsearch/ElastiCoil.php

namespace App\Elasticsearch;

use Elastica\Query;
use Elastica\Suggest;
use Elastica\Suggest\Completion;
use Elastica\Util;
use FOS\ElasticaBundle\Elastica\Index;

final class ElastiCoil
{
    public const SUGGEST_NAME = 'completion';
    public const SUGGEST_FIELD = 'suggest';

    private Index $suggestIndex;

    public function __construct(Index $suggestIndex)
    {
        $this->suggestIndex = $suggestIndex;
    }

    /**
     * Get the a suggest object for a keyword and locale.
     */
    public function getSuggest(string $q, string $locale): Suggest
    {
        $completionSuggest = (new Completion(self::SUGGEST_NAME, self::SUGGEST_FIELD))
            ->setPrefix(Util::escapeTerm($q))
            ->setParam('context', ['locale' => $locale])
            ->setSize(5);

        return new Suggest($completionSuggest);
    }

    /**
     * Return suggestions for a keyword and locale as a simple array.
     *
     * @return array<string>
     */
    public function getSuggestions(string $q, string $locale): array
    {
        $suggest = $this->getSuggest($q, $locale);
        $query = (new Query())->setSuggest($suggest);
        $suggests = $this->suggestIndex->search($query)->getSuggests();

        return $suggests[self::SUGGEST_NAME][0]['options'] ?? [];
    }
}

Some explanations: ๐Ÿ’ก

  • As the search action, we get the user input by getting the "q" GET parameter.
  • Then, we create an elastica Suggest object with the name of the mapping property to use.
  • Just below, we add a context that will allow us to filter returned items: in this case, we filter on the current page locale (en or fr).
  • Then, we extract the returned options of the Elasticsearch response.
  • Eventually, we return a JsonResponse with a simple array with the options to display to the user.

Displaying the suggestions

Now that the suggest action is done, we can use it with an autocomplete widget. Try it on the form just below. It's the same form we used in the previous articles (only some JavaScript was added to get the suggestions). As you can see, on this page, only English words are returned, but if you try on the French version, you can verify that only French ones are. It's the same action, but the filter was done thanks to the context we added to the Elasticsearch suggester.

To render the widget, I used a vue.js component. Just a comment about the route we use here: we don't have to specify the locale: {{ path('suggest') }} because the routing component will automatically add it (here it's en). This autocomplete is also on the search results page. Here the code of the component include:

Click here to see the Vue.js component code.
{# https://github.com/BosNaufal/vue-autocomplete #}
{% set placeholder = placeholder is defined ? placeholder : 'enter_one_or_several_keywords'|trans({}, 'search') %}
<autocomplete
    ref="autocomplete"
    aria-describedby="qHelp"
    url="{{ path('suggest') }}" {# current local is injected #}
    anchor="text" {# not used, custom render #}
    label="text"
    :on-should-render-child="autocompleteRenderChild"
    :required="true"
    id="post-q"
    name="q"
    :classes="{ wrapper: 'form-wrapper', input: 'form-control', list: 'data-list', item: 'data-list-item' }"
    placeholder="{{ placeholder }}"
    init-value="{{ app.request.query.get('q') }}"
    :options="[]"
    :min="1"
    :encode-params="true"
>
</autocomplete>

Conclusion

That was the last part of this Elasticsearch tutorial. It was exciting writing it (but very long!) at the same time I was implementing the features on this website. There is still a lot to do, but I am happy with what I did so far ๐Ÿ˜Š. I am using the search every day to find articles or snippets quickly. There is excellent news: The FOSElastica bundle is updated to support elastica 7.0. So, as soon as it's out, I'll modify this tutorial to use the last Elasticsearch version: 7.6! ๐Ÿ˜€

That's it! I hope you like it. Check out the links below to have additional information related to the post. As always, feedback, likes and retweets are welcome. (see the box below) See you! COil. ๐Ÿ˜Š

  Back to the second part

  Read the doc  More on the web  More on Stackoverflow

They gave feedback and helped me to fix errors and typos in this article, many thanks to greg0ire, jmsche, Nico.F (Slack Symfony). ๐Ÿ˜Š


Call to action

Did you like this post? You can help me back in several ways: (use the Tweet on the right to comment or to contact me )

Thank you for reading! And see you soon on Strangebuzz! ๐Ÿ˜‰

COil