Implementing a search engine with elasticsearch and Symfony (part 1/3)
Published on 2019-09-22 • Modified on 2020-01-12
In this post, we will see how to create a full-text search engine with elasticsearch in a Symfony application. We will use Docker compose to set up an elasticsearch stack. We will try to keep the configuration as minimal as possible keeping the sensible default components values. In the end, on this website, we will able to search for articles and snippets matching one or several keywords. Let's go! 😎
» Published in "A week of Symfony 665" (23-29 September 2019).
Warning
This article is outdated. I have deleted all Elasticsearch stuff on this blog. Check out my new blog post about on Meilisearch (to come).
Tutorial
This post is the first part of the tutorial "Implementing a search engine with Elasticsearch and Symfony":
- Part 1: Setting the Elasticsearch stack, installing FOSElastica, indexing data, searching and displaying results.
- Part 2: Cleanup and refactoring, using an Elasticsearch alias, creating a custom provider, tuning the search relevance and adding the pagination.
- Part 3: Adding Kibana to the Elasticsearch stack, implementing an auto-complete with Elasticsearch.
Prerequisite
I will assume that you have a basic knowledge of Symfony. That you know how to set up an application and to handle a database schema with an ORM (we will use here Doctrine). As a docker-compose file will be used, I will assume you are also familiar with it, if not, please read the docker-compose getting started guide.
Configuration
- PHP 8.3
- Symfony 6.4
Setting the development environment with docker-compose
First, we need to prepare our development environment. As I am currently learning Docker, let's see how to set up most of the components with docker-compose so we can work (have fun? 😄) in good conditions. The stack will include:
- elasticsearch 6.8
- elastic head 5
- MySQL 5.7
- Adminer (last stable)
Elasticsearch head will allow us to check our local Elasticsearch cluster and adminer is a basic database administration interface that will allow us to easily check our tables and data (like PhpMyAdmin).
Let's have a look at the docker-compose.yaml
file:
# ./docker-compose.yaml
# DEV docker compose file ——————————————————————————————————————————————————————
# Check out: https://docs.docker.com/compose/gettingstarted/
version: '3.7'
# docker-compose -f docker-compose.yaml up -d
services:
# Database ———————————————————————————————————————————————————————————————————
# MySQL server database (official image)
# https://docs.docker.com/samples/library/mysql/
db:
image: mysql:5.7
container_name: sb-db
command: --default-authentication-plugin=mysql_native_password
ports:
- "3309:3306"
environment:
MYSQL_ROOT_PASSWORD: root
# adminer database interface (official image)
# https://hub.docker.com/_/adminer
adminer:
container_name: sb-adminer
depends_on:
- db
image: adminer
ports:
- "8089:8080"
# elasticsearch ——————————————————————————————————————————————————————————————
# elasticsearch server (official image)
# https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html
elasticsearch:
container_name: sb-elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:6.8.3 # 6.8.4 out
ports:
- "9209:9200"
environment:
- "discovery.type=single-node"
- "bootstrap.memory_lock=true"
- "ES_JAVA_OPTS=-Xms1G -Xmx1G"
- "xpack.security.enabled=false"
- "http.cors.enabled=true"
- "http.cors.allow-origin=*"
# elasticsearch head manager (fork of mobz/elasticsearch-head for elasticsearch 6)
# /!\ it isn't an official image /!\
# https://hub.docker.com/r/tobias74/elasticsearch-head
elasticsearch-head:
container_name: sb-elasticsearch-head
depends_on:
- elasticsearch
image: tobias74/elasticsearch-head:6
ports:
- "9109:9100"
We have two sections. One containing the Elasticsearch components and the second with the database ones. To launch the Docker hub, run the following command:
docker-compose -f docker-compose.yaml up -d
Now, you can access the Docker hub's components exposed through HTTP:
- Adminer » http://localhost:8089
- elastic head » http://localhost:9109/
- elastic » http://localhost:9209/
Several notes: to access a database with adminer, you must specify a server, for our hub, it's the container_name key we have set up in the docker-compose.yml
file, in this case, it's sb-db, the user is "root" and the password too. Don't use this in production! ⛔
For Elasticsearch head, in the top bar, you must specify the URL of the Elasticsearch cluster, it's http://localhost:9209
. When validating, you should see an empty node.
On this project, I am using the Symfony binary, I start the local HTTP server with the following command: (the Symfony binary must be installed)
symfony serve --daemon
Then, the project can be browsed locally at https://127.0.0.1:8000. On my MacBookPro and MacMini, I have installed PHP with Homebrew and on my Ubuntu Workstation, PHP 7.2 was the default version installed (the three set ups work flawlessly). We won't see here how to set up a full Web server/PHP environment with Docker. To do so please check the related posts of Pierstoval. 😉 Now that we have our dev stack ready to use, let see how to build our Elasticsearch index.
Installing and configuring the FOSElastica Bundle
First, we need to install the FOSElastica Bundle (Of course you could directly use elastica or another wrapper). Note, that we won't use the last Elasticsearch version (7.3) because it doesn't seem to be supported by the bundle yet. Also, note that changing the Elasticsearch version we use is as easy as replacing 6.8.18 by 7.15 in the docker-compose file! That's the power of Docker. 💪
composer require friendsofsymfony/elastica-bundle
Open the config/packages/fos_elastica.yaml
file and change the port to 9209:
# Read the documentation: https://github.com/FriendsOfSymfony/FOSElasticaBundle/blob/master/Resources/doc/setup.md
# config/packages/fos_elastica.yaml
fos_elastica:
clients:
default: { host: localhost, port: 9209 }
indexes:
app: null
Now, we can launch the create index command to see is the set up is OK:
php bin/console fos:elastica:create
If you go to the Elasticsearch head component, you should see an empty app index created:
Now, let's see how to add data to the index. We will not see here all the process to create a model and the corresponding entities and tables. On this blog, I have an article
table which contains all posts and snippets. The schema was created with the API Platform schema generator. So the goal here will be to add all the articles in the Elasticsearch index.
Indexing data in the Elasticsearch index
In the rest of this post, I will take my database schema as the reference. So replace App\Entity\Article
by your entity class name. Same thing for the entity's properties. First, let's add some fields in the Elasticsearch mapping:
# config/packages/fos_elastica.yaml
fos_elastica:
clients:
default: { host: localhost, port: 9209 }
indexes:
app:
types:
articles:
properties:
type: ~
name: ~
slug: ~
keyword: ~
persistence:
driver: orm
model: App\Entity\Article
We have added several text fields and the type of the article. (article or snippet) Let's keep the default fields settings for now and let's launch the populate command that will be responsible for refreshing the Elasticsearch index:
php bin/console fos:elastica:populate
Resetting app
42/42 [============================] 100%
Populating app/articles
Refreshing app
If you can see this, it means that the populate command was successful. We can check that the Elasticsearch documents were correctly indexed. Go to the web interface of "Elasticsearch head", click the "browser" tab and click on a document to see it's raw JSON. We can see the entity id (14) and the fields we have declared to be indexed. (type, name, slug, keyword).
Now that we have an index with some data, let's try to search.
Searching and displaying the results
For clarity, we will create a basic controller that will handle the search action. First, we need to bind a variable to the finder service of the "articles"
type. This service is automatically created by the FOSElastica bundle depending on the types declared in the configuration. Add this in your config/services.yaml
file.
# config/services.yaml
services:
_defaults:
bind:
$articlesFinder: '@fos_elastica.finder.app.articles'
Then, thanks to autoloading, we can inject this service in our new controller:
<?php
declare(strict_types=1);
// src/Controller/SearchPart1Controller.php
namespace App\Controller;
use FOS\ElasticaBundle\Finder\TransformedFinder;
use Symfony\Component\HttpFoundation\Request;
use Symfony\Component\HttpFoundation\Response;
use Symfony\Component\HttpFoundation\Session\SessionInterface;
use Symfony\Component\Routing\Annotation\Route;
use function Symfony\Component\String\u;
/**
* You know, for search.
*/
#[Route(path: '/{_locale}', name: 'search_part1_', requirements: ['_locale' => '%locales_requirements%'])]
final class SearchPart1Controller extends AbstractController
{
public function __construct(
private readonly TransformedFinder $articlesFinder,
) {
}
#[Route(path: ['en' => '/part1/search', 'fr' => '/partie1/recherche'], name: 'main')]
public function search(Request $request, SessionInterface $session): Response
{
$q = u($request->query->get('q', ''))->trim();
$results = !$q->isEmpty() ? $this->articlesFinder->findHybrid($q->toString()) : [];
$session->set('q', $q);
return $this->render('search/search_part1.html.twig', compact('results', 'q'));
}
}
The action of this controller will be very concise. We get the keyword to search from the HTTP query (q is for query) then we call the findHybrid
function to find the articles matching with it, then we save the keyword in session. For each result, the findHybrid
function will return two objects: The first one, the "hit", will contain the meta information of the raw Elasticsearch response for this result. It's in this object that will get the score of the document. When providing a keyword, all results are sorted by score from the most to the least relevant. The second object is the Doctrine entity matching the search so we don't have to handle to raw Elasticsearch response manually. Now, we can display the results:
{% extends 'layout.html.twig' %}
{# templates/search/search_part1.html.twig // This is the template of the 1st part of the tutorial #}
{% trans_default_domain 'search' %}
{% set esArticle = article_es() %} {# Don't do this! This is to avoid polluting the SearchController #}
{% block content %}
<div class="col-md-12">
<div class="card">
<div class="card-header card-header-primary">
<p class="h3">{{ 'your_search_for'|trans}} <b>"{{ q }}"</b>, <b>{{ results|length }}</b> {{ 'results'|trans}}.</p>
</div>
<div class="card-body">
<p class="h3">» {{ 'get_back'|trans}} "<a href="{{ path('blog_show', {'slug': esArticle[1].slug|a_slug(locale), 'q': q}) }}#search_form">{{ ('title_'~esArticle[1].id)|trans({}, 'blog') }}</a>"</p>
</div>
</div>
</div>
{% for result in results %}
{% set hit = result.result.hit %}
{% set article = result.transformed %}
{% if article.isArticle %}
{% set tag_route = 'blog_list_tag' %}
{% set pathEn = path('blog_show', {'_locale': 'en','slug': article.slug|a_slug('en')}) %}
{% set pathFr = path('blog_show', {'_locale': 'fr','slug': article.slug|a_slug('fr')}) %}
{% set title = ('title_'~article.id)|trans({}, 'blog') %}
{% else %}
{% set tag_route = 'snippet_list_tag' %}
{% set pathEn = path('snippet_show', {'_locale': 'en', 'slug': article.slug|s_slug('en') }) %}
{% set pathFr = path('snippet_show', {'_locale': 'fr', 'slug': article.slug|s_slug('fr') }) %}
{% set title = ('title_'~article.id)|trans({}, 'snippet') %}
{% endif %}
<div class="card">
<div class="card-header">
<h2 class="h3">
[{{ ('type_'~article.type.value)|trans({}, 'blog') }}] {{ title }} » {{ 'score'|trans }} <b>{{ hit._score }}</b>
</h2>
</div>
<div class="card-body">
<div class="blog-tags">
{% for tag in article.keywords %}<a class="badge badge-{{ random_class() }}" href="{{ path(tag_route, {'tag': tag}) }}"><i class="far fa-tag"></i> {{ tag|trans({}, 'breadcrumbs') }}</a> {% endfor %}
</div>
<br/>
<p class="card-text text-center">
<a href="{{ pathEn }}" class="btn btn-primary card-link">🇬🇧 {{ 'read_in_english'|trans({}, 'blog') }}</a>
<a href="{{ pathFr }}" class="btn btn-primary card-link">🇫🇷 {{ 'read_in_french'|trans({}, 'blog') }}</a>
</p>
</div>
</div>
{% endfor %}
<div class="col-md-12">
{% if results is empty %}
<p class="h3">{{ 'no_results'|trans }}</p>
{% endif %}
</div>
<div class="col-md-12">
{% include 'search/_form.html.twig' with {route: 'search_part1_main'} %}
</div>
{% endblock %}
Let's have a look at the template. Don't be afraid! There are specific code and helpers developed for this blog, it's not the subject of the post (it's the real template used by the search). The two important lines are at the beginning of the for
loop:
{% set hit = result.result.hit %}
{% set article = result.transformed %}
As mentioned before, first, we get the hit object, then we can access the score with hit._score
(It is displayed at the right of the article or snippet title). Then, we get the Article
Doctrine entity with result.transformed
. Now, we can access the entity getters like we are used too with Twig. For example, article.isArticle
will return true
if the article is a blog post and false
if it's a snippet (there are only two article types). That's it! You can test the search with the following form:
When launching a search, a new entry is automatically added in the debug panel so one can easily debug the raw Elasticsearch query that was executed.
Note that there are up to ten results returned for now (no pagination). Our search engine is working but it's very basic for now. There is a very annoying problem because the texts of the posts and snippets are not completely indexed yet because they are stored in translations files. So, the next goal will be to include them when indexing so the search relevance is much better. We can also implement several interesting things, the pagination, the boosts, the alias... As this blog post is already quite big, let's keep all these things for the next one! 😌
That's it! I hope you like it. Check out the links below to have additional information related to the post. As always, feedback, likes and retweets are welcome. (see the box below) See you! COil. 😊
They gave feedback and helped me to fix errors and typos in this article; many thanks to dkarlovi, jmsche. 😊
Call to action
Did you like this post? You can help me back in several ways: (use the Tweet on the right to comment or to contact me )
- Report any error/typo.
- Report something that could be improved.
- Like and retweet!
- Follow me on Twitter Follow me on Twitter
- Subscribe to the RSS feed.
- Click on the More on Stackoverflow buttons to make me win "Announcer" badges 🏅.
Thank you for reading! And see you soon on Strangebuzz! 😉
[🇬🇧] This is my 7th #Symfony blog post of the year. This time we will see about to implement a basic search engine with #elasticsearch.https://t.co/Nh0aCrimdI
— [SB] COil (@C0il) September 26, 2019
Comments, likes and retweets are welcome! 😉 Annual goal: 7/12 (58%) #symfony #php #docker #strangebuzz #blog #blogging