Detecting four bytes encoded characters/emojis with PHP

Published on 2024-05-17 • Modified on 2024-05-17

This snippet shows how to detect four bytes encoded characters/emojis with PHP. This can be useful to avoid errors while inserting user data in a MySQL or MariaDB database in a column that does not support four-byte emojis, such as columns with the utf8_general_ci collation.


<?php

declare(strict_types=1);

namespace App\Controller\Snippet;

/**
 * I am using a PHP trait to isolate each snippet in a file.
 * This code should be called from a Symfony controller extending AbstractController (as of Symfony 4.2)
 * or Symfony\Bundle\FrameworkBundle\Controller\Controller (Symfony <= 4.1).
 * Services are injected in the main controller constructor.
 */
trait Snippet302Trait
{
    public function snippet302(): void
    {
        $fourBytesPattern = '/[\xF0-\xF7][\x80-\xBF]{3}/';
        $chars = [
            'c', // std character (1 byte)

            'é', // french accent (2 bytes)

            // 3 bytes emoji
            '✅',
            '⚙',
            '☃',
            '❄',

            // chinese standard character (3 bytes)
            '文',

            // chinese special characters (4 bytes)
            '𠜎',
            '𡃁',

            // other special characters (4 bytes)
            '𐍈',
            '𐎠',

            // 4 bytes emojis
            '🖋',
            '🎶',
            '🌈',
            '🐰',
            '🔗',

            // 8 bytes emojis (flags)
            '🇫🇷',
            '🇬🇧',

            // other complex emojis (18 bytes)
            '👨‍👩‍👧',
        ];

        foreach ($chars as $char) {
            $utf8Bytes = mb_convert_encoding($char, 'UTF-8');
            $length = \strlen($utf8Bytes); // Get the length of the UTF-8 bytes

            echo $char.': 4 bytes or more? '.(preg_match($fourBytesPattern, $char) === 1 ? 'true' : 'false').' (bytes length: '.$length.')';
            echo PHP_EOL;
        }

        // That's it! 😁
    }
}

 Run this snippet  More on Stackoverflow  Random snippet

  Work with me!