Detecting four bytes encoded characters/emojis with PHP
Published on 2024-05-17 • Modified on 2024-05-17
This snippet shows how to detect four bytes encoded characters/emojis with PHP. This can be useful to avoid errors while inserting user data in a MySQL or MariaDB database in a column that does not support four-byte emojis, such as columns with the utf8_general_ci
collation.
<?php
declare(strict_types=1);
namespace App\Controller\Snippet;
/**
* I am using a PHP trait to isolate each snippet in a file.
* This code should be called from a Symfony controller extending AbstractController (as of Symfony 4.2)
* or Symfony\Bundle\FrameworkBundle\Controller\Controller (Symfony <= 4.1).
* Services are injected in the main controller constructor.
*/
trait Snippet302Trait
{
public function snippet302(): void
{
$fourBytesPattern = '/[\xF0-\xF7][\x80-\xBF]{3}/';
$chars = [
'c', // std character (1 byte)
'é', // french accent (2 bytes)
// 3 bytes emoji
'✅',
'⚙',
'☃',
'❄',
// chinese standard character (3 bytes)
'文',
// chinese special characters (4 bytes)
'𠜎',
'𡃁',
// other special characters (4 bytes)
'𐍈',
'𐎠',
// 4 bytes emojis
'🖋',
'🎶',
'🌈',
'🐰',
'🔗',
// 8 bytes emojis (flags)
'🇫🇷',
'🇬🇧',
// other complex emojis (18 bytes)
'👨👩👧',
];
foreach ($chars as $char) {
$utf8Bytes = mb_convert_encoding($char, 'UTF-8');
$length = \strlen($utf8Bytes); // Get the length of the UTF-8 bytes
echo $char.': 4 bytes or more? '.(preg_match($fourBytesPattern, $char) === 1 ? 'true' : 'false').' (bytes length: '.$length.')';
echo PHP_EOL;
}
// That's it! 😁
}
}
Run this snippet More on Stackoverflow Random snippet