Regular Expressions in sql

Regular Expressions in sql

What are regular expressions?

Regular expressions help you search for data that matches complex criteria. We looked at templates in the previous tutorial. If you've worked with wildcards before, you may be wondering why learn regular expressions when you can get similar results using wildcards. Because, compared to wildcards, regular expressions allow us to search for data that matches even more complex criteria.

 

Basic Syntax

The basic syntax for a regular expression is as follows

SELECT statements... WHERE fieldname REGEXP 'pattern';

HERE -

  • "SELECT statements..." is a standard SELECT statement
  • "WHERE fieldname" is the name of the column for which the regular expression is to be executed.
  • "REGEXP 'pattern'" REGEXP is a regular expression operator, and 'pattern' represents a pattern that must match REGEXP. RLIKE is synonymous with REGEXP and achieves the same results as REGEXP. In order not to confuse it with the LIKE operator, it's better to use REGEXP .

 

Let's now look at a practical example

SELECT * FROM `movies` WHERE `title` REGEXP 'code';

The above query looks for all movie titles that have a word code. It doesn't matter if the "code" is at the beginning, middle, or end of the heading. As long as it is in the header, it will be considered.

Let's assume we want to search for movies that start with a, b, c, or d followed by any number of other characters, how are we going to achieve this. We can use regular expression along with metacharacters to achieve the desired results.

SELECT * FROM `movies` WHERE `title` REGEXP '^[abcd]';

 

Running the above script in MySQL Workbench against myflixdb gives us the following results.

movie_id title director year_released category_id
4 Code Name Black Edgar Jimz 2010 NULL
5 Daddy's Little Girls NULL 2007 eight
6 Angels and Demons NULL 2007 6
7 Davinci Code NULL 2007 6

Let's now take a closer look at our regular expression responsible for the above result.

'^ [abcd]' the caret (^) means that pattern matching should be applied at the beginning, and the character list [abcd] means that only movie titles starting with a, b, c, or d will be returned in our result set .

Let's modify our script above to use a NOT list and see what results we get after running our query.

SELECT * FROM `movies` WHERE `title` REGEXP '^[^abcd]';

Running the above script in MySQL Workbench against myflixdb gives us the following results.

movie_id title director year_released category_id
one Pirates of the Caribbean 4 Rob Marshall 2011 one
2 Forgetting Sarah Marshall Nicholas Stoller 2008 2
3 X-Men   2008  
9 honey mooners John Schultz 2005 eight
sixteen 67% Guilty   2012  
17 The Great Dictator Chalie Chapli 1920 7
eighteen sample movie Anonymous   eight
nineteen movie 3 John Brown 1920 eight

Let's now take a closer look at our regular expression responsible for the above results.

The '^[^abcd]' caret (^) means that pattern matching should be applied at the beginning, and the character list [^abcd] means that movie titles beginning with any of the nested characters are excluded from the result set.

 

Regular expression metacharacters

In the example above, we've looked at the simplest form of a regular expression. Let's now look at more complex regex pattern matches. Suppose we want to search for movie titles that start with the pattern "code" using just a regular expression, how would we go about it? The answer is a metacharacter. They allow us to fine-tune the results of a pattern search using regular expressions.

 

char Description example  
* The asterisk (*) metacharacter is used to match zero (0) or more instances of strings preceding it SELECT * FROM FILMS WHERE title REGEXP 'da *'; will give all movies containing "yes" characters. For example, the Da Vinci Code, Daddy's Little Girls.
+ The plus (+) metacharacter is used to match one or more string instances that precede it. SELECT * FROM `movies` WHERE` title` REGEXP 'mon +'; will return all movies containing the characters "mon". For example, angels and demons.
? The question metacharacter (?) is used to match zero (0) or one occurrence of strings that precede it. SELECT * FROM `categories` WHERE` category_name` REGEXP 'com?';  will return all categories containing the string com. For example, comedy, romantic comedy.
, The dot (.) metacharacter is used for any single character in a newline exception. SELECT * FROM MOVIES WHERE `year_released` REGEXP '200.'; will give all movies released over the years starting with the characters "200" followed by any single character. For example, 2005, 2007, 2008, etc.
[abc] Charlist[a] are used to match any of the nested characters. SELECT * FROM `movies` WHERE` title` REGEXP '[vwxyz]'; will return all movies containing any one character in "vwxyz". For example, X-Men, The Da Vinci Code, etc.  
[^A] The character list [^abc] is used to match any character other than the supplied ones. SELECT * FROM `movies` WHERE` title` REGEXP '^ [^ vwxyz]'; will return all movies containing characters other than "vwxyz".
[AZ] [AZ] is used to match any capital letter. SELECT * FROM `members` WHERE` postal_address` REGEXP '[AZ]'; will give all members who have a mailing address containing any character from A to Z. For example, Janet Jones with member number 1.
[AZ] [az] is used to match any lowercase letter SELECT * FROM `members` WHERE` postal_address` REGEXP '[az]'; will give all members that have mail addresses containing any character from a to z. . For example, Janet Jones with membership number 1.
[0-9] [0-9] is used to match any digit between 0 and 9. SELECT * FROM `members` WHERE` contact_number` REGEXP '[0-9]'   will give all members sent contact numbers containing the characters "[0-9]". For example, Robert Phil.
^ The caret symbol (^) is used to start the match at the start. SELECT * FROM `movies` WHERE` title` REGEXP '^ [cd]'; gives all movies with a title starting with any of the characters in "cd". For example, codename Black, Daddy's Little Girls and Da Vinci Code.
| The vertical bar (|) is used to isolate alternatives. SELECT * FROM `movies` WHERE` title` REGEXP '^ [cd] | ^ [u]'; gives all movies with a title beginning with any of the characters 'cd' or 'u'. For example, the codename Black, Daddy's Little Girl, The Da Vinci Code, and Underworld - Awakening.
[[:<:]] [[: <:]] Matches the beginning of words. SELECT * FROM `movies` WHERE` title` REGEXP '[[: <:]] for'; gives all movies with titles starting with characters. For example: Forget Sarah Marshall.
[[:>:]] [[:>:]] Matches the end of words. SELECT * FROM `movies` WHERE` title` REGEXP 'ack [[:>:]]';  gives all movies with titles ending in 'ack'. For example, the codename is Black.
[:class:] [:Class:] is the same as character class, i.e. [:alpha:] to match letters, [:space:] to match white space, [punctuation] match punctuation, and [:upper:] for class uppercase letters. SELECT * FROM `movies` WHERE` title` REGEXP '[:alpha:]'; all movies with titles contain only letters. For example, Forget Sarah Marshall, X-Men, etc. Movies like Pirates of the Caribbean 4 will be omitted from this query.

 

The backslash (\) is used as an escape character. If we want to use it as part of a pattern in a regular expression, we must use a double backslash (\\)

Summary

  • Regular expressions provide powerful and flexible pattern matching that can help us implement powerful search utilities for our database systems.
  • REGEXP is an operator used when performing a regular expression pattern match. RLIKE is a synonym
  • Regular expressions support several metacharacters, which provide more flexibility and control over pattern matching.
  • The backslash is used as an escape character in regular expressions. This is only taken into account in pattern matching if double backslashes are used.
  • Regular expressions are not case sensitive.