5 Must Know Javascript Regex Tips
By trent on 13 Jul in Javascript
You won’t make it too long with javascript before you need to create a savvy regular expression. Regular expression can be tricky, so here are 5 tips to help you find what you’re looking for. Lets take the following block of text:
Lorem ipsum 12345-4321 dolor sit amet, consectetur adipisicing elit, sed do 987-654 eiusmod tempor incididunt ut labore et 123-456-789 [[dolore magna aliqua. Ut enim]] ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure [[dolor in reprehenderit in]] voluptate velit esse cillum dolore eu fugiat nulla pariatur.
You will notice I’ve embedded a few numbers and brackets through out so we can test some different scenarios. Lets get started.
1) Selection Over Multiple Lines
Lets say you need to find a selection that may span over multiple lines, you might think using “.” will get anything, but you’re wrong, it will stop at the end of the line. The answer is to use [\s\S].
Example of the problem:
str.match(/\[\[.+?\]\]/ig); /* null Uh oh, it couldn't cross lines to capture a match */
Example of the solution:
str.match(/\[\[[\s\S]+?\]\]/ig); /* ["[[dolore magna aliqua. Ut enim]]", "[[dolor in reprehenderit in]]"] Now it selected across multiple lines */
[\s\S] basically says get all spaces/whitespace and non-whitespace, which covers most everything.
2) Get as Much as Possible, but not too Aggressive
Often you need to get a pattern 0 or more, or 1 or more times, but not overly aggressive. This is often the case when you’re expecting another match and don’t want to overrun your boundaries. To do this you use “*?” and “+?”.
Example of the problem:
str.match(/\[\[[\s\S]+\]\]/ig); /* ["[[dolore magna aliqua. Ut enim]] ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure [[dolor in reprehenderit in]]"] Oh no, it selected everything until the final ending bracket, it is too aggressive! */
Example of the solution:
str.match(/\[\[[\s\S]+?\]\]/ig); /* ["[[dolore magna aliqua. Ut enim]]", "[[dolor in reprehenderit in]]"] Ok, now it stops at the first ending bracket, less agressive. */
There may be confusion of when to use this, but it boils down to when you’re looking for a token that ends your selection.
3) Non-Captures
Sometimes you need to look for specific items, but you don’t necessarily want them returned. An example would be finding the text from inside an html tag, without returning the tag itself. For this you will use “(?:)”.
Example of the problem:str.match(/\[\[[\s\S]+?\]\]/ig); /* ["[[dolore magna aliqua. Ut enim]]", "[[dolor in reprehenderit in]]"] The result also returns brackets, not good! */
Example of the solution:
str.match(/(?:\[\[)([\s\S]+?)(?:\]\])/i); /* ["[[dolore magna aliqua. Ut enim]]", "dolore magna aliqua. Ut enim"] The second element won't contain the brackets */
These get misleading however. When doing this you will only get 2 items, the first being the first full match found, and the second being the first match without the non-captures. Using the “g” global option will not force it to get all matches like this.
4) Positive Lookahead
Sometimes you want to look ahead of your match to see that the following pattern is ok. This won’t select the pattern though. To do this use “(?=)”.
Example of the problem:
str.match(/\d+/ig) /* ["12345", "4321", "987", "654", "123", "456", "789"] Crap, I only want numbers with a "-" after them.. */
Example of the solution:
str.match(/\d+(?=\-)/ig); /* ["12345", "987", "123", "456"] */
This could also be used to correct the “g” issue I mentioned in tip 3 since the lookahead will not be captured, but be aware it is only a lookahead.
5) Negative Lookahead
Same situation as #4, except we want to make sure the following pattern doesn’t match. To do this we use “(?!)”.
str.match(/\d+/ig) /* ["12345", "4321", "987", "654", "123", "456", "789"] Crap, I only want numbers without a "-" after them.. */
Example of the solution:
str.match(/\d+(?![0-9\-])/ig); /* ["4321", "654", "789"] */
Now I have you curious about javascript look behinds. Well, sorry to disappoint you, but no such thing with native Javascript.
Bonus) Named Captures
Native javascript regular expressions don’t support named captures, but here is a quick named capture method to gain this ability. If you want even more power check out XRegExp which provides an entire library building on the weaknesses of javascript’s native implementation.