Talk:Comparison of regular expression engines: Difference between revisions

Content deleted Content added
No edit summary
 
(17 intermediate revisions by 13 users not shown)
Line 1:
{{WikiProject Computingbanner shell|class=List|importance=|software=yes|software-importance=}}
{{WikiProject Computing|importance=Low|software=yes|software-importance=Low}}
}}
{{Broken anchors|links=
* <nowiki>[[Regular expression#Fuzzy regular expressions|fuzzy regular expression]]</nowiki> The anchor (#Fuzzy regular expressions) has been [[Special:Diff/693310524|deleted by other users]] before. <!-- {"title":"Fuzzy regular expressions","appear":{"revid":556036708,"parentid":555983164,"timestamp":"2013-05-21T01:36:54Z","replaced_anchors":{"Fuzzy Regular Expressions":"Fuzzy regular expressions"},"removed_section_titles":["Fuzzy Regular Expressions"],"added_section_titles":["Fuzzy regular expressions"]},"disappear":{"revid":693310524,"parentid":692694255,"timestamp":"2015-12-01T19:20:24Z","removed_section_titles":["Fuzzy regular expressions"],"added_section_titles":["Fuzzy regexps"]},"very_different":false,"rename_to":"Fuzzy regexes"} -->
}}
 
== python regex module missing ==
= Status/Edit Notes =
https://pypi.python.org/pypi/regex supports a broader set of features than pythons standard re module, especially recursive pattern matching --[[User:ThomasKalka|ThomasKalka]] ([[User talk:ThomasKalka|talk]]) 11:02, 28 March 2016 (UTC)
 
:Linked in the "Remarks" column for Python in the [[Comparison of regular expression engines#Languages|Languages table]]. [[User:Rootsmusic|rootsmusic]] ([[User talk:Rootsmusic|talk]]) 22:36, 14 December 2023 (UTC)
 
== Jan-2010 Update ==
Made some updates -- not enough room to document them on the 1 line summary. Some of these address comments made previously (below).
 
Updates include:
* removing notes that something has only been available since '2007' (version was mentioned, but released 5 years ago)....
* remove notes on Unicode support where the note was (supports ALL, including binary )... if it supports it, it supports it, a special note should not be required for 'all', rather, only 'partial' support cases should be noted.
Line 14 ⟶ 22:
* Note 2: (not mentioned in the article) -- PCRE gets it's code from Perl -- so it's features generally track Perl's. PCRE is an acronym for Perl Compatible Regular Expression. and the engine in ruby derives from PCRE -- and tries to track it's features. The Ruby engine was done, specifically to add Japanese support BEFORE UTF-8 started becoming prevalent. Thus it supported 16-bits early on, but for locale-based charsets for Japanese. It wasn't really until it added UTF-8 support that it got full Unicode support.
<small> (This is written after updating ""Part 2"" . I'm looking at ""Part 3"" to see what is salvageable there... started to writeup comments, but better I do it and then say what was done, as if I get hung up on saying what I'll do, I may not get it done...(am getting a bit tired of this update stuff already)... </small> [[User:Athenae|Astara Athenea]] ([[User talk:Athenae|talk]]) 21:44, 22 January 2012 (UTC)
 
 
== Ill-defined terms ==
Line 46 ⟶ 53:
 
: I've gone ahead and done this. --[[User:Monger|Monger]] 01:00, 20 July 2007 (UTC)
 
 
== Unicode property support ==
Line 76 ⟶ 82:
:'''NOTE:''' An application using a library for regular expression support does not necessarily offer the full set of features of the library, e.g. GNU Grep which uses PCRE does not offer lookahead support, though PCRE does.
 
However, the table shows that GNU Grep does support lookahead. Unfortunately, I'm not sure which is true, perhaps someone else who knows can correct it. <span style="font-size: smaller;" class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/99.42.116.61|99.42.116.61]] ([[User talk:99.42.116.61|talk]]) 22:03, 28 April 2013 (UTC)</span><!-- Template:Unsigned IP --> <!--Autosigned by SineBot-->
 
== External links modified ==
 
Hello fellow Wikipedians,
 
I have just modified 4 external links on [[Comparison of regular expression engines]]. Please take a moment to review [https://en.wikipedia.org/w/index.php?diff=prev&oldid=795049497 my edit]. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit [[User:Cyberpower678/FaQs#InternetArchiveBot|this simple FaQ]] for additional information. I made the following changes:
*Added archive https://archive.is/20081203133158/http://jeff.bleugris.com/journal/projects/ to http://jeff.bleugris.com/journal/projects/
*Added archive https://web.archive.org/web/20081201072631/http://www.regexlab.com/en/deelx/ to http://www.regexlab.com/en/deelx/
*Added archive https://web.archive.org/web/20131122023923/http://www2.tcl.tk/461 to http://www2.tcl.tk/461
*Added archive https://web.archive.org/web/20110715032327/https://www.p6r.com/software/rgx.html to https://www.p6r.com/software/rgx.html
 
When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.
 
{{sourcecheck|checked=false|needhelp=}}
 
Cheers.—[[User:InternetArchiveBot|'''<span style="color:darkgrey;font-family:monospace">InternetArchiveBot</span>''']] <span style="color:green;font-family:Rockwell">([[User talk:InternetArchiveBot|Report bug]])</span> 17:41, 11 August 2017 (UTC)
 
== Javascript regular expression features recently added. ==
 
In new implementations, as seen in a [https://github.com/tc39/proposal-regexp-named-groups proposal for named capture groups] has been added.
<ref>{{Cite web|url=https://codereview.chromium.org/2050343002|title=Issue 2050343002: [regexp] Experimental support for regexp named captures - Code Review|website=codereview.chromium.org|access-date=2018-02-02}}</ref>
 
ES2018 has a [https://github.com/tc39/proposal-regexp-lookbehind proposal for lookbehind], which was already implemented in some engines.<ref>{{Cite web|url=https://v8project.blogspot.com/2016/02/regexp-lookbehind-assertions.html|title=V8 JavaScript Engine: RegExp lookbehind assertions|last=Hablich|first=Michael|date=2016-02-26|website=V8 JavaScript Engine|access-date=2018-02-02}}</ref>
 
Also v8 has had unicode for a while now. <ref>{{Cite web|url=https://chromium.googlesource.com/v8/v8/+/3a2fbc3a4ed2802b52659df2209b930200d63b29|title=3a2fbc3a4ed2802b52659df2209b930200d63b29 - v8/v8 - Git at Google|website=chromium.googlesource.com|language=en|access-date=2018-02-02}}</ref><ref>{{Cite web|url=https://chromium.googlesource.com/v8/v8/+/e1c645d1f41febae014b4d0dfe7dc6e4549fab5e|title=e1c645d1f41febae014b4d0dfe7dc6e4549fab5e - v8/v8 - Git at Google|website=chromium.googlesource.com|language=en|access-date=2018-02-02}}</ref>
 
{{reflist-talk}}
 
== Engines could be categorized ==
 
- There are official types of engines: DFA / NFA with the distinction Traditional NFA, Posix NFA (see. [https://www.oreilly.com/library/view/mastering-regular-expressions/0596528124/ch04.html Mastering Regular Expressions, 3rd Edition by Jeffrey E.F. Friedl, chapter 4])
- And there is a strong grouping of Perl compatibility (which drove regex developments some years ago). Perl 5.005 introduced new features ([https://perldoc.perl.org/perl5005delta.html#Regular-Expressions Perl 5.005 Regular Expression improvements]) like Lookbehinds, Conditional Expressions, Atomic Groups. Perl 5.10 introduced other new features ([https://perldoc.perl.org/perl5100delta.html#Regular-expressions Perl 5.1 Regular Expression improvements]) many years later like Named Capture Buffers, Possessive Quantifiers, Relative Backreferences, \K, among others. The regex engine in version 5.10 was developed in collaboration with the PCRE project, the most interesting features were added beween 1997 and 2007 ([https://www.rexegg.com/pcre-documentation.html Curated PCRE history]).
 
As Perl is/was the defacto standard for regex, most of the engines in this Wikipedia article have a grammar and feature clearly set before the Perl 5.005 release, between Perl 5.005 and 5.10, or after Perl 5.10.
 
Sebastian --[[Special:Contributions/88.217.185.170|88.217.185.170]] ([[User talk:88.217.185.170|talk]]) 21:47, 12 October 2019 (UTC)
 
:A bit of digging provided: PCRE was created 1997, when Perl 5.004 was out. PCRE 2.0 was created in 1998, when Perl 5.005 (with regex updates) was released. PCRE 7.0-7.3 were done in 2006-2007 in co-development for Perl 5.10 (with more groundbreaking regex updates). Sebastian --[[Special:Contributions/88.217.185.170|88.217.185.170]] ([[User talk:88.217.185.170|talk]]) 22:20, 12 October 2019 (UTC)
 
== About Java Regex Variable-length lookaround ==
 
I just tested regex "(?<=[a-z]+)[0-9]+" on some Java platforms, including Oracle JDK and OpenJDK 1.8, Oracle JDK and OpenJDK 17 at the text "abcd12345", and it gave the correct result "12345"; the variable-width look-ahead regex "[0-9]+(?=[a-z]+)" also works fine on Java on my machine. But on Android platform with API level 29 and Java source/target compatibility version 1.8, this regular expression has a compilation error for the reason of "non-fixed width look-behind". Also on the website [https://regex101.com/r/kB8Y47/1 regex test] it fails with Java 8 flavor.
 
I don't know how these work on Java, and the different results above, very confused now. <!-- Template:Unsigned --><span class="autosigned" style="font-size:85%;">—&nbsp;Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[User:Bczhc|Bczhc]] ([[User talk:Bczhc#top|talk]] • [[Special:Contributions/Bczhc|contribs]]) 02:52, 5 February 2022 (UTC)</span> <!--Autosigned by SineBot-->
 
:and is there a need to add "variable-length lookahead" future property on regular expression features part 2? [[User:Bczhc|Bczhc]] ([[User talk:Bczhc|talk]]) 02:57, 5 February 2022 (UTC)
 
 
== possessive quantifiers ==
 
I'm missing the feature "possessive quantifiers" that some RegEx dialects have. I can only find the distinction between greedy and non-greedy, but technically there is greedy, lazy/reluctant and possessive.