1
1
mirror of https://github.com/mrabarnett/mrab-regex.git synced 2025-10-06 06:12:38 +02:00
Commit Graph

97 Commits

Author SHA1 Message Date
Matthew Barnett
15cbd1eaf3 Git issue 495: Running time for failing fullmatch increases rapidly with input length
Re-enabled modified repeat guards due to regression in speed caused by excessive backtracking.
2023-03-23 17:37:37 +00:00
Matthew Barnett
5954c51580 Git issue 494: Backtracking failure matching regex ^a?(a?)b?c\1$ against string abca
Disabled repeat guards. They keep causing issues, and it's just simpler to rely on timeouts.
2023-03-22 18:17:48 +00:00
Matthew Barnett
cc392629b2 Updated text for supported Unicode and Python versions. 2022-10-31 02:46:35 +00:00
Matthew Barnett
4ad1f04899 Updated to Unicode 15.0.0. 2022-09-13 01:47:01 +01:00
Matthew Barnett
416dd66ba9 Updated version. 2022-09-11 19:27:20 +01:00
Matthew Barnett
71be78a042 Git issue 477: \v for vertical spacing
Added \p{HorizSpace} (\p{H}) and \p{VertSpace} (\p{V}).
2022-08-17 20:26:24 +01:00
Matthew Barnett
01758302a6 Git issue 475: 2022.7.24 improperly released
The file https://pypi.org/pypi/regex/2022.7.24/json was missing references to most of the wheels, so this is a new release in the hope that it was just a glitch in GitHub Actions.
2022-07-25 14:55:33 +01:00
Matthew Barnett
ea5e640db3 Git issue 474: regex has no equivalent to re.Match.groups() for captures
Added 'allcaptures' and 'allspans' methods to match objects.

Fixed bug where compiling a pattern didn't always check for unused arguments.
2022-07-24 23:27:26 +01:00
Matthew Barnett
5c9b260d2f Git issue 473: Emoji classified as letter
The values for GC:Assigned and GC:LC were flipped.
2022-07-09 00:36:46 +01:00
Matthew Barnett
8a514bc62f Git issue 472: Revisit compilation flag to prevent adding a single explicitly compiled regex to the cache
Added 'cache_pattern' parameter to 'compile' function to improve use of the cache.
2022-06-02 23:20:58 +01:00
Matthew Barnett
74b8db0e81 Git issue 467: Scoped inline flags 'a', 'u' and 'L' affect global flags
Those flags scan now be scoped.
2022-04-24 19:02:34 +01:00
Matthew Barnett
138970bafb Git issue 457: Difference with re, when repl returns None
Make regex consistent with re by treating a replacement template of None as ''.

Also, now rejects invalid ASCII escapes like re module does.
2022-03-15 17:03:11 +00:00
Matthew Barnett
67ab717196 Git issue 453: Document last supported python2 version
Added a brief reference to the last version to support Python 2 in README.rst.

Git issue 456: RegexFlag exists in re, but not regex

    Updated the flags to use enum now that regex supports only Python 3.6+.
2022-03-02 01:18:38 +00:00
Matthew Barnett
667f171a0b Updated version for new release. 2022-01-18 18:18:44 +00:00
Matthew Barnett
dde2d98360 Git issue 443: 2021.11.9 source release is missing C headers
Updated version.
2021-11-09 22:16:38 +00:00
Matthew Barnett
1c30637ec7 Git issue 442: Fuzzy regex matching doesn't seem to test insertions correctly 2021-11-09 19:43:42 +00:00
Matthew Barnett
6ce0bda712 Git issue 435: Unmatched groups: sub vs subf
A similar fix also applies to expandf: unmatched groups should expand to an empty string.
2021-11-02 17:13:53 +00:00
Matthew Barnett
f2c5da72e3 Further changes for migration to Github. 2021-11-01 19:27:48 +00:00
Matthew Barnett
bf5e239c0e Git issue 433: Disagreement between fuzzy_counts and fuzzy_changes
Fuzzy changes were sometimes not removed when backtracking.
2021-10-23 00:10:40 +01:00
Matthew Barnett
26d37df1c3 Removed Apple Silicon build from .travis.yml because it's not currently codesigned by Travis CI. 2021-10-21 02:21:13 +01:00
Matthew Barnett
d81009be69 Git issue 428: match hangs on the following example - possible infinite loop?
Fixed miscalculation of total error count when there's more than one fuzzy term.
2021-10-08 23:56:28 +01:00
Matthew Barnett
25638c20a4 Git issue 427: Possible bug with BESTMATCH 2021-09-30 23:10:38 +01:00
Matthew Barnett
3dd42455df Updated to Unicode 14.0.0. 2021-09-24 21:07:41 +01:00
Matthew Barnett
e3b477cc93 Git issue 421: 2021.8.27 results in "Fatal Python error: Segmentation fault"
Fixed problems with use of fast searching tables in opposite direction.
2021-08-27 19:39:30 +01:00
Matthew Barnett
ac7ce3f5ee Git issue 420: segmentation fault in finditer (maybe others)
Fixed a bugs in fast searches in reverse direction.
2021-08-27 01:40:23 +01:00
Matthew Barnett
75211751d9 Updated version. 2021-08-21 21:22:35 +01:00
Matthew Barnett
26a320f29b Forgot to update version! 2021-08-03 18:41:15 +01:00
Matthew Barnett
ae6bb1ebd3 Additional fix for Git issue 415. 2021-07-06 00:03:35 +01:00
Matthew Barnett
5d6f9cb115 Git issue 415: Fuzzy character restrictions don't apply to insertions at "right edge" 2021-07-05 20:41:03 +01:00
Matthew Barnett
32453c1378 Git issue 407: API is not a drop-in replacement for python's re when it comes to typing
Now exports Match object as well as Pattern object.

Git issue 414: Memory optimization questions

sys.getsizeof returns a more accurate size of a pattern object. It includes the size of internal data, but, as is the norm, does not include the size of public objects.
2021-07-01 23:27:31 +01:00
Matthew Barnett
1e6986b92f Git issue 408: regex fails with a quantified backreference but succeeds with repeated backref
Git issue 407: API is not a drop-in replacement for python's re when it comes to typing
2021-04-04 17:48:38 +01:00
Matthew Barnett
0321186b78 Git issue 403: Fuzzy matching with wrong distance (unnecessary substitutions)
Reworked the fuzzy matching code.
2021-03-17 20:11:27 +00:00
Matthew Barnett
5de64f7553 Git issue 394: Unexpected behaviour in fuzzy matching with limited character set with IGNORECASE flag 2020-11-13 01:54:29 +00:00
Matthew Barnett
d5a5016c1b Update version. 2020-11-11 16:13:28 +00:00
Matthew Barnett
b693a1fba7 Git issue 362: Any LICENSE work for this project?
Changed licence to Apache 2.0 and added licence file.
2020-10-28 22:28:14 +00:00
Matthew Barnett
92989b561a Git issue 387: Compilaton flag to avoid storing compiled regexp in internal cache
Slight reversion/revision. You can prevent explicitly-compiled patterns from being cached by using "cache_all(False)".
2020-10-23 03:01:51 +01:00
Matthew Barnett
22c5f461b4 Git issue 387: Compilaton flag to avoid storing compiled regexp in internal cache
No longer caches patterns that are compiled explicitly.
2020-10-22 23:43:57 +01:00
Matthew Barnett
fa9def53cf Git issue 386: GCC 10 warnings
Fixed bugs in fuzzy_match_string_fld and fuzzy_match_group_fld.

Added more braces around data in some Unicode tables.
2020-10-15 13:27:10 +01:00
Matthew Barnett
818685f09c Git issue 385: Comments in expressions
Didn't parse regex comments property when in VERBOSE mode.
2020-10-11 03:20:49 +01:00
Matthew Barnett
5c657a4473 Git issue 383: Memory Error - regex.findall
The problem was caused by a lazy repeat looping forever, growing the backtracking stack. Greedy repeats were OK.
2020-09-27 02:43:56 +01:00
Matthew Barnett
fe9fb05890 Git issue 377: request: \h for horizontal space
Added \h as an alias to [[:blank:]].
2020-07-14 23:44:08 +01:00
Matthew Barnett
fb025ba271 Git issue 376: Is the \L option as efficient as it can be?
Improved performance of string sets.
2020-06-07 23:24:47 +01:00
Matthew Barnett
be28c28db9 Git issue 376: Is the \L option as efficient as it can be?
Switched StringSet to use fallback method due to inefficiencies in the engine. Needs more investigation.
2020-06-07 02:19:06 +01:00
Matthew Barnett
af87091c93 Git issue 372: Regression from 2020.4.4 -> 2020.5.7 in non-fuzzy matching pattern
Changed the 'state' member that's tested in is_repeat_guarded for a fuzzy match. The previously-used member wasn't initialised in a non-fuzzy match. The new test is a better one to use anyway.
2020-05-14 14:26:25 +01:00
Matthew Barnett
98fea72cf3 Git issue 371: Specifying character set when fuzzy-matching allows characters not in the set
fuzzy_ext_match and fuzzy_ext_match_group_fld didn't support sets!
2020-05-13 17:46:26 +01:00
Matthew Barnett
2a97a7df1f Git issue 370: Confusions about Fuzzy matching behavior (prob a bug?) 2020-05-07 18:01:07 +01:00
Matthew Barnett
c660527507 Updated to Unicode 13.0.0. 2020-04-04 21:05:28 +01:00
Matthew Barnett
2a0cb832d2 Git issue 365: Memory leak occurs in fuzzy match at some substitution use cases 2020-02-20 20:31:27 +00:00
Matthew Barnett
42ec250563 Git issue #364: Contradictory values in fuzzy_counts and fuzzy_changes 2020-02-18 23:40:21 +00:00
Matthew Barnett
0c850822f5 Issue 357: New exception "ValueError: unused keyword argument" breaks use case
Added ignore_unused keyword argument.

Issue 359: 2020.1.7 source distribution release contains \r\n line endings

Fixed line endings for source distribution.

Issue 360: Invalid modeline in `_regex.c`

Removed vim modeline.
2020-01-07 23:38:06 +00:00