49 Commits

Author SHA1 Message Date
Omar Shibli
d39609ebc4
fixed unicode chars issue, super annoying, it's not perfect, but hey done better than perfect. more info here https://github.com/robert-bor/aho-corasick/pull/82 (#83)
Co-authored-by: omarshibli <omar.shibli@personetics.com>
2020-09-24 17:21:59 -07:00
Dave Jarvis
beca23930d Fix warnings 2020-08-23 16:27:59 -07:00
Dave Jarvis
cfcd2170ba
Address IDE warnings (#81)
Co-authored-by: Dave Jarvis <Dave.Jarvis@gmail.com>
2020-08-23 16:21:04 -07:00
Dave Jarvis
00a6a3a9f3 Clean up comments, update maven, require Java 8 for building 2020-06-11 18:22:13 -07:00
Renaud Richardet
26268ae012
fixes #73, javadocs error (#76) 2020-05-11 16:48:32 -07:00
Uri Simchoni
365ac85830
Add a test for parallel search in same trie (#75)
* Add a test for parallel search in same trie

* Add a timeout for the parallel test
2020-03-25 23:40:01 -07:00
Umit Gunduz
413d63675b Update Trie.java (#70)
* Update Trie.java

fix firstMatch NullPointerException

* Update Trie.java

change for shorter code
2019-10-11 08:32:26 -07:00
Daniel Beck
9f80565b53 #49 Allow to specify Payload with Keyword (#68)
* #49: Allow to fix Payload with Keyword
2019-08-19 20:16:46 -07:00
Luke Butters
21a6fc5baa Changes from auto code review 2017-11-01 16:25:49 +11:00
Luke Butters
e4d87b4b07 grammer 2017-11-01 16:19:03 +11:00
Luke Butters
773ff39e48 Stop Trie#removePartialMatches() from being expensive #61
This changes the running time of `Trie#removePartialMatches()` from something that is subquadratic time or worse (I think n^3) to a running time that is linear.
2017-11-01 16:10:53 +11:00
robert-bor
d9a10a475a
#53 Added AbstractStatefulEmitHandler, test shows the example of usage. 2017-05-15 20:51:58 +02:00
Crystark
ea88eb987a #53 Allow to ack emits
Also allow use of isOnlyWholeWords, isOnlyWholeWordsWhiteSpaceSeparated
and isAllowOverlaps using a StatefulEmitHandler
2017-05-15 16:02:02 +02:00
robert-bor
a45df04a26 Optimize imports
Reformatted code (Java convention; tab is 4 spaces)
2016-11-30 12:10:20 +01:00
djarvis
5edf6d8126 Added missing override annotations. Added final modifier to Interval member variables. Updated documentation for ignoreCase (issue #33) and moved the ignore methods to the top of the builder to reflect their preferred calling order. 2016-11-30 12:10:04 +01:00
robert-bor
b5aaa51fdd Optimize imports
Reformatted code (Java convention; tab is 4 spaces)
2016-11-30 09:10:21 +01:00
djarvis
503a0f1c76 Updated source base to leverage JDK 1.7 syntax. Added more final modifiers. Eliminated parameter modification inside method. Some formatting. Changed TrieBuilder to offer CharSequence instead of String; revised Trie accordingly. Removed some duplication. NetBeans automatically translated the code to use static imports (as per JDK 1.7 syntax). 2016-11-30 09:06:14 +01:00
robert-bor
8ae9636201 4 spaces for code
Badges for Travis, Codacy, Codecov, Maven and Javadoc
Added Travis CI build instructions
2016-11-29 19:54:23 +01:00
djarvis
f6a7103f5f Added final modifier. Added helper methods for adding keywords using arrays and collections. Added test for large character strings. Simplified code for adding keywords. Renamed a few methods for consistency. Some code formatting. Updated unit tests with constant arrays, as a first step to reducing the duplication in the unit tests; migrated away from deprecated methods. 2016-11-28 21:20:57 -08:00
Dave Jarvis
69781c0ae8 Added source code comments.
Added source code comments that should be useful for developers looking for more details.
2016-11-27 17:38:33 -08:00
robert-bor
dc27d6e3e9 pull #17 changes adopted to implement a whole word check on the entire keyword, including whitespaces. 2015-09-22 22:22:20 +02:00
robert-bor
76ae8222ea issue #12 adopted the suggestion by yim1990 with a small change, so that the keyword emit is lowercased as well 2015-09-22 22:10:19 +02:00
robert-bor
e2c5334234 pull #14 implemented pull request by rripken for containsMatch and firstMatch 2015-09-22 22:02:30 +02:00
robert-bor
4633b1ba2a Merge branch 'rripken-master' into feature/footprint-reduction
Conflicts:
	src/main/java/org/ahocorasick/trie/Trie.java
	src/test/java/org/ahocorasick/trie/TrieTest.java
2015-09-22 20:38:29 +02:00
robert-bor
30f003c5ae Issue #18 fixed link to broken PDF, now points to http://cr.yp.to/bib/1975/aho.pdf 2015-09-22 20:22:24 +02:00
robert-bor
c18e030459 Merge branch 'SubOptimal-contrib' into feature/footprint-reduction 2015-09-22 20:18:52 +02:00
robert-bor
023c253c93 Issue #16 #20 #21 adopted pull request from remen which makes sure the failure states are constructed as part of the trie construction. This prevents the NPE which the referenced issues are complaining about. 2015-09-22 20:14:48 +02:00
robert-bor
fcefdfdaf9 Merge branch 'remen-master' into feature/footprint-reduction
Conflicts:
	src/main/java/org/ahocorasick/trie/Trie.java
2015-09-22 20:06:04 +02:00
robert-bor
b85f8fc08f Issue #22 added possibility to stop processing on generating at least one emit 2015-09-22 19:31:05 +02:00
robert-bor
4399e42b99 Issue #23 apply CharSequence to top-level parseText as well 2015-09-22 06:25:50 +02:00
robert-bor
055e13c298 Issue #23 removed the ParseConfiguration, rely on CharSequence instead 2015-09-21 22:03:59 +02:00
robert-bor
88799fb3da Issue #23 added callback handler concept which omits the custom setting up of a list, but instead places direct calls to the handler. The handler are only supported on the lowest level of aho-corasick, ie no overlap, whole words and token support
Also added the possibility to pass a reader to the same level as above.
2015-09-21 21:09:26 +02:00
Petter Remen
9bce51e001 Issue #16 Use builder pattern to create Trie
Previously, there was a race condition in Trie#parseText since
it called constructFailureStates on first run without synchronization.
See https://github.com/robert-bor/aho-corasick/issues/16

This commit fixes this by using the builder pattern in order to
create a fully initialized Trie.

N.B. This changes the API
2015-07-03 12:29:31 +02:00
Frank Dietrich
285a74c37f fix broken link to the white paper 2015-04-30 01:10:47 +02:00
ryan
d1478c7480 HashMap has better performance in my test cases. 2014-10-06 13:34:03 -07:00
ryan
a46e7dfe1d Fixed formatting changes. 2014-10-06 11:02:01 -07:00
ryan
df503bae43 Added method and tests for a faster path to return the first match. 2014-10-06 10:52:35 -07:00
robert-bor
2b125d2689 Issue #10 make sure that State emits a specific match only once 2014-08-27 08:42:46 +02:00
robert-bor
e8b5be0497 Issue #8 fixed Unicode issue by converting characters individually, not the entire search text 2014-08-26 09:50:15 +02:00
robert-bor
7431c74a7f Issue #7 bugfix release v0.2.2 2014-02-15 11:50:43 +01:00
robert-bor
d7421ead0f Issue #7 Ignore keywords in the trie that are null or empty 2014-02-15 11:44:54 +01:00
robert-bor
31117d6a6e Solved issue #5 by introducing a proper boundary check for words that are at the end of a String 2014-02-08 12:45:55 +01:00
robert-bor
bcde097070 Issue #4 Trie.tokenize() available. It returns a list of tokens. A token can be either a fragment (unmatched text) or a match. If it is a match, the original emit can be queried. 2014-02-01 22:01:15 +01:00
robert-bor
ae20429936 Issue #3 added case insensitivity when matching keywords 2014-02-01 21:04:53 +01:00
robert-bor
cb44a6bff2 Issue #2 implemented whole word matching 2014-02-01 20:35:38 +01:00
robert-bor
4c8ea8ba57 Issue #1 fixed bug in compareTo method of Interval. Problem was the compareTo only worked on start position, whereas it should also work on end position. 2014-01-31 16:16:21 +01:00
robert-bor
1785a554f3 Issue #1 remove overlapping intervals. Resolution rule: longer matches over smaller ones, left-most over right-most 2014-01-31 14:56:11 +01:00
robert-bor
922da26965 Emit now also contains the start position of the found keyword 2014-01-30 09:36:53 +01:00
robert-bor
d140afc0da first setup for the Aho-Corasick algorithm 2014-01-29 21:27:46 +01:00