63 Commits

Author SHA1 Message Date
Douglas Lovell
c0d89cec2d add a few word transition tests 2015-10-30 16:25:04 -06:00
Douglas Lovell
f9c2d9d4aa all the tests pass 2015-10-30 15:03:06 -06:00
Douglas Lovell
b7bb0cbf5b put text positions on the transitions and track in the token stream 2015-10-30 10:58:15 -06:00
Douglas Lovell
dd5f9b25fa fix the off by ones 2015-10-29 15:29:44 -06:00
Douglas Lovell
7514478a65 Make the transition token the hash key for a transition 2015-10-29 15:28:23 -06:00
Douglas Lovell
8560af8cce test trie in word transition mode 2015-10-28 17:03:29 -06:00
Douglas Lovell
51940af6e7 test state with word transitions 2015-10-28 16:57:37 -06:00
Douglas Lovell
1f63ae71d4 StringBuffer, StringIterator are old school 2015-10-28 16:56:27 -06:00
Douglas Lovell
4be3e115b6 add builder method for setting word transitions 2015-10-28 14:10:53 -06:00
Douglas Lovell
a646f233a5 improve the option name 2015-10-28 11:17:46 -06:00
Douglas Lovell
f05026cf90 added word transitions. all tests pass 2015-10-28 10:59:45 -06:00
Douglas Lovell
6283bf039d first pass refactoring for word transitions 2015-10-27 16:55:35 -06:00
robert-bor
3393e4f51f Issue #26 tokens report if they are 100% whitespace 2015-09-27 20:56:04 +02:00
robert-bor
438e546245 Issue #25 match tokens report back whether they are whole words or not 2015-09-27 18:22:38 +02:00
robert-bor
877a56c956 Issue #24 removed if condition that checked for empty emit strings, whereas emit() always returns a collection 2015-09-27 17:58:55 +02:00
robert-bor
b42c664796 Issue #24 big cleanup, removed all post-processing methods for whole words and non-overlapping sequences and integrated the same functionality closer to the AC algorithm. 2015-09-27 17:56:42 +02:00
robert-bor
b274844b75 Issue #24 stopOnHit removed; the functionality has been replaced by the superior firstMatch 2015-09-27 14:40:04 +02:00
robert-bor
bfaa32b20e Issue #24 tokenize() method implementation extracted to separate class 2015-09-27 14:37:36 +02:00
robert-bor
5203efbbcb Extra explanation on containsMatch 2015-09-23 20:56:24 +02:00
robert-bor
a46177415f Updated README.md documentation 2015-09-23 08:39:13 +02:00
robert-bor
e365689391 v0.3.0 v0.3.0 2015-09-22 22:27:30 +02:00
robert-bor
dc27d6e3e9 pull #17 changes adopted to implement a whole word check on the entire keyword, including whitespaces. 2015-09-22 22:22:20 +02:00
robert-bor
76ae8222ea issue #12 adopted the suggestion by yim1990 with a small change, so that the keyword emit is lowercased as well 2015-09-22 22:10:19 +02:00
robert-bor
e2c5334234 pull #14 implemented pull request by rripken for containsMatch and firstMatch 2015-09-22 22:02:30 +02:00
robert-bor
4633b1ba2a Merge branch 'rripken-master' into feature/footprint-reduction
Conflicts:
	src/main/java/org/ahocorasick/trie/Trie.java
	src/test/java/org/ahocorasick/trie/TrieTest.java
2015-09-22 20:38:29 +02:00
robert-bor
30f003c5ae Issue #18 fixed link to broken PDF, now points to http://cr.yp.to/bib/1975/aho.pdf 2015-09-22 20:22:24 +02:00
robert-bor
c18e030459 Merge branch 'SubOptimal-contrib' into feature/footprint-reduction 2015-09-22 20:18:52 +02:00
robert-bor
023c253c93 Issue #16 #20 #21 adopted pull request from remen which makes sure the failure states are constructed as part of the trie construction. This prevents the NPE which the referenced issues are complaining about. 2015-09-22 20:14:48 +02:00
robert-bor
fcefdfdaf9 Merge branch 'remen-master' into feature/footprint-reduction
Conflicts:
	src/main/java/org/ahocorasick/trie/Trie.java
2015-09-22 20:06:04 +02:00
robert-bor
b85f8fc08f Issue #22 added possibility to stop processing on generating at least one emit 2015-09-22 19:31:05 +02:00
robert-bor
4399e42b99 Issue #23 apply CharSequence to top-level parseText as well 2015-09-22 06:25:50 +02:00
robert-bor
055e13c298 Issue #23 removed the ParseConfiguration, rely on CharSequence instead 2015-09-21 22:03:59 +02:00
robert-bor
88799fb3da Issue #23 added callback handler concept which omits the custom setting up of a list, but instead places direct calls to the handler. The handler are only supported on the lowest level of aho-corasick, ie no overlap, whole words and token support
Also added the possibility to pass a reader to the same level as above.
2015-09-21 21:09:26 +02:00
Petter Remen
9bce51e001 Issue #16 Use builder pattern to create Trie
Previously, there was a race condition in Trie#parseText since
it called constructFailureStates on first run without synchronization.
See https://github.com/robert-bor/aho-corasick/issues/16

This commit fixes this by using the builder pattern in order to
create a fully initialized Trie.

N.B. This changes the API
2015-07-03 12:29:31 +02:00
Frank Dietrich
285a74c37f fix broken link to the white paper 2015-04-30 01:10:47 +02:00
ryan
d1478c7480 HashMap has better performance in my test cases. 2014-10-06 13:34:03 -07:00
ryan
a46e7dfe1d Fixed formatting changes. 2014-10-06 11:02:01 -07:00
ryan
df503bae43 Added method and tests for a faster path to return the first match. 2014-10-06 10:52:35 -07:00
robert-bor
25eeef5168 v0.2.4 with bugfix #10 v0.2.4 2014-08-27 08:44:06 +02:00
robert-bor
2b125d2689 Issue #10 make sure that State emits a specific match only once 2014-08-27 08:42:46 +02:00
robert-bor
c96c57399a update README.md 2014-08-26 10:11:05 +02:00
robert-bor
c572d234e1 v0.2.3 bugfix v0.2.3 2014-08-26 10:05:18 +02:00
robert-bor
e8b5be0497 Issue #8 fixed Unicode issue by converting characters individually, not the entire search text 2014-08-26 09:50:15 +02:00
robert-bor
7431c74a7f Issue #7 bugfix release v0.2.2 v0.2.2 2014-02-15 11:50:43 +01:00
robert-bor
d7421ead0f Issue #7 Ignore keywords in the trie that are null or empty 2014-02-15 11:44:54 +01:00
robert-bor
a4fcfe8f20 Updated README for v0.2.1 2014-02-08 13:24:36 +01:00
robert-bor
4bd568836f Releasing v0.2.1 bug release for issue #5 v0.2.1 2014-02-08 12:46:48 +01:00
robert-bor
31117d6a6e Solved issue #5 by introducing a proper boundary check for words that are at the end of a String 2014-02-08 12:45:55 +01:00
robert-bor
1656e862df v0.2.0 v0.2.0 2014-02-01 22:02:29 +01:00
robert-bor
bcde097070 Issue #4 Trie.tokenize() available. It returns a list of tokens. A token can be either a fragment (unmatched text) or a match. If it is a match, the original emit can be queried. 2014-02-01 22:01:15 +01:00