1 define: #Tokenizer &parents: {StreamProcessor} &slots:
2 {#separators -> ASCIIString Character Whitespace copy}.
3 "A stream that collects characters and emits words separated by
4 any of the given separators."
5 "FIXME: ASCIIString Character Whitespace is not ASCII specific.
6 Being a whitespace needs some abstraction over the actual encoding."
7 "TODO: Design extension: parametrize Tokenizer on something more high level that works at the level of overriding #skipSeparators."
9 s@(Tokenizer traits) newOn: target &separators
12 separators ifNotNil: [result separators := separators].
16 s@(ReadStream traits) split
17 "Answer a Tokenizer over the input argument with the default separators."
20 s@(ReadStream traits) splitWith: separators
21 "Answer a Tokenizer over the input argument with the given separators."
22 [Tokenizer newOn: s &separators: separators].
24 s@(Tokenizer traits) skipSeparators
26 [s source isAtEnd not /\ [s isSeparator: s source peek]]
27 whileTrue: [s source next].
31 s@(Tokenizer traits) isAtEnd
33 [s skipSeparators] on: Stream Exhaustion do: [| :c | ^ True].
37 s@(Tokenizer traits) reset
44 s@(Tokenizer traits) contents
50 s@(Tokenizer traits) isSeparator: char
52 s separators includes: char
55 s@(Tokenizer traits) elementType
56 "Since a Tokenizer returns sub-Sequences of its source."
57 [s source collectionType].
59 s@(Tokenizer traits) collectionType
60 "Answer a generic Sequence, since our elements are themselves Sequences."
63 s@(Tokenizer traits) next
67 [result nextPut: s source next]
68 until: [s source isAtEnd \/ [s isSeparator: s source peek]].
69 ] breakOn: Stream Exhaustion.
70 result position isZero ifTrue: [s exhausted]]
71 writingAs: s elementType