regex: assume unknown characters to be word characters