View all functions

CategoryStrings: Regex

What does the regexp function do in MATLAB / RunMat?

regexp(text, pattern) locates regular expression matches inside character vectors, string scalars, string arrays, and cell arrays of character vectors. It can return starting indices, ending indices, matched substrings, capture tokens, token extents, named-token structures, or the text split around matches. The optional 'once' flag restricts the search to the first match, while 'emptymatch','allow' keeps zero-length matches that otherwise get filtered out.

How does the regexp function behave in MATLAB / RunMat?

  • Single character vectors and string scalars return a numeric row vector of 1-based match start indices by default.
  • String arrays and cell arrays always produce cell outputs that mirror the input shape, with each cell holding the result for the corresponding element.
  • 'match' returns matched substrings, 'tokens' returns nested cells of capture-group substrings, 'tokenExtents' returns n × 2 double matrices with start/end indices for each token, 'names' returns scalar struct values keyed by named tokens, and 'split' yields the text segments between matches.
  • 'once' stops after the first match (per element), and every requested output honours that limit.
  • 'emptymatch','remove' (default) filters zero-length matches; 'emptymatch','allow' keeps them so callers can observe optional patterns.
  • 'forceCellOutput' forces cell-array containers even for scalar inputs so downstream code can rely on uniform dimensions. MATLAB-compatible 'warnings','on'/'off' flags are accepted but currently informational only.
  • 'matchcase' and 'ignorecase' toggle case sensitivity, while 'lineanchors' (^/$) and 'dotall'/'dotExceptNewline' control how . interacts with newlines, mirroring MATLAB flags.

regexp Function GPU Execution Behaviour

regexp executes entirely on the CPU and is registered as an acceleration sink. If any argument resides on the GPU, the runtime gathers it before evaluation, computes all requested outputs on the host, and returns host-side containers. Providers do not implement custom hooks for this builtin, so no GPU kernels are required or invoked.

Examples of using the regexp function in MATLAB / RunMat

Find all 1-based match positions in a character vector

idx = regexp('abracadabra', 'a');

Expected output:

idx =
     1     4     6     8    11

Return matched substrings using 'match'

matches = regexp('abc123xyz', '\d+', 'match');

Expected output:

matches =
  1×1 cell array
    {'123'}

Extract capture tokens

tokens = regexp('2024-03-14', '(\d{4})-(\d{2})-(\d{2})', 'tokens');
year = tokens{1}{1};
month = tokens{1}{2};
day = tokens{1}{3};

Expected output:

year =
    '2024'
month =
    '03'
day =
    '14'

Split a string array around commas

parts = regexp(["a,b,c"; "1,2,3"], ',', 'split');

Expected output:

parts =
  2×1 cell array
    {1×3 cell}
    {1×3 cell}

Return only the first match with 'once'

first_idx = regexp('abababa', 'ba', 'once');

Expected output:

first_idx =
     2

Work with named tokens

matches = regexp('X=42; Y=7;', '(?<name>[A-Z])=(?<value>\d+)', 'names');
values = cellfun(@(s) str2double(s.value), matches);

Expected output:

values =
     42     7

Keep zero-length matches with 'emptymatch','allow'

idx = regexp('aba', 'b*', 'emptymatch', 'allow');

Expected output:

idx =
     1     2     3     4

FAQ

What outputs does regexp return by default?

With a single output argument, regexp returns a numeric row vector of 1-based match starts. When the call site asks for multiple outputs (e.g. [startIdx, endIdx, matchStr] = regexp(...)), RunMat returns match starts, match ends, and matched substrings in that order, just like MATLAB.

How can I request tokens or splits instead of indices?

Specify the desired output types as string flags, for example regexp(str, pat, 'match'), regexp(str, pat, 'tokens'), or regexp(str, pat, 'split'). Multiple flags combine, so regexp(str, pat, 'match', 'tokens') returns both outputs.

Does regexp support case-insensitive matching?

Yes. Use 'ignorecase' (or call regexpi) to enable case-insensitive matching, and 'matchcase' to revert to the default case-sensitive behaviour.

How are string arrays and cell arrays handled?

For string arrays and cell arrays of char vectors, every output is a cell array whose shape matches the input. Each cell contains the result for the corresponding element, which mirrors MATLAB's container semantics.

How do zero-length matches behave?

By default ('emptymatch','remove'), zero-length matches are filtered out so loops do not stall. Specify 'emptymatch','allow' to keep them, matching MATLAB's 'emptymatch' flag.

Can I force cell output even for character vectors?

Yes. Pass 'forceCellOutput' to force the outputs into cell arrays, which is useful when writing code that handles both scalar and array inputs uniformly.

Does regexp run on the GPU?

No. RunMat executes regexp on the CPU. If inputs reside on the GPU, it gathers them first and then re-uploads any numeric outputs when beneficial, preserving residency for downstream kernels.

What happens when I ask for more outputs than I requested via flags?

RunMat follows MATLAB's rules: if you do not supply explicit output flags, the default multi-output order is start indices, end indices, and matched substrings. Extra requested outputs beyond what you specified become numeric zeros.

See Also

regexpi, regexprep, contains, split, strfind

Source & Feedback