Mojo struct

StringSlice

@register_passable(trivial) struct StringSlice[mut: Bool, //, origin: Origin[mut]]

A non-owning view to encoded string data.

This type is guaranteed to have the same ABI (size, alignment, and field layout) as the llvm::StringRef type.

Notes: TODO: The underlying string data is guaranteed to be encoded using UTF-8.

Parameters

mut (Bool): Whether the slice is mutable.
origin (Origin[mut]): The origin of the underlying string data.

Implemented traits

AnyType, Boolable, CollectionElement, CollectionElementNew, Copyable, EqualityComparable, ExplicitlyCopyable, FloatableRaising, Hashable, IntableRaising, Movable, PathLike, Representable, Sized, Stringable, UnknownDestructibility, Writable, _CurlyEntryFormattable

Methods

`init`

@implicit __init__(lit: StringLiteral) -> StringSlice[StaticConstantOrigin]

Construct a new StringSlice from a StringLiteral.

Args:

lit (StringLiteral): The literal to construct this StringSlice from.

__init__(*, owned unsafe_from_utf8: Span[SIMD[uint8, 1], origin]) -> Self

Construct a new StringSlice from a sequence of UTF-8 encoded bytes.

Safety: unsafe_from_utf8 MUST be valid UTF-8 encoded data.

Args:

unsafe_from_utf8 (Span[SIMD[uint8, 1], origin]): A Span[Byte] encoded in UTF-8.

__init__(*, unsafe_from_utf8_strref: StringRef) -> Self

Construct a new StringSlice from a StringRef pointing to UTF-8 encoded bytes.

Safety: - unsafe_from_utf8_strref MUST point to data that is valid for origin. - unsafe_from_utf8_strref MUST be valid UTF-8 encoded data.

Args:

unsafe_from_utf8_strref (StringRef): A StringRef of bytes encoded in UTF-8.

__init__(*, ptr: UnsafePointer[SIMD[uint8, 1]], length: UInt) -> Self

Construct a StringSlice from a pointer to a sequence of UTF-8 encoded bytes and a length.

Safety: - ptr MUST point to at least length bytes of valid UTF-8 encoded data. - ptr must point to data that is live for the duration of origin.

Args:

ptr (UnsafePointer[SIMD[uint8, 1]]): A pointer to a sequence of bytes encoded in UTF-8.
length (UInt): The number of bytes of encoded data.

@implicit __init__[O: ImmutableOrigin, //](ref [O] value: String) -> StringSlice[O]

Construct an immutable StringSlice.

Parameters:

O (ImmutableOrigin): The immutable origin.

Args:

value (String): The string value.

`bool`

__bool__(self) -> Bool

Check if a string slice is non-empty.

Returns:

True if a string slice is non-empty, False otherwise.

`getitem`

__getitem__[I: Indexer](self, idx: I) -> String

Gets the character at the specified position.

Parameters:

I (Indexer): A type that can be used as an index.

Args:

idx (I): The index value.

Returns:

A new string containing the character at the specified position.

`lt`

__lt__(self, rhs: StringSlice[origin]) -> Bool

Verify if the StringSlice bytes are strictly less than the input in overlapping content.

Args:

rhs (StringSlice[origin]): The other StringSlice to compare against.

Returns:

If the StringSlice bytes are strictly less than the input in overlapping content.

`eq`

__eq__(self, rhs_same: Self) -> Bool

Verify if a StringSlice is equal to another StringSlice with the same origin.

Args:

rhs_same (Self): The StringSlice to compare against.

Returns:

If the StringSlice is equal to the input in length and contents.

__eq__(self, rhs: StringSlice[origin]) -> Bool

Verify if a StringSlice is equal to another StringSlice.

Args:

rhs (StringSlice[origin]): The StringSlice to compare against.

Returns:

If the StringSlice is equal to the input in length and contents.

`ne`

__ne__(self, rhs_same: Self) -> Bool

Verify if a StringSlice is not equal to another StringSlice with the same origin.

Args:

rhs_same (Self): The StringSlice to compare against.

Returns:

If the StringSlice is not equal to the input in length and contents.

__ne__(self, rhs: StringSlice[origin]) -> Bool

Verify if span is not equal to another StringSlice.

Args:

rhs (StringSlice[origin]): The StringSlice to compare against.

Returns:

If the StringSlice is not equal to the input in length and contents.

`contains`

__contains__(ref self, substr: StringSlice[origin]) -> Bool

Returns True if the substring is contained within the current string.

Args:

substr (StringSlice[origin]): The substring to check.

Returns:

True if the string contains the substring.

`mul`

__mul__(self, n: Int) -> String

Concatenates the string n times.

Args:

n (Int): The number of times to concatenate the string.

Returns:

The string concatenated n times.

`copy`

copy(self) -> Self

Explicitly construct a deep copy of the provided StringSlice.

Returns:

A copy of the value.

`from_utf8`

static from_utf8(from_utf8: Span[SIMD[uint8, 1], origin]) -> Self

Construct a new StringSlice from a buffer containing UTF-8 encoded data.

Args:

from_utf8 (Span[SIMD[uint8, 1], origin]): A span of bytes containing UTF-8 encoded data.

Returns:

A new validated StringSlice pointing to the provided buffer.

Raises:

An exception is raised if the provided buffer byte values do not form valid UTF-8 encoded codepoints.

`str`

__str__(self) -> String

Convert this StringSlice to a String.

Notes: This will allocate a new string that copies the string contents from the provided string slice.

Returns:

A new String.

`repr`

__repr__(self) -> String

Return a Mojo-compatible representation of this string slice.

Returns:

Representation of this string slice as a Mojo string literal input form syntax.

`len`

__len__(self) -> Int

Get the string length in bytes.

This function returns the number of bytes in the underlying UTF-8 representation of the string.

To get the number of Unicode codepoints in a string, use len(str.chars()).

Examples

Query the length of a string, in bytes and Unicode codepoints:

from collections.string import StringSlice
from testing import assert_equal

var s = StringSlice("ನಮಸ್ಕಾರ")

assert_equal(len(s), 21)
assert_equal(len(s.chars()), 7)
from collections.string import StringSlice
from testing import assert_equal

var s = StringSlice("ನಮಸ್ಕಾರ")

assert_equal(len(s), 21)
assert_equal(len(s.chars()), 7)

Strings containing only ASCII characters have the same byte and Unicode codepoint length:

from collections.string import StringSlice
from testing import assert_equal

var s = StringSlice("abc")

assert_equal(len(s), 3)
assert_equal(len(s.chars()), 3)
from collections.string import StringSlice
from testing import assert_equal

var s = StringSlice("abc")

assert_equal(len(s), 3)
assert_equal(len(s.chars()), 3)

Returns:

The string length in bytes.

`write_to`

write_to[W: Writer](self, mut writer: W)

Formats this string slice to the provided Writer.

Parameters:

W (Writer): A type conforming to the Writable trait.

Args:

writer (W): The object to write to.

`hash`

__hash__(self) -> UInt

Hash the underlying buffer using builtin hash.

Returns:

A 64-bit hash value. This value is not suitable for cryptographic uses. Its intended usage is for data structures. See the hash builtin documentation for more details.

`fspath`

__fspath__(self) -> String

Return the file system path representation of this string.

Returns:

The file system path representation as a string.

`iter`

__iter__(self) -> _StringSliceIter[origin]

Iterate over the string, returning immutable references.

Returns:

An iterator of references to the string elements.

`reversed`

__reversed__(self) -> _StringSliceIter[origin, False]

Iterate backwards over the string, returning immutable references.

Returns:

A reversed iterator of references to the string elements.

`int`

__int__(self) -> Int

Parses the given string as a base-10 integer and returns that value. If the string cannot be parsed as an int, an error is raised.

Returns:

An integer value that represents the string, or otherwise raises.

`float`

__float__(self) -> SIMD[float64, 1]

Parses the string as a float point number and returns that value. If the string cannot be parsed as a float, an error is raised.

Returns:

A float value that represents the string, or otherwise raises.

`strip`

strip(self, chars: StringSlice[origin]) -> Self

Return a copy of the string with leading and trailing characters removed.

Examples:

print("himojohi".strip("hi")) # "mojo"
print("himojohi".strip("hi")) # "mojo"

Args:

chars (StringSlice[origin]): A set of characters to be removed. Defaults to whitespace.

Returns:

A copy of the string with no leading or trailing characters.

strip(self) -> Self

Return a copy of the string with leading and trailing whitespaces removed. This only takes ASCII whitespace into account: " \t\n\v\f\r\x1c\x1d\x1e".

Examples:

print("  mojo  ".strip()) # "mojo"
print("  mojo  ".strip()) # "mojo"

Returns:

A copy of the string with no leading or trailing whitespaces.

`rstrip`

rstrip(self, chars: StringSlice[origin]) -> Self

Return a copy of the string with trailing characters removed.

Examples:

print("mojohi".strip("hi")) # "mojo"
print("mojohi".strip("hi")) # "mojo"

Args:

chars (StringSlice[origin]): A set of characters to be removed. Defaults to whitespace.

Returns:

A copy of the string with no trailing characters.

rstrip(self) -> Self

Return a copy of the string with trailing whitespaces removed. This only takes ASCII whitespace into account: " \t\n\v\f\r\x1c\x1d\x1e".

Examples:

print("mojo  ".strip()) # "mojo"
print("mojo  ".strip()) # "mojo"

Returns:

A copy of the string with no trailing whitespaces.

`lstrip`

lstrip(self, chars: StringSlice[origin]) -> Self

Return a copy of the string with leading characters removed.

Examples:

print("himojo".strip("hi")) # "mojo"
print("himojo".strip("hi")) # "mojo"

Args:

chars (StringSlice[origin]): A set of characters to be removed. Defaults to whitespace.

Returns:

A copy of the string with no leading characters.

lstrip(self) -> Self

Return a copy of the string with leading whitespaces removed. This only takes ASCII whitespace into account: " \t\n\v\f\r\x1c\x1d\x1e".

Examples:

print("  mojo".strip()) # "mojo"
print("  mojo".strip()) # "mojo"

Returns:

A copy of the string with no leading whitespaces.

`chars`

chars(self) -> CharsIter[origin]

Returns an iterator over the Chars encoded in this string slice.

Examples

Print the characters in a string:

from collections.string import StringSlice
from testing import assert_equal

var s = StringSlice("abc")
var iter = s.chars()
assert_equal(iter.__next__(), Char.ord("a"))
assert_equal(iter.__next__(), Char.ord("b"))
assert_equal(iter.__next__(), Char.ord("c"))
assert_equal(iter.__has_next__(), False)
from collections.string import StringSlice
from testing import assert_equal

var s = StringSlice("abc")
var iter = s.chars()
assert_equal(iter.__next__(), Char.ord("a"))
assert_equal(iter.__next__(), Char.ord("b"))
assert_equal(iter.__next__(), Char.ord("c"))
assert_equal(iter.__has_next__(), False)

chars() iterates over Unicode codepoints, and supports multibyte codepoints:

from collections.string import StringSlice
from testing import assert_equal

# A visual character composed of a combining sequence of 2 codepoints.
var s = StringSlice("á")
assert_equal(s.byte_length(), 3)

var iter = s.chars()
assert_equal(iter.__next__(), Char.ord("a"))
 # U+0301 Combining Acute Accent
assert_equal(iter.__next__().to_u32(), 0x0301)
assert_equal(iter.__has_next__(), False)
from collections.string import StringSlice
from testing import assert_equal

# A visual character composed of a combining sequence of 2 codepoints.
var s = StringSlice("á")
assert_equal(s.byte_length(), 3)

var iter = s.chars()
assert_equal(iter.__next__(), Char.ord("a"))
 # U+0301 Combining Acute Accent
assert_equal(iter.__next__().to_u32(), 0x0301)
assert_equal(iter.__has_next__(), False)

Returns:

An iterator type that returns successive Char values stored in this string slice.

`char_slices`

char_slices(self) -> _StringSliceIter[origin]

Iterate over the string, returning immutable references.

Returns:

An iterator of references to the string elements.

`as_bytes`

as_bytes(self) -> Span[SIMD[uint8, 1], origin]

Get the sequence of encoded bytes of the underlying string.

Returns:

A slice containing the underlying sequence of encoded bytes.

`unsafe_ptr`

unsafe_ptr(self) -> UnsafePointer[SIMD[uint8, 1], mut=mut, origin=origin]

Gets a pointer to the first element of this string slice.

Returns:

A pointer pointing at the first element of this string slice.

`byte_length`

byte_length(self) -> Int

Get the length of this string slice in bytes.

Returns:

The length of this string slice in bytes.

`char_length`

char_length(self) -> UInt

Returns the length in Unicode codepoints.

This returns the number of Char codepoint values encoded in the UTF-8 representation of this string.

Note: To get the length in bytes, use StringSlice.byte_length().

Examples

Query the length of a string, in bytes and Unicode codepoints:

from collections.string import StringSlice
from testing import assert_equal

var s = StringSlice("ನಮಸ್ಕಾರ")

assert_equal(s.char_length(), 7)
assert_equal(len(s), 21)
from collections.string import StringSlice
from testing import assert_equal

var s = StringSlice("ನಮಸ್ಕಾರ")

assert_equal(s.char_length(), 7)
assert_equal(len(s), 21)

Strings containing only ASCII characters have the same byte and Unicode codepoint length:

from collections.string import StringSlice
from testing import assert_equal

var s = StringSlice("abc")

assert_equal(s.char_length(), 3)
assert_equal(len(s), 3)
from collections.string import StringSlice
from testing import assert_equal

var s = StringSlice("abc")

assert_equal(s.char_length(), 3)
assert_equal(len(s), 3)

The character length of a string with visual combining characters is the length in Unicode codepoints, not grapheme clusters:

from collections.string import StringSlice
from testing import assert_equal

var s = StringSlice("á")
assert_equal(s.char_length(), 2)
assert_equal(s.byte_length(), 3)
from collections.string import StringSlice
from testing import assert_equal

var s = StringSlice("á")
assert_equal(s.char_length(), 2)
assert_equal(s.byte_length(), 3)

Returns:

The length in Unicode codepoints.

`get_immutable`

get_immutable(self) -> StringSlice[(muttoimm origin._mlir_origin)]

Return an immutable version of this string slice.

Returns:

A string slice covering the same elements, but without mutability.

`startswith`

startswith(self, prefix: StringSlice[origin], start: Int = 0, end: Int = -1) -> Bool

Verify if the StringSlice starts with the specified prefix between start and end positions.

Args:

prefix (StringSlice[origin]): The prefix to check.
start (Int): The start offset from which to check.
end (Int): The end offset from which to check.

Returns:

True if the self[start:end] is prefixed by the input prefix.

`endswith`

endswith(self, suffix: StringSlice[origin], start: Int = 0, end: Int = -1) -> Bool

Verify if the StringSlice end with the specified suffix between start and end positions.

Args:

suffix (StringSlice[origin]): The suffix to check.
start (Int): The start offset from which to check.
end (Int): The end offset from which to check.

Returns:

True if the self[start:end] is suffixed by the input suffix.

`format`

format[*Ts: _CurlyEntryFormattable](self, *args: *Ts) -> String

Format a template with *args.

Examples:

# Manual indexing:
print("{0} {1} {0}".format("Mojo", 1.125)) # Mojo 1.125 Mojo
# Automatic indexing:
print("{} {}".format(True, "hello world")) # True hello world
# Manual indexing:
print("{0} {1} {0}".format("Mojo", 1.125)) # Mojo 1.125 Mojo
# Automatic indexing:
print("{} {}".format(True, "hello world")) # True hello world

Parameters:

*Ts (_CurlyEntryFormattable): The types of substitution values that implement Representable and Stringable (to be changed and made more flexible).

Args:

*args (*Ts): The substitution values.

Returns:

The template with the given values substituted.

`find`

find(ref self, substr: StringSlice[origin], start: Int = 0) -> Int

Finds the offset of the first occurrence of substr starting at start. If not found, returns -1.

Args:

substr (StringSlice[origin]): The substring to find.
start (Int): The offset from which to find.

Returns:

The offset of substr relative to the beginning of the string.

`rfind`

rfind(self, substr: StringSlice[origin], start: Int = 0) -> Int

Finds the offset of the last occurrence of substr starting at start. If not found, returns -1.

Args:

substr (StringSlice[origin]): The substring to find.
start (Int): The offset from which to find.

Returns:

The offset of substr relative to the beginning of the string.

`isspace`

isspace(self) -> Bool

Determines whether every character in the given StringSlice is a python whitespace String. This corresponds to Python's universal separators: " \t\n\v\f\r\x1c\x1d\x1e\x85\u2028\u2029".

Examples:

Check if a string contains only whitespace:

from collections.string import StringSlice
from testing import assert_true, assert_false

# An empty string is not considered to contain only whitespace chars:
assert_false(StringSlice("").isspace())

# ASCII space characters
assert_true(StringSlice(" ").isspace())
assert_true(StringSlice("	").isspace())

# Contains non-space characters
assert_false(StringSlice(" abc  ").isspace())
from collections.string import StringSlice
from testing import assert_true, assert_false

# An empty string is not considered to contain only whitespace chars:
assert_false(StringSlice("").isspace())

# ASCII space characters
assert_true(StringSlice(" ").isspace())
assert_true(StringSlice("	").isspace())

# Contains non-space characters
assert_false(StringSlice(" abc  ").isspace())

Returns:

True if the whole StringSlice is made up of whitespace characters listed above, otherwise False.

`isnewline`

isnewline[single_character: Bool = False](self) -> Bool

Determines whether every character in the given StringSlice is a python newline character. This corresponds to Python's universal newlines: "\r\n" and "\t\n\v\f\r\x1c\x1d\x1e\x85\u2028\u2029".

Parameters:

single_character (Bool): Whether to evaluate the stringslice as a single unicode character (avoids overhead when already iterating).

Returns:

True if the whole StringSlice is made up of whitespace characters listed above, otherwise False.

`splitlines`

splitlines[O: ImmutableOrigin, //](self: StringSlice[O], keepends: Bool = False) -> List[StringSlice[O]]

Split the string at line boundaries. This corresponds to Python's universal newlines: "\r\n" and "\t\n\v\f\r\x1c\x1d\x1e\x85\u2028\u2029".

Parameters:

O (ImmutableOrigin): The immutable origin.

Args:

keepends (Bool): If True, line breaks are kept in the resulting strings.

Returns:

A List of Strings containing the input split by line boundaries.

`count`

count(self, substr: StringSlice[origin]) -> Int

Return the number of non-overlapping occurrences of substring substr in the string.

If sub is empty, returns the number of empty strings between characters which is the length of the string plus one.

Args:

substr (StringSlice[origin]): The substring to count.

Returns:

The number of occurrences of substr.

Parameters​

Implemented traits​

Methods​

__init__​

__bool__​

__getitem__​

__lt__​

__eq__​

__ne__​

__contains__​

__mul__​

copy​

from_utf8​

__str__​

__repr__​

__len__​

Examples

write_to​

__hash__​

__fspath__​

__iter__​

__reversed__​

__int__​

__float__​

strip​

rstrip​

lstrip​

chars​

Examples

char_slices​

as_bytes​

unsafe_ptr​

byte_length​

char_length​

Examples

get_immutable​

startswith​

endswith​

format​

find​

rfind​

isspace​

isnewline​

splitlines​

count​

Parameters

Implemented traits

Methods

`init`

`bool`

`getitem`

`lt`

`eq`

`ne`

`contains`

`mul`

`copy`

`from_utf8`

`str`

`repr`

`len`

`write_to`

`hash`

`fspath`

`iter`

`reversed`

`int`

`float`

`strip`

`rstrip`

`lstrip`

`chars`

`char_slices`

`as_bytes`

`unsafe_ptr`

`byte_length`

`char_length`

`get_immutable`

`startswith`

`endswith`

`format`

`find`

`rfind`

`isspace`

`isnewline`

`splitlines`

`count`