Skip to main content
Log in

Mojo struct

StringSlice

@register_passable(trivial) struct StringSlice[mut: Bool, //, origin: Origin[mut]]

A non-owning view to encoded string data.

This type is guaranteed to have the same ABI (size, alignment, and field layout) as the llvm::StringRef type.

Notes: TODO: The underlying string data is guaranteed to be encoded using UTF-8.

Parameters

  • mut (Bool): Whether the slice is mutable.
  • origin (Origin[mut]): The origin of the underlying string data.

Implemented traits

AnyType, Boolable, CollectionElement, CollectionElementNew, Copyable, EqualityComparable, ExplicitlyCopyable, FloatableRaising, Hashable, IntableRaising, Movable, PathLike, Representable, Sized, Stringable, UnknownDestructibility, Writable, _CurlyEntryFormattable

Methods

__init__

@implicit __init__(lit: StringLiteral) -> StringSlice[StaticConstantOrigin]

Construct a new StringSlice from a StringLiteral.

Args:

  • lit (StringLiteral): The literal to construct this StringSlice from.

__init__(*, owned unsafe_from_utf8: Span[SIMD[uint8, 1], origin]) -> Self

Construct a new StringSlice from a sequence of UTF-8 encoded bytes.

Safety: unsafe_from_utf8 MUST be valid UTF-8 encoded data.

Args:

  • unsafe_from_utf8 (Span[SIMD[uint8, 1], origin]): A Span[Byte] encoded in UTF-8.

__init__(*, unsafe_from_utf8_strref: StringRef) -> Self

Construct a new StringSlice from a StringRef pointing to UTF-8 encoded bytes.

Safety: - unsafe_from_utf8_strref MUST point to data that is valid for origin. - unsafe_from_utf8_strref MUST be valid UTF-8 encoded data.

Args:

  • unsafe_from_utf8_strref (StringRef): A StringRef of bytes encoded in UTF-8.

__init__(*, ptr: UnsafePointer[SIMD[uint8, 1]], length: UInt) -> Self

Construct a StringSlice from a pointer to a sequence of UTF-8 encoded bytes and a length.

Safety: - ptr MUST point to at least length bytes of valid UTF-8 encoded data. - ptr must point to data that is live for the duration of origin.

Args:

  • ptr (UnsafePointer[SIMD[uint8, 1]]): A pointer to a sequence of bytes encoded in UTF-8.
  • length (UInt): The number of bytes of encoded data.

@implicit __init__[O: ImmutableOrigin, //](ref [O] value: String) -> StringSlice[O]

Construct an immutable StringSlice.

Parameters:

  • O (ImmutableOrigin): The immutable origin.

Args:

  • value (String): The string value.

__bool__

__bool__(self) -> Bool

Check if a string slice is non-empty.

Returns:

True if a string slice is non-empty, False otherwise.

__getitem__

__getitem__[I: Indexer](self, idx: I) -> String

Gets the character at the specified position.

Parameters:

  • I (Indexer): A type that can be used as an index.

Args:

  • idx (I): The index value.

Returns:

A new string containing the character at the specified position.

__lt__

__lt__(self, rhs: StringSlice[origin]) -> Bool

Verify if the StringSlice bytes are strictly less than the input in overlapping content.

Args:

  • rhs (StringSlice[origin]): The other StringSlice to compare against.

Returns:

If the StringSlice bytes are strictly less than the input in overlapping content.

__eq__

__eq__(self, rhs_same: Self) -> Bool

Verify if a StringSlice is equal to another StringSlice with the same origin.

Args:

  • rhs_same (Self): The StringSlice to compare against.

Returns:

If the StringSlice is equal to the input in length and contents.

__eq__(self, rhs: StringSlice[origin]) -> Bool

Verify if a StringSlice is equal to another StringSlice.

Args:

  • rhs (StringSlice[origin]): The StringSlice to compare against.

Returns:

If the StringSlice is equal to the input in length and contents.

__ne__

__ne__(self, rhs_same: Self) -> Bool

Verify if a StringSlice is not equal to another StringSlice with the same origin.

Args:

  • rhs_same (Self): The StringSlice to compare against.

Returns:

If the StringSlice is not equal to the input in length and contents.

__ne__(self, rhs: StringSlice[origin]) -> Bool

Verify if span is not equal to another StringSlice.

Args:

  • rhs (StringSlice[origin]): The StringSlice to compare against.

Returns:

If the StringSlice is not equal to the input in length and contents.

__contains__

__contains__(ref self, substr: StringSlice[origin]) -> Bool

Returns True if the substring is contained within the current string.

Args:

  • substr (StringSlice[origin]): The substring to check.

Returns:

True if the string contains the substring.

__mul__

__mul__(self, n: Int) -> String

Concatenates the string n times.

Args:

  • n (Int): The number of times to concatenate the string.

Returns:

The string concatenated n times.

copy

copy(self) -> Self

Explicitly construct a deep copy of the provided StringSlice.

Returns:

A copy of the value.

from_utf8

static from_utf8(from_utf8: Span[SIMD[uint8, 1], origin]) -> Self

Construct a new StringSlice from a buffer containing UTF-8 encoded data.

Args:

  • from_utf8 (Span[SIMD[uint8, 1], origin]): A span of bytes containing UTF-8 encoded data.

Returns:

A new validated StringSlice pointing to the provided buffer.

Raises:

An exception is raised if the provided buffer byte values do not form valid UTF-8 encoded codepoints.

__str__

__str__(self) -> String

Convert this StringSlice to a String.

Notes: This will allocate a new string that copies the string contents from the provided string slice.

Returns:

A new String.

__repr__

__repr__(self) -> String

Return a Mojo-compatible representation of this string slice.

Returns:

Representation of this string slice as a Mojo string literal input form syntax.

__len__

__len__(self) -> Int

Get the string length in bytes.

This function returns the number of bytes in the underlying UTF-8 representation of the string.

To get the number of Unicode codepoints in a string, use len(str.chars()).

Examples

Query the length of a string, in bytes and Unicode codepoints:

from collections.string import StringSlice
from testing import assert_equal

var s = StringSlice("ನಮಸ್ಕಾರ")

assert_equal(len(s), 21)
assert_equal(len(s.chars()), 7)
from collections.string import StringSlice
from testing import assert_equal

var s = StringSlice("ನಮಸ್ಕಾರ")

assert_equal(len(s), 21)
assert_equal(len(s.chars()), 7)

Strings containing only ASCII characters have the same byte and Unicode codepoint length:

from collections.string import StringSlice
from testing import assert_equal

var s = StringSlice("abc")

assert_equal(len(s), 3)
assert_equal(len(s.chars()), 3)
from collections.string import StringSlice
from testing import assert_equal

var s = StringSlice("abc")

assert_equal(len(s), 3)
assert_equal(len(s.chars()), 3)

.

Returns:

The string length in bytes.

write_to

write_to[W: Writer](self, mut writer: W)

Formats this string slice to the provided Writer.

Parameters:

  • W (Writer): A type conforming to the Writable trait.

Args:

  • writer (W): The object to write to.

__hash__

__hash__(self) -> UInt

Hash the underlying buffer using builtin hash.

Returns:

A 64-bit hash value. This value is not suitable for cryptographic uses. Its intended usage is for data structures. See the hash builtin documentation for more details.

__fspath__

__fspath__(self) -> String

Return the file system path representation of this string.

Returns:

The file system path representation as a string.

__iter__

__iter__(self) -> _StringSliceIter[origin]

Iterate over the string, returning immutable references.

Returns:

An iterator of references to the string elements.

__reversed__

__reversed__(self) -> _StringSliceIter[origin, False]

Iterate backwards over the string, returning immutable references.

Returns:

A reversed iterator of references to the string elements.

__int__

__int__(self) -> Int

Parses the given string as a base-10 integer and returns that value. If the string cannot be parsed as an int, an error is raised.

Returns:

An integer value that represents the string, or otherwise raises.

__float__

__float__(self) -> SIMD[float64, 1]

Parses the string as a float point number and returns that value. If the string cannot be parsed as a float, an error is raised.

Returns:

A float value that represents the string, or otherwise raises.

strip

strip(self, chars: StringSlice[origin]) -> Self

Return a copy of the string with leading and trailing characters removed.

Examples:

print("himojohi".strip("hi")) # "mojo"
print("himojohi".strip("hi")) # "mojo"

.

Args:

  • chars (StringSlice[origin]): A set of characters to be removed. Defaults to whitespace.

Returns:

A copy of the string with no leading or trailing characters.

strip(self) -> Self

Return a copy of the string with leading and trailing whitespaces removed. This only takes ASCII whitespace into account: " \t\n\v\f\r\x1c\x1d\x1e".

Examples:

print("  mojo  ".strip()) # "mojo"
print("  mojo  ".strip()) # "mojo"

.

Returns:

A copy of the string with no leading or trailing whitespaces.

rstrip

rstrip(self, chars: StringSlice[origin]) -> Self

Return a copy of the string with trailing characters removed.

Examples:

print("mojohi".strip("hi")) # "mojo"
print("mojohi".strip("hi")) # "mojo"

.

Args:

  • chars (StringSlice[origin]): A set of characters to be removed. Defaults to whitespace.

Returns:

A copy of the string with no trailing characters.

rstrip(self) -> Self

Return a copy of the string with trailing whitespaces removed. This only takes ASCII whitespace into account: " \t\n\v\f\r\x1c\x1d\x1e".

Examples:

print("mojo  ".strip()) # "mojo"
print("mojo  ".strip()) # "mojo"

.

Returns:

A copy of the string with no trailing whitespaces.

lstrip

lstrip(self, chars: StringSlice[origin]) -> Self

Return a copy of the string with leading characters removed.

Examples:

print("himojo".strip("hi")) # "mojo"
print("himojo".strip("hi")) # "mojo"

.

Args:

  • chars (StringSlice[origin]): A set of characters to be removed. Defaults to whitespace.

Returns:

A copy of the string with no leading characters.

lstrip(self) -> Self

Return a copy of the string with leading whitespaces removed. This only takes ASCII whitespace into account: " \t\n\v\f\r\x1c\x1d\x1e".

Examples:

print("  mojo".strip()) # "mojo"
print("  mojo".strip()) # "mojo"

.

Returns:

A copy of the string with no leading whitespaces.

chars

chars(self) -> CharsIter[origin]

Returns an iterator over the Chars encoded in this string slice.

Examples

Print the characters in a string:

from collections.string import StringSlice
from testing import assert_equal

var s = StringSlice("abc")
var iter = s.chars()
assert_equal(iter.__next__(), Char.ord("a"))
assert_equal(iter.__next__(), Char.ord("b"))
assert_equal(iter.__next__(), Char.ord("c"))
assert_equal(iter.__has_next__(), False)
from collections.string import StringSlice
from testing import assert_equal

var s = StringSlice("abc")
var iter = s.chars()
assert_equal(iter.__next__(), Char.ord("a"))
assert_equal(iter.__next__(), Char.ord("b"))
assert_equal(iter.__next__(), Char.ord("c"))
assert_equal(iter.__has_next__(), False)

chars() iterates over Unicode codepoints, and supports multibyte codepoints:

from collections.string import StringSlice
from testing import assert_equal

# A visual character composed of a combining sequence of 2 codepoints.
var s = StringSlice("á")
assert_equal(s.byte_length(), 3)

var iter = s.chars()
assert_equal(iter.__next__(), Char.ord("a"))
# U+0301 Combining Acute Accent
assert_equal(iter.__next__().to_u32(), 0x0301)
assert_equal(iter.__has_next__(), False)
from collections.string import StringSlice
from testing import assert_equal

# A visual character composed of a combining sequence of 2 codepoints.
var s = StringSlice("á")
assert_equal(s.byte_length(), 3)

var iter = s.chars()
assert_equal(iter.__next__(), Char.ord("a"))
# U+0301 Combining Acute Accent
assert_equal(iter.__next__().to_u32(), 0x0301)
assert_equal(iter.__has_next__(), False)

.

Returns:

An iterator type that returns successive Char values stored in this string slice.

char_slices

char_slices(self) -> _StringSliceIter[origin]

Iterate over the string, returning immutable references.

Returns:

An iterator of references to the string elements.

as_bytes

as_bytes(self) -> Span[SIMD[uint8, 1], origin]

Get the sequence of encoded bytes of the underlying string.

Returns:

A slice containing the underlying sequence of encoded bytes.

unsafe_ptr

unsafe_ptr(self) -> UnsafePointer[SIMD[uint8, 1], mut=mut, origin=origin]

Gets a pointer to the first element of this string slice.

Returns:

A pointer pointing at the first element of this string slice.

byte_length

byte_length(self) -> Int

Get the length of this string slice in bytes.

Returns:

The length of this string slice in bytes.

char_length

char_length(self) -> UInt

Returns the length in Unicode codepoints.

This returns the number of Char codepoint values encoded in the UTF-8 representation of this string.

Note: To get the length in bytes, use StringSlice.byte_length().

Examples

Query the length of a string, in bytes and Unicode codepoints:

from collections.string import StringSlice
from testing import assert_equal

var s = StringSlice("ನಮಸ್ಕಾರ")

assert_equal(s.char_length(), 7)
assert_equal(len(s), 21)
from collections.string import StringSlice
from testing import assert_equal

var s = StringSlice("ನಮಸ್ಕಾರ")

assert_equal(s.char_length(), 7)
assert_equal(len(s), 21)

Strings containing only ASCII characters have the same byte and Unicode codepoint length:

from collections.string import StringSlice
from testing import assert_equal

var s = StringSlice("abc")

assert_equal(s.char_length(), 3)
assert_equal(len(s), 3)
from collections.string import StringSlice
from testing import assert_equal

var s = StringSlice("abc")

assert_equal(s.char_length(), 3)
assert_equal(len(s), 3)

The character length of a string with visual combining characters is the length in Unicode codepoints, not grapheme clusters:

from collections.string import StringSlice
from testing import assert_equal

var s = StringSlice("á")
assert_equal(s.char_length(), 2)
assert_equal(s.byte_length(), 3)
from collections.string import StringSlice
from testing import assert_equal

var s = StringSlice("á")
assert_equal(s.char_length(), 2)
assert_equal(s.byte_length(), 3)

.

Returns:

The length in Unicode codepoints.

get_immutable

get_immutable(self) -> StringSlice[(muttoimm origin._mlir_origin)]

Return an immutable version of this string slice.

Returns:

A string slice covering the same elements, but without mutability.

startswith

startswith(self, prefix: StringSlice[origin], start: Int = 0, end: Int = -1) -> Bool

Verify if the StringSlice starts with the specified prefix between start and end positions.

Args:

  • prefix (StringSlice[origin]): The prefix to check.
  • start (Int): The start offset from which to check.
  • end (Int): The end offset from which to check.

Returns:

True if the self[start:end] is prefixed by the input prefix.

endswith

endswith(self, suffix: StringSlice[origin], start: Int = 0, end: Int = -1) -> Bool

Verify if the StringSlice end with the specified suffix between start and end positions.

Args:

  • suffix (StringSlice[origin]): The suffix to check.
  • start (Int): The start offset from which to check.
  • end (Int): The end offset from which to check.

Returns:

True if the self[start:end] is suffixed by the input suffix.

format

format[*Ts: _CurlyEntryFormattable](self, *args: *Ts) -> String

Format a template with *args.

Examples:

# Manual indexing:
print("{0} {1} {0}".format("Mojo", 1.125)) # Mojo 1.125 Mojo
# Automatic indexing:
print("{} {}".format(True, "hello world")) # True hello world
# Manual indexing:
print("{0} {1} {0}".format("Mojo", 1.125)) # Mojo 1.125 Mojo
# Automatic indexing:
print("{} {}".format(True, "hello world")) # True hello world

.

Parameters:

  • *Ts (_CurlyEntryFormattable): The types of substitution values that implement Representable and Stringable (to be changed and made more flexible).

Args:

  • *args (*Ts): The substitution values.

Returns:

The template with the given values substituted.

find

find(ref self, substr: StringSlice[origin], start: Int = 0) -> Int

Finds the offset of the first occurrence of substr starting at start. If not found, returns -1.

Args:

  • substr (StringSlice[origin]): The substring to find.
  • start (Int): The offset from which to find.

Returns:

The offset of substr relative to the beginning of the string.

rfind

rfind(self, substr: StringSlice[origin], start: Int = 0) -> Int

Finds the offset of the last occurrence of substr starting at start. If not found, returns -1.

Args:

  • substr (StringSlice[origin]): The substring to find.
  • start (Int): The offset from which to find.

Returns:

The offset of substr relative to the beginning of the string.

isspace

isspace(self) -> Bool

Determines whether every character in the given StringSlice is a python whitespace String. This corresponds to Python's universal separators: " \t\n\v\f\r\x1c\x1d\x1e\x85\u2028\u2029".

Examples:

Check if a string contains only whitespace:

from collections.string import StringSlice
from testing import assert_true, assert_false

# An empty string is not considered to contain only whitespace chars:
assert_false(StringSlice("").isspace())

# ASCII space characters
assert_true(StringSlice(" ").isspace())
assert_true(StringSlice(" ").isspace())

# Contains non-space characters
assert_false(StringSlice(" abc ").isspace())
from collections.string import StringSlice
from testing import assert_true, assert_false

# An empty string is not considered to contain only whitespace chars:
assert_false(StringSlice("").isspace())

# ASCII space characters
assert_true(StringSlice(" ").isspace())
assert_true(StringSlice(" ").isspace())

# Contains non-space characters
assert_false(StringSlice(" abc ").isspace())

.

Returns:

True if the whole StringSlice is made up of whitespace characters listed above, otherwise False.

isnewline

isnewline[single_character: Bool = False](self) -> Bool

Determines whether every character in the given StringSlice is a python newline character. This corresponds to Python's universal newlines: "\r\n" and "\t\n\v\f\r\x1c\x1d\x1e\x85\u2028\u2029".

Parameters:

  • single_character (Bool): Whether to evaluate the stringslice as a single unicode character (avoids overhead when already iterating).

Returns:

True if the whole StringSlice is made up of whitespace characters listed above, otherwise False.

splitlines

splitlines[O: ImmutableOrigin, //](self: StringSlice[O], keepends: Bool = False) -> List[StringSlice[O]]

Split the string at line boundaries. This corresponds to Python's universal newlines: "\r\n" and "\t\n\v\f\r\x1c\x1d\x1e\x85\u2028\u2029".

Parameters:

  • O (ImmutableOrigin): The immutable origin.

Args:

  • keepends (Bool): If True, line breaks are kept in the resulting strings.

Returns:

A List of Strings containing the input split by line boundaries.

count

count(self, substr: StringSlice[origin]) -> Int

Return the number of non-overlapping occurrences of substring substr in the string.

If sub is empty, returns the number of empty strings between characters which is the length of the string plus one.

Args:

  • substr (StringSlice[origin]): The substring to count.

Returns:

The number of occurrences of substr.