nvstrings API Reference

nvstrings

class nvstrings.nvstrings(cptr)

Instance manages a list of strings in device memory.

Operations are across all of the strings and their results reside in device memory. Strings in the list are immutable. Methods that modify any string will create a new nvstrings instance.

Methods

capitalize() Capitalize first character of each string.
cat([others, sep, na_rep]) Appends the given strings to this list of strings and returns as new nvstrings.
center(width[, fillchar]) Pad the beginning and end of each string to the minimum width.
compare(str[, devptr]) Compare each string to the supplied string.
contains(pat[, regex, devptr]) Find the specified string within each string.
count(pat[, devptr]) Count occurrences of pattern in each string.
endswith(pat[, devptr]) Return array of boolean values with True for the strings where the specified string is at the end.
extract(pat) Extract string from the first match of regular expression pat.
extract_column(pat) Extract string from the first match of regular expression pat.
find(sub[, start, end, devptr]) Find the specified string sub within each string.
find_from(sub[, starts, ends, devptr]) Find the specified string within each string starting at the specified character positions.
find_multiple(strs[, devptr]) Return a ‘matrix’ of find results for each of the string in the strs parameter.
findall(pat) Find all occurrences of regular expression pattern in each string.
findall_column(pat) A new set of nvstrings is created by organizing substring results vertically.
get(i) Returns the character specified in each string as a new string.
hash([devptr]) Returns hash values represented by each string.
index(sub[, start, end, devptr]) Same as find but throws an error if arg is not found in all strings.
isalnum([devptr]) Return array of boolean values with True for strings that contain only alpha-numeric characters.
isalpha([devptr]) Return array of boolean values with True for strings that contain only alphabetic characters.
isdecimal([devptr]) Return array of boolean values with True for strings that contain only decimal characters – those that can be used to extract base10 numbers.
isdigit([devptr]) Return array of boolean values with True for strings that contain only decimal and digit characters.
islower([devptr]) Return array of boolean values with True for strings that contain only lowercase characters.
isnumeric([devptr]) Return array of boolean values with True for strings that contain only numeric characters.
isspace([devptr]) Return array of boolean values with True for strings that contain only whitespace characters.
isupper([devptr]) Return array of boolean values with True for strings that contain only uppercase characters.
join([sep]) Concatentate this list of strings into a single string.
len([devptr]) Returns the number of characters of each string.
ljust(width[, fillchar]) Pad the end of each string to the minimum width.
lower() Convert each string to lowercase.
lstrip([to_strip]) Strip leading characters from each string.
match(pat[, devptr]) Return array of boolean values where True is set if the specified pattern matches the beginning of the corresponding string.
order(stype[, asc, devptr]) Sort this list by name (2) or length (1) or both (3).
pad(width[, side, fillchar]) Add specified padding to each string.
partition([delimiter]) Each string is split into two strings on the first delimiter found.
remove_strings(indexes[, count]) Remove the specified strings and return a new instance.
repeat(repeats) Appends each string with itself the specified number of times.
replace(pat, repl[, n, regex]) Replace a string (pat) in each string with another string (repl).
rfind(sub[, start, end, devptr]) Find the specified string within each string.
rindex(sub[, start, end, devptr]) Same as rfind but throws an error if arg is not found in all strings.
rjust(width[, fillchar]) Pad the beginning of each string to the minimum width.
rpartition([delimiter]) Each string is split into two strings on the first delimiter found.
rsplit([delimiter, n]) Returns an array of nvstrings each representing the split of each individual string.
rsplit_column([delimiter, n]) A new set of columns (nvstrings) is created by splitting the strings vertically.
rstrip([to_strip]) Strip trailing characters from each string.
size() The number of strings managed by this instance.
slice(start[, stop, step]) Returns a substring of each string.
slice_from([starts, stops]) Return substring of each string using positions for each string.
slice_replace([start, stop, repl]) Replace the specified section of each string with a new string.
sort(stype[, asc]) Sort this list by name (2) or length (1) or both (3).
split([delimiter, n]) Returns an array of nvstrings each representing the split of each individual string.
split_column([delimiter, n]) A new set of columns (nvstrings) is created by splitting the strings vertically.
startswith(pat[, devptr]) Return array of boolean values with True for the strings where the specified string is at the beginning.
stof([devptr]) Returns float values represented by each string.
stoi([devptr]) Returns integer value represented by each string.
strip([to_strip]) Strip leading and trailing characters from each string.
sublist(indexes[, count]) Return a sublist of strings from this instance.
swapcase() Change each lowercase character to uppercase and vice versa.
title() Uppercase the first letter of each letter after a space and lowercase the rest.
to_host() Copies strings back to CPU memory into a Python array.
translate(table) Translate individual characters to new characters using the provided table.
upper() Convert each string to uppercase.
wrap(width) This will place new-line characters in whitespace so each line is no more than width characters.
zfill(width) Pads the strings with leading zeros.
capitalize()

Capitalize first character of each string. This only applies to ASCII characters at this time.

Examples

import nvstrings

s = nvstrings.to_device(["hello, friend","goodbye, friend"])
print(s.lower())

Output:

['Hello, friend", "Goodbye, friend"]
cat(others=None, sep=None, na_rep=None)

Appends the given strings to this list of strings and returns as new nvstrings.

Parameters:
others : List of str

Strings to be appended. The number of strings must match size() of this instance. This must be either a Python array of strings or another nvstrings instance.

sep : str

If specified, this separator will be appended to each string before appending the others.

na_rep : char

This character will take the place of any null strings (not empty strings) in either list.

Examples

import nvstrings

s1 = nvstrings.to_device(['hello', None,'goodbye'])
s2 = nvstrings.to_device(['world','globe', None])

print(s1.cat(s2,sep=':', na_rep='_'))

Output:

["hello:world","_:globe","goodbye:_"]
center(width, fillchar=' ')

Pad the beginning and end of each string to the minimum width.

Parameters:
width : int

The minimum width of characters of the new string. If the width is smaller than the existing string, no padding is performed.

fillchar : char

The character used to do the padding. Default is space character. Only the first character is used.

Examples

import nvstrings

s = nvstrings.to_device(["hello","goodbye","well"])
for result in s.center(width=6):
  print(result)

Output:

['hello ', 'goodbye', ' well ']
compare(str, devptr=0)

Compare each string to the supplied string. Returns value of 0 for strings that match str. Returns < 0 when first different character is lower than argument string or argument string is shorter. Returns > 0 when first different character is greater than the argument string or the argument string is longer.

Parameters:
str : str

String to compare all strings in this instance.

devptr : GPU memory pointer

Where string result values will be written. Must be able to hold at least size() of int32 values.

Examples

import nvstrings
s = nvstrings.to_device(["hello","world"])

print(s.compare('hello'))

Output:

[0,15]
contains(pat, regex=True, devptr=0)

Find the specified string within each string.

Default expects regex pattern. Returns an array of boolean values where True if pat is found, False if not.

Parameters:
pat : str

Pattern or string to search for in each string of this instance.

regex : bool

If True, pat is interpreted as a regex string. If False, pat is a string to be searched for in each instance.

devptr : GPU memory pointer

Optional device memory pointer to hold the results. Must be able to hold at least size() of np.byte values.

Examples

import nvstrings
s = nvstrings.to_device(["hello","there","world"])

print(s.contains('o'))

Output:

[True, False, True]
count(pat, devptr=0)

Count occurrences of pattern in each string.

Parameters:
pat : str

Pattern to find

devptr : GPU memory pointer

Optional device memory pointer to hold the results. Memory must be able to hold at least size() of int32 values.

endswith(pat, devptr=0)

Return array of boolean values with True for the strings where the specified string is at the end.

Parameters:
pat : str

Pattern to find. Regular expressions are not accepted.

devptr : GPU memory pointer

Optional device memory pointer to hold the results. Memory must be able to hold at least size() of np.byte values.

Examples

import nvstrings
s = nvstrings.to_device(["hello","there","world"])

print(s.endsswith('d'))

Output:

[False, False, True]
extract(pat)

Extract string from the first match of regular expression pat. A new array of nvstrings is created for each string in this instance.

Parameters:
pat : str

The regex pattern with group capture syntax

Examples

import nvstrings

s = nvstrings.to_device(["a1","b2","c3"])
for result in s.extract('([ab])(\d)'):
  print(result)

Output:

["a","1"]
["b","2"]
[None,None]
extract_column(pat)

Extract string from the first match of regular expression pat. A new array of nvstrings is created by organizing group results vertically.

Parameters:
pat : str

The regex pattern with group capture syntax

Examples

import nvstrings

s = nvstrings.to_device(["a1","b2","c3"])
for result in s.extract_column('([ab])(\d)'):
  print(result)

Output:

["a","b"]
["1","2"]
[None,None]
find(sub, start=0, end=None, devptr=0)

Find the specified string sub within each string. Return -1 for those strings where sub is not found.

Parameters:
sub : str

String to find

start : int

Beginning of section to replace. Default is beginning of each string.

end : int

End of section to replace. Default is end of each string.

devptr : GPU memory pointer

Optional device memory pointer to hold the results. Memory size must be able to hold at least size() of int32 values.

Examples

import nvstrings
s = nvstrings.to_device(["hello","there","world"])

print(s.find('o'))

Output:

[4,-1,1]
find_from(sub, starts=0, ends=0, devptr=0)

Find the specified string within each string starting at the specified character positions.

The starts and ends parameters are device memory pointers. If specified, each must contain size() of int32 values.

Returns -1 for those strings where sub is not found.

Parameters:
sub : str

String to find

starts : GPU memory pointer

Pointer to GPU array of int32 values of beginning of sections to search, one per string.

ends : GPU memory pointer

Pointer to GPU array of int32 values of end of sections to search. Use -1 to specify to the end of that string.

devptr : GPU memory pointer

Optional device memory pointer to hold the results. Memory size must be able to hold at least size() of int32 values.

Examples

import nvstrings
import numpy as np
from numba import cuda

s = nvstrings.to_device(["hello","there"])
darr = cuda.to_device(np.asarray([2,3],dtype=np.int32))
print(s.find_from('e',starts=darr.device_ctypes_pointer.value))

Output:

[-1,4]
find_multiple(strs, devptr=0)

Return a ‘matrix’ of find results for each of the string in the strs parameter.

Each row is an array of integers identifying the first location of the corresponding provided string.

Parameters:
strs : nvstrings

Strings to find in each of the strings in this instance.

devptr : GPU memory pointer

Optional device memory pointer to hold the results.

Memory size must be able to hold at least size()*strs.size() of int32 values.

Examples

import nvstrings

s = nvstrings.to_device(["hare","bunny","rabbit"])
t = nvstrings.to_device(["a","e","i","o","u"])
print(s.find_multiple(t))

Output:

[[1, 3, -1, -1, -1], [-1, -1, -1, -1, 1], [1, -1, 4, -1, -1]]
findall(pat)

Find all occurrences of regular expression pattern in each string. A new array of nvstrings is created for each string in this instance.

Parameters:
pat : str

The regex pattern used to search for substrings

Examples

import nvstrings

s = nvstrings.to_device(["hare","bunny","rabbit"])
for result in s.findall('[ab]'):
  print(result)

Output:

["a"]
["b"]
["a","b","b"]
findall_column(pat)

A new set of nvstrings is created by organizing substring results vertically.

Parameters:
pat : str

The regex pattern to search for substrings

Examples

import nvstrings

s = nvstrings.to_device(["hare","bunny","rabbit"])
for result in s.findall_column('[ab]'):
  print(result)

Output:

["a","b","a"]
[None,None,"b"]
[None,None,"b"]
get(i)

Returns the character specified in each string as a new string.

The nvstrings returned contains a list of single character strings.

Parameters:
i : int

The character position identifying the character in each string to return.

Examples

import nvstrings

s = nvstrings.to_device(["hello world","goodbye","well said"])
print(s.get(0))

Output:

['h', 'g', 'w']
hash(devptr=0)

Returns hash values represented by each string.

Parameters:
devptr : GPU memory pointer

Where string hash values will be written. Must be able to hold at least size() of uint32 values.

Examples

import nvstrings
s = nvstrings.to_device(["hello","world"])
s.hash()

Output:

[99162322, 113318802]
index(sub, start=0, end=None, devptr=0)

Same as find but throws an error if arg is not found in all strings.

Parameters:
sub : str

String to find

start : int

Beginning of section to replace. Default is beginning of each string.

end : int

End of section to replace. Default is end of each string.

devptr : GPU memory pointer

Optional device memory pointer to hold the results. Memory size must be able to hold at least size() of int32 values.

Examples

import nvstrings
s = nvstrings.to_device(["hello","world"])

print(s.index('l'))

Output:

[2,3]
isalnum(devptr=0)

Return array of boolean values with True for strings that contain only alpha-numeric characters. Equivalent to: isalpha() or isdigit() or isnumeric() or isdecimal()

Examples

import nvstrings

s = nvstrings.to_device(['1234', 'de', '1.75', '-34', '+9.8', ' '])
print(s.isalnum())

Output:

[True, True, False, False, False, False]
isalpha(devptr=0)

Return array of boolean values with True for strings that contain only alphabetic characters.

Examples

import nvstrings

s = nvstrings.to_device(['1234', 'de', '1.75', '-34', '+9.8', ' '])
print(s.isalpha())

Output:

[False, True, False, False, False, False]
isdecimal(devptr=0)

Return array of boolean values with True for strings that contain only decimal characters – those that can be used to extract base10 numbers.

Examples

import nvstrings

s = nvstrings.to_device(['1234', 'de', '1.75', '-34', '+9.8', ' '])
print(s.isdecimal())

Output:

[True, False, False, False, False, False]
isdigit(devptr=0)

Return array of boolean values with True for strings that contain only decimal and digit characters.

Examples

import nvstrings

s = nvstrings.to_device(['1234', 'de', '1.75', '-34', '+9.8', ' '])
print(s.isdigit())

Output:

[True, False, False, False, False, False]
islower(devptr=0)

Return array of boolean values with True for strings that contain only lowercase characters.

Examples

import nvstrings

s = nvstrings.to_device(['hello', 'Goodbye'])
print(s.islower())

Output:

[True, False]
isnumeric(devptr=0)

Return array of boolean values with True for strings that contain only numeric characters. These include digit and numeric characters.

Examples

import nvstrings

s = nvstrings.to_device(['1234', 'de', '1.75', '-34', '+9.8', ' '])
print(s.isnumeric())

Output:

[True, False, False, False, False, False]
isspace(devptr=0)

Return array of boolean values with True for strings that contain only whitespace characters.

Examples

import nvstrings

s = nvstrings.to_device(['1234', 'de', '1.75', '-34', '+9.8', ' '])
print(s.isspace())

Output:

[False, False, False, False, False, True]
isupper(devptr=0)

Return array of boolean values with True for strings that contain only uppercase characters.

Examples

import nvstrings

s = nvstrings.to_device(['hello', 'Goodbye'])
print(s.isupper())

Output:

[False, True]
join(sep='')

Concatentate this list of strings into a single string.

Parameters:
sep : str

This separator will be appended to each string before appending the next.

Examples

import nvstrings

s = nvstrings.to_device(["hello","goodbye"])
s.join(sep=':')

Output:

['hello:goodbye']
len(devptr=0)

Returns the number of characters of each string.

Parameters:
devptr : GPU memory pointer

Where string length values will be written. Must be able to hold at least size() of int32 values.

Examples

import nvstrings
import numpy as np
from librmm_cffi import librmm

# example passing device memory pointer
s = nvstrings.to_device(["abc","d","ef"])
arr = np.arange(s.size(),dtype=np.int32)
d_arr = librmm.to_device(arr)
s.len(d_arr.device_ctypes_pointer.value)
print(d_arr.copy_to_host())

Output:

[3,1,2]
ljust(width, fillchar=' ')

Pad the end of each string to the minimum width.

Parameters:
width : int

The minimum width of characters of the new string. If the width is smaller than the existing string, no padding is performed.

fillchar : char

The character used to do the padding. Default is space character. Only the first character is used.

Examples

import nvstrings

s = nvstrings.to_device(["hello","goodbye","well"])
print(s.ljust(width=6))

Output:

['hello ', 'goodbye', 'well  ']
lower()

Convert each string to lowercase. This only applies to ASCII characters at this time.

Examples

import nvstrings

s = nvstrings.to_device(["Hello, Friend","Goodbye, Friend"])
print(s.lower())

Output:

['hello, friend', 'goodbye, friend']
lstrip(to_strip=None)

Strip leading characters from each string.

Parameters:
to_strip : str

Characters to be removed from leading edge of each string

Examples

import nvstrings

s = nvstrings.to_device(["oh","hello","goodbye"])
print(s.lstrip('o'))

Output:

['h', 'hello', 'goodbye']
match(pat, devptr=0)

Return array of boolean values where True is set if the specified pattern matches the beginning of the corresponding string.

Parameters:
pat : str

Pattern to find

devptr : GPU memory pointer

Optional device memory pointer to hold the results. Memory size must be able to hold at least size() of np.byte values.

Examples

import nvstrings
s = nvstrings.to_device(["hello","there","world"])

print(s.match('h'))

Output:

[True, False, True]
order(stype, asc=True, devptr=0)

Sort this list by name (2) or length (1) or both (3). This sort only provides the new indexes and does not reorder the managed strings.

Parameters:
stype : int

Type of sort to use.

If stype is 1, strings will be sorted by length

If stype is 2, strings will be sorted alphabetically by name

If stype is 3, strings will be sorted by length and then alphabetically

asc : bool

Whether to sort ascending (True) or descending (False)

devptr : GPU memory pointer

Where index values will be written. Must be able to hold at least size() of int32 values.

Examples

import nvstrings

s = nvstrings.to_device(["aaa", "bb", "aaaabb"])
print(s.order(2))

Output:

[1, 0, 2]
pad(width, side='left', fillchar=' ')

Add specified padding to each string. Side:{‘left’,’right’,’both’}, default is ‘left’.

Parameters:
fillchar : char

The character used to do the padding. Default is space character. Only the first character is used.

side : str

Either one of “left”, “right”, “both”. The default is “left”

“left” performs a padding on the left – same as rjust()

“right” performs a padding on the right – same as ljust()

“both” performs equal padding on left and right – same as center()

Examples

import nvstrings

s = nvstrings.to_device(["hello","goodbye","well"])
print(s.pad(' ', side='left'))

Output:

[" hello"," goodbye"," well"]
partition(delimiter=' ')

Each string is split into two strings on the first delimiter found.

Three strings are returned for each string: beginning, delimiter, end.

Parameters:
delimiter : str

The character used to locate the split points of each string. Default is space.

Examples

import nvstrings

strs = nvstrings.to_device(["hello world","goodbye","up in arms"])
for s in strs.partition(' '):
  print(s)

Output:

['hello', ' ', 'world']
['goodbye', '', '']
['up', ' ', 'in arms']
remove_strings(indexes, count=0)

Remove the specified strings and return a new instance.

Parameters:
indexes : List of ints

0-based indexes of strings to remove from an nvstrings object If this parameter is pointer to device memory, count parm is required.

count : int

Number of ints if indexes parm is a device pointer. Otherwise it is ignored.

Examples

import nvstrings
s = nvstrings.to_device(["hello","there","world"])

print(s.remove_strings([0, 2]))

Output:

['there']
repeat(repeats)

Appends each string with itself the specified number of times. This returns a nvstrings instance with the new strings.

Parameters:
repeats : int

The number of times each string should be repeated. Repeat count of 0 or 1 will just return copy of each string.

Examples

import nvstrings

s = nvstrings.to_device(["hello","goodbye","well"])
print(s.repeat(2))

Output:

['hellohello', 'goodbyegoodbye', 'wellwell']
replace(pat, repl, n=-1, regex=True)

Replace a string (pat) in each string with another string (repl).

Parameters:
pat : str

String to be replaced. This can also be a regex expression – not a compiled regex.

repl : str

String to replace strng with

Examples

import nvstrings

s = nvstrings.to_device(["hello","goodbye"])
print(s.replace('e', ''))

Output:

['hllo', 'goodby']
rfind(sub, start=0, end=None, devptr=0)

Find the specified string within each string. Search from the end of the string.

Return -1 for those strings where sub is not found.

Parameters:
sub : str

String to find

start : int

Beginning of section to replace. Default is beginning of each string.

end : int

End of section to replace. Default is end of each string.

devptr : GPU memory pointer

Optional device memory pointer to hold the results.

Examples

import nvstrings
s = nvstrings.to_device(["hello","there","world"])

print(s.rfind('o'))

Output:

[4, -1, 1]
rindex(sub, start=0, end=None, devptr=0)

Same as rfind but throws an error if arg is not found in all strings.

Parameters:
sub : str

String to find

start : int

Beginning of section to replace. Default is beginning of each string.

end : int

End of section to replace. Default is end of each string.

devptr : GPU memory pointer

Optional device memory pointer to hold the results. Memory size must be able to hold at least size() of int32 values.

Examples

import nvstrings
s = nvstrings.to_device(["hello","world"])

print(s.rindex('l'))

Output:

[3,3]
rjust(width, fillchar=' ')

Pad the beginning of each string to the minimum width.

Parameters:
width : int

The minimum width of characters of the new string. If the width is smaller than the existing string, no padding is performed.

fillchar : char

The character used to do the padding. Default is space character. Only the first character is used.

Examples

import nvstrings

s = nvstrings.to_device(["hello","goodbye","well"])
print(s.ljust(width=6))

Output:

[' hello', 'goodbye', '  well']
rpartition(delimiter=' ')

Each string is split into two strings on the first delimiter found. Delimiter is searched for from the end.

Three strings are returned for each string: beginning, delimiter, end.

Parameters:
delimiter : str

The character used to locate the split points of each string. Default is space.

Examples

import nvstrings

strs = nvstrings.to_device(["hello world","goodbye","up in arms"])
for s in strs.rpartition(' '):
  print(s)

Output:

['hello', ' ', 'world']
['', '', 'goodbye']
['up in', ' ', 'arms']
rsplit(delimiter=None, n=-1)

Returns an array of nvstrings each representing the split of each individual string. The delimiter is searched for from the end of each string.

Parameters:
delimiter : str

The character used to locate the split points of each string. Default is space.

n : int

Maximum number of strings to return for each split.

Examples

import nvstrings

strs = nvstrings.to_device(["hello world","goodbye","up in arms"])
for s in strs.rsplit(' ',2):
  print(s)

Output:

['hello', 'world']
['goodbye']
['up in', 'arms']
rsplit_column(delimiter=' ', n=-1)

A new set of columns (nvstrings) is created by splitting the strings vertically. Delimiter is searched from the end.

Parameters:
delimiter : str

The character used to locate the split points of each string. Default is space.

Examples

import nvstrings

s = nvstrings.to_device(["hello world","goodbye","well said"])
for result in s.rsplit_column(' '):
  print(result)

Output:

["hello","goodbye","well"]
["world",None,"said"]
rstrip(to_strip=None)

Strip trailing characters from each string.

Parameters:
to_strip : str

Characters to be removed from trailing edge of each string

Examples

import nvstrings

s = nvstrings.to_device(["oh","hello","goodbye"])
print(s.rstrip('o'))

Output:

['oh', 'hell', 'goodbye']
size()

The number of strings managed by this instance.

Returns:
int: number of strings

Examples

import nvstrings
s = nvstrings.to_device(["hello","world"])
print(s.size())

Output:

2
slice(start, stop=None, step=None)

Returns a substring of each string.

Parameters:
start : int

Beginning position of the string to extract. Default is beginning of the each string.

stop : int

Ending position of the string to extract. Default is end of each string.

step : str

Characters that are to be captured within the specified section. Default is every character.

Examples

import nvstrings

s = nvstrings.to_device(["hello","goodbye"])
print(s.slice(2,5))

Output:

['llo', 'odb']
slice_from(starts=0, stops=0)

Return substring of each string using positions for each string.

The starts and stops parameters are device memory pointers. If specified, each must contain size() of int32 values.

Parameters:
starts : GPU memory pointer

Beginning position of each the string to extract. Default is beginning of the each string.

stops : GPU memory pointer

Ending position of the each string to extract. Default is end of each string. Use -1 to specify to the end of that string.

Examples

import nvstrings
import numpy as np
from numba import cuda

s = nvstrings.to_device(["hello","there"])
darr = cuda.to_device(np.asarray([2,3],dtype=np.int32))
print(s.slice_from(starts=darr.device_ctypes_pointer.value))

Output:

['llo','re']
slice_replace(start=None, stop=None, repl=None)

Replace the specified section of each string with a new string.

Parameters:
start : int

Beginning position of the string to replace. Default is beginning of the each string.

stop : int

Ending position of the string to replace. Default is end of each string.

repl : str

String to insert into the specified position values.

Examples


import nvstrings

strs = nvstrings.to_device([“abcdefghij”,”0123456789”]) print(strs.slice_replace(2,5,’z’))

Output:


[‘abzfghij’, ‘01z56789’]

sort(stype, asc=True)

Sort this list by name (2) or length (1) or both (3). Sorting can help improve performance for other operations.

Parameters:
stype : int

Type of sort to use.

If stype is 1, strings will be sorted by length

If stype is 2, strings will be sorted alphabetically by name

If stype is 3, strings will be sorted by length and then alphabetically

asc : bool

Whether to sort ascending (True) or descending (False)

Examples

import nvstrings

s = nvstrings.to_device(["aaa", "bb", "aaaabb"])
print(s.sort(3))

Output:

['bb', 'aaa', 'aaaabb']
split(delimiter=None, n=-1)

Returns an array of nvstrings each representing the split of each individual string.

Parameters:
delimiter : str

The character used to locate the split points of each string. Default is space.

n : int

Maximum number of strings to return for each split.

Examples

import nvstrings

s = nvstrings.to_device(["hello world","goodbye","well said"])
for result in s.split(' '):
  print(result)

Output:

["hello","world"]
["goodbye"]
["well","said"]
split_column(delimiter=' ', n=-1)

A new set of columns (nvstrings) is created by splitting the strings vertically.

Parameters:
delimiter : str

The character used to locate the split points of each string. Default is space.

Examples

import nvstrings

s = nvstrings.to_device(["hello world","goodbye","well said"])
for result in s.split_column(' '):
  print(result)

Output:

["hello","goodbye","well"]
["world",None,"said"]
startswith(pat, devptr=0)

Return array of boolean values with True for the strings where the specified string is at the beginning.

Parameters:
pat : str

Pattern to find. Regular expressions are not accepted.

devptr : GPU memory pointer

Optional device memory pointer to hold the results. Memory must be able to hold at least size() of np.byte values.

Examples

import nvstrings
s = nvstrings.to_device(["hello","there","world"])

print(s.startswith('h'))

Output:

[True, False, False]
stof(devptr=0)

Returns float values represented by each string.

Parameters:
devptr : GPU memory pointer

Where resulting float values will be written. Memory must be able to hold at least size() of float32 values

Examples

import nvstrings
import numpy as np
from librmm_cffi import librmm
s = nvstrings.to_device(["1234","-876","543.2","-0.12",".55"])
print(s.stof())

Output:

[1234.0, -876.0, 543.2000122070312,
 -0.11999999731779099, 0.550000011920929]
stoi(devptr=0)

Returns integer value represented by each string.

Parameters:
devptr : GPU memory pointer

Where resulting integer values will be written. Memory must be able to hold at least size() of int32 values.

Examples

import nvstrings
import numpy as np
s = nvstrings.to_device(["1234","-876","543.2","-0.12",".55""])
print(s.stoi())

Output:

[1234, -876, 543, 0, 0]
strip(to_strip=None)

Strip leading and trailing characters from each string.

Parameters:
to_strip : str

Characters to be removed from both ends of each string

Examples

import nvstrings

s = nvstrings.to_device(["oh, hello","goodbye"])
print(s.strip('o'))

Output:

['h, hell', 'goodbye']
sublist(indexes, count=0)

Return a sublist of strings from this instance.

Parameters:
indexes : List of ints or GPU memory pointer

0-based indexes of strings to return from an nvstrings object

count : int

Number of ints if indexes parm is a device pointer. Otherwise it is ignored.

Examples

import nvstrings
s = nvstrings.to_device(["hello","there","world"])

print(s.sublist([0, 2]))

Output:

['hello', 'world']
swapcase()

Change each lowercase character to uppercase and vice versa. This only applies to ASCII characters at this time.

Examples

import nvstrings

s = nvstrings.to_device(["Hello, Friend","Goodbye, Friend"])
print(s.lower())

Output:

['hELLO, fRIEND', 'gOODBYE, fRIEND']
title()

Uppercase the first letter of each letter after a space and lowercase the rest. This only applies to ASCII characters at this time.

Examples

import nvstrings

s = nvstrings.to_device(["Hello friend","goodnight moon"])
print(s.title())

Output:

['Hello Friend', 'Goodnight Moon']
to_host()

Copies strings back to CPU memory into a Python array.

Returns:
A list of strings

Examples

import nvstrings
s = nvstrings.to_device(["hello","world"])

h = s.upper().to_host()
print(h)

Output:

["HELLO","WORLD"]
translate(table)

Translate individual characters to new characters using the provided table.

Parameters:
pat : dict

Use str.maketrans() to build the mapping table. Unspecified characters are unchanged.

Examples

import nvstrings

s = nvstrings.to_device(["hello","there","world"])
print(s.translate(str.maketrans('elh','ELH')))

Output:

['HELLo', 'tHErE', 'worLd]
upper()

Convert each string to uppercase. This only applies to ASCII characters at this time.

Examples

import nvstrings

s = nvstrings.to_device(["Hello, friend","Goodbye, friend"])
print(s.lower())

Output:

['HELLO, FRIEND', 'GOODBYE, FRIEND']
wrap(width)

This will place new-line characters in whitespace so each line is no more than width characters. Lines will not be truncated.

Parameters:
width : int

The maximum width of characters per newline in the new string. If the width is smaller than the existing string, no newlines will be inserted.

Examples

import nvstrings

s = nvstrings.to_device(["hello there","goodbye all","well ok"])
print(s.wrap(3))

Output:

['hello\nthere', 'goodbye\nall', 'well\nok']
zfill(width)

Pads the strings with leading zeros. It will handle prefix sign characters correctly for strings containing leading number characters.

Parameters:
width : int

The minimum width of characters of the new string. If the width is smaller than the existing string, no padding is performed.

Examples

import nvstrings

s = nvstrings.to_device(["hello","1234","-9876","+5.34"])
print(s.zfill(width=6))

Output:

['0hello', '001234', '-09876', '+05.34']

nvcategory

class nvcategory.nvcategory(cptr)

Instance manages a dictionary of strings (keys) in device memory and a mapping of indexes (values).

Methods

add_strings(nvs) Create new category incorporating specified strings.
gather_strings(indexes[, count]) Return nvstrings instance represented using the specified indexes.
indexes_for_key(key[, devptr]) Return all index values for given key.
keys() Return the unique strings for this category as nvstrings instance.
keys_size() The number of keys.
remove_strings(nvs) Create new category without the specified strings.
size() The number of values.
to_strings() Return nvstrings instance represented by the values in this instance.
value(str) Return the category value for the given string.
value_for_index(idx) Return the category value for the given index.
values([devptr]) Return all values for this instance.
add_strings(nvs)

Create new category incorporating specified strings. This will return a new nvcategory with new key values. The index values will appear as if appended.

Parameters:
nvs : nvstrings

New strings to be added.

Examples

import nvcategory, nvstrings
s1 = nvstrings.to_device(["eee","aaa","eee","dddd"])
s2 = nvstrings.to_device(["ggg","eee","aaa"])
c1 = nvcategory.from_strings(s1)
c2 = c1.add_strings(s2)
print(c1.keys())
print(c1.values())
print(c2.keys())
print(c2.values())

Output:

gather_strings(indexes, count=0)

Return nvstrings instance represented using the specified indexes.

Parameters:
indexes : List of ints or GPU memory pointer

0-based indexes of keys to return as an nvstrings object

count : int

Number of ints if indexes parm is a device pointer. Otherwise it is ignored.

Returns:
nvstrings: strings list based on indexes

Examples

import nvcategory
c = nvcategory.to_device(["eee","aaa","eee","dddd"])
print(c.keys())
print(c.values())
print(c.gather_strings([0,2,0]))

Output:

['aaa','dddd','eee']
[2, 0, 2, 1]
['aaa','eee','aaa']
indexes_for_key(key, devptr=0)

Return all index values for given key.

Parameters:
key : str

key whose values should be returned

devptr : GPU memory pointer

Where index values will be written. Must be able to hold int32 values for this key.

Examples

import nvcategory
c = nvcategory.to_device(["eee","aaa","eee","dddd"])
print(c.indexes_for_key('aaa'))
print(c.indexes_for_key('eee'))

Output:

[1]
[0, 2]
keys()

Return the unique strings for this category as nvstrings instance.

Returns:
nvstrings: keys

Examples

import nvcategory
c = nvcategory.to_device(["eee","aaa","eee","dddd"])
print(c.keys())

Output:

['aaa','dddd','eee']
keys_size()

The number of keys.

Returns:
int: number of keys

Examples

import nvcategory
c = nvcategory.to_device(["eee","aaa","eee","dddd"])
print(c.keys())
print(c.keys_size())

Output:

['aaa','dddd','eee']
3
remove_strings(nvs)

Create new category without the specified strings. The returned category will have new set of key values and indexes.

Parameters:
nvs : nvstrings

strings to be removed.

Examples

import nvcategory, nvstrings
s1 = nvstrings.to_device(["eee","aaa","eee","dddd"])
s2 = nvstrings.to_device(["aaa"])
c1 = nvcategory.from_strings(s1)
c2 = c1.remove_strings(s2)
print(c1.keys())
print(c1.values())
print(c2.keys())
print(c2.values())

Output:

size()

The number of values.

Returns:
int: number of values

Examples

import nvcategory
c = nvcategory.to_device(["eee","aaa","eee","dddd"])
print(c.values())
print(c.size())

Output:

[2, 0, 2, 1]
4
to_strings()

Return nvstrings instance represented by the values in this instance.

Returns:
nvstrings: full strings list based on values indexes

Examples

import nvcategory
c = nvcategory.to_device(["eee","aaa","eee","dddd"])
print(c.keys())
print(c.values())
print(c.to_strings())

Output:

['aaa','dddd','eee']
[2, 0, 2, 1]
['eee','aaa','eee','dddd']
value(str)

Return the category value for the given string.

Parameters:
str : str

key to retrieve

Examples

import nvcategory
c = nvcategory.to_device(["eee","aaa","eee","dddd"])
print(c.value('aaa'))
print(c.value('eee'))

Output:

0
2
value_for_index(idx)

Return the category value for the given index.

Parameters:
idx : int

index value to retrieve

Examples

import nvcategory
c = nvcategory.to_device(["eee","aaa","eee","dddd"])
print(c.value_for_index(3))

Output:

1
values(devptr=0)

Return all values for this instance.

Parameters:
devptr : GPU memory pointer

Where index values will be written. Must be able to hold size() of int32 values.

Examples

import nvcategory
c = nvcategory.to_device(["eee","aaa","eee","dddd"])
print(c.values())

Output:

[2, 0, 2, 1]