nvstrings API Reference¶
nvstrings¶
-
class
nvstrings.
nvstrings
(cptr)¶ Instance manages a list of strings in device memory.
Operations are across all of the strings and their results reside in device memory. Strings in the list are immutable. Methods that modify any string will create a new nvstrings instance.
Methods
capitalize
()Capitalize first character of each string. cat
([others, sep, na_rep])Appends the given strings to this list of strings and returns as new nvstrings. center
(width[, fillchar])Pad the beginning and end of each string to the minimum width. compare
(str[, devptr])Compare each string to the supplied string. contains
(pat[, regex, devptr])Find the specified string within each string. count
(pat[, devptr])Count occurrences of pattern in each string. endswith
(pat[, devptr])Return array of boolean values with True for the strings where the specified string is at the end. extract
(pat)Extract string from the first match of regular expression pat. extract_column
(pat)Extract string from the first match of regular expression pat. find
(sub[, start, end, devptr])Find the specified string sub within each string. find_from
(sub[, starts, ends, devptr])Find the specified string within each string starting at the specified character positions. find_multiple
(strs[, devptr])Return a ‘matrix’ of find results for each of the string in the strs parameter. findall
(pat)Find all occurrences of regular expression pattern in each string. findall_column
(pat)A new set of nvstrings is created by organizing substring results vertically. get
(i)Returns the character specified in each string as a new string. hash
([devptr])Returns hash values represented by each string. index
(sub[, start, end, devptr])Same as find but throws an error if arg is not found in all strings. isalnum
([devptr])Return array of boolean values with True for strings that contain only alpha-numeric characters. isalpha
([devptr])Return array of boolean values with True for strings that contain only alphabetic characters. isdecimal
([devptr])Return array of boolean values with True for strings that contain only decimal characters – those that can be used to extract base10 numbers. isdigit
([devptr])Return array of boolean values with True for strings that contain only decimal and digit characters. islower
([devptr])Return array of boolean values with True for strings that contain only lowercase characters. isnumeric
([devptr])Return array of boolean values with True for strings that contain only numeric characters. isspace
([devptr])Return array of boolean values with True for strings that contain only whitespace characters. isupper
([devptr])Return array of boolean values with True for strings that contain only uppercase characters. join
([sep])Concatentate this list of strings into a single string. len
([devptr])Returns the number of characters of each string. ljust
(width[, fillchar])Pad the end of each string to the minimum width. lower
()Convert each string to lowercase. lstrip
([to_strip])Strip leading characters from each string. match
(pat[, devptr])Return array of boolean values where True is set if the specified pattern matches the beginning of the corresponding string. order
(stype[, asc, devptr])Sort this list by name (2) or length (1) or both (3). pad
(width[, side, fillchar])Add specified padding to each string. partition
([delimiter])Each string is split into two strings on the first delimiter found. remove_strings
(indexes[, count])Remove the specified strings and return a new instance. repeat
(repeats)Appends each string with itself the specified number of times. replace
(pat, repl[, n, regex])Replace a string (pat) in each string with another string (repl). rfind
(sub[, start, end, devptr])Find the specified string within each string. rindex
(sub[, start, end, devptr])Same as rfind but throws an error if arg is not found in all strings. rjust
(width[, fillchar])Pad the beginning of each string to the minimum width. rpartition
([delimiter])Each string is split into two strings on the first delimiter found. rsplit
([delimiter, n])Returns an array of nvstrings each representing the split of each individual string. rsplit_column
([delimiter, n])A new set of columns (nvstrings) is created by splitting the strings vertically. rstrip
([to_strip])Strip trailing characters from each string. size
()The number of strings managed by this instance. slice
(start[, stop, step])Returns a substring of each string. slice_from
([starts, stops])Return substring of each string using positions for each string. slice_replace
([start, stop, repl])Replace the specified section of each string with a new string. sort
(stype[, asc])Sort this list by name (2) or length (1) or both (3). split
([delimiter, n])Returns an array of nvstrings each representing the split of each individual string. split_column
([delimiter, n])A new set of columns (nvstrings) is created by splitting the strings vertically. startswith
(pat[, devptr])Return array of boolean values with True for the strings where the specified string is at the beginning. stof
([devptr])Returns float values represented by each string. stoi
([devptr])Returns integer value represented by each string. strip
([to_strip])Strip leading and trailing characters from each string. sublist
(indexes[, count])Return a sublist of strings from this instance. swapcase
()Change each lowercase character to uppercase and vice versa. title
()Uppercase the first letter of each letter after a space and lowercase the rest. to_host
()Copies strings back to CPU memory into a Python array. translate
(table)Translate individual characters to new characters using the provided table. upper
()Convert each string to uppercase. wrap
(width)This will place new-line characters in whitespace so each line is no more than width characters. zfill
(width)Pads the strings with leading zeros. -
capitalize
()¶ Capitalize first character of each string. This only applies to ASCII characters at this time.
Examples
import nvstrings s = nvstrings.to_device(["hello, friend","goodbye, friend"]) print(s.lower())
Output:
['Hello, friend", "Goodbye, friend"]
-
cat
(others=None, sep=None, na_rep=None)¶ Appends the given strings to this list of strings and returns as new nvstrings.
Parameters: - others : List of str
Strings to be appended. The number of strings must match size() of this instance. This must be either a Python array of strings or another nvstrings instance.
- sep : str
If specified, this separator will be appended to each string before appending the others.
- na_rep : char
This character will take the place of any null strings (not empty strings) in either list.
Examples
import nvstrings s1 = nvstrings.to_device(['hello', None,'goodbye']) s2 = nvstrings.to_device(['world','globe', None]) print(s1.cat(s2,sep=':', na_rep='_'))
Output:
["hello:world","_:globe","goodbye:_"]
-
center
(width, fillchar=' ')¶ Pad the beginning and end of each string to the minimum width.
Parameters: - width : int
The minimum width of characters of the new string. If the width is smaller than the existing string, no padding is performed.
- fillchar : char
The character used to do the padding. Default is space character. Only the first character is used.
Examples
import nvstrings s = nvstrings.to_device(["hello","goodbye","well"]) for result in s.center(width=6): print(result)
Output:
['hello ', 'goodbye', ' well ']
-
compare
(str, devptr=0)¶ Compare each string to the supplied string. Returns value of 0 for strings that match str. Returns < 0 when first different character is lower than argument string or argument string is shorter. Returns > 0 when first different character is greater than the argument string or the argument string is longer.
Parameters: - str : str
String to compare all strings in this instance.
- devptr : GPU memory pointer
Where string result values will be written. Must be able to hold at least size() of int32 values.
Examples
import nvstrings s = nvstrings.to_device(["hello","world"]) print(s.compare('hello'))
Output:
[0,15]
-
contains
(pat, regex=True, devptr=0)¶ Find the specified string within each string.
Default expects regex pattern. Returns an array of boolean values where True if pat is found, False if not.
Parameters: - pat : str
Pattern or string to search for in each string of this instance.
- regex : bool
If True, pat is interpreted as a regex string. If False, pat is a string to be searched for in each instance.
- devptr : GPU memory pointer
Optional device memory pointer to hold the results. Must be able to hold at least size() of np.byte values.
Examples
import nvstrings s = nvstrings.to_device(["hello","there","world"]) print(s.contains('o'))
Output:
[True, False, True]
-
count
(pat, devptr=0)¶ Count occurrences of pattern in each string.
Parameters: - pat : str
Pattern to find
- devptr : GPU memory pointer
Optional device memory pointer to hold the results. Memory must be able to hold at least size() of int32 values.
-
endswith
(pat, devptr=0)¶ Return array of boolean values with True for the strings where the specified string is at the end.
Parameters: - pat : str
Pattern to find. Regular expressions are not accepted.
- devptr : GPU memory pointer
Optional device memory pointer to hold the results. Memory must be able to hold at least size() of np.byte values.
Examples
import nvstrings s = nvstrings.to_device(["hello","there","world"]) print(s.endsswith('d'))
Output:
[False, False, True]
-
extract
(pat)¶ Extract string from the first match of regular expression pat. A new array of nvstrings is created for each string in this instance.
Parameters: - pat : str
The regex pattern with group capture syntax
Examples
import nvstrings s = nvstrings.to_device(["a1","b2","c3"]) for result in s.extract('([ab])(\d)'): print(result)
Output:
["a","1"] ["b","2"] [None,None]
-
extract_column
(pat)¶ Extract string from the first match of regular expression pat. A new array of nvstrings is created by organizing group results vertically.
Parameters: - pat : str
The regex pattern with group capture syntax
Examples
import nvstrings s = nvstrings.to_device(["a1","b2","c3"]) for result in s.extract_column('([ab])(\d)'): print(result)
Output:
["a","b"] ["1","2"] [None,None]
-
find
(sub, start=0, end=None, devptr=0)¶ Find the specified string sub within each string. Return -1 for those strings where sub is not found.
Parameters: - sub : str
String to find
- start : int
Beginning of section to replace. Default is beginning of each string.
- end : int
End of section to replace. Default is end of each string.
- devptr : GPU memory pointer
Optional device memory pointer to hold the results. Memory size must be able to hold at least size() of int32 values.
Examples
import nvstrings s = nvstrings.to_device(["hello","there","world"]) print(s.find('o'))
Output:
[4,-1,1]
-
find_from
(sub, starts=0, ends=0, devptr=0)¶ Find the specified string within each string starting at the specified character positions.
The starts and ends parameters are device memory pointers. If specified, each must contain size() of int32 values.
Returns -1 for those strings where sub is not found.
Parameters: - sub : str
String to find
- starts : GPU memory pointer
Pointer to GPU array of int32 values of beginning of sections to search, one per string.
- ends : GPU memory pointer
Pointer to GPU array of int32 values of end of sections to search. Use -1 to specify to the end of that string.
- devptr : GPU memory pointer
Optional device memory pointer to hold the results. Memory size must be able to hold at least size() of int32 values.
Examples
import nvstrings import numpy as np from numba import cuda s = nvstrings.to_device(["hello","there"]) darr = cuda.to_device(np.asarray([2,3],dtype=np.int32)) print(s.find_from('e',starts=darr.device_ctypes_pointer.value))
Output:
[-1,4]
-
find_multiple
(strs, devptr=0)¶ Return a ‘matrix’ of find results for each of the string in the strs parameter.
Each row is an array of integers identifying the first location of the corresponding provided string.
Parameters: - strs : nvstrings
Strings to find in each of the strings in this instance.
- devptr : GPU memory pointer
Optional device memory pointer to hold the results.
Memory size must be able to hold at least size()*strs.size() of int32 values.
Examples
import nvstrings s = nvstrings.to_device(["hare","bunny","rabbit"]) t = nvstrings.to_device(["a","e","i","o","u"]) print(s.find_multiple(t))
Output:
[[1, 3, -1, -1, -1], [-1, -1, -1, -1, 1], [1, -1, 4, -1, -1]]
-
findall
(pat)¶ Find all occurrences of regular expression pattern in each string. A new array of nvstrings is created for each string in this instance.
Parameters: - pat : str
The regex pattern used to search for substrings
Examples
import nvstrings s = nvstrings.to_device(["hare","bunny","rabbit"]) for result in s.findall('[ab]'): print(result)
Output:
["a"] ["b"] ["a","b","b"]
-
findall_column
(pat)¶ A new set of nvstrings is created by organizing substring results vertically.
Parameters: - pat : str
The regex pattern to search for substrings
Examples
import nvstrings s = nvstrings.to_device(["hare","bunny","rabbit"]) for result in s.findall_column('[ab]'): print(result)
Output:
["a","b","a"] [None,None,"b"] [None,None,"b"]
-
get
(i)¶ Returns the character specified in each string as a new string.
The nvstrings returned contains a list of single character strings.
Parameters: - i : int
The character position identifying the character in each string to return.
Examples
import nvstrings s = nvstrings.to_device(["hello world","goodbye","well said"]) print(s.get(0))
Output:
['h', 'g', 'w']
-
hash
(devptr=0)¶ Returns hash values represented by each string.
Parameters: - devptr : GPU memory pointer
Where string hash values will be written. Must be able to hold at least size() of uint32 values.
Examples
import nvstrings s = nvstrings.to_device(["hello","world"]) s.hash()
Output:
[99162322, 113318802]
-
index
(sub, start=0, end=None, devptr=0)¶ Same as find but throws an error if arg is not found in all strings.
Parameters: - sub : str
String to find
- start : int
Beginning of section to replace. Default is beginning of each string.
- end : int
End of section to replace. Default is end of each string.
- devptr : GPU memory pointer
Optional device memory pointer to hold the results. Memory size must be able to hold at least size() of int32 values.
Examples
import nvstrings s = nvstrings.to_device(["hello","world"]) print(s.index('l'))
Output:
[2,3]
-
isalnum
(devptr=0)¶ Return array of boolean values with True for strings that contain only alpha-numeric characters. Equivalent to: isalpha() or isdigit() or isnumeric() or isdecimal()
Examples
import nvstrings s = nvstrings.to_device(['1234', 'de', '1.75', '-34', '+9.8', ' ']) print(s.isalnum())
Output:
[True, True, False, False, False, False]
-
isalpha
(devptr=0)¶ Return array of boolean values with True for strings that contain only alphabetic characters.
Examples
import nvstrings s = nvstrings.to_device(['1234', 'de', '1.75', '-34', '+9.8', ' ']) print(s.isalpha())
Output:
[False, True, False, False, False, False]
-
isdecimal
(devptr=0)¶ Return array of boolean values with True for strings that contain only decimal characters – those that can be used to extract base10 numbers.
Examples
import nvstrings s = nvstrings.to_device(['1234', 'de', '1.75', '-34', '+9.8', ' ']) print(s.isdecimal())
Output:
[True, False, False, False, False, False]
-
isdigit
(devptr=0)¶ Return array of boolean values with True for strings that contain only decimal and digit characters.
Examples
import nvstrings s = nvstrings.to_device(['1234', 'de', '1.75', '-34', '+9.8', ' ']) print(s.isdigit())
Output:
[True, False, False, False, False, False]
-
islower
(devptr=0)¶ Return array of boolean values with True for strings that contain only lowercase characters.
Examples
import nvstrings s = nvstrings.to_device(['hello', 'Goodbye']) print(s.islower())
Output:
[True, False]
-
isnumeric
(devptr=0)¶ Return array of boolean values with True for strings that contain only numeric characters. These include digit and numeric characters.
Examples
import nvstrings s = nvstrings.to_device(['1234', 'de', '1.75', '-34', '+9.8', ' ']) print(s.isnumeric())
Output:
[True, False, False, False, False, False]
-
isspace
(devptr=0)¶ Return array of boolean values with True for strings that contain only whitespace characters.
Examples
import nvstrings s = nvstrings.to_device(['1234', 'de', '1.75', '-34', '+9.8', ' ']) print(s.isspace())
Output:
[False, False, False, False, False, True]
-
isupper
(devptr=0)¶ Return array of boolean values with True for strings that contain only uppercase characters.
Examples
import nvstrings s = nvstrings.to_device(['hello', 'Goodbye']) print(s.isupper())
Output:
[False, True]
-
join
(sep='')¶ Concatentate this list of strings into a single string.
Parameters: - sep : str
This separator will be appended to each string before appending the next.
Examples
import nvstrings s = nvstrings.to_device(["hello","goodbye"]) s.join(sep=':')
Output:
['hello:goodbye']
-
len
(devptr=0)¶ Returns the number of characters of each string.
Parameters: - devptr : GPU memory pointer
Where string length values will be written. Must be able to hold at least size() of int32 values.
Examples
import nvstrings import numpy as np from librmm_cffi import librmm # example passing device memory pointer s = nvstrings.to_device(["abc","d","ef"]) arr = np.arange(s.size(),dtype=np.int32) d_arr = librmm.to_device(arr) s.len(d_arr.device_ctypes_pointer.value) print(d_arr.copy_to_host())
Output:
[3,1,2]
-
ljust
(width, fillchar=' ')¶ Pad the end of each string to the minimum width.
Parameters: - width : int
The minimum width of characters of the new string. If the width is smaller than the existing string, no padding is performed.
- fillchar : char
The character used to do the padding. Default is space character. Only the first character is used.
Examples
import nvstrings s = nvstrings.to_device(["hello","goodbye","well"]) print(s.ljust(width=6))
Output:
['hello ', 'goodbye', 'well ']
-
lower
()¶ Convert each string to lowercase. This only applies to ASCII characters at this time.
Examples
import nvstrings s = nvstrings.to_device(["Hello, Friend","Goodbye, Friend"]) print(s.lower())
Output:
['hello, friend', 'goodbye, friend']
-
lstrip
(to_strip=None)¶ Strip leading characters from each string.
Parameters: - to_strip : str
Characters to be removed from leading edge of each string
Examples
import nvstrings s = nvstrings.to_device(["oh","hello","goodbye"]) print(s.lstrip('o'))
Output:
['h', 'hello', 'goodbye']
-
match
(pat, devptr=0)¶ Return array of boolean values where True is set if the specified pattern matches the beginning of the corresponding string.
Parameters: - pat : str
Pattern to find
- devptr : GPU memory pointer
Optional device memory pointer to hold the results. Memory size must be able to hold at least size() of np.byte values.
Examples
import nvstrings s = nvstrings.to_device(["hello","there","world"]) print(s.match('h'))
Output:
[True, False, True]
-
order
(stype, asc=True, devptr=0)¶ Sort this list by name (2) or length (1) or both (3). This sort only provides the new indexes and does not reorder the managed strings.
Parameters: - stype : int
Type of sort to use.
If stype is 1, strings will be sorted by length
If stype is 2, strings will be sorted alphabetically by name
If stype is 3, strings will be sorted by length and then alphabetically
- asc : bool
Whether to sort ascending (True) or descending (False)
- devptr : GPU memory pointer
Where index values will be written. Must be able to hold at least size() of int32 values.
Examples
import nvstrings s = nvstrings.to_device(["aaa", "bb", "aaaabb"]) print(s.order(2))
Output:
[1, 0, 2]
-
pad
(width, side='left', fillchar=' ')¶ Add specified padding to each string. Side:{‘left’,’right’,’both’}, default is ‘left’.
Parameters: - fillchar : char
The character used to do the padding. Default is space character. Only the first character is used.
- side : str
Either one of “left”, “right”, “both”. The default is “left”
“left” performs a padding on the left – same as rjust()
“right” performs a padding on the right – same as ljust()
“both” performs equal padding on left and right – same as center()
Examples
import nvstrings s = nvstrings.to_device(["hello","goodbye","well"]) print(s.pad(' ', side='left'))
Output:
[" hello"," goodbye"," well"]
-
partition
(delimiter=' ')¶ Each string is split into two strings on the first delimiter found.
Three strings are returned for each string: beginning, delimiter, end.
Parameters: - delimiter : str
The character used to locate the split points of each string. Default is space.
Examples
import nvstrings strs = nvstrings.to_device(["hello world","goodbye","up in arms"]) for s in strs.partition(' '): print(s)
Output:
['hello', ' ', 'world'] ['goodbye', '', ''] ['up', ' ', 'in arms']
-
remove_strings
(indexes, count=0)¶ Remove the specified strings and return a new instance.
Parameters: - indexes : List of ints
0-based indexes of strings to remove from an nvstrings object If this parameter is pointer to device memory, count parm is required.
- count : int
Number of ints if indexes parm is a device pointer. Otherwise it is ignored.
Examples
import nvstrings s = nvstrings.to_device(["hello","there","world"]) print(s.remove_strings([0, 2]))
Output:
['there']
-
repeat
(repeats)¶ Appends each string with itself the specified number of times. This returns a nvstrings instance with the new strings.
Parameters: - repeats : int
The number of times each string should be repeated. Repeat count of 0 or 1 will just return copy of each string.
Examples
import nvstrings s = nvstrings.to_device(["hello","goodbye","well"]) print(s.repeat(2))
Output:
['hellohello', 'goodbyegoodbye', 'wellwell']
-
replace
(pat, repl, n=-1, regex=True)¶ Replace a string (pat) in each string with another string (repl).
Parameters: - pat : str
String to be replaced. This can also be a regex expression – not a compiled regex.
- repl : str
String to replace strng with
Examples
import nvstrings s = nvstrings.to_device(["hello","goodbye"]) print(s.replace('e', ''))
Output:
['hllo', 'goodby']
-
rfind
(sub, start=0, end=None, devptr=0)¶ Find the specified string within each string. Search from the end of the string.
Return -1 for those strings where sub is not found.
Parameters: - sub : str
String to find
- start : int
Beginning of section to replace. Default is beginning of each string.
- end : int
End of section to replace. Default is end of each string.
- devptr : GPU memory pointer
Optional device memory pointer to hold the results.
Examples
import nvstrings s = nvstrings.to_device(["hello","there","world"]) print(s.rfind('o'))
Output:
[4, -1, 1]
-
rindex
(sub, start=0, end=None, devptr=0)¶ Same as rfind but throws an error if arg is not found in all strings.
Parameters: - sub : str
String to find
- start : int
Beginning of section to replace. Default is beginning of each string.
- end : int
End of section to replace. Default is end of each string.
- devptr : GPU memory pointer
Optional device memory pointer to hold the results. Memory size must be able to hold at least size() of int32 values.
Examples
import nvstrings s = nvstrings.to_device(["hello","world"]) print(s.rindex('l'))
Output:
[3,3]
-
rjust
(width, fillchar=' ')¶ Pad the beginning of each string to the minimum width.
Parameters: - width : int
The minimum width of characters of the new string. If the width is smaller than the existing string, no padding is performed.
- fillchar : char
The character used to do the padding. Default is space character. Only the first character is used.
Examples
import nvstrings s = nvstrings.to_device(["hello","goodbye","well"]) print(s.ljust(width=6))
Output:
[' hello', 'goodbye', ' well']
-
rpartition
(delimiter=' ')¶ Each string is split into two strings on the first delimiter found. Delimiter is searched for from the end.
Three strings are returned for each string: beginning, delimiter, end.
Parameters: - delimiter : str
The character used to locate the split points of each string. Default is space.
Examples
import nvstrings strs = nvstrings.to_device(["hello world","goodbye","up in arms"]) for s in strs.rpartition(' '): print(s)
Output:
['hello', ' ', 'world'] ['', '', 'goodbye'] ['up in', ' ', 'arms']
-
rsplit
(delimiter=None, n=-1)¶ Returns an array of nvstrings each representing the split of each individual string. The delimiter is searched for from the end of each string.
Parameters: - delimiter : str
The character used to locate the split points of each string. Default is space.
- n : int
Maximum number of strings to return for each split.
Examples
import nvstrings strs = nvstrings.to_device(["hello world","goodbye","up in arms"]) for s in strs.rsplit(' ',2): print(s)
Output:
['hello', 'world'] ['goodbye'] ['up in', 'arms']
-
rsplit_column
(delimiter=' ', n=-1)¶ A new set of columns (nvstrings) is created by splitting the strings vertically. Delimiter is searched from the end.
Parameters: - delimiter : str
The character used to locate the split points of each string. Default is space.
Examples
import nvstrings s = nvstrings.to_device(["hello world","goodbye","well said"]) for result in s.rsplit_column(' '): print(result)
Output:
["hello","goodbye","well"] ["world",None,"said"]
-
rstrip
(to_strip=None)¶ Strip trailing characters from each string.
Parameters: - to_strip : str
Characters to be removed from trailing edge of each string
Examples
import nvstrings s = nvstrings.to_device(["oh","hello","goodbye"]) print(s.rstrip('o'))
Output:
['oh', 'hell', 'goodbye']
-
size
()¶ The number of strings managed by this instance.
Returns: - int: number of strings
Examples
import nvstrings s = nvstrings.to_device(["hello","world"]) print(s.size())
Output:
2
-
slice
(start, stop=None, step=None)¶ Returns a substring of each string.
Parameters: - start : int
Beginning position of the string to extract. Default is beginning of the each string.
- stop : int
Ending position of the string to extract. Default is end of each string.
- step : str
Characters that are to be captured within the specified section. Default is every character.
Examples
import nvstrings s = nvstrings.to_device(["hello","goodbye"]) print(s.slice(2,5))
Output:
['llo', 'odb']
-
slice_from
(starts=0, stops=0)¶ Return substring of each string using positions for each string.
The starts and stops parameters are device memory pointers. If specified, each must contain size() of int32 values.
Parameters: - starts : GPU memory pointer
Beginning position of each the string to extract. Default is beginning of the each string.
- stops : GPU memory pointer
Ending position of the each string to extract. Default is end of each string. Use -1 to specify to the end of that string.
Examples
import nvstrings import numpy as np from numba import cuda s = nvstrings.to_device(["hello","there"]) darr = cuda.to_device(np.asarray([2,3],dtype=np.int32)) print(s.slice_from(starts=darr.device_ctypes_pointer.value))
Output:
['llo','re']
-
slice_replace
(start=None, stop=None, repl=None)¶ Replace the specified section of each string with a new string.
Parameters: - start : int
Beginning position of the string to replace. Default is beginning of the each string.
- stop : int
Ending position of the string to replace. Default is end of each string.
- repl : str
String to insert into the specified position values.
Examples
import nvstrings
strs = nvstrings.to_device([“abcdefghij”,”0123456789”]) print(strs.slice_replace(2,5,’z’))
Output:
[‘abzfghij’, ‘01z56789’]
-
sort
(stype, asc=True)¶ Sort this list by name (2) or length (1) or both (3). Sorting can help improve performance for other operations.
Parameters: - stype : int
Type of sort to use.
If stype is 1, strings will be sorted by length
If stype is 2, strings will be sorted alphabetically by name
If stype is 3, strings will be sorted by length and then alphabetically
- asc : bool
Whether to sort ascending (True) or descending (False)
Examples
import nvstrings s = nvstrings.to_device(["aaa", "bb", "aaaabb"]) print(s.sort(3))
Output:
['bb', 'aaa', 'aaaabb']
-
split
(delimiter=None, n=-1)¶ Returns an array of nvstrings each representing the split of each individual string.
Parameters: - delimiter : str
The character used to locate the split points of each string. Default is space.
- n : int
Maximum number of strings to return for each split.
Examples
import nvstrings s = nvstrings.to_device(["hello world","goodbye","well said"]) for result in s.split(' '): print(result)
Output:
["hello","world"] ["goodbye"] ["well","said"]
-
split_column
(delimiter=' ', n=-1)¶ A new set of columns (nvstrings) is created by splitting the strings vertically.
Parameters: - delimiter : str
The character used to locate the split points of each string. Default is space.
Examples
import nvstrings s = nvstrings.to_device(["hello world","goodbye","well said"]) for result in s.split_column(' '): print(result)
Output:
["hello","goodbye","well"] ["world",None,"said"]
-
startswith
(pat, devptr=0)¶ Return array of boolean values with True for the strings where the specified string is at the beginning.
Parameters: - pat : str
Pattern to find. Regular expressions are not accepted.
- devptr : GPU memory pointer
Optional device memory pointer to hold the results. Memory must be able to hold at least size() of np.byte values.
Examples
import nvstrings s = nvstrings.to_device(["hello","there","world"]) print(s.startswith('h'))
Output:
[True, False, False]
-
stof
(devptr=0)¶ Returns float values represented by each string.
Parameters: - devptr : GPU memory pointer
Where resulting float values will be written. Memory must be able to hold at least size() of float32 values
Examples
import nvstrings import numpy as np from librmm_cffi import librmm s = nvstrings.to_device(["1234","-876","543.2","-0.12",".55"]) print(s.stof())
Output:
[1234.0, -876.0, 543.2000122070312, -0.11999999731779099, 0.550000011920929]
-
stoi
(devptr=0)¶ Returns integer value represented by each string.
Parameters: - devptr : GPU memory pointer
Where resulting integer values will be written. Memory must be able to hold at least size() of int32 values.
Examples
import nvstrings import numpy as np s = nvstrings.to_device(["1234","-876","543.2","-0.12",".55""]) print(s.stoi())
Output:
[1234, -876, 543, 0, 0]
-
strip
(to_strip=None)¶ Strip leading and trailing characters from each string.
Parameters: - to_strip : str
Characters to be removed from both ends of each string
Examples
import nvstrings s = nvstrings.to_device(["oh, hello","goodbye"]) print(s.strip('o'))
Output:
['h, hell', 'goodbye']
-
sublist
(indexes, count=0)¶ Return a sublist of strings from this instance.
Parameters: - indexes : List of ints or GPU memory pointer
0-based indexes of strings to return from an nvstrings object
- count : int
Number of ints if indexes parm is a device pointer. Otherwise it is ignored.
Examples
import nvstrings s = nvstrings.to_device(["hello","there","world"]) print(s.sublist([0, 2]))
Output:
['hello', 'world']
-
swapcase
()¶ Change each lowercase character to uppercase and vice versa. This only applies to ASCII characters at this time.
Examples
import nvstrings s = nvstrings.to_device(["Hello, Friend","Goodbye, Friend"]) print(s.lower())
Output:
['hELLO, fRIEND', 'gOODBYE, fRIEND']
-
title
()¶ Uppercase the first letter of each letter after a space and lowercase the rest. This only applies to ASCII characters at this time.
Examples
import nvstrings s = nvstrings.to_device(["Hello friend","goodnight moon"]) print(s.title())
Output:
['Hello Friend', 'Goodnight Moon']
-
to_host
()¶ Copies strings back to CPU memory into a Python array.
Returns: - A list of strings
Examples
import nvstrings s = nvstrings.to_device(["hello","world"]) h = s.upper().to_host() print(h)
Output:
["HELLO","WORLD"]
-
translate
(table)¶ Translate individual characters to new characters using the provided table.
Parameters: - pat : dict
Use str.maketrans() to build the mapping table. Unspecified characters are unchanged.
Examples
import nvstrings s = nvstrings.to_device(["hello","there","world"]) print(s.translate(str.maketrans('elh','ELH')))
Output:
['HELLo', 'tHErE', 'worLd]
-
upper
()¶ Convert each string to uppercase. This only applies to ASCII characters at this time.
Examples
import nvstrings s = nvstrings.to_device(["Hello, friend","Goodbye, friend"]) print(s.lower())
Output:
['HELLO, FRIEND', 'GOODBYE, FRIEND']
-
wrap
(width)¶ This will place new-line characters in whitespace so each line is no more than width characters. Lines will not be truncated.
Parameters: - width : int
The maximum width of characters per newline in the new string. If the width is smaller than the existing string, no newlines will be inserted.
Examples
import nvstrings s = nvstrings.to_device(["hello there","goodbye all","well ok"]) print(s.wrap(3))
Output:
['hello\nthere', 'goodbye\nall', 'well\nok']
-
zfill
(width)¶ Pads the strings with leading zeros. It will handle prefix sign characters correctly for strings containing leading number characters.
Parameters: - width : int
The minimum width of characters of the new string. If the width is smaller than the existing string, no padding is performed.
Examples
import nvstrings s = nvstrings.to_device(["hello","1234","-9876","+5.34"]) print(s.zfill(width=6))
Output:
['0hello', '001234', '-09876', '+05.34']
-
nvcategory¶
-
class
nvcategory.
nvcategory
(cptr)¶ Instance manages a dictionary of strings (keys) in device memory and a mapping of indexes (values).
Methods
add_strings
(nvs)Create new category incorporating specified strings. gather_strings
(indexes[, count])Return nvstrings instance represented using the specified indexes. indexes_for_key
(key[, devptr])Return all index values for given key. keys
()Return the unique strings for this category as nvstrings instance. keys_size
()The number of keys. remove_strings
(nvs)Create new category without the specified strings. size
()The number of values. to_strings
()Return nvstrings instance represented by the values in this instance. value
(str)Return the category value for the given string. value_for_index
(idx)Return the category value for the given index. values
([devptr])Return all values for this instance. -
add_strings
(nvs)¶ Create new category incorporating specified strings. This will return a new nvcategory with new key values. The index values will appear as if appended.
Parameters: - nvs : nvstrings
New strings to be added.
Examples
import nvcategory, nvstrings s1 = nvstrings.to_device(["eee","aaa","eee","dddd"]) s2 = nvstrings.to_device(["ggg","eee","aaa"]) c1 = nvcategory.from_strings(s1) c2 = c1.add_strings(s2) print(c1.keys()) print(c1.values()) print(c2.keys()) print(c2.values())
Output:
-
gather_strings
(indexes, count=0)¶ Return nvstrings instance represented using the specified indexes.
Parameters: - indexes : List of ints or GPU memory pointer
0-based indexes of keys to return as an nvstrings object
- count : int
Number of ints if indexes parm is a device pointer. Otherwise it is ignored.
Returns: - nvstrings: strings list based on indexes
Examples
import nvcategory c = nvcategory.to_device(["eee","aaa","eee","dddd"]) print(c.keys()) print(c.values()) print(c.gather_strings([0,2,0]))
Output:
['aaa','dddd','eee'] [2, 0, 2, 1] ['aaa','eee','aaa']
-
indexes_for_key
(key, devptr=0)¶ Return all index values for given key.
Parameters: - key : str
key whose values should be returned
- devptr : GPU memory pointer
Where index values will be written. Must be able to hold int32 values for this key.
Examples
import nvcategory c = nvcategory.to_device(["eee","aaa","eee","dddd"]) print(c.indexes_for_key('aaa')) print(c.indexes_for_key('eee'))
Output:
[1] [0, 2]
-
keys
()¶ Return the unique strings for this category as nvstrings instance.
Returns: - nvstrings: keys
Examples
import nvcategory c = nvcategory.to_device(["eee","aaa","eee","dddd"]) print(c.keys())
Output:
['aaa','dddd','eee']
-
keys_size
()¶ The number of keys.
Returns: - int: number of keys
Examples
import nvcategory c = nvcategory.to_device(["eee","aaa","eee","dddd"]) print(c.keys()) print(c.keys_size())
Output:
['aaa','dddd','eee'] 3
-
remove_strings
(nvs)¶ Create new category without the specified strings. The returned category will have new set of key values and indexes.
Parameters: - nvs : nvstrings
strings to be removed.
Examples
import nvcategory, nvstrings s1 = nvstrings.to_device(["eee","aaa","eee","dddd"]) s2 = nvstrings.to_device(["aaa"]) c1 = nvcategory.from_strings(s1) c2 = c1.remove_strings(s2) print(c1.keys()) print(c1.values()) print(c2.keys()) print(c2.values())
Output:
-
size
()¶ The number of values.
Returns: - int: number of values
Examples
import nvcategory c = nvcategory.to_device(["eee","aaa","eee","dddd"]) print(c.values()) print(c.size())
Output:
[2, 0, 2, 1] 4
-
to_strings
()¶ Return nvstrings instance represented by the values in this instance.
Returns: - nvstrings: full strings list based on values indexes
Examples
import nvcategory c = nvcategory.to_device(["eee","aaa","eee","dddd"]) print(c.keys()) print(c.values()) print(c.to_strings())
Output:
['aaa','dddd','eee'] [2, 0, 2, 1] ['eee','aaa','eee','dddd']
-
value
(str)¶ Return the category value for the given string.
Parameters: - str : str
key to retrieve
Examples
import nvcategory c = nvcategory.to_device(["eee","aaa","eee","dddd"]) print(c.value('aaa')) print(c.value('eee'))
Output:
0 2
-
value_for_index
(idx)¶ Return the category value for the given index.
Parameters: - idx : int
index value to retrieve
Examples
import nvcategory c = nvcategory.to_device(["eee","aaa","eee","dddd"]) print(c.value_for_index(3))
Output:
1
-
values
(devptr=0)¶ Return all values for this instance.
Parameters: - devptr : GPU memory pointer
Where index values will be written. Must be able to hold size() of int32 values.
Examples
import nvcategory c = nvcategory.to_device(["eee","aaa","eee","dddd"]) print(c.values())
Output:
[2, 0, 2, 1]
-