seutil package

Subpackages

Submodules

seutil.BashUtils module

class seutil.BashUtils.BashUtils[source]

Bases: object

Utility functions for running Bash commands.

PRINT_LIMIT = 1000
class RunResult(return_code, stdout, stderr)[source]

Bases: tuple

return_code: int

Alias for field number 0

stderr: str

Alias for field number 2

stdout: str

Alias for field number 1

classmethod get_temp_dir() pathlib.Path[source]
classmethod get_temp_file() pathlib.Path[source]
classmethod run(cmd: str, expected_return_code: Optional[int] = None, is_update_env: bool = False, timeout: Optional[float] = None) seutil.BashUtils.BashUtils.RunResult[source]

Runs a Bash command and returns the stdout. :param cmd: the command to run. :param expected_return_code: if set to an int, will raise exception if the return code mismatch. :param is_update_env: if true, the environment in this python process (os.environ) will be updated upon the successful execution of cmd (i.e., returns 0), to reflect the changes to the enrionment variables cmd may make. Note it can not change the environment of the process that invoked this python process. It is useful because the updated environment will be used for later BashUtils.run executions. :param timeout: if not None, kill the process after timeout seconds and raise TimeoutExpire exception. :return: the run result, which is a named tuple with field return_code, stdout, stderr.

seutil.CliUtils module

class seutil.CliUtils.Option[source]

Bases: object

class seutil.CliUtils.StoreInDict(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Bases: argparse.Action

seutil.CliUtils.main(argv, actions: Dict[str, Callable], normalize_options: Optional[Callable[[Dict], Dict]] = None)[source]

Main function for command line option parsing, in the form of “THIS action options…”, Where each option is in the form of “-name=value”. :param argv: The command line inputs, without the name of the script (sys.argv[1:]). :param actions: The mapping from action name to action function. :param normalize_options: Optional function to normalize options, by default identical function.

seutil.GitHubUtils module

class seutil.GitHubUtils.GitHubUtils[source]

Bases: object

DEFAULT_ACCESS_TOKEN = None
DEFAULT_GITHUB_OBJECT = None
GITHUB_SEARCH_ITEMS_MAX = 1000
T

alias of TypeVar(‘T’)

classmethod ensure_github_api_call(call: Callable[[github.MainClass.Github], seutil.GitHubUtils.T], github: Optional[github.MainClass.Github] = None, max_retry_times: int = inf) seutil.GitHubUtils.T[source]
classmethod get_github(access_token: Optional[str] = None) github.MainClass.Github[source]
classmethod is_url_valid_git_repo(url: str) bool[source]
logger = <Logger GitHubUtils (DEBUG)>
classmethod search_repos(q: str = '', sort: str = 'stars', order: str = 'desc', is_allow_fork: bool = False, max_num_repos: int = 1000, github: Optional[github.MainClass.Github] = None, max_retry_times: int = inf, *_, **qualifiers) List[github.Repository.Repository][source]

Searches the repos by querying GitHub API v3. :return: a list of full names of the repos match the query.

classmethod search_repos_of_language(language: str, max_num_repos: int = inf, is_allow_fork: bool = False, max_retry_times: int = inf, strategies: Optional[List[str]] = None) List[github.Repository.Repository][source]

Searches for all the repos of the language. :return: a list of full names of matching repos.

classmethod search_users(q: str = '', sort: str = 'repositories', order: str = 'desc', max_num_users: int = 1000, github: Optional[github.MainClass.Github] = None, max_retry_times: int = inf, *_, **qualifiers) List[github.NamedUser.NamedUser][source]

Searches the users by querying GitHub API v3. :return: a list of usernames (login) of the users match the query.

class wait_rate_limit(github: Optional[github.MainClass.Github] = None)[source]

Bases: object

Wait for rate limit of the github accessor. For use with “with”. Use the default github accessor if no argument is given.

DEFAULT_GITHUB_OBJECT = None
logger = <Logger GitHubUtils (DEBUG)>

seutil.IOUtils module

class seutil.IOUtils.IOUtils[source]

Bases: object

Utility functions for I/O.

DEJSONFY_FUNC_NAME = 'dejsonfy'
class Format(value)[source]

Bases: enum.Enum

An enumeration.

classmethod from_str(string: str) seutil.IOUtils.IOUtils.Format[source]
get_extension() str[source]
json = (4,)
jsonList = (5,)
jsonNoSort = (3,)
jsonPretty = (2,)
pkl = (1,)
txt = (0,)
txtList = (6,)
yaml = (7,)
IO_FORMATS: Dict[seutil.IOUtils.IOUtils.Format, Dict] = defaultdict(<function IOUtils.<lambda>>, {<Format.pkl: (1,)>: {'mode': 'b', 'dumpf': <function IOUtils.<lambda>>, 'loadf': <function IOUtils.<lambda>>}, <Format.jsonPretty: (2,)>: {'mode': 't', 'dumpf': <function IOUtils.<lambda>>, 'loadf': <function IOUtils.<lambda>>}, <Format.jsonNoSort: (3,)>: {'mode': 't', 'dumpf': <function IOUtils.<lambda>>, 'loadf': <function IOUtils.<lambda>>}, <Format.json: (4,)>: {'mode': 't', 'dumpf': <function IOUtils.<lambda>>, 'loadf': <function IOUtils.<lambda>>}, <Format.yaml: (7,)>: {'mode': 't', 'dumpf': <function IOUtils.<lambda>>, 'loadf': <function IOUtils.<lambda>>}, <Format.jsonList: (5,)>: {'mode': 't', 'dumpf': <function IOUtils.<lambda>>, 'loadf': <function IOUtils.<lambda>>}, <Format.txtList: (6,)>: {'mode': 't', 'dumpf': <function IOUtils.<lambda>>, 'loadf': <function IOUtils.<lambda>>}})
JSONFY_ATTR_FIELD_NAME = 'jsonfy_attr'
JSONFY_FUNC_NAME = 'jsonfy'
class cd(path: Union[str, pathlib.Path])[source]

Bases: object

Change directory. Usage:

with IOUtils.cd(path):

<statements>

# end with

Using a string path is supported for backward compatibility. Using pathlib.Path should be preferred.

classmethod dejsonfy(data, clz: Optional[Union[Type, str]] = None)[source]

Turns a json-compatible data structure to an object of class {@code clz}. If {@code clz} is not assigned, the data will be casted to dict or list if possible. Otherwise the data will be casted to the object through (try each option in order, if applicable): 1. DEJSONFY function, which takes the data as argument and returns a object;

should have the name {@link IOUtils#DEJSONFY_FUNC_NAME};

  1. JSONFY_ATTR field, which is a dict of attribute name-type pairs, that will be extracted from the object to a dict; should have the name {@link IOUtils#JSONFY_ATTR_FIELD_NAME};

classmethod dump(file_path: Union[str, pathlib.Path], obj: object, fmt: Union[seutil.IOUtils.IOUtils.Format, str] = Format.jsonPretty, append: bool = False) None[source]

Saves an object to the file in the specified format. By default, the format is json pretty-print, and the existing content in the file will be erased. :param file_path: the file to save the object into. :param obj: the object to save. :param fmt: the format, one of IOUtils.Format. :param append: if true, appends to the file instead of erasing existing content in the file.

classmethod dumpf_json_list(obj, f)[source]
classmethod dumpf_txt_list(obj, f)[source]
classmethod extend_json(file_name, data)[source]

Updates the json data file. The data should be list like (support extend).

classmethod has_dir(dirname) bool[source]
classmethod jsonfy(obj)[source]

Turns an object to a json-compatible data structure. A json-compatible data can only have list, dict (with str keys), str, int and float. Any object of other classes will be casted through (try each option in order, if applicable): 1. JSONFY function, which takes no argument and returns a json-compatible data;

should have the name {@link IOUtils#JSONFY_FUNC_NAME};

  1. JSONFY_ATTR field, which is a dict of attribute name-type pairs, that will be extracted from the object to a dict; should have the name {@link IOUtils#JSONFY_ATTR_FIELD_NAME};

  2. cast to a string.

classmethod load(file_path: Union[str, pathlib.Path], fmt: Union[seutil.IOUtils.IOUtils.Format, str] = Format.jsonPretty) Any[source]
classmethod load_json_stream(file_path: Union[str, pathlib.Path], fmt: Union[seutil.IOUtils.IOUtils.Format, str] = Format.jsonPretty)[source]

Reads large json file containing a list of data iteratively. Returns a generator function.

classmethod loadf_json_list(f) List[source]
classmethod loadf_txt_list(f) List[source]
classmethod mk_dir(dirname, mode=511, is_remove_if_exists: bool = False, is_make_parent: bool = True)[source]

Makes the directory. :param dirname: the name of the directory. :param mode: mode of the directory. :param is_remove_if_exists: if the directory with name already exists, whether to remove. :param is_make_parent: if make parent directory if not exists.

classmethod rm(path: pathlib.Path, ignore_non_exist: bool = True, force: bool = True)[source]

Removes the file/dir. :param path: the path to the file/dir to remove. :param ignore_non_exist: ignores error if the file/dir does not exist. :param force: force remove the file even it’s protected / dir even it’s non-empty.

classmethod rm_dir(path: pathlib.Path, ignore_non_exist: bool = True, force: bool = True)[source]

Removes the directory. :param path: the name of the directory. :param ignore_non_exist: ignores error if the directory does not exist. :param force: force remove the directory even it’s non-empty.

classmethod update_json(file_name, data)[source]

Updates the json data file. The data should be dict like (support update).

seutil.IOUtils.is_clz_record_class(clz: Type) bool[source]
seutil.IOUtils.is_obj_record_class(obj: Any) bool[source]

seutil.LoggingUtils module

class seutil.LoggingUtils.LoggingUtils[source]

Bases: object

CRITICAL = 50
DEBUG = 10
ERROR = 40
INFO = 20
WARNING = 30
default_handlers = []
default_level = 30
classmethod get_handler_console(stream=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>, level=30) logging.Handler[source]
classmethod get_handler_file(filename, level=10) logging.Handler[source]
classmethod get_logger(name: str, level: Optional[int] = None) logging.Logger[source]
classmethod log_and_raise(logger: logging.Logger, msg: str, error_type, level: int = 40)[source]
loggers = [<Logger GitHubUtils (DEBUG)>]
logging_format = '[{relativeCreated:6.0f}{levelname[0]}]{name}: {message}'
logging_format_detail = '[{asctime}|{relativeCreated:.3f}|{levelname:7}]{name}: {message} [@{filename}:{lineno}|{funcName}|pid {process}|tid {thread}]'
classmethod refresh_loggers()[source]

Refresh all the loggers to use the default handlers.

classmethod setup(level=30, filename: Optional[str] = None)[source]

seutil.MiscUtils module

class seutil.MiscUtils.ClassPropertyDescriptor(fget, fset=None)[source]

Bases: object

setter(func)[source]
seutil.MiscUtils.chunks(l, n)[source]

Yield successive n-sized chunks from l.

seutil.MiscUtils.classproperty(func)[source]
seutil.MiscUtils.get_num_params(vocab_size, num_layers, num_neurons)[source]

Returns the number of trainable parameters of an LSTM.

Parameters
  • vocab_size (int) – The vocabulary size

  • num_layers (int) – The number of layers in the LSTM

  • num_neurons (int) – The number of neurons / units per layer

Returns

The number of trainable parameters

Return type

int

seutil.MiscUtils.iter_len(iterator: Iterable) int[source]

Counts the length with the iterator.

seutil.MiscUtils.itos_human_readable(value: int, precision: int = 1) str[source]

Converts a large integer to a human-readable string representation. :return the human-readable string representation of the int. :raises TypeError if the value passed was unable to be coaxed into int.

seutil.MiscUtils.shuffle_data(items: Sequence[seutil.MiscUtils.T]) Sequence[seutil.MiscUtils.T][source]

Randomly shuffles the data.

seutil.Stream module

class seutil.Stream.Stream[source]

Bases: object

Streams help manipulate sequences of objects.

count()[source]
filter(predicate_func: Callable[[object], bool])[source]

Returns a stream consisting of the elements of this stream that match the given predicate.

get(index: int)[source]
is_empty()[source]
map(map_func: Callable[[str], object], errors: str = 'raise', default: object = '')[source]
classmethod of(one_or_more_items)[source]

Get a new stream from the item / items. :param one_or_more_items: is converted to list with builtin list function.

classmethod of_dirs(dir_path: Union[str, pathlib.Path])[source]

Get a stream of the sub-directories under the directory.

classmethod of_files(dir_path: Union[str, pathlib.Path])[source]

Get a stream of the files under the directory.

peak(peak_func: Callable[[str], None], errors: str = 'ignore')[source]
reduce(count_func: typing.Callable[[str], float] = <function Stream.<lambda>>)[source]
shuffle(seed=None)[source]

Shuffles the list of files in the dataset.

sorted(key: typing.Callable[[str], object] = <function Stream.<lambda>>, reverse: bool = False)[source]

Sorts the list of files in the dataset.

split(fraction_list: typing.List[float], count_func: typing.Callable[[str], float] = <function Stream.<lambda>>)[source]

Splits the dataset as each part specified by the fractions (assumed to sum up to 1). Splitting is done by finding the cutting points. If randomization is needed, call shuffle first. :param count_func: customize the number of data counts in each file.

seutil.TimeUtils module

class seutil.TimeUtils.TimeUtils[source]

Bases: object

classmethod time_limit(seconds)[source]
exception seutil.TimeUtils.TimeoutException[source]

Bases: Exception

seutil.bash module

exception seutil.bash.BashError(cmd: str, completed_process: subprocess.CompletedProcess, check_returncode: int)[source]

Bases: RuntimeError

seutil.bash.run(cmd: str, check_returncode: Optional[int] = None, warn_nonzero: bool = True, update_env: bool = False, update_env_clear_existing: bool = False, **kwargs) subprocess.CompletedProcess[source]

Run a bash command using subprocess.run. The command will be run using “bash -c”.

Some arguments’ default values are changed (but can be overridden with kwargs): * capture_output=True, text=True: capture all stdout and stderr.

This function is able to check if return code match a given value (subprocess only supports checking non-zero values, but this function supports any). Nevertheless, this function warns about any non-zero values if check_returncode is not set, to avoid silent failures; this behavior can be turned off via warn_nonzero=False.

In addition, this function can try to update the environment variables in this process with the ones after running the command (if the command finished successfully). The retrival of the sub shell’s environments is done by env into a temporary file.

Parameters
  • cmd – the command to run

  • check_returncode – the return code to expect from the command

  • warn_nonzero – whether to warn about non-zero exit codes

  • update_env – whether to update the environment variables in this process

  • update_env_clear_existing – whether to clear existing environment variables before updating

  • kwargs – other arguments passed to subprocess.run

Returns

the subprocess.CompletedProcess object, has stdout, stderr, returncode fields

Raises

BashError if the command’s output did not match check_returncode

Raises

subprocess.TimeoutExpired if the command timed out

seutil.debug module

class seutil.debug.Reporter[source]

Bases: object

add_to_history(name, value)[source]
generate_report() str[source]
seutil.debug.inspect(var: typing.Any, name: typing.Optional[str] = None, reporter: seutil.debug.Reporter = <seutil.debug.Reporter object>)[source]
seutil.debug.report(reporter: seutil.debug.Reporter = <seutil.debug.Reporter object>)[source]

seutil.io module

exception seutil.io.DeserializationError(data, clz: Optional[Union[Type, str]], reason: str)[source]

Bases: RuntimeError

class seutil.io.Fmt(value)[source]

Bases: seutil.io.FmtProperty, enum.Enum

An enumeration.

binary: bool

Alias for field number 3

exts: List[str]

Alias for field number 2

json = FmtProperty(writer=<function Fmt.<lambda>>, reader=<function Fmt.<lambda>>, exts=['json'], binary=False, line_mode=False, serialize=True)
jsonFlexible = FmtProperty(writer=<function Fmt.<lambda>>, reader=<function Fmt.<lambda>>, exts=['json'], binary=False, line_mode=False, serialize=True)
jsonList = FmtProperty(writer=<function Fmt.<lambda>>, reader=<function Fmt.<lambda>>, exts=['jsonl'], binary=False, line_mode=True, serialize=True)
jsonNoSort = FmtProperty(writer=<function Fmt.<lambda>>, reader=<function Fmt.<lambda>>, exts=['json'], binary=False, line_mode=False, serialize=True)
jsonPretty = FmtProperty(writer=<function Fmt.<lambda>>, reader=<function Fmt.<lambda>>, exts=['json'], binary=False, line_mode=False, serialize=True)
line_mode: bool

Alias for field number 4

pickle = FmtProperty(writer=<function Fmt.<lambda>>, reader=<function Fmt.<lambda>>, exts=['pkl', 'pickle'], binary=True, line_mode=False, serialize=False)
reader: Union[Callable[[io.IOBase], Any], Callable[[str], Any]]

Alias for field number 1

serialize: bool

Alias for field number 5

txt = FmtProperty(writer=<function Fmt.<lambda>>, reader=<function Fmt.<lambda>>, exts=['txt'], binary=False, line_mode=False, serialize=False)
txtList = FmtProperty(writer=<function Fmt.<lambda>>, reader=<function Fmt.<lambda>>, exts=['txt'], binary=False, line_mode=True, serialize=False)
writer: Union[Callable[[io.IOBase, Any], None], Callable[[Any], str]]

Alias for field number 0

yaml = FmtProperty(writer=<function Fmt.<lambda>>, reader=<function Fmt.<lambda>>, exts=['yml', 'yaml'], binary=False, line_mode=False, serialize=True)
class seutil.io.cd(path: Union[str, pathlib.Path])[source]

Bases: object

Temporally changes directory, for use with with:

``` with cd(path):

# cwd moved to path <statements>

# cwd moved back to original cwd ```

seutil.io.deserialize(data, clz: Optional[Union[Type, str]] = None, error: str = 'ignore')[source]

Deserializes some data (with only primitive types, list, dict) to an object with proper types.

Parameters
  • data – the data to be deserialized.

  • clz – the targeted type of deserialization (or its name); if None, will return the data as-is.

  • error – what to do when the deserialization has problem: * raise: raise a DeserializationError. * ignore (default): return the data as-is.

Returns

the deserialized data.

seutil.io.dump(path: Union[str, pathlib.Path], obj: object, fmt: Optional[seutil.io.Fmt] = None, serialization: Optional[bool] = None, parents: bool = True, append: bool = False, exists_ok: bool = True, serialization_fmt_aware: bool = True) None[source]

Saves an object to a file. The format is automatically inferred from the file name, if not otherwise specified. By default, serialization (i.e., converting to primitive types and data structures) is automatically performed for the formats that needs it (e.g., json).

Parameters
  • path – the path to save the file.

  • obj – the object to be saved.

  • fmt – the format of the file; if None (default), inferred from path.

  • serialization – whether or not to serialize the object before saving: * True: always serialize; * None (default): only serialize for the formats that needs it; * False: never serialize.

  • parents – what to do if parent directories of path do not exist: * True (default): automatically create them; * False: raise Exception.

  • append – whether to append to an existing file if any (default False).

  • exists_ok – what to do if path already exists and append is False: * True (default): automatically rewrites it; * False: raise Exception.

  • serialization_fmt_aware – let the serialization function be aware of the target format to fit its constraints (e.g., dictionaries in json format can only have str keys).

seutil.io.load(path: Union[str, pathlib.Path], fmt: Optional[seutil.io.Fmt] = None, serialization: Optional[bool] = None, clz: Optional[Type] = None, error: str = 'ignore', iter_line: bool = False) Union[object, Iterator[object]][source]

Loads an object from a file. The format is automatically inferred from the file name, if not otherwise specified. By default, if clz is given, deserialization (i.e., unpackingn from primitive types and data structures) is automatically performed for the formats that needs it (e.g., json).

Parameters
  • path – the path to load the object.

  • fmt – the format of the file; if None (default), inferred from path.

  • serialization – whether or not to deserialize the object after loading: * True: always serialize; * None (default): only serialize for the formats that needs it; * False: never serialize.

  • clz – the class to use for deserialization; if None (default), deserialization is a no-op.

  • error – what to do if deserialization fails: * raise: raise a DeserializationError. * ignore (default): return the data as-is.

  • iter_line – whether to iterate over the lines of the file instead of loading the whole file.

seutil.io.mkdir(path: Union[str, pathlib.Path], parents: bool = True, fresh: bool = False)[source]

Creates a directory.

Parameters
  • path – the path to the directory.

  • parents – if True, automatically creates parent directories; otherwise, raise error if any parent is missing.

  • fresh – if True and if the directory already exists, removes it before creating.

seutil.io.mktmp(prefix: Optional[str] = None, suffix: Optional[str] = None, separator: str = '-', dir: Optional[pathlib.Path] = None) pathlib.Path[source]

Makes a temp file. A wrapper for tempfile.mkstemp.

seutil.io.mktmp_dir(prefix: Optional[str] = None, suffix: Optional[str] = None, separator: str = '-', dir: Optional[pathlib.Path] = None) pathlib.Path[source]

Makes a temp directory. A wrapper for tempfile.mkdtemp.

seutil.io.rm(path: Union[str, pathlib.Path], missing_ok: bool = True, force: bool = True)[source]

Removes a file/directory.

Parameters
  • path – the name of the file/directory.

  • missing_ok – (-f) ignores error if the file/directory does not exist.

  • force – (-rf) force remove the directory even it’s not empty.

seutil.io.rmdir(path: Union[str, pathlib.Path], missing_ok: bool = True, force: bool = True)[source]

Removes a directory.

Parameters
  • path – the name of the directory.

  • missing_ok – (-f) ignores error if the directory does not exist.

  • force – (-f) force remove the directory even it’s non-empty.

seutil.io.serialize(obj: object, fmt: Optional[seutil.io.Fmt] = None) object[source]

Serializes an object into a data structure with only primitive types, list, dict. If fmt is provided, its formatting constraints are taken into account. Supported fmts: * json, jsonPretty, jsonNoSort, jsonList: dict only have str keys.

Parameters
  • obj – the object to be serialized.

  • fmt – (optional) the target format.

Returns

the serialized object.

seutil.log module

This module assists the logging standard library. The main functionality is:

  • maintain two frequently used handlers: a stderr handler and a file handler, both with rich and customizable formats.

  • setup method to attach them to the root logger.

  • get_logger method to quickly create a logger with customized level.

seutil.log.get_logger(name: str, level: Union[int, str] = 0)[source]

Get a logger with specified name and level.

seutil.log.setup(log_file: Optional[Union[str, pathlib.Path]] = None, level_stderr: Union[int, str] = 20, level_file: Union[int, str] = 10, fmt_stderr: str = '[{asctime}{levelname[0]}]{name}: {message}', datefmt_stderr: str = '%H:%M:%S', fmt_file: str = '[{asctime}|{relativeCreated:.3f}|{levelname:7}]{name}: {message} [@{filename}:{lineno}|{funcName}|pid {process}|tid {thread}]', datefmt_file: str = '%Y-%m-%d %H:%M:%S', clear_handlers: bool = True, **kwargs_file: dict)[source]

Setup the stderr and file handlers, and attach them to the root logger.

Parameters
  • log_file – the log file to use; if None, no file handler is created (and any existing one would be removed)

  • level_stderr – the level filter of the stderr handler

  • level_file – the level filter of the file handler

  • fmt_stderr – the format of the stderr handler (with {} style)

  • fmt_file – the format of the file handler (with {} style)

  • clear_handlers – if True, remove all existing handlers of the root logger; otherwise, keep them as is

  • kwargs_file – other optional kwargs to the file handler (RotatingFileHandler)

seutil.pbar module

class seutil.pbar.PBarManager(out: TextIO, switch_interval: float = 2.5)[source]

Bases: object

add(instance: seutil.pbar.tqdm_managed)[source]
format_summaries() str[source]
format_summary(instance)[source]
remove(instance: seutil.pbar.tqdm_managed)[source]
run()[source]
switch(to: Optional[int] = None)[source]
update()[source]
seutil.pbar.tqdm

alias of seutil.pbar.tqdm_managed

class seutil.pbar.tqdm_managed(*_, **__)[source]

Bases: tqdm.std.tqdm

clear(*_, **__)[source]

Clear current bar display.

close()[source]

Cleanup and (if leave=False) close the progressbar.

display(*_, **__)[source]

Use self.sp to display msg in the specified pos.

Consider overloading this function when inheriting to use e.g.: self.some_frontend(**self.format_dict) instead of self.sp.

Parameters
  • msg (str, optional. What to display (default: repr(self)).) –

  • pos (int, optional. Position to moveto) – (default: abs(self.pos)).

Module contents

class seutil.BashUtils[source]

Bases: object

Utility functions for running Bash commands.

PRINT_LIMIT = 1000
class RunResult(return_code, stdout, stderr)[source]

Bases: tuple

return_code: int

Alias for field number 0

stderr: str

Alias for field number 2

stdout: str

Alias for field number 1

classmethod get_temp_dir() pathlib.Path[source]
classmethod get_temp_file() pathlib.Path[source]
classmethod run(cmd: str, expected_return_code: Optional[int] = None, is_update_env: bool = False, timeout: Optional[float] = None) seutil.BashUtils.BashUtils.RunResult[source]

Runs a Bash command and returns the stdout. :param cmd: the command to run. :param expected_return_code: if set to an int, will raise exception if the return code mismatch. :param is_update_env: if true, the environment in this python process (os.environ) will be updated upon the successful execution of cmd (i.e., returns 0), to reflect the changes to the enrionment variables cmd may make. Note it can not change the environment of the process that invoked this python process. It is useful because the updated environment will be used for later BashUtils.run executions. :param timeout: if not None, kill the process after timeout seconds and raise TimeoutExpire exception. :return: the run result, which is a named tuple with field return_code, stdout, stderr.

class seutil.GitHubUtils[source]

Bases: object

DEFAULT_ACCESS_TOKEN = None
DEFAULT_GITHUB_OBJECT = None
GITHUB_SEARCH_ITEMS_MAX = 1000
T

alias of TypeVar(‘T’)

classmethod ensure_github_api_call(call: Callable[[github.MainClass.Github], seutil.GitHubUtils.T], github: Optional[github.MainClass.Github] = None, max_retry_times: int = inf) seutil.GitHubUtils.T[source]
classmethod get_github(access_token: Optional[str] = None) github.MainClass.Github[source]
classmethod is_url_valid_git_repo(url: str) bool[source]
logger = <Logger GitHubUtils (DEBUG)>
classmethod search_repos(q: str = '', sort: str = 'stars', order: str = 'desc', is_allow_fork: bool = False, max_num_repos: int = 1000, github: Optional[github.MainClass.Github] = None, max_retry_times: int = inf, *_, **qualifiers) List[github.Repository.Repository][source]

Searches the repos by querying GitHub API v3. :return: a list of full names of the repos match the query.

classmethod search_repos_of_language(language: str, max_num_repos: int = inf, is_allow_fork: bool = False, max_retry_times: int = inf, strategies: Optional[List[str]] = None) List[github.Repository.Repository][source]

Searches for all the repos of the language. :return: a list of full names of matching repos.

classmethod search_users(q: str = '', sort: str = 'repositories', order: str = 'desc', max_num_users: int = 1000, github: Optional[github.MainClass.Github] = None, max_retry_times: int = inf, *_, **qualifiers) List[github.NamedUser.NamedUser][source]

Searches the users by querying GitHub API v3. :return: a list of usernames (login) of the users match the query.

class wait_rate_limit(github: Optional[github.MainClass.Github] = None)[source]

Bases: object

Wait for rate limit of the github accessor. For use with “with”. Use the default github accessor if no argument is given.

DEFAULT_GITHUB_OBJECT = None
logger = <Logger GitHubUtils (DEBUG)>
class seutil.IOUtils[source]

Bases: object

Utility functions for I/O.

DEJSONFY_FUNC_NAME = 'dejsonfy'
class Format(value)[source]

Bases: enum.Enum

An enumeration.

classmethod from_str(string: str) seutil.IOUtils.IOUtils.Format[source]
get_extension() str[source]
json = (4,)
jsonList = (5,)
jsonNoSort = (3,)
jsonPretty = (2,)
pkl = (1,)
txt = (0,)
txtList = (6,)
yaml = (7,)
IO_FORMATS: Dict[seutil.IOUtils.IOUtils.Format, Dict] = defaultdict(<function IOUtils.<lambda>>, {<Format.pkl: (1,)>: {'mode': 'b', 'dumpf': <function IOUtils.<lambda>>, 'loadf': <function IOUtils.<lambda>>}, <Format.jsonPretty: (2,)>: {'mode': 't', 'dumpf': <function IOUtils.<lambda>>, 'loadf': <function IOUtils.<lambda>>}, <Format.jsonNoSort: (3,)>: {'mode': 't', 'dumpf': <function IOUtils.<lambda>>, 'loadf': <function IOUtils.<lambda>>}, <Format.json: (4,)>: {'mode': 't', 'dumpf': <function IOUtils.<lambda>>, 'loadf': <function IOUtils.<lambda>>}, <Format.yaml: (7,)>: {'mode': 't', 'dumpf': <function IOUtils.<lambda>>, 'loadf': <function IOUtils.<lambda>>}, <Format.jsonList: (5,)>: {'mode': 't', 'dumpf': <function IOUtils.<lambda>>, 'loadf': <function IOUtils.<lambda>>}, <Format.txtList: (6,)>: {'mode': 't', 'dumpf': <function IOUtils.<lambda>>, 'loadf': <function IOUtils.<lambda>>}})
JSONFY_ATTR_FIELD_NAME = 'jsonfy_attr'
JSONFY_FUNC_NAME = 'jsonfy'
class cd(path: Union[str, pathlib.Path])[source]

Bases: object

Change directory. Usage:

with IOUtils.cd(path):

<statements>

# end with

Using a string path is supported for backward compatibility. Using pathlib.Path should be preferred.

classmethod dejsonfy(data, clz: Optional[Union[Type, str]] = None)[source]

Turns a json-compatible data structure to an object of class {@code clz}. If {@code clz} is not assigned, the data will be casted to dict or list if possible. Otherwise the data will be casted to the object through (try each option in order, if applicable): 1. DEJSONFY function, which takes the data as argument and returns a object;

should have the name {@link IOUtils#DEJSONFY_FUNC_NAME};

  1. JSONFY_ATTR field, which is a dict of attribute name-type pairs, that will be extracted from the object to a dict; should have the name {@link IOUtils#JSONFY_ATTR_FIELD_NAME};

classmethod dump(file_path: Union[str, pathlib.Path], obj: object, fmt: Union[seutil.IOUtils.IOUtils.Format, str] = Format.jsonPretty, append: bool = False) None[source]

Saves an object to the file in the specified format. By default, the format is json pretty-print, and the existing content in the file will be erased. :param file_path: the file to save the object into. :param obj: the object to save. :param fmt: the format, one of IOUtils.Format. :param append: if true, appends to the file instead of erasing existing content in the file.

classmethod dumpf_json_list(obj, f)[source]
classmethod dumpf_txt_list(obj, f)[source]
classmethod extend_json(file_name, data)[source]

Updates the json data file. The data should be list like (support extend).

classmethod has_dir(dirname) bool[source]
classmethod jsonfy(obj)[source]

Turns an object to a json-compatible data structure. A json-compatible data can only have list, dict (with str keys), str, int and float. Any object of other classes will be casted through (try each option in order, if applicable): 1. JSONFY function, which takes no argument and returns a json-compatible data;

should have the name {@link IOUtils#JSONFY_FUNC_NAME};

  1. JSONFY_ATTR field, which is a dict of attribute name-type pairs, that will be extracted from the object to a dict; should have the name {@link IOUtils#JSONFY_ATTR_FIELD_NAME};

  2. cast to a string.

classmethod load(file_path: Union[str, pathlib.Path], fmt: Union[seutil.IOUtils.IOUtils.Format, str] = Format.jsonPretty) Any[source]
classmethod load_json_stream(file_path: Union[str, pathlib.Path], fmt: Union[seutil.IOUtils.IOUtils.Format, str] = Format.jsonPretty)[source]

Reads large json file containing a list of data iteratively. Returns a generator function.

classmethod loadf_json_list(f) List[source]
classmethod loadf_txt_list(f) List[source]
classmethod mk_dir(dirname, mode=511, is_remove_if_exists: bool = False, is_make_parent: bool = True)[source]

Makes the directory. :param dirname: the name of the directory. :param mode: mode of the directory. :param is_remove_if_exists: if the directory with name already exists, whether to remove. :param is_make_parent: if make parent directory if not exists.

classmethod rm(path: pathlib.Path, ignore_non_exist: bool = True, force: bool = True)[source]

Removes the file/dir. :param path: the path to the file/dir to remove. :param ignore_non_exist: ignores error if the file/dir does not exist. :param force: force remove the file even it’s protected / dir even it’s non-empty.

classmethod rm_dir(path: pathlib.Path, ignore_non_exist: bool = True, force: bool = True)[source]

Removes the directory. :param path: the name of the directory. :param ignore_non_exist: ignores error if the directory does not exist. :param force: force remove the directory even it’s non-empty.

classmethod update_json(file_name, data)[source]

Updates the json data file. The data should be dict like (support update).

class seutil.LoggingUtils[source]

Bases: object

CRITICAL = 50
DEBUG = 10
ERROR = 40
INFO = 20
WARNING = 30
default_handlers = []
default_level = 30
classmethod get_handler_console(stream=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>, level=30) logging.Handler[source]
classmethod get_handler_file(filename, level=10) logging.Handler[source]
classmethod get_logger(name: str, level: Optional[int] = None) logging.Logger[source]
classmethod log_and_raise(logger: logging.Logger, msg: str, error_type, level: int = 40)[source]
loggers = [<Logger GitHubUtils (DEBUG)>]
logging_format = '[{relativeCreated:6.0f}{levelname[0]}]{name}: {message}'
logging_format_detail = '[{asctime}|{relativeCreated:.3f}|{levelname:7}]{name}: {message} [@{filename}:{lineno}|{funcName}|pid {process}|tid {thread}]'
classmethod refresh_loggers()[source]

Refresh all the loggers to use the default handlers.

classmethod setup(level=30, filename: Optional[str] = None)[source]
class seutil.Stream[source]

Bases: object

Streams help manipulate sequences of objects.

count()[source]
filter(predicate_func: Callable[[object], bool])[source]

Returns a stream consisting of the elements of this stream that match the given predicate.

get(index: int)[source]
is_empty()[source]
map(map_func: Callable[[str], object], errors: str = 'raise', default: object = '')[source]
classmethod of(one_or_more_items)[source]

Get a new stream from the item / items. :param one_or_more_items: is converted to list with builtin list function.

classmethod of_dirs(dir_path: Union[str, pathlib.Path])[source]

Get a stream of the sub-directories under the directory.

classmethod of_files(dir_path: Union[str, pathlib.Path])[source]

Get a stream of the files under the directory.

peak(peak_func: Callable[[str], None], errors: str = 'ignore')[source]
reduce(count_func: typing.Callable[[str], float] = <function Stream.<lambda>>)[source]
shuffle(seed=None)[source]

Shuffles the list of files in the dataset.

sorted(key: typing.Callable[[str], object] = <function Stream.<lambda>>, reverse: bool = False)[source]

Sorts the list of files in the dataset.

split(fraction_list: typing.List[float], count_func: typing.Callable[[str], float] = <function Stream.<lambda>>)[source]

Splits the dataset as each part specified by the fractions (assumed to sum up to 1). Splitting is done by finding the cutting points. If randomization is needed, call shuffle first. :param count_func: customize the number of data counts in each file.

class seutil.TimeUtils[source]

Bases: object

classmethod time_limit(seconds)[source]
exception seutil.TimeoutException[source]

Bases: Exception