watchful.client module

This script provides the functions required for interacting directly with Watchful client application.

watchful.client.api(verb: str, **kwargs: Dict) → Dict | None[source]

This is a convenience function for API calls; made up of a verb and optional keyword arguments.

Parameters:

verb (str) – The verb for the API.
kwargs (Dict) – Optional parameters to support the API for verb.

Returns:

The dictionary of the HTTP response from the connection request.

Return type:

Dict, optional

watchful.client.api_send_action(action: Dict) → Dict | None[source]

This is a convenience function for API calls with an action.

Parameters:: action (Dict) – The verb for the API with optional parameters.
Returns:: The dictionary of the HTTP response from the connection request.
Return type:: Dict, optional

watchful.client.apply_hints(name: str) → Dict | None[source]

This function applies the hints for an external hinter.

Parameters:: name (str) – The hinter name.
Returns:: The dictionary of the HTTP response from the connection request.
Return type:: Dict, optional

watchful.client.await_plabels() → Dict | None[source]

This function gets the updated HTTP response.

Returns:: The dictionary of the HTTP response from the connection request.
Return type:: Dict, optional

watchful.client.await_port_opening(port: int, timeout_sec: int = 10) → None[source]

This function waits for the port to be open; it returns None if port was opened within the timeout, otherwise it raises an exception. It is used for awaiting Watchful process startup.

Parameters:

port (int) – The port.
timeout_sec (int, optional) – The timeout in seconds, defaults to 10.

watchful.client.await_summary(pred: ~typing.Callable, halt_fn: ~typing.Callable = <function <lambda>>, unchanged_timeout: int = 60) → Dict | None[source]

This function returns the summary once pred(summary) returns true, or stops waiting once halt_fn returns true and then returns None, or raises an exception if the summary is unchanged for unchanged_timeout seconds.

Parameters:

pred (Callable) – The predicate function.
halt_fn (Callable, optional) – The halt function, defaults to lambda x: False.
unchanged_timeout (int, optional) – The timeout in seconds, defaults to 60.

Returns:

The dictionary of the HTTP response from get().

Return type:

Dict, optional

watchful.client.base_rate(class__: str, rate: int) → Dict | None[source]

This function sets the base rate for a class.

Parameters:

class (str) – The class to set a base rate for.
rate (int) – The base rate for the class.

Returns:

The dictionary of the HTTP response from the connection request.

Return type:

Dict, optional

watchful.client.candidate_dicts(summary: Dict | None = None) → List[Dict[str, str]][source]

This function retrieves and returns all the candidates, together with the column names for all values.

Parameters:: summary (Dict, optional) – The dictionary of the HTTP response from a connection request, defaults to None.
Returns:: The list of all the candidates, each as a dictionary of named values.
Return type:: List[Dict[str, str]]

watchful.client.class_(class__: str) → Dict | None[source]

This function creates a class.

Parameters:: class (str) – The class.
Returns:: The dictionary of the HTTP response from the connection request.
Return type:: Dict, optional

watchful.client.config() → Dict | None[source]

This function retrieves the app instance configuration parameters remote, username, role and authorization and their values.

Returns:: A dictionary of key value pairs
Return type:: Dict, optional

watchful.client.config_set(key: str, value: str) → Dict | None[source]

This function sets one app instance configuration parameter using a key and value pair.

Parameters:

key (str) – The parameter name.
value (str) – The parameter value.

Returns:

The dictionary of the HTTP response from the connection request.

Return type:

Dict, optional

watchful.client.create_class(class__: str, class_type: str = 'ftc') → Dict | None[source]

This function creates a class.

Parameters:

class (str) – The class.
class_type (str, optional) – The class type, it can be either “ftc” or “ner”, defaults to “ftc”.

Returns:

The dictionary of the HTTP response from the connection request.

Return type:

Dict, optional

watchful.client.create_dataset(csv_bytes: bytes, columns: List[str], filename: str = 'none', has_header: bool = True, threshold_detect: float = 0.5, is_fast_detect: bool = True, force_load: bool = True) → str[source]

This function loads the specified columns of a csv dataset and returns the dataset id if its encoding is detected to be utf-8 or if dataset loading is forced.

Parameters:

csv_bytes (bytes) – The csv dataset bytes.
columns (List[str]) – The list of column names to use.
filename (str, optional) – The csv dataset filename, defaults to “none”.
has_header (bool, optional) – The boolean indicating if the csv dataset has a header, defaults to True.
threshold_detect (float, optional) – The minimum confidence required to accept the detected encoding.
is_fast_detect (bool, optional) – Whether to use fast encoding detection with a lower accuracy, or not.
force_load (bool, optional) – The boolean indicating if the csv dataset will be loaded even when its encoding is detected to be non-utf-8, defaults to True. This is useful in rare cases where the csv dataset is detected to be non-utf-8 encoded and the user is sure about the csv dataset being utf-8 encoded.

Returns:

The dataset id.

Return type:

str

TODO: Add error handling.

watchful.client.create_project(title_: str | None = None) → str | Dict | None[source]

This function creates a new project. Additionally, if title is supplied, a title is given to the newly created project.

Parameters:: title (str, optional) – The title for the project.
Returns:: If a title is supplied and open_project("new") is successful, the dictionary of the HTTP response from the connection request from title(title); otherwise the read HTTP response from open_project("new").
Return type:: Union[str, Optional[Dict]]

watchful.client.delete(id_: int) → Dict | None[source]

This function deletes a hinter.

Parameters:: id (int) – The hinter id to delete.
Returns:: The dictionary of the HTTP response from the connection request.
Return type:: Dict, optional

watchful.client.delete_class(class__: str) → Dict | None[source]

This function deletes a class.

Parameters:: class (str) – The class to delete.
Returns:: The dictionary of the HTTP response from the connection request.
Return type:: Dict, optional

watchful.client.dump() → Generator[List[str], None, None][source]

This function returns all the candidates in “hint API order”.

Returns:: The generator of all the candidates.
Return type:: Generator[List[str], None, None]

watchful.client.dump_dicts() → Generator[Dict[str, str], None, None][source]

This function returns all the candidates in “hint API order”, together with the column names for all values.

Returns:: The generator of all the candidates, each as a dictionary of named values.
Return type:: Generator[Dict[str, str], None, None]

watchful.client.ephemeral(port: str = '9002') → None[source]

This function starts the backend using the specified port for an interactive session without persistence.

Parameters:: port (str, optional) – The port, defaults to “9002”.

watchful.client.exit_backend() → None[source]: This function exits the backend. Note that the API call will usually fail because the backend exits before returning a HTTP response so we suppress the error. This is useful locally, for tests, and during development, but not in dockerized Watchful application instances.

watchful.client.export() → Dict | None[source]

This function exports the dataset and returns an updated HTTP response.

Returns:: The dictionary of the HTTP response from the connection request.
Return type:: Dict, optional

watchful.client.export_async() → Dict | None[source]

This function exports the dataset. As it is asynchronous, the immediate HTTP response is likely not updated yet.

Returns:: The dictionary of the HTTP response from the connection request.
Return type:: Dict, optional

watchful.client.export_dataset_to_path(out_file: str, fields: List[str] | None = None) → None[source]

This function exports the original dataset via a buffered stream to the specified output file path. It takes fields as an optional argument for the header (column names), for the case where the callee expects to use specific columns; otherwise it uses the column names returned by the Watchful application. An exception is raised when the dataset’s column names do not match the user’s expected column names.

Parameters:

out_file (str) – The file path to export the original dataset to.
fields (List, optional) – The list of column names to use for the dataset export.

watchful.client.export_preview(mode: str = 'ftc') → Dict | None[source]

Returns a preview of the export.

Parameters:: mode (str, optional) – The mode of the export preview, it can be either “ftc” or “ner”, defaults to “ftc”.
Returns:: The dictionary of the HTTP response from the connection request.
Return type:: Dict, optional

watchful.client.export_project() → Response[source]

This function returns a consolidated version (a single *.hints file) of the currently open project. Unlike other GET endpoints that return a summary object, this function returns a streamed file, the *.hints project file.

Returns:: The HTTP response from the connection request.
Return type:: requests.models.Response

watchful.client.export_stream(content_type: str = 'text/csv', mode: str = 'ftc') → Response[source]

This function begins the export using the export_stream call. The result is not JSON, but is data to be processed directly.

For FTC mode, content_type must be text/csv and mode must be ftc. For NER mode, content_type must be application/jsonlines and mode must be ner.

On success, it returns the requests.models.Response object from which you can read the data.

Parameters:

content_type (str, optional) – The content type of the export, defaults to “text/csv”.
mode (str, optional) – The mode of the export, it can be either “ftc” or “ner”, defaults to “ftc”.

Returns:

The HTTP response from the connection request.

Return type:

requests.models.Response

watchful.client.external(host: str = 'localhost', port: str = '9001', scheme: str = 'http') → None[source]

This function changes the global HOST, PORT and SCHEME values.

Parameters:

host (str, optional) – The host, defaults to “localhost”.
port (str, optional) – The port, defaults to “9001”.
scheme (str, optional) – The scheme, either “http” or “https”, defaults to “http”.

watchful.client.external_hinter(class__: str, name: str, weight: int) → Dict | None[source]

This function creates an external hinter.

Parameters:

class (str) – The class for the hinter.
name (str) – The name for the hinter.
weight (int) – The weight for the hinter.

Returns:

The dictionary of the HTTP response from the connection request.

Return type:

Dict, optional

watchful.client.fetch(token: str | None = None) → Dict | None[source]

This function performs fetch with Watchful hub.

Parameters:: token (str, optional) – The user’s auth token, defaults to None.
Returns:: The dictionary of the HTTP response from the connection request.
Return type:: Dict, optional

watchful.client.get() → Dict | None[source]

This function gets the current status of the Watchful application, containing information such as your currently active project, dataset examples (candidates) and classes, hinters created, hand labels and label distribution, confidences and error rate, recall and precision and many more.

Returns:: The dictionary of the HTTP response from the connection request.
Return type:: Dict, optional

watchful.client.get_dataset_filepath(summary: Dict, is_local: bool = True) → str[source]

This function infers the datasets filepath from summary. For correctness, we use the summary that has been success asserted via _assert_success. As this function uses file operations, it does not work when the Watchful application is remote, and in such case returns “”.

Parameters:

summary (Dict) – The dictionary of the HTTP response from a connection request.
is_local (bool, optional) – The boolean indicating if the Watchful application is local, defaults to True

Returns:

The dataset filepath.

Return type:

str

watchful.client.get_dataset_id(summary: Dict) → str[source]

This function gets the active dataset id from summary. For correctness, we use the summary that has been success asserted via _assert_success.

Parameters:: summary (Dict) – The dictionary of the HTTP response from a connection request.
Returns:: The dataset id.
Return type:: str

watchful.client.get_datasets_dir(summary: Dict, is_local: bool = True) → str[source]

This function infers the datasets directory from summary. For correctness, we use the summary that has been success asserted via _assert_success.

Parameters:

summary (Dict) – The dictionary of the HTTP response from a connection request.
is_local (bool, optional) – The boolean indicating if the Watchful application is local, defaults to True

Returns:

The datasets directory.

Return type:

str

watchful.client.get_project_id(summary: Dict) → str[source]

This function gets the active project id from summary. For correctness, we use the summary that has been success asserted via _assert_success.

Parameters:: summary (Dict) – The dictionary of the HTTP response from a connection request.
Returns:: The project id.
Return type:: str

watchful.client.get_watchful_home(summary: Dict, is_local: bool = True) → str[source]

This function gets Watchful home from summary. For correctness, we use the summary that has been success asserted via _assert_success. If Watchful home is not available and the Watchful application is local, we derive Watchful home from the user home.

Parameters:

summary (Dict) – The dictionary of the HTTP response from a connection request.
is_local (bool, optional) – The boolean indicating if the Watchful application is local, defaults to True

Returns:

Watchful home.

Return type:

str

watchful.client.hint(name: str, offset: int, values: List[bool]) → Dict | None[source]

This function adds the hints for an external hinter.

Parameters:

name (str) – The hinter name.
offset (int) – The offset.
values (List[bool]) – The hints.

Returns:

The dictionary of the HTTP response from the connection request.

Return type:

Dict, optional

TODO: Come up with a better streaming Python API here.

watchful.client.hint_all(name: str, values: List[bool]) → Dict | None[source]

This function applies the hints for an external hinter.

Parameters:

name (str) – The hinter name.
values (List[bool]) – The hints.

Returns:

The dictionary of the HTTP response from the connection request.

Return type:

Dict, optional

watchful.client.hinter(class__: str, query_: str, weight: int) → Dict | None[source]

This function creates a hinter and returns an updated HTTP response.

Parameters:

class (str) – The class for the hinter.
query (str) – The query for the hinter.
weight (int) – The weight for the hinter.

Returns:

The dictionary of the HTTP response from the connection request.

Return type:

Dict, optional

watchful.client.hinter_async(class__: str, query_: str, weight: int) → Dict | None[source]

This function creates a hinter. As it is asynchronous, the immediate HTTP response is likely not updated yet.

Parameters:

class (str) – The class for the hinter.
query (str) – The query for the hinter.
weight (int) – The weight for the hinter.

Returns:

The dictionary of the HTTP response from the connection request.

Return type:

Dict, optional

watchful.client.hub_api(verb: str, token: str, **kwargs: Dict) → Dict | None[source]

This is a convenience function for collaboration API calls with Watchful; made up of a verb, a token and optional keyword arguments.

Parameters:

verb (str) – The verb for the hub API.
verb – The user’s auth token.
kwargs (Dict) – Optional parameters to support the hub API for verb.

Returns:

The dictionary of the HTTP response from the connection request.

Return type:

Dict, optional

watchful.client.ignore_column_flag(columns: List[str] | None = None, flag: Literal['inferenceable'] = 'inferenceable') → Dict | None[source]

This function sets a flag for each of the columns of the dataset in the currently active project. Given columns will be set to False and all other columns will be set to True. A helper to indicate all available columns as given is to omit columns.

Parameters:

flag (str, optional) – The flag to be set; “inferenceable” is currently the only supported flag.
columns (List, optional) – A list of column names specifying whether the flag should be set to False.

Returns:

The dictionary of the HTTP response from the connection request.

Return type:

Dict, optional

watchful.client.is_utf8(csv_bytes: bytes | None = None, filepath: str | None = None, threshold: float = 0.5, is_fast: bool = True) → bool[source]

This function attempts to detect if the encoding of the given bytes or the content of the given filepath is utf-8. It returns True if the detected encoding is utf-8 and has a confidence of the given threshold or more, otherwise False. This function may need some tweaking for a very large dataset, but should work with the is_fast argument set to True by default.

Parameters:

csv_bytes (bytes) – The csv dataset bytes.
filepath (str) – The path of the csv dataset file.
threshold (float, optional) – The minimum confidence required to accept the detected encoding.
is_fast (bool, optional) – Whether to use fast encoding detection with a lower accuracy, or not.

Returns:

True if the detected encoding is utf-8 and has a confidence of the given threshold or more, otherwise False.

Return type:

bool

watchful.client.label_single(row: List[str]) → List[str][source]

This function labels a candidate row.

Parameters:: row (List[str]) – The candidate row.
Returns:: The plabels for the candidate row.
Return type:: List[str]

watchful.client.list_projects() → Dict[source]

This function lists the available projects.

Returns:: The dictionary of the HTTP response from the connection request.
Return type:: Dict

watchful.client.load_attributes(dataset_id: str, attributes_filename: str) → Dict | None[source]

This function is used in the case of Watchful application being on the same machine as the data enrichment.

Parameters:

dataset_id (str) – The dataset id.
attributes_filename (str) – The attributes filename.

Returns:

The dictionary of the HTTP response from the connection request.

Return type:

Dict, optional

watchful.client.login(email: str, password: str) → Dict | None[source]

This function performs login with the email and password with Watchful hub.

Parameters:

email (str) – The user’s email.
password (str) – The user’s password.

Returns:

The dictionary of the HTTP response from the connection request.

Return type:

Dict, optional

watchful.client.open_project(id_: str) → str[source]

This function opens a project via its project id, which is the path to its hints file.

Parameters:: id (str) – The project id.
Returns:: The read HTTP response.
Return type:: str

watchful.client.peek(token: str | None = None) → Dict | None[source]

This function performs peek with Watchful hub.

Parameters:: token (str, optional) – The user’s auth token, defaults to None.
Returns:: The dictionary of the HTTP response from the connection request.
Return type:: Dict, optional

watchful.client.print_candidates(summary: Dict | None = None) → None[source]

This function retrieves and prints the column names and all the candidates.

Parameters:: summary (Dict, optional) – The dictionary of the HTTP response from a connection request, defaults to None.

watchful.client.publish(token: str | None = None) → Dict | None[source]

This function performs publish with Watchful hub.

Parameters:: token (str, optional) – The user’s auth token, defaults to None.
Returns:: The dictionary of the HTTP response from the connection request.
Return type:: Dict, optional

watchful.client.pull(token: str | None = None) → Dict | None[source]

This function performs pull with Watchful hub.

Parameters:: token (str, optional) – The user’s auth token, defaults to None.
Returns:: The dictionary of the HTTP response from the connection request.
Return type:: Dict, optional

watchful.client.push(token: str | None = None) → Dict | None[source]

This function performs push with Watchful hub.

Parameters:: token (str, optional) – The user’s auth token, defaults to None.
Returns:: The dictionary of the HTTP response from the connection request.
Return type:: Dict, optional

watchful.client.query(q: str, page: int = 0) → Dict | None[source]

This function queries for a page and returns an updated HTTP response.

Parameters:

q (str) – The query.
page (int, optional) – The page, defaults to 0.

Returns:

The dictionary of the HTTP response from the connection request.

Return type:

Dict, optional

watchful.client.query_all(q: str, max_pages: int = 0) → Generator[List[str], None, None][source]

This function evaluates the query returning the results as opposed to the summary. By default, it returns all results for the query (all pages). This can be limited by setting max_pages to the positive number of pages you want. Each query result is a vector with a string for each field that is returned. Note that TOKS, SENTS, CELLS queries only return one field and each result will be wrapped in a vector of one string.

Parameters:

q (str) – The query.
max_pages (int, optional) – The maximum page, defaults to 0.

Returns:

The fields.

Return type:

Generator[List[str], None, None]

watchful.client.query_async(q: str, page: int = 0) → Dict | None[source]

This function queries for a page. As it is asynchronous, the immediate HTTP response is likely not updated yet.

Parameters:

q (str) – The query.
page (int, optional) – The page, defaults to 0.

Returns:

The dictionary of the HTTP response from the connection request.

Return type:

Dict, optional

watchful.client.records(csv_: str) → Dict | None[source]

This function loads the csv dataset.

Parameters:: csv (str) – The csv dataset.
Returns:: The dictionary of the HTTP response from the connection request.
Return type:: Dict, optional

watchful.client.register_summary_hook(function: Callable) → None[source]

This function allows you to provide a function that will be called with every summary object that is returned from any API call to the /api endpoint, that being the raw response body before JSON parsing. This can be used, for example, to instrument a test suite with a function that writes every summary object to disk and then creating a dataset of Watchful summary objects for further analysis. Most SDK users probably won’t be reaching for this function every day, but if you find a clever use for it, let us know!

Parameters:: function (Callable) – Your function to be called with every summary string

watchful.client.request(method: str = 'GET', path: str = '/', **kwargs: Dict) → Response[source]

This is a wrapper function for API calls; made up of the API method, path and optional keyword arguments.

Parameters:

method (str) – The API method string in uppercase.
path (str) – The path string after the hostname and port.
kwargs (Dict) – Optional parameters to include in the API call.

Returns:

The HTTP response from the connection request.

Return type:

requests.models.Response

watchful.client.set_column_flag(columns: List[str] | None = None, flag: Literal['inferenceable'] = 'inferenceable', pos_sense: bool = True) → Dict | None[source]

This function sets a flag for each of the columns of the dataset in the currently active project. As a default, given columns will be set to True and all other columns will be set to False. A helper to indicate all available columns as given is to omit columns.

Parameters:

flag (str, optional) – The flag to be set; “inferenceable” is currently the only supported flag.
columns (List, optional) – A list of column names specifying whether the flag should be set.
pos_sense (bool, optional) – A boolean specifying whether the setting of the flag is in positive or negative sense.

Returns:

The dictionary of the HTTP response from the connection request.

Return type:

Dict, optional

watchful.client.set_hub_url(url: str) → Dict | None[source]

This function sets the Watchful hub URL of the Watchful client. The Watchful hub URL should not change after data has been fetched or published to a hub.

Parameters:: url (str) – The Watchful hub URL.
Returns:: The dictionary of the HTTP response from the connection request.
Return type:: Dict, optional

watchful.client.spawn_cmd(cmd: str, env: str | None = None) → int[source]

This function spawns a command and returns the PID of the spawned process.

Parameters:

cmd (str) – The command.
env (str, optional) – The environment, defaults to None.

Returns:

The PID of the spawned process.

Return type:

int

watchful.client.title(title_: str) → Dict | None[source]

This function gives a title to a newly created project.

Parameters:: title (str) – The title for the project.
Returns:: The dictionary of the HTTP response from the connection request.
Return type:: Dict, optional

watchful.client.upload_attributes(dataset_id: str, attributes_filepath: str) → Dict | None[source]

This function uploads the attributes for the dataset_id to the remote Watchful application, where the Watchful application then saves it to a filepath according to its stable application logic.

Parameters:

dataset_id (str) – The dataset id.
attributes_filepath (str) – The attributes filepath.

Returns:

The dictionary of the HTTP response from the connection request.

Return type:

Dict, optional

watchful.client.whoami(token: str | None = None) → Dict | None[source]

This function performs whoami with Watchful hub.

Parameters:: token (str, optional) – The user’s auth token, defaults to None.
Returns:: The dictionary of the HTTP response from the connection request.
Return type:: Dict, optional