geonature.core.imports.checks.dataframe.cast

Functions

convert_to_datetime(value_raw)

Try to convert a date string to a datetime object.

convert_to_uuid(value)

convert_to_integer(value)

check_datetime_field(→ Set[str])

Check if a column is a datetime and convert it to datetime type.

check_uuid_field(→ Set[str])

Check if a column is a UUID and convert it to UUID type.

check_integer_field(→ Set[str])

Check if a column is an integer and convert it to integer type.

check_numeric_field(→ Set[str])

Check if column string values are numerics and convert it to numeric type.

check_unicode_field(→ Iterator[Dict[str, Any]])

Check if column values have the right length.

check_boolean_field(df, source_col, dest_col, required)

Check a boolean field in a dataframe.

check_anytype_field(→ Set[str])

Check a field in a dataframe according to its type.

check_types(→ Set[str])

Check the types of columns in a dataframe based on the provided fields.

Module Contents

geonature.core.imports.checks.dataframe.cast.convert_to_datetime(value_raw)[source]

Try to convert a date string to a datetime object. If the input string does not match any of compatible formats, it will return None.

Parameters

value_rawstr

The input string to convert

Returns

converted_datedatetime or None

The converted datetime object or None if the conversion failed

geonature.core.imports.checks.dataframe.cast.convert_to_uuid(value)[source]
geonature.core.imports.checks.dataframe.cast.convert_to_integer(value)[source]
geonature.core.imports.checks.dataframe.cast.check_datetime_field(df: pandas.DataFrame, source_field: str, target_field: str, required: bool) Set[str][source]

Check if a column is a datetime and convert it to datetime type.

Parameters

dfpandas.DataFrame

The dataframe to check.

source_fieldstr

The name of the column to check.

target_fieldstr

The name of the column where to store the result.

requiredbool

Whether the column is mandatory or not.

Yields

dict

A dictionary containing an error code, the column name, and the invalid rows.

Returns

set

Set containing the name of the target field.

Notes

The error codes are:
  • INVALID_DATE: the value is not of datetime type.

geonature.core.imports.checks.dataframe.cast.check_uuid_field(df: pandas.DataFrame, source_field: str, target_field: str, required: bool) Set[str][source]

Check if a column is a UUID and convert it to UUID type.

Parameters

dfpandas.DataFrame

The dataframe to check.

source_fieldstr

The name of the column to check.

target_fieldstr

The name of the column where to store the result.

requiredbool

Whether the column is mandatory or not.

Yields

dict

A dictionary containing an error code, the column name, and the invalid rows.

Returns

set

Set containing the name of the target field.

Notes

The error codes are:
  • INVALID_UUID: the value is not a valid UUID.

geonature.core.imports.checks.dataframe.cast.check_integer_field(df: pandas.DataFrame, source_field: str, target_field: str, required: bool) Set[str][source]

Check if a column is an integer and convert it to integer type.

Parameters

dfpandas.DataFrame

The dataframe to check.

source_fieldstr

The name of the column to check.

target_fieldstr

The name of the column where to store the result.

requiredbool

Whether the column is mandatory or not.

Yields

dict

A dictionary containing an error code, the column name, and the invalid rows.

Returns

set

Set containing the name of the target field.

Notes

The error codes are:
  • INVALID_INTEGER: the value is not of integer type.

geonature.core.imports.checks.dataframe.cast.check_numeric_field(df: pandas.DataFrame, source_field: str, target_field: str, required: bool) Set[str][source]

Check if column string values are numerics and convert it to numeric type.

Parameters

dfpandas.DataFrame

The dataframe to check.

source_fieldstr

The name of the column to check.

target_fieldstr

The name of the column where to store the result.

requiredbool

Whether the column is mandatory or not.

Yields

dict

A dictionary containing an error code, the column name, and the invalid rows.

Returns

set

Set containing the name of the target field.

Notes

The error codes are:
  • INVALID_NUMERIC: the value is not of numeric type.

geonature.core.imports.checks.dataframe.cast.check_unicode_field(df: pandas.DataFrame, field: str, field_length: int | None) Iterator[Dict[str, Any]][source]

Check if column values have the right length.

Parameters

dfpandas.DataFrame

The dataframe to check.

fieldstr

The name of the column to check.

field_lengthOptional[int]

The maximum length of the column.

Yields

dict

A dictionary containing an error code, the column name, and the invalid rows.

Notes

The error codes are:
  • INVALID_CHAR_LENGTH: the string is too long.

geonature.core.imports.checks.dataframe.cast.check_boolean_field(df, source_col, dest_col, required)[source]

Check a boolean field in a dataframe.

Parameters

dfpandas.DataFrame

The dataframe to check.

source_colstr

The name of the column to check.

dest_colstr

The name of the column where to store the result.

requiredbool

Whether the column is mandatory or not.

Yields

dict

A dictionary containing an error code and the rows with errors.

Notes

The error codes are:
  • MISSING_VALUE: the value is mandatory but it’s missing (null).

  • INVALID_BOOL: the value is not a boolean.

geonature.core.imports.checks.dataframe.cast.check_anytype_field(df: pandas.DataFrame, field_type: sqlalchemy.sql.sqltypes.TypeEngine, source_col: str, dest_col: str, required: bool) Set[str][source]

Check a field in a dataframe according to its type.

Parameters

dfpandas.DataFrame

The dataframe to check.

field_typesqlalchemy.TypeEngine

The type of the column to check.

source_colstr

The name of the column to check.

dest_colstr

The name of the column where to store the result.

requiredbool

Whether the column is mandatory or not.

Yields

dict

A dictionary containing an error code and the rows with errors.

Returns

set

Set containing the name of columns updated in the dataframe.

geonature.core.imports.checks.dataframe.cast.check_types(entity: geonature.core.imports.models.Entity, df: pandas.DataFrame, fields: Dict[str, geonature.core.imports.models.BibFields]) Set[str][source]

Check the types of columns in a dataframe based on the provided fields.

Parameters

entityEntity

The entity to check.

dfpd.DataFrame

The dataframe to check.

fieldsDict[str, BibFields]

A dictionary mapping column names to their corresponding BibFields.

Returns

Set[str]

Set containing the names of updated columns.