geonature.core.imports.checks.dataframe.cast#
Functions#
|
Try to convert a date string to a datetime object. |
|
|
|
|
|
Check if a column is a datetime and convert it to datetime type. |
|
Check if a column is a UUID and convert it to UUID type. |
|
Check if a column is an integer and convert it to integer type. |
|
Check if column string values are numerics and convert it to numeric type. |
|
Check if column values have the right length. |
|
Check a boolean field in a dataframe. |
|
Check a field in a dataframe according to its type. |
|
Check the types of columns in a dataframe based on the provided fields. |
Module Contents#
- geonature.core.imports.checks.dataframe.cast.convert_to_datetime(value_raw)[source]#
Try to convert a date string to a datetime object. If the input string does not match any of compatible formats, it will return None.
Parameters#
- value_rawstr
The input string to convert
Returns#
- converted_datedatetime or None
The converted datetime object or None if the conversion failed
- geonature.core.imports.checks.dataframe.cast.check_datetime_field(df: pandas.DataFrame, source_field: str, target_field: str, required: bool) Set[str] [source]#
Check if a column is a datetime and convert it to datetime type.
Parameters#
- dfpandas.DataFrame
The dataframe to check.
- source_fieldstr
The name of the column to check.
- target_fieldstr
The name of the column where to store the result.
- requiredbool
Whether the column is mandatory or not.
Yields#
- dict
A dictionary containing an error code, the column name, and the invalid rows.
Returns#
- set
Set containing the name of the target field.
Notes#
- The error codes are:
INVALID_DATE: the value is not of datetime type.
- geonature.core.imports.checks.dataframe.cast.check_uuid_field(df: pandas.DataFrame, source_field: str, target_field: str, required: bool) Set[str] [source]#
Check if a column is a UUID and convert it to UUID type.
Parameters#
- dfpandas.DataFrame
The dataframe to check.
- source_fieldstr
The name of the column to check.
- target_fieldstr
The name of the column where to store the result.
- requiredbool
Whether the column is mandatory or not.
Yields#
- dict
A dictionary containing an error code, the column name, and the invalid rows.
Returns#
- set
Set containing the name of the target field.
Notes#
- The error codes are:
INVALID_UUID: the value is not a valid UUID.
- geonature.core.imports.checks.dataframe.cast.check_integer_field(df: pandas.DataFrame, source_field: str, target_field: str, required: bool) Set[str] [source]#
Check if a column is an integer and convert it to integer type.
Parameters#
- dfpandas.DataFrame
The dataframe to check.
- source_fieldstr
The name of the column to check.
- target_fieldstr
The name of the column where to store the result.
- requiredbool
Whether the column is mandatory or not.
Yields#
- dict
A dictionary containing an error code, the column name, and the invalid rows.
Returns#
- set
Set containing the name of the target field.
Notes#
- The error codes are:
INVALID_INTEGER: the value is not of integer type.
- geonature.core.imports.checks.dataframe.cast.check_numeric_field(df: pandas.DataFrame, source_field: str, target_field: str, required: bool) Set[str] [source]#
Check if column string values are numerics and convert it to numeric type.
Parameters#
- dfpandas.DataFrame
The dataframe to check.
- source_fieldstr
The name of the column to check.
- target_fieldstr
The name of the column where to store the result.
- requiredbool
Whether the column is mandatory or not.
Yields#
- dict
A dictionary containing an error code, the column name, and the invalid rows.
Returns#
- set
Set containing the name of the target field.
Notes#
- The error codes are:
INVALID_NUMERIC: the value is not of numeric type.
- geonature.core.imports.checks.dataframe.cast.check_unicode_field(df: pandas.DataFrame, field: str, field_length: int | None) Iterator[Dict[str, Any]] [source]#
Check if column values have the right length.
Parameters#
- dfpandas.DataFrame
The dataframe to check.
- fieldstr
The name of the column to check.
- field_lengthOptional[int]
The maximum length of the column.
Yields#
- dict
A dictionary containing an error code, the column name, and the invalid rows.
Notes#
- The error codes are:
INVALID_CHAR_LENGTH: the string is too long.
- geonature.core.imports.checks.dataframe.cast.check_boolean_field(df, source_col, dest_col, required)[source]#
Check a boolean field in a dataframe.
Parameters#
- dfpandas.DataFrame
The dataframe to check.
- source_colstr
The name of the column to check.
- dest_colstr
The name of the column where to store the result.
- requiredbool
Whether the column is mandatory or not.
Yields#
- dict
A dictionary containing an error code and the rows with errors.
Notes#
- The error codes are:
MISSING_VALUE: the value is mandatory but it’s missing (null).
INVALID_BOOL: the value is not a boolean.
- geonature.core.imports.checks.dataframe.cast.check_anytype_field(df: pandas.DataFrame, field_type: sqlalchemy.sql.sqltypes.TypeEngine, source_col: str, dest_col: str, required: bool) Set[str] [source]#
Check a field in a dataframe according to its type.
Parameters#
- dfpandas.DataFrame
The dataframe to check.
- field_typesqlalchemy.TypeEngine
The type of the column to check.
- source_colstr
The name of the column to check.
- dest_colstr
The name of the column where to store the result.
- requiredbool
Whether the column is mandatory or not.
Yields#
- dict
A dictionary containing an error code and the rows with errors.
Returns#
- set
Set containing the name of columns updated in the dataframe.
- geonature.core.imports.checks.dataframe.cast.check_types(entity: geonature.core.imports.models.Entity, df: pandas.DataFrame, fields: Dict[str, geonature.core.imports.models.BibFields]) Set[str] [source]#
Check the types of columns in a dataframe based on the provided fields.
Parameters#
- entityEntity
The entity to check.
- dfpd.DataFrame
The dataframe to check.
- fieldsDict[str, BibFields]
A dictionary mapping column names to their corresponding BibFields.
Returns#
- Set[str]
Set containing the names of updated columns.