geonature.core.imports.checks.dataframe

Submodules

Functions

check_required_values(df, fields)

Check if required values are present in the dataframe.

check_counts(df, count_min_field, count_max_field[, ...])

Check if the value in the count_min_field is lower or equal to the value in the count_max_field

check_datasets(→ Set[str])

Check if datasets exist and are authorized for the user and import.

check_geometry(df, file_srid, geom_4326_field, ...[, ...])

What this check do:

check_types(→ Set[str])

Check the types of columns in a dataframe based on the provided fields.

concat_dates(df, datetime_min_col, datetime_max_col, ...)

Concatenates date and time columns to form datetime columns.

Package Contents

geonature.core.imports.checks.dataframe.check_required_values(df: pandas.DataFrame, fields: Dict[str, geonature.core.imports.models.BibFields])[source]

Check if required values are present in the dataframe.

Parameters

dfpandas.DataFrame

The dataframe to check.

fieldsDict[str, BibFields]

Dictionary of fields to check.

Yields

dict

Dictionary containing the error code, the column name and the invalid rows.

Notes

Field is mandatory if: ((field.mandatory AND NOT (ANY optional_cond is not NaN)) OR (ANY mandatory_cond is not NaN))

<=> ((field.mandatory AND ALL optional_cond are NaN ) OR (ANY mandatory_cond is not NaN))

geonature.core.imports.checks.dataframe.check_counts(df: pandas.DataFrame, count_min_field: str, count_max_field: str, default_count: int = None)[source]

Check if the value in the count_min_field is lower or equal to the value in the count_max_field

count_min_field | count_max_field |
————— | ————— |
0 | 2 | –> ok
2 | 0 | –> provoke an error

Parameters

dfpandas.DataFrame

The dataframe to check.

count_min_fieldBibField

The field containing the minimum count.

count_max_fieldBibField

The field containing the maximum count.

default_countobject, optional

The default count to use if a count is missing, by default None.

Yields

dict

Dictionary containing the error code, the column name and the invalid rows.

Returns

set

Set of columns updated.

geonature.core.imports.checks.dataframe.check_datasets(imprt: geonature.core.imports.models.TImports, df: pandas.DataFrame, uuid_field: geonature.core.imports.models.BibFields, id_field: geonature.core.imports.models.BibFields, module_code: str, object_code: str | None = None) Set[str][source]

Check if datasets exist and are authorized for the user and import.

Parameters

imprtTImports

Import to check datasets for.

dfpd.DataFrame

Dataframe to check.

uuid_fieldBibFields

Field containing dataset UUIDs.

id_fieldBibFields

Field to fill with dataset IDs.

module_codestr

Module code to check datasets for.

object_codeOptional[str], optional

Object code to check datasets for, by default None.

Yields

dict

Dictionary containing error code, column name and invalid rows.

Returns

Set[str]

Set of columns updated.

geonature.core.imports.checks.dataframe.check_geometry(df: pandas.DataFrame, file_srid: int, geom_4326_field: geonature.core.imports.models.BibFields, geom_local_field: geonature.core.imports.models.BibFields, wkt_field: geonature.core.imports.models.BibFields = None, latitude_field: geonature.core.imports.models.BibFields = None, longitude_field: geonature.core.imports.models.BibFields = None, codecommune_field: geonature.core.imports.models.BibFields = None, codemaille_field: geonature.core.imports.models.BibFields = None, codedepartement_field: geonature.core.imports.models.BibFields = None, id_area: int = None)[source]

What this check do: - check there is at least a wkt, a x/y or a code defined for each row

(report NO-GEOM if there are not, or MULTIPLE_ATTACHMENT_TYPE_CODE if several are defined)

  • set geom_local or geom_4326 or both (depending of file_srid) from wkt or x/y - check wkt validity - check x/y validity

  • check wkt & x/y bounding box

What this check does not do (done later in SQL): - set geom_4326 & geom_local from code

  • verify code validity

  • set geom_4326 from geom_local, or reciprocally, depending of file_srid

  • set geom_point

  • check geom validity (ST_IsValid)

FIXME: area from code are never checked in bounding box!

Parameters

dfpandas.DataFrame

The dataframe to check

file_sridint

The srid of the file

geom_4326_fieldBibFields

The column in the dataframe that contains geometries in SRID 4326

geom_local_fieldBibFields

The column in the dataframe that contains geometries in the SRID of the area

wkt_fieldBibFields, optional

The column in the dataframe that contains geometries” WKT

latitude_fieldBibFields, optional

The column in the dataframe that contains latitudes

longitude_fieldBibFields, optional

The column in the dataframe that contains longitudes

codecommune_fieldBibFields, optional

The column in the dataframe that contains commune codes

codemaille_fieldBibFields, optional

The column in the dataframe that contains maille codes

codedepartement_fieldBibFields, optional

The column in the dataframe that contains departement codes

id_areaint, optional

The id of the area to check if the geometry is inside (Not Implemented)

geonature.core.imports.checks.dataframe.check_types(entity: geonature.core.imports.models.Entity, df: pandas.DataFrame, fields: Dict[str, geonature.core.imports.models.BibFields]) Set[str][source]

Check the types of columns in a dataframe based on the provided fields.

Parameters

entityEntity

The entity to check.

dfpd.DataFrame

The dataframe to check.

fieldsDict[str, BibFields]

A dictionary mapping column names to their corresponding BibFields.

Returns

Set[str]

Set containing the names of updated columns.

geonature.core.imports.checks.dataframe.concat_dates(df: pandas.DataFrame, datetime_min_col: str, datetime_max_col: str, date_min_col: str, date_max_col: str = None, hour_min_col: str = None, hour_max_col: str = None)[source]

Concatenates date and time columns to form datetime columns.

Parameters

dfpandas.DataFrame

The input DataFrame.

datetime_min_colstr

The column name for the minimum datetime.

datetime_max_colstr

The column name for the maximum datetime.

date_min_colstr

The column name for the minimum date.

date_max_colstr, optional

The column name for the maximum date.

hour_min_colstr, optional

The column name for the minimum hour.

hour_max_colstr, optional

The column name for the maximum hour.