geonature.core.imports.checks.dataframe#
Submodules#
Functions#
|
Check if required values are present in the dataframe. |
|
Check if the value in the count_min_field is lower or equal to the value in the count_max_field |
|
Check if datasets exist and are authorized for the user and import. |
|
What this check do: |
|
Check the types of columns in a dataframe based on the provided fields. |
|
Concatenates date and time columns to form datetime columns. |
Package Contents#
- geonature.core.imports.checks.dataframe.check_required_values(df: pandas.DataFrame, fields: Dict[str, geonature.core.imports.models.BibFields])[source]#
Check if required values are present in the dataframe.
Parameters#
- dfpandas.DataFrame
The dataframe to check.
- fieldsDict[str, BibFields]
Dictionary of fields to check.
Yields#
- dict
Dictionary containing the error code, the column name and the invalid rows.
Notes#
- Field is mandatory if: ((field.mandatory AND NOT (ANY optional_cond is not NaN)) OR (ANY mandatory_cond is not NaN))
<=> ((field.mandatory AND ALL optional_cond are NaN ) OR (ANY mandatory_cond is not NaN))
- geonature.core.imports.checks.dataframe.check_counts(df: pandas.DataFrame, count_min_field: str, count_max_field: str, default_count: int = None)[source]#
Check if the value in the count_min_field is lower or equal to the value in the count_max_field
count_min_field | count_max_field |————— | ————— |0 | 2 | –> ok2 | 0 | –> provoke an errorParameters#
- dfpandas.DataFrame
The dataframe to check.
- count_min_fieldBibField
The field containing the minimum count.
- count_max_fieldBibField
The field containing the maximum count.
- default_countobject, optional
The default count to use if a count is missing, by default None.
Yields#
- dict
Dictionary containing the error code, the column name and the invalid rows.
Returns#
- set
Set of columns updated.
- geonature.core.imports.checks.dataframe.check_datasets(imprt: geonature.core.imports.models.TImports, df: pandas.DataFrame, uuid_field: geonature.core.imports.models.BibFields, id_field: geonature.core.imports.models.BibFields, module_code: str, object_code: str | None = None) Set[str] [source]#
Check if datasets exist and are authorized for the user and import.
Parameters#
- imprtTImports
Import to check datasets for.
- dfpd.DataFrame
Dataframe to check.
- uuid_fieldBibFields
Field containing dataset UUIDs.
- id_fieldBibFields
Field to fill with dataset IDs.
- module_codestr
Module code to check datasets for.
- object_codeOptional[str], optional
Object code to check datasets for, by default None.
Yields#
- dict
Dictionary containing error code, column name and invalid rows.
Returns#
- Set[str]
Set of columns updated.
- geonature.core.imports.checks.dataframe.check_geometry(df: pandas.DataFrame, file_srid: int, geom_4326_field: geonature.core.imports.models.BibFields, geom_local_field: geonature.core.imports.models.BibFields, wkt_field: geonature.core.imports.models.BibFields = None, latitude_field: geonature.core.imports.models.BibFields = None, longitude_field: geonature.core.imports.models.BibFields = None, codecommune_field: geonature.core.imports.models.BibFields = None, codemaille_field: geonature.core.imports.models.BibFields = None, codedepartement_field: geonature.core.imports.models.BibFields = None, id_area: int = None)[source]#
What this check do: - check there is at least a wkt, a x/y or a code defined for each row
(report NO-GEOM if there are not, or MULTIPLE_ATTACHMENT_TYPE_CODE if several are defined)
set geom_local or geom_4326 or both (depending of file_srid) from wkt or x/y - check wkt validity - check x/y validity
check wkt & x/y bounding box
What this check does not do (done later in SQL): - set geom_4326 & geom_local from code
verify code validity
set geom_4326 from geom_local, or reciprocally, depending of file_srid
set geom_point
check geom validity (ST_IsValid)
FIXME: area from code are never checked in bounding box!
Parameters#
- dfpandas.DataFrame
The dataframe to check
- file_sridint
The srid of the file
- geom_4326_fieldBibFields
The column in the dataframe that contains geometries in SRID 4326
- geom_local_fieldBibFields
The column in the dataframe that contains geometries in the SRID of the area
- wkt_fieldBibFields, optional
The column in the dataframe that contains geometries” WKT
- latitude_fieldBibFields, optional
The column in the dataframe that contains latitudes
- longitude_fieldBibFields, optional
The column in the dataframe that contains longitudes
- codecommune_fieldBibFields, optional
The column in the dataframe that contains commune codes
- codemaille_fieldBibFields, optional
The column in the dataframe that contains maille codes
- codedepartement_fieldBibFields, optional
The column in the dataframe that contains departement codes
- id_areaint, optional
The id of the area to check if the geometry is inside (Not Implemented)
- geonature.core.imports.checks.dataframe.check_types(entity: geonature.core.imports.models.Entity, df: pandas.DataFrame, fields: Dict[str, geonature.core.imports.models.BibFields]) Set[str] [source]#
Check the types of columns in a dataframe based on the provided fields.
Parameters#
- entityEntity
The entity to check.
- dfpd.DataFrame
The dataframe to check.
- fieldsDict[str, BibFields]
A dictionary mapping column names to their corresponding BibFields.
Returns#
- Set[str]
Set containing the names of updated columns.
- geonature.core.imports.checks.dataframe.concat_dates(df: pandas.DataFrame, datetime_min_col: str, datetime_max_col: str, date_min_col: str, date_max_col: str = None, hour_min_col: str = None, hour_max_col: str = None)[source]#
Concatenates date and time columns to form datetime columns.
Parameters#
- dfpandas.DataFrame
The input DataFrame.
- datetime_min_colstr
The column name for the minimum datetime.
- datetime_max_colstr
The column name for the maximum datetime.
- date_min_colstr
The column name for the minimum date.
- date_max_colstr, optional
The column name for the maximum date.
- hour_min_colstr, optional
The column name for the minimum hour.
- hour_max_colstr, optional
The column name for the maximum hour.