Tools > Validators
Data Validators.
This module provides utility functions for data validation, such as asserting that all sublists within a list have the same elements, regardless of their order.
StationData
Bases: BaseModel
Represent the weather station's data with validation rules, ensuring data consistency and integrity.
This model includes comprehensive validation for each attribute to ensure that data about weather stations is accurate and in the correct format. It handles geographical coordinates, station identification, and operational dates with specific constraints.
Attributes:
Region : str The geographical region code of the weather station, automatically converted to uppercase. It must be between 1 to 2 characters long. State : str The state code where the weather station is located, automatically converted to uppercase and required to be exactly 2 characters long. StationName : str The name of the weather station, automatically converted to uppercase. It can include both letters and numbers. IdStationWho : IdStationWhoType A unique identifier for the weather station, following a specific format ('A' followed by 3 digits). Latitude : float The geographical latitude of the station. This model accepts both comma and dot as decimal separators to accommodate different formats. Longitude : float The geographical longitude of the station. Similar to Latitude, it accepts both comma and dot for decimal separation. Altitude : float The station's altitude in meters above sea level. Accepts string input with comma or dot decimal separators and converts it to a float. FoundingDate : date The date when the station was established. It supports various date formats, including 'dd/mm/yyyy' and 'dd/mm/yy', and ensures that the date is converted into a standard date object.
Methods:
parse_geo_coords(cls, value: str) -> float:
Class method to parse geographical coordinates from string to float.
It's designed to accommodate the Brazilian format for decimal numbers,
converting commas to dots.
parse_date(cls, value: str) -> date:
Class method to parse and validate the founding date from a string
into a datetime.date
object. It supports multiple date formats
for flexibility.
Source code in app/tools/validators.py
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 |
|
parse_date(value)
classmethod
Parse and validate foundation dates in multiple formats.
This validator function attempts to parse the date from a given string. It supports two date formats: 'dd/mm/yyyy' and 'dd/mm/yy'. This flexibility allows for handling variations in the date format.
Parameters
value : str The string value of the date to be parsed.
Returns
datetime.date The parsed date as a datetime.date object.
Raises
ValueError If the provided value does not match any of the supported date formats.
Example
parse_date("19/07/2020") datetime.date(2020, 7, 19) parse_date("19/07/20") datetime.date(2020, 7, 19)
Source code in app/tools/validators.py
232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 |
|
parse_geo_coords(value)
classmethod
Parse a string input representing geographic coordinates, converting it to a float.
This method accommodates the common Brazilian format for decimal numbers, where commas are used as decimal separators.
Parameters:
value : str The geographic coordinate as a string, potentially using a comma for decimal separation.
Returns:
float The geographic coordinate as a float.
Raises:
ValueError If the input string cannot be parsed into a float, indicating an invalid format.
Source code in app/tools/validators.py
197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 |
|
WeatherData
Bases: BaseModel
Represent meteorological data for a weather station, ensuring data integrity through validation.
This model encapsulates and validates a range of meteorological measurements, such as temperature, humidity, atmospheric pressure, wind speed, and direction. It is designed to accommodate the nuances of meteorological data, including the allowance of NaN values for certain fields where data might be missing.
Attributes:
IdStationWho : IdStationWhoType The unique identifier for the weather station, adhering to a specific format. Date : date The date on which the meteorological measurements were taken. Time : time The time at which the meteorological measurements were recorded, with support for UTC notation. TotalPrecipitation : float The total precipitation measured in millimeters. NaN values are permitted to indicate missing or invalid data. MaxAtmosphericPressure : float The maximum atmospheric pressure measured. NaN values are permitted to indicate missing or invalid data. MinAtmosphericPressure : float The minimum atmospheric pressure measured. NaN values are permitted to indicate missing or invalid data. GlobalRadiation : float The global radiation measured in Kj/m². NaN values are permitted to indicate missing or invalid data. DryBulbTemperature : float The air temperature measured by a dry bulb thermometer in degrees Celsius. NaN values are permitted to indicate missing or invalid data. DewPointTemperature : float The dew point temperature in degrees Celsius. NaN values are permitted to indicate missing or invalid data. MaxTemperature : float The maximum temperature recorded in the last hour in degrees Celsius. NaN values are permitted to indicate missing or invalid data. MinTemperature : float The minimum temperature recorded in the last hour in degrees Celsius. NaN values are permitted to indicate missing or invalid data. MaxDewPointTemperature : float The maximum dew point temperature recorded in the last hour in degrees Celsius. NaN values are permitted to indicate missing or invalid data. MinDewPointTemperature : float The minimum dew point temperature recorded in the last hour in degrees Celsius. NaN values are permitted to indicate missing or invalid data. MaxRelativeHumidity : float The maximum relative humidity recorded in the last hour, expressed as a percentage. NaN values are permitted to indicate missing or invalid data. MinRelativeHumidity : float The minimum relative humidity recorded in the last hour, expressed as a percentage. Allows NaN values. RelativeHumidity : float The relative humidity, expressed as a percentage. NaN values are permitted to indicate missing or invalid data. WindDirection : float The wind direction, in degrees from true north. NaN values are permitted to indicate missing or invalid data. MaxWindGust : float The maximum wind gust speed recorded in meters per second. NaN values are permitted to indicate missing or invalid data. WindSpeed : float The wind speed in meters per second. NaN values are permitted to indicate missing or invalid data.
Methods:
parse_custom_date_format(value: str) -> date:
Parses and validates a date string formatted as 'yyyy/mm/dd',
ensuring it conforms to this specific format.
parse_time_utc(value: str) -> time:
Parses and validates a time string formatted with UTC notation
('HHMM UTC'), converting it to a time
object.
parse_to_float(value: float) -> float | None:
Validates and adjusts float fields, specifically handling
NaN values.
set_nan_out_range(value: float) -> float | None:
Validates and adjusts float fields, specifically handling
NaN values and converting negative values to None.
Notes:
The inclusion of NaN values and the conversion of negative values to None are crucial for maintaining the integrity of meteorological data, acknowledging the presence of missing or non-applicable measurements.
Source code in app/tools/validators.py
276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 |
|
parse_custom_date_format(value)
classmethod
Parse and validates a string representing a date into a date object, expecting the format 'yyyy/mm/dd'.
This method is designed to ensure consistency in date representation within meteorological data, specifically accommodating the international standard format.
Parameters:
value : str The string representation of a date, expected to be in the 'yyyy/mm/dd' format.
Returns:
datetime.date The date converted into a date object.
Raises:
ValueError If the input string does not match the expected date format, indicating an invalid date format.
Source code in app/tools/validators.py
388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 |
|
parse_time_utc(value)
classmethod
Parse a string representing time with UTC notation ('HHMM UTC') into a time object.
This function standardizes the representation of time within the dataset, aligning with international time notation standards.
Parameters:
value : str The time as a string in 'HHMM UTC' format.
Returns:
datetime.time The time converted into a time object.
Raises:
ValueError If the string is not in the 'HHMM UTC' format or cannot be parsed into a time object.
Source code in app/tools/validators.py
429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 |
|
parse_to_float(value)
classmethod
Parse and validates string inputs for temperature and dew point fields, allowing for Brazilian numeric format.
Converts string representations of numerical values, which may use commas as decimal separators, into floats. This caters to the Brazilian format for decimal numbers and ensures that the data is accurately represented and validated.
Parameters:
value : str The string representation of a numerical value, potentially using a comma as the decimal separator.
Returns:
float The numeric value converted into a float.
Raises:
ValueError If the input value cannot be converted into a float, indicating an invalid numeric format.
Source code in app/tools/validators.py
465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 |
|
set_nan_out_range(value)
classmethod
Validate numerical fields, allowing NaN values and converting negative or improperly formatted values to None.
This method ensures that meteorological measurements are within logical ranges, acknowledging the possibility of missing data (represented as NaN) and correcting any negative values that do not make sense in the context of the measurement being taken.
Parameters:
value : Any The value to validate, which may be a numerical value or NaN.
Returns:
float | None The original value if it's a valid number or None if the value is negative or improperly formatted.
Note:
This method emphasizes the flexibility required in handling meteorological data, particularly in accommodating missing data points and ensuring data integrity.
Source code in app/tools/validators.py
511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 |
|
validate_data_quality(df, output_path, file_name, schema)
Validate each row in a DataFrame against a Pydantic schema and log validation errors.
This function iterates through the DataFrame, attempting to create instances of the specified Pydantic schema with each row's data. If a row fails validation, the error is logged. The log file is created only if there are invalid records.
Parameters:
df : pd.DataFrame The DataFrame containing data to be validated. output_path : str The directory path where the log file will be saved, if necessary. file_name : str The name of the log file for recording validation errors, without the extension. schema : BaseModel The Pydantic model against which data rows will be validated.
Yields:
Yields instances of the Pydantic schema for valid rows or logs validation errors for invalid rows.
Source code in app/tools/validators.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
|
validate_sublists(list_with_sublists)
Confirm if all sublists within a given list contain identical elements, regardless of their order.
This function is crucial for ensuring dataframes have consistent column names across multiple files. It assesses each sublist (representing dataframe columns) to verify they all contain the same elements (column names).
Parameters:
list_with_sublists : List[List[str]] A list containing sublists to be validated for identical elements.
Returns:
bool True if all sublists contain identical elements; False otherwise.
Raises:
ValueError Raised if any sublist differs in elements, indicating inconsistent column names.
Examples:
validate_sublists([["A", "B"], ["B", "A"]]) True
validate_sublists([["C", "B"], ["B", "A"]]) ValueError: Sublists do not contain the same elements.
Source code in app/tools/validators.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
|