stc_unicef_cpi.data package
Submodules
stc_unicef_cpi.data.cv_loaders module
- class stc_unicef_cpi.data.cv_loaders.HexSpatialKFold(n_splits=5, *, random_state=None, hex_idx=None)
Bases:
KFoldNB lightly modified version of GroupKFold - new code takes hex codes passed and generates n_split suitable groups, rather than requiring these to be passed along with X, y, as in original GroupKFold
- get_even_clusters(X, n_clusters)
- get_spatial_groups(X)
- haversine(latlon1, latlon2)
Calculate the great circle distance between two points on the earth (specified in decimal degrees)
- split(X: Union[DataFrame, ndarray], y: Optional[Union[Series, ndarray]] = None, groups: Optional[Union[Series, ndarray]] = None) Iterator[Tuple[ndarray, ndarray]]
Generate indices to split data into training and test set.
- Parameters
X (Union[pd.DataFrame, np.ndarray]) – array-like of shape (n_samples, n_features) Training data, where n_samples is the number of samples and n_features is the number of features.
y (Union[pd.Series, np.ndarray], optional) – array-like of shape (n_samples,), The target variable for supervised learning problems, defaults to None
groups (Union[pd.Series, np.ndarray], optional) – Spatial group labels for the samples used while splitting the dataset into train/test set, defaults to None
- Returns
Generator of tuples of train and test indices
- Return type
Iterator[Tuple[np.ndarray, np.ndarray]]
- Yield
Next set of train, test indices
- Return type
Iterator[Tuple[np.ndarray, np.ndarray]]
- class stc_unicef_cpi.data.cv_loaders.KerasDataGenerator(hex_idxs: ndarray, batch_size=32, dim=(16, 16), data_files: Optional[Union[List[str], List[Path]]] = None, shuffle=True)
Bases:
SequenceGenerates data for Keras
- on_epoch_end()
Updates indexes after each epoch
- class stc_unicef_cpi.data.cv_loaders.StratifiedIntervalKFold(n_splits=5, *, n_cuts=5, shuffle=False, random_state=None)
Bases:
StratifiedKFoldNB lightly edited version of stratified KFold - difference is just that class labels are generated using pd cut to make n_cuts even intervals (to improve folds w inflated vals), rather than just using values themselves
- stc_unicef_cpi.data.cv_loaders.cv_split(all_hex_idxs: ndarray, labels: ndarray, k: int, mode='normal', seed=42, strat_cuts=5)
Generate k folds on (fixed order) hex dataset - either fully random (normal), stratified by interval (stratified), or spatially (spatial) :param all_hex_idxs: Array of hex codes of dataset :type all_hex_idxs: np.ndarray of type int :param labels: corresponding target labels for these idxs :type labels: np.ndarray of type int :param k: Number of folds :type k: int :param mode: mode to generate folds, choice of [‘normal’,’stratified’,’spatial’]. Defaults to ‘normal’ (fully random) :type mode: str, optional :param seed: random seed, defaults to 42 :type seed: int, optional :param strat_cuts: number of intervals to cut the data into for stratified CV, defaults to 5 :type strat_cuts: int, optional :return: folds :rtype: _type_
stc_unicef_cpi.data.get_cell_tower_data module
GET CELL TOWER DATA FROM OPEN CELL ID
- stc_unicef_cpi.data.get_cell_tower_data.get_cell_data(country, save_path)
get_cell_data _summary_ :param country: _description_ :type country: _type_ :param token: _description_ :type token: _type_ :return: _description_ :rtype: _type_
- stc_unicef_cpi.data.get_cell_tower_data.get_opencell_url(country, token)
Get Open Cell Id data :param country: _description_ :type country: _type_ :param token: _description_ :type token: _type_ :return: _description_ :rtype: _type_
stc_unicef_cpi.data.get_drive_data module
- stc_unicef_cpi.data.get_drive_data.download_from_drive_folder(country, folder_id, scopes=['https://www.googleapis.com/auth/drive.readonly'])
Download content from google drive folder containing google earth engine images :param folder_id: folder id, retrievable from the url :type folder_id: str :param scopes: _description_, defaults to [’https://www.googleapis.com/auth/drive.readonly’] :type scopes: list, optional
stc_unicef_cpi.data.get_econ_data module
Download econ and facilities data
- stc_unicef_cpi.data.get_econ_data.download_econ_data(out_dir)
Download economic data :param out_dir: path to output directory, defaults to c.econ_data :type out_dir: str, optional
- stc_unicef_cpi.data.get_econ_data.get_data_from_calibrated_nighttime(url, out_dir, dir)
Get data from calibrated nighttime light data, dataset authored by Jiandong Chen, Ming Gao :param url: url of data to download :type url: str :param out_dir: path to output directory :type out_dir: str :param dir: path to specific data type :type dir: str
stc_unicef_cpi.data.get_facebook_data module
GET DELIVERY ESTIMATES FROM FACEBOOK MARKETING API
- stc_unicef_cpi.data.get_facebook_data.define_params(lat, lon, radius, opt)
Define search parameters
- stc_unicef_cpi.data.get_facebook_data.delivery_estimate(account, lat, long, radius, opt)
- stc_unicef_cpi.data.get_facebook_data.fb_api_init(token, id)
Init Facebook API
- Parameters
token – Access token
id – Account id
- Returns
api and account connection
- Return type
conn
- stc_unicef_cpi.data.get_facebook_data.get_facebook_estimates(coords, out_dir, name_out, res)
Get delivery estimates from a lists of coordinates
- Returns
- Return type
- stc_unicef_cpi.data.get_facebook_data.point_delivery_estimate(account, lat, lon, radius, opt)
Point delivery estimate :return: _description_ :rtype: _type_
stc_unicef_cpi.data.get_osm_data module
- stc_unicef_cpi.data.get_osm_data.add_neighboring_hexagons(hex_codes, hex_code_col='hex_code')
Get all hexagons and their respective coordinates :param hex_codes: list of hexagons codes :type hex_codes: list :return: hex_codes and geometry of polygons :rtype: list
- stc_unicef_cpi.data.get_osm_data.assign_cluster(results, country, res, lat='@lat', long='@lon', hex_code_col='hex_code')
Assign H3 cluster with a specified resolution :param results: dataframe with central point of the road lat, lon, length, type_road :type results: dataframe :param country: country of interest :type country: str :param res: resolution :type res: int :returns: hexes with road length :rtype: dataframe
- stc_unicef_cpi.data.get_osm_data.assign_road_length_to_hex(coords)
Query the input though Overpass to get road length :param coords: coordinates of polygon :type coords: list :returns:lat, lon, length and type of highway :rtype: dataframe
- stc_unicef_cpi.data.get_osm_data.format_polygon_coords(geometry)
Format the coordinates for Overpass :param geometry: string of polygon :type geometry: str :return: formatted string :rtype: str
- stc_unicef_cpi.data.get_osm_data.get_osm_info(query_osm_road)
Parse query through Overpass to access Open Street :param query_osm_road: string of a query :type query_osm_road: str :returns: return dataframe with data accessed :rtype: dataframe
- stc_unicef_cpi.data.get_osm_data.get_road_density(country, res)
Get road density :param country: country of interest :type country: str :param res: grid resolution :type res: int :return: road density at hex level :rtype: dataframe
- stc_unicef_cpi.data.get_osm_data.query_osm_road(coords, elem='way')
Build query to access lat, long, lengths and type of roads in a polygon :param geometry: string of polygon :type geometry: str :param elem: specify whether you want ways or also nodes and relations :returns: string of the query :rtype: str
stc_unicef_cpi.data.get_satellite_data module
- class stc_unicef_cpi.data.get_satellite_data.SatelliteImages(country, folder='gee', res=500, start='2010-01-01', end='2020-01-01')
Bases:
objectGet Satellite Images From Google Earth Engine
- export_drive(config) dict
Export tiff file into drive :param config: Configuration of output tiff :type config: dictionary
- get_copernicus_data(transform, proj, ctry, geo, start_date, end_date, name='cpi_cop_land') dict
Get status and evolution of land surface at global scale :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :param start_date: starting date :type start_date: str :param end_date: ending date :type end_date: str :param name: name of file, defaults to “cpi_cop_land” :type name: str, optional :return: task status :rtype: dictionary
- get_country_boundaries()
Get countries boundaries
- get_healthcare_data(transform, proj, ctry, geo, name='cpi_health_acc') dict
Get health care data :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :param name: name of file, defaults to “cpi_health_acc” :type name: str, optional :return: task status :rtype: dictionary
- get_land_use_data(transform, proj, ctry, geo, name='cpi_ghsl') dict
Get land use data :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :param name: name of file, defaults to “cpi_ghsl” :type name: str, optional :return: task status :rtype: dictionary
- get_ndvi_data(transform, proj, ctry, geo, start_date, end_date, name='cpi_ndvi') dict
Get Normalized Difference Vegetation Index :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :param start_date: starting date :type start_date: str :param end_date: ending date :type end_date: str :param name: name of file, defaults to “cpi_ndwi” :type name: str, optional :return: task status :rtype: dictionary
- get_ndwi_data(transform, proj, ctry, geo, start_date, end_date, name='cpi_ndwi') dict
Get Normalized Difference Water Index (NDWI) :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :param start_date: starting date :type start_date: str :param end_date: ending date :type end_date: str :param name: name of file, defaults to “cpi_ndwi” :type name: str, optional :return: task status :rtype: dictionary
- get_nighttime_data(transform, proj, ctry, geo, start_date, end_date, name='cpi_nighttime') dict
Get nighttime data :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :param start_date: starting date :type start_date: str :param end_date: ending date :type end_date: str :param name: name of file, defaults to “cpi_ndwi” :type name: str, optional :return: task status :rtype: dictionary
- get_pollution_data(transform, proj, ctry, geo, start_date, end_date, name='cpi_pollution') dict
Get pollution data :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :param start_date: starting date :type start_date: str :param end_date: ending date :type end_date: str :param name: name of file, defaults to “cpi_ndwi” :type name: str, optional :return: task status :rtype: dictionary
- get_pop_data(transform, proj, geo, name='cpi_poptotal') dict
Get 2020 population estimates in country, by age and sex :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param geo: dissolved geometry of all features in the collection :type geo: geometry :param name: name of file, defaults to “cpi_poptotal” :type name: str, optional :return: task status :rtype: dictionary
- get_precipitation_data(transform, proj, ctry, geo, start_date, end_date) dict
Get precipitation data :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :param start_date: starting date :type start_date: str :param end_date: ending date :type end_date: str :return: task status :rtype: dictionary
- get_projection()
Get country’s transform between projected coordinates and the base coordinate system :return: the transform, the base coordinate reference system :rtype: List, Object
- get_topography_data(transform, proj, ctry, geo)
Get topography data :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :return: task status :rtype: dictionary
stc_unicef_cpi.data.get_speedtest_data module
- stc_unicef_cpi.data.get_speedtest_data.get_speedtest_info(url, name, path_save) None
Get speedtest information :param url: url needed to retrieve information :type url: str :param name: name of the file we want to retrieve :type name: str :param path_save: directory to save information :type path_save: str :raises ValueError: unable to retrieve data message
stc_unicef_cpi.data.make_dataset module
- stc_unicef_cpi.data.make_dataset.aggregate_dataset(df) DataFrame
Aggregate dataset :param df: input required to aggregate :type df: dataframe :return: agg mean, agg count :rtype: dataframe, dataframe
- stc_unicef_cpi.data.make_dataset.append_features_to_hexes(country, res, encoders, gpu, force=False, force_download=False, audience=False, read_dir=None, save_dir=None, model_dir=None, tiff_dir=None, hyper_tuning=False) DataFrame
Append features to hexagons withing a country :param country: country of interest :type country: str :param res: grid resolution :type res: int :param encoders: whether to append autoencoder features :type encoders: bool :param gpu: whether to use gpus or not :type gpu: bool :param force: force clipping, defaults to False :type force: bool, optional :param force_download: force download, defaults to False :type force_download: bool, optional :param audience: whether or not to include audience estimates, defaults to False :type audience: bool, optional :param read_dir: path to read input, defaults to c.ext_data :type read_dir: str, optional :param save_dir: path to save output, defaults to c.int_data :type save_dir: str, optional :param model_dir: path to model, defaults to c.base_dir_model :type model_dir: str, optional :param tiff_dir: path to tiff files, defaults to c.tiff_data :type tiff_dir: str, optional :param hyper_tuning: whether or not to perform hyperparameter tuning, defaults to False :type hyper_tuning: bool, optional :return: hexes with corresponding features :rtype: dataframe
- stc_unicef_cpi.data.make_dataset.change_name_reproject_tiff(tiff, attribute, country, read_dir=None, out_dir=None) None
Rename attributes and reprojection of Tiff file :param tiff: path to tiff file :type tiff: str :param attributes: attributes names :type attributes: list of lists :param country: contry of interest :type country: str :param read_dir: path to read external data from, defaults to c.ext_data :type read_dir: str, optional
- stc_unicef_cpi.data.make_dataset.create_dataset(country_code, country, res, gpu=False, encoders=True, force=False, force_download=False, audience=False, hyper_tuning=True, lat='latnum', long='longnum', interim_dir=None, save_dir=None, model_dir=None, threshold=30, read_dir_target=None, read_dir=None, tiff_dir=None) DataFrame
Create dataset :param country_code: country code :type country_code: str :param country: country of interest :type country: str :param res: grid resolution :type res: int :param gpu: whether to use gpus or not :type gpu: bool :param encoders: whether to append autoencoder features :type encoders: bool :param force: force clipping, defaults to False :type force: bool, optional :param force_download: force download, defaults to False :type force_download: bool, optional :param audience: whether or not to include audience estimates, defaults to False :type audience: bool, optional :param hyper_tuning: whether or not to perform hyperparameter tuning, defaults to False :type hyper_tuning: bool, optional :param lat: colname containing latitude, defaults to “latnum” :type lat: str, optional :param long: colname containing longitude, defaults to “longnum” :type long: str, optional :param interim_dir: path to interim data, defaults to c.int_data :type interim_dir: str, optional :param read_dir: path to read input, defaults to c.ext_data :type read_dir: str, optional :param save_dir: path to save output, defaults to c.proc_data :type save_dir: str, optional :param model_dir: path to model, defaults to c.base_dir_model :type model_dir: str, optional :param tiff_dir: path to tiff files, defaults to c.tiff_data :type tiff_dir: str, optional :param threshold: minimum number of surveys per hexagon, defaults to c.cutoff :type threshold: int, optional :param read_dir_target: path to directory of target data, defaults to c.raw_data :type read_dir_target: str, optional :return: dataset with features and target variable :rtype: dataframe
- stc_unicef_cpi.data.make_dataset.create_target_variable(country_code, res, lat, long, threshold, read_dir, copy_to_nbrs=False) DataFrame
Create target variable :param country_code: country code related to country of interest :type country_code: str :param res: resolution of country of interest :type res: int :param lat: latitude of country of interest :type lat: numeric :param long: longitude of country of interest :type long: numeric :param threshold: minimal number of surveys per hexagon :type threshold: int :param read_dir: directory from where to read dataset :type read_dir: str :param copy_to_nbrs: include neighbouring resolution, defaults to False :type copy_to_nbrs: bool, optional :raises ValueError: no raw survey data available :return: dataset with observations satisfying conditions :rtype: dataframe
- stc_unicef_cpi.data.make_dataset.preprocessed_commuting_zones(country, res, read_dir=None) DataFrame
Preprocess commuting zones :param country: country of interest :type country: str :param res: grid resolution :type res: int :param read_dir: path to read data from, defaults to c.ext_data :type read_dir: str, optional :return: processed information of commuting zones :rtype: dataframe
- stc_unicef_cpi.data.make_dataset.preprocessed_speed_test(speed, res, country) DataFrame
Processing speed test data :param speed: dataset containing speed test data :type speed: dataframe :param res: grid resolution :type res: int :param country: country of interest :type country: str :return: clipped speed data to country, reprojected and aggregated :rtype: dataframe
- stc_unicef_cpi.data.make_dataset.preprocessed_tiff_files(country, read_dir=None, out_dir=None, force=False) None
Preprocess tiff files
- stc_unicef_cpi.data.make_dataset.read_input_unicef(path_read) DataFrame
Read source data provided by STC and UNICEF :param path_read: path to read data from :type path_read: str :return: database with target variable :rtype: dataframe
- stc_unicef_cpi.data.make_dataset.select_country(df, country_code, lat, long) DataFrame
Select country of interest :param df: input provided by UNICEF and STC :type df: dataframe :param country_code: country code :type country_code: str :param lat: colname containing latitude measures :type lat: numerical :param long: colname containing longitude measures :type long: numerical :return: database with info related to country of interest :rtype: dataframe
stc_unicef_cpi.data.process_geotiff module
- stc_unicef_cpi.data.process_geotiff.agg_tif_to_df(df: ~pandas.core.frame.DataFrame, tiff_dir: ~typing.Union[str, ~os.PathLike, ~typing.List[str], ~typing.List[~os.PathLike]], rm_prefix: ~typing.Union[str, ~typing.Pattern[str]] = 'cpi', agg_fn: ~typing.Callable[[~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._generic_alias.ScalarType]]], ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._generic_alias.ScalarType]]] = <function mean>, max_records: int = 100000, replace_old: bool = True, resolution: int = 7, verbose: bool = False) DataFrame
Pass df with hex_code column of numpy_int type h3 codes, and a directory with tiff files, then aggregate pixels from tiffs within each hexagon according to given function.
Note that rather than using shapefiles, this uses pixel centroid values, hence different quantities of pixels may be aggregated in each hexagon, and it will not work sensibly at all if the resolution of the tiff file is lower than the resolution of the specified hexagons.
- Parameters
df (pd.DataFrame) – ‘ground truth’ dataframe to aggregate tiffs to, with hex_code column at specified resolution
tiff_dir (Union[str, PathLike]) – Either directory containing .tifs, a single .tif file, or a list of .tif files to aggregate to given df
rm_prefix (Union[str, Pattern[str]], optional) – Prefix or regex pattern to remove from file string when naming variables, defaults to “cpi”
agg_fn (Callable[[npt.NDArray], npt.NDArray], optional) – Function to use when aggregating tiff pixels within cells, defaults to np.mean
max_records (int, optional) – Max number of pixels in clipped tiff before using dask, defaults to int(1e5)
replace_old (bool, optional) – Overwrite old columns if match new data, defaults to True
resolution (int, optional) – Resolution level of h3 grid to use, defaults to 7
verbose (bool, optional) – Verbose output, defaults to False
- Raises
ValueError – hex_code column not in df
- Returns
Original dataframe with new columns added from aggregated values of tiffs in hexes
- Return type
pd.DataFrame
- stc_unicef_cpi.data.process_geotiff.clip_tif_to_ctry(file_path: Union[PathLike, str], ctry_name: str, save_dir: Optional[Union[PathLike, str]] = None) None
Clip a GeoTIFF to a specified country boundary, and write a new file to the specified directory if given, else just plot the clipped tiff. File name is prepended with the country name.
- stc_unicef_cpi.data.process_geotiff.convert_tiffs_to_image_dataset(tiff_dir: Union[str, PathLike], hex_codes: Union[List[int], ndarray[Any, dtype[Union[int32, int64]]]], dim_x: int = 256, dim_y: int = 256) ndarray[Any, dtype[ScalarType]]
Convert set of GeoTIFFs to a 4D numpy array according to specified dataset - expect the path to a directory containing all relevant GeoTIFFs with extension ‘.tif’, and a list of h3 hexagon identifiers in numpy_int form (use import h3.api.numpy_int as h3).
Returned array is in form (hex_id, band, i, j), with i, j through the band image array defaulting to size 256 x 256, as specified by dim_x, dim_y.
- Parameters
tiff_dir (Union[str, PathLike]) – Path to GeoTIFF directory, with file extensions ‘.tif’
hex_codes (Union[List[int], npt.NDArray[Union[np.int32,np.int64]]]) – Set of H3 hex codes for which you wish to extract images
dim_x (int, optional) – Pixel width of extracted images, defaults to 256
dim_y (int, optional) – Pixel height of extracted images, defaults to 256
- Returns
Array of images at hex coords, in shape (hex_id, band, i, j)
- Return type
npt.NDArray
- stc_unicef_cpi.data.process_geotiff.extract_image_at_coords(dataset: Union[Dataset, DataArray, List[Dataset], DatasetReader], lat: float, long: float, dim_x: int = 256, dim_y: int = 256, verbose: bool = False) ndarray[Any, dtype[ScalarType]]
Extract an array of specified dimensions (num pixels) about specified lat/long - centered by default
- Parameters
dataset (Union[Dataset, DataArray, List[Dataset], rasterio.io.DatasetReader]) – rioxarray or rasterio dataset (open tiff file)
lat (float) – Latitude of center point about which to extract image
long (float) – Longitude of center point about which to extract image
dim_x (int, optional) – x dimension (pixel width) of extracted image, defaults to 256
dim_y (int, optional) – y dimension (pixel height) of extracted image, defaults to 256
verbose (bool, optional) – Verbose, defaults to False
- Returns
Array of tiff values (‘image’) at specified coordinates, of given size
- Return type
npt.NDArray
- stc_unicef_cpi.data.process_geotiff.extract_ims_from_hex_codes(datasets: Union[List[str], List[PathLike]], hex_codes: Union[List[int], ndarray[Any, dtype[Union[int32, int64]]]], width: int = 256, height: int = 256, verbose: bool = False) ndarray[Any, dtype[ScalarType]]
For a set of datasets, specified by file path, and a set of h3 hex codes, extract centered images of specified size and return a 4D array in shape (image_idx,band,i,j).
- Parameters
datasets (Union[List[str], List[PathLike]]) – List of paths to tiff files for which you want to extract (and stack) ‘image’ bands
hex_codes (Union[List[int], npt.NDArray[Union[np.int32,np.int64]]]) – Set of H3 hex codes in numpy_int format for which you wish to extract images
width (int, optional) – Width of extracted images in pixels, defaults to 256
height (int, optional) – Height of extracted images in pixels, defaults to 256
verbose (bool, optional) – Verbose, defaults to False
- Returns
Extracted images in shape (image_idx,band,i,j)
- Return type
npt.NDArray
- stc_unicef_cpi.data.process_geotiff.geotiff_to_df(geotiff_filepath: Union[str, PathLike], spec_band_names: Optional[List[str]] = None, max_bands: int = 5, rm_prefix: Union[str, Pattern[str]] = '', verbose: bool = False) DataFrame
Convert a geotiff file to a pandas dataframe, and print some additional info.
- Parameters
geotiff_filepath (Union[str, PathLike]) – path to a geotiff file
spec_band_names (Optional[List[str]], optional) – Specified band names - only used if these are not specified in the GeoTIFF itself, at which point they are mandatory, defaults to None
max_bands (Optional[int], optional) – Max allowable bands before requires use of rast_to_agg_df, defaults to 5
rm_prefix (Union[str, Pattern[str]], optional) – Prefix (or regex pattern) to replace in file name, defaults to None
verbose (bool, optional) – verbose output, defaults to False
- Raises
ValueError – No band names provided but none found either
ValueError – Number of band names provided when none found does not match number of bands
ValueError – Too many bands to handle without excessive memory - use rast_to_agg_df instead
ValueError – Problem with index resulting from conversion to df
ValueError – Problem converting bands
- Returns
pandas dataframe of lat, long, val for each band
- Return type
pd.DataFrame
- stc_unicef_cpi.data.process_geotiff.print_tif_metadata(rioxarray_rio_obj: Union[Dataset, DataArray, List[Dataset]], name: Optional[str] = None) None
View metadata associated with a raster file, loaded using rioxarray
- Parameters
rioxarray_rio_obj (Union[Dataset, DataArray, List[Dataset]) – rioxarray dataset object
name (Optional[str], optional) – Name of tiff data, defaults to “”
- stc_unicef_cpi.data.process_geotiff.rast_to_agg_df(tiff_file: ~typing.Union[str, ~pathlib.Path, bytes], agg_fn: ~typing.Callable[[~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._generic_alias.ScalarType]]], ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._generic_alias.ScalarType]]] = <function mean>, resolution: int = 7, max_bands: int = 3, verbose: bool = False) DataFrame
Likely slower than using rioxarray fns, but benefit of handling groups of bands at a time, rather than all at once (v memory expensive) - only to be used for tiffs with many bands.
- Parameters
tiff_file (Union[str, PathLike]) – Path to (many banded, large) tiff file
agg_fn (Callable[[npt.NDArray], npt.NDArray], optional) – Aggregation function, defaults to np.mean
resolution (int, optional) – Resolution of H3 grid to aggregate to, defaults to 7
max_bands (int, optional) – Max number of bands to process at one time, defaults to 3
verbose (bool, optional) – Verbose, defaults to False
- Returns
Dataframe of aggregated data
- Return type
pd.DataFrame
- stc_unicef_cpi.data.process_geotiff.resample_tif(tif_file_path: Union[str, PathLike], dest_dir: Union[str, PathLike], rescale_factor: Optional[float] = 2.0) None
Resample a tiff file by a given factor, using bilinear resampling - greater than 1 corresponds to increased resolution, less than 1 decreased.
- stc_unicef_cpi.data.process_geotiff.rxr_reproject_tiff_to_target(src_tiff_file: Union[str, PathLike], target_tiff_file: Union[str, PathLike], dest_path: Optional[Union[PathLike, str]] = None, verbose: bool = False) Optional[Union[Dataset, DataArray, List[Dataset]]]
Use rioxarray and an example (target) tiff to reproject the given (source) tiff to the same CRS and resolution.
- Parameters
src_tiff_file (Union[str, PathLike]) – Path to tiff file you want to reproject
target_tiff_file (Union[str, PathLike]) – Path to tiff file that is example of desired projection and resolution
dest_path (Optional[Union[str, PathLike]], optional) – Path to write reprojected tiff to, defaults to None (just return reprojected raster)
verbose (bool, optional) – Verbosity, defaults to False
- Returns
Either None (if dest_path is not None) or reprojected raster
- Return type
Union[Dataset, DataArray, List[Dataset], None]
stc_unicef_cpi.data.process_netcdf module
- stc_unicef_cpi.data.process_netcdf.netcdf_to_clipped_array(file_path: Union[str, PathLike], *, ctry_name: str = 'Nigeria', save_dir: Optional[Union[PathLike, str]] = None, plot: bool = False) Union[None, ndarray[Any, dtype[ScalarType]]]
Read netCDF file and return either array clipped to specified country, or a GeoTIFF clipped to this country and saved in the specified directory with same name as before
- Parameters
file_path (Union[str, PathLike]) – Path to netCDF file to reproject and clip
ctry_name (str, optional) – Country to clip to, defaults to “Nigeria”
save_dir (Optional[Union[str, PathLike]], optional) – Directory to save to, defaults to None (just return clipped array)
plot (bool, optional) – Visualise clipped array, defaults to False
- Returns
Either None if save_dir is not None, or clipped array
- Return type
Union[None, npt.NDArray]
stc_unicef_cpi.data.process_to_torch module
- class stc_unicef_cpi.data.process_to_torch.HexDataset(tiff_dir, hex_codes, labels, width=33, height=33, transform=None, target_transform=None)
Bases:
DatasetMake a torch dataset that constructs images from tiff files according to hex codes
- Parameters
Dataset (_type_) – _description_
- stc_unicef_cpi.data.process_to_torch.make_torch_dataloader_from_numpy(images, labels, bs=64, shuffle=False)
Take np image dataset and dataframe, and convert to a dataset amenable to train torch models
stc_unicef_cpi.data.stream_data module
Data Streaming From External Sources
- class stc_unicef_cpi.data.stream_data.FacebookMarketingStreamer(country, force, read_path, res, logging)
Bases:
StreamerObjectStream data from Facebook Marketing Api
- implement()