stc_unicef_cpi.data package

Submodules

stc_unicef_cpi.data.cv_loaders module

class stc_unicef_cpi.data.cv_loaders.HexSpatialKFold(n_splits=5, *, random_state=None, hex_idx=None)

Bases: KFold

NB lightly modified version of GroupKFold - new code takes hex codes passed and generates n_split suitable groups, rather than requiring these to be passed along with X, y, as in original GroupKFold

get_even_clusters(X, n_clusters)
get_spatial_groups(X)
haversine(latlon1, latlon2)

Calculate the great circle distance between two points on the earth (specified in decimal degrees)

split(X: Union[DataFrame, ndarray], y: Optional[Union[Series, ndarray]] = None, groups: Optional[Union[Series, ndarray]] = None) Iterator[Tuple[ndarray, ndarray]]

Generate indices to split data into training and test set.

Parameters
  • X (Union[pd.DataFrame, np.ndarray]) – array-like of shape (n_samples, n_features) Training data, where n_samples is the number of samples and n_features is the number of features.

  • y (Union[pd.Series, np.ndarray], optional) – array-like of shape (n_samples,), The target variable for supervised learning problems, defaults to None

  • groups (Union[pd.Series, np.ndarray], optional) – Spatial group labels for the samples used while splitting the dataset into train/test set, defaults to None

Returns

Generator of tuples of train and test indices

Return type

Iterator[Tuple[np.ndarray, np.ndarray]]

Yield

Next set of train, test indices

Return type

Iterator[Tuple[np.ndarray, np.ndarray]]

class stc_unicef_cpi.data.cv_loaders.KerasDataGenerator(hex_idxs: ndarray, batch_size=32, dim=(16, 16), data_files: Optional[Union[List[str], List[Path]]] = None, shuffle=True)

Bases: Sequence

Generates data for Keras

on_epoch_end()

Updates indexes after each epoch

class stc_unicef_cpi.data.cv_loaders.StratifiedIntervalKFold(n_splits=5, *, n_cuts=5, shuffle=False, random_state=None)

Bases: StratifiedKFold

NB lightly edited version of stratified KFold - difference is just that class labels are generated using pd cut to make n_cuts even intervals (to improve folds w inflated vals), rather than just using values themselves

stc_unicef_cpi.data.cv_loaders.cv_split(all_hex_idxs: ndarray, labels: ndarray, k: int, mode='normal', seed=42, strat_cuts=5)

Generate k folds on (fixed order) hex dataset - either fully random (normal), stratified by interval (stratified), or spatially (spatial) :param all_hex_idxs: Array of hex codes of dataset :type all_hex_idxs: np.ndarray of type int :param labels: corresponding target labels for these idxs :type labels: np.ndarray of type int :param k: Number of folds :type k: int :param mode: mode to generate folds, choice of [‘normal’,’stratified’,’spatial’]. Defaults to ‘normal’ (fully random) :type mode: str, optional :param seed: random seed, defaults to 42 :type seed: int, optional :param strat_cuts: number of intervals to cut the data into for stratified CV, defaults to 5 :type strat_cuts: int, optional :return: folds :rtype: _type_

stc_unicef_cpi.data.get_cell_tower_data module

GET CELL TOWER DATA FROM OPEN CELL ID

stc_unicef_cpi.data.get_cell_tower_data.get_cell_data(country, save_path)

get_cell_data _summary_ :param country: _description_ :type country: _type_ :param token: _description_ :type token: _type_ :return: _description_ :rtype: _type_

stc_unicef_cpi.data.get_cell_tower_data.get_opencell_url(country, token)

Get Open Cell Id data :param country: _description_ :type country: _type_ :param token: _description_ :type token: _type_ :return: _description_ :rtype: _type_

stc_unicef_cpi.data.get_drive_data module

stc_unicef_cpi.data.get_drive_data.download_from_drive_folder(country, folder_id, scopes=['https://www.googleapis.com/auth/drive.readonly'])

Download content from google drive folder containing google earth engine images :param folder_id: folder id, retrievable from the url :type folder_id: str :param scopes: _description_, defaults to [’https://www.googleapis.com/auth/drive.readonly’] :type scopes: list, optional

stc_unicef_cpi.data.get_econ_data module

Download econ and facilities data

stc_unicef_cpi.data.get_econ_data.download_econ_data(out_dir)

Download economic data :param out_dir: path to output directory, defaults to c.econ_data :type out_dir: str, optional

stc_unicef_cpi.data.get_econ_data.get_data_from_calibrated_nighttime(url, out_dir, dir)

Get data from calibrated nighttime light data, dataset authored by Jiandong Chen, Ming Gao :param url: url of data to download :type url: str :param out_dir: path to output directory :type out_dir: str :param dir: path to specific data type :type dir: str

stc_unicef_cpi.data.get_facebook_data module

GET DELIVERY ESTIMATES FROM FACEBOOK MARKETING API

stc_unicef_cpi.data.get_facebook_data.define_params(lat, lon, radius, opt)

Define search parameters

Parameters
  • lat (str) – latitude

  • long (str) – longitude

  • radius (float) – radius

  • opt (string) – optimization criteria

stc_unicef_cpi.data.get_facebook_data.delivery_estimate(account, lat, long, radius, opt)
stc_unicef_cpi.data.get_facebook_data.fb_api_init(token, id)

Init Facebook API

Parameters
  • token – Access token

  • id – Account id

Returns

api and account connection

Return type

conn

stc_unicef_cpi.data.get_facebook_data.get_facebook_estimates(coords, out_dir, name_out, res)

Get delivery estimates from a lists of coordinates

Returns

Return type

stc_unicef_cpi.data.get_facebook_data.point_delivery_estimate(account, lat, lon, radius, opt)

Point delivery estimate :return: _description_ :rtype: _type_

stc_unicef_cpi.data.get_osm_data module

stc_unicef_cpi.data.get_osm_data.add_neighboring_hexagons(hex_codes, hex_code_col='hex_code')

Get all hexagons and their respective coordinates :param hex_codes: list of hexagons codes :type hex_codes: list :return: hex_codes and geometry of polygons :rtype: list

stc_unicef_cpi.data.get_osm_data.assign_cluster(results, country, res, lat='@lat', long='@lon', hex_code_col='hex_code')

Assign H3 cluster with a specified resolution :param results: dataframe with central point of the road lat, lon, length, type_road :type results: dataframe :param country: country of interest :type country: str :param res: resolution :type res: int :returns: hexes with road length :rtype: dataframe

stc_unicef_cpi.data.get_osm_data.assign_road_length_to_hex(coords)

Query the input though Overpass to get road length :param coords: coordinates of polygon :type coords: list :returns:lat, lon, length and type of highway :rtype: dataframe

stc_unicef_cpi.data.get_osm_data.format_polygon_coords(geometry)

Format the coordinates for Overpass :param geometry: string of polygon :type geometry: str :return: formatted string :rtype: str

stc_unicef_cpi.data.get_osm_data.get_osm_info(query_osm_road)

Parse query through Overpass to access Open Street :param query_osm_road: string of a query :type query_osm_road: str :returns: return dataframe with data accessed :rtype: dataframe

stc_unicef_cpi.data.get_osm_data.get_road_density(country, res)

Get road density :param country: country of interest :type country: str :param res: grid resolution :type res: int :return: road density at hex level :rtype: dataframe

stc_unicef_cpi.data.get_osm_data.query_osm_road(coords, elem='way')

Build query to access lat, long, lengths and type of roads in a polygon :param geometry: string of polygon :type geometry: str :param elem: specify whether you want ways or also nodes and relations :returns: string of the query :rtype: str

stc_unicef_cpi.data.get_satellite_data module

class stc_unicef_cpi.data.get_satellite_data.SatelliteImages(country, folder='gee', res=500, start='2010-01-01', end='2020-01-01')

Bases: object

Get Satellite Images From Google Earth Engine

export_drive(config) dict

Export tiff file into drive :param config: Configuration of output tiff :type config: dictionary

get_copernicus_data(transform, proj, ctry, geo, start_date, end_date, name='cpi_cop_land') dict

Get status and evolution of land surface at global scale :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :param start_date: starting date :type start_date: str :param end_date: ending date :type end_date: str :param name: name of file, defaults to “cpi_cop_land” :type name: str, optional :return: task status :rtype: dictionary

get_country_boundaries()

Get countries boundaries

get_healthcare_data(transform, proj, ctry, geo, name='cpi_health_acc') dict

Get health care data :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :param name: name of file, defaults to “cpi_health_acc” :type name: str, optional :return: task status :rtype: dictionary

get_land_use_data(transform, proj, ctry, geo, name='cpi_ghsl') dict

Get land use data :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :param name: name of file, defaults to “cpi_ghsl” :type name: str, optional :return: task status :rtype: dictionary

get_ndvi_data(transform, proj, ctry, geo, start_date, end_date, name='cpi_ndvi') dict

Get Normalized Difference Vegetation Index :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :param start_date: starting date :type start_date: str :param end_date: ending date :type end_date: str :param name: name of file, defaults to “cpi_ndwi” :type name: str, optional :return: task status :rtype: dictionary

get_ndwi_data(transform, proj, ctry, geo, start_date, end_date, name='cpi_ndwi') dict

Get Normalized Difference Water Index (NDWI) :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :param start_date: starting date :type start_date: str :param end_date: ending date :type end_date: str :param name: name of file, defaults to “cpi_ndwi” :type name: str, optional :return: task status :rtype: dictionary

get_nighttime_data(transform, proj, ctry, geo, start_date, end_date, name='cpi_nighttime') dict

Get nighttime data :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :param start_date: starting date :type start_date: str :param end_date: ending date :type end_date: str :param name: name of file, defaults to “cpi_ndwi” :type name: str, optional :return: task status :rtype: dictionary

get_pollution_data(transform, proj, ctry, geo, start_date, end_date, name='cpi_pollution') dict

Get pollution data :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :param start_date: starting date :type start_date: str :param end_date: ending date :type end_date: str :param name: name of file, defaults to “cpi_ndwi” :type name: str, optional :return: task status :rtype: dictionary

get_pop_data(transform, proj, geo, name='cpi_poptotal') dict

Get 2020 population estimates in country, by age and sex :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param geo: dissolved geometry of all features in the collection :type geo: geometry :param name: name of file, defaults to “cpi_poptotal” :type name: str, optional :return: task status :rtype: dictionary

get_precipitation_data(transform, proj, ctry, geo, start_date, end_date) dict

Get precipitation data :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :param start_date: starting date :type start_date: str :param end_date: ending date :type end_date: str :return: task status :rtype: dictionary

get_projection()

Get country’s transform between projected coordinates and the base coordinate system :return: the transform, the base coordinate reference system :rtype: List, Object

get_satellite_images() None

Get satellite images

get_topography_data(transform, proj, ctry, geo)

Get topography data :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :return: task status :rtype: dictionary

task_config(geo, name, image, transform, proj) dict

Determine countries parameters

stc_unicef_cpi.data.get_speedtest_data module

stc_unicef_cpi.data.get_speedtest_data.get_speedtest_info(url, name, path_save) None

Get speedtest information :param url: url needed to retrieve information :type url: str :param name: name of the file we want to retrieve :type name: str :param path_save: directory to save information :type path_save: str :raises ValueError: unable to retrieve data message

stc_unicef_cpi.data.get_speedtest_data.get_speedtest_url(service_type, year, q) str

Get Speed Test Url From Ookla :param service_type: type of network performance :type service_type: str :param year: year :type year: int :param q: quarter :type q: int :return: url, name :rtype: str

stc_unicef_cpi.data.get_speedtest_data.prep_tile(data, name, path_save) None

Prepare tile for further preprocessing :param data: data containing information related to speed test :type data: dataframe :param name: name of file :type name: str

stc_unicef_cpi.data.make_dataset module

stc_unicef_cpi.data.make_dataset.aggregate_dataset(df) DataFrame

Aggregate dataset :param df: input required to aggregate :type df: dataframe :return: agg mean, agg count :rtype: dataframe, dataframe

stc_unicef_cpi.data.make_dataset.append_features_to_hexes(country, res, encoders, gpu, force=False, force_download=False, audience=False, read_dir=None, save_dir=None, model_dir=None, tiff_dir=None, hyper_tuning=False) DataFrame

Append features to hexagons withing a country :param country: country of interest :type country: str :param res: grid resolution :type res: int :param encoders: whether to append autoencoder features :type encoders: bool :param gpu: whether to use gpus or not :type gpu: bool :param force: force clipping, defaults to False :type force: bool, optional :param force_download: force download, defaults to False :type force_download: bool, optional :param audience: whether or not to include audience estimates, defaults to False :type audience: bool, optional :param read_dir: path to read input, defaults to c.ext_data :type read_dir: str, optional :param save_dir: path to save output, defaults to c.int_data :type save_dir: str, optional :param model_dir: path to model, defaults to c.base_dir_model :type model_dir: str, optional :param tiff_dir: path to tiff files, defaults to c.tiff_data :type tiff_dir: str, optional :param hyper_tuning: whether or not to perform hyperparameter tuning, defaults to False :type hyper_tuning: bool, optional :return: hexes with corresponding features :rtype: dataframe

stc_unicef_cpi.data.make_dataset.change_name_reproject_tiff(tiff, attribute, country, read_dir=None, out_dir=None) None

Rename attributes and reprojection of Tiff file :param tiff: path to tiff file :type tiff: str :param attributes: attributes names :type attributes: list of lists :param country: contry of interest :type country: str :param read_dir: path to read external data from, defaults to c.ext_data :type read_dir: str, optional

stc_unicef_cpi.data.make_dataset.create_dataset(country_code, country, res, gpu=False, encoders=True, force=False, force_download=False, audience=False, hyper_tuning=True, lat='latnum', long='longnum', interim_dir=None, save_dir=None, model_dir=None, threshold=30, read_dir_target=None, read_dir=None, tiff_dir=None) DataFrame

Create dataset :param country_code: country code :type country_code: str :param country: country of interest :type country: str :param res: grid resolution :type res: int :param gpu: whether to use gpus or not :type gpu: bool :param encoders: whether to append autoencoder features :type encoders: bool :param force: force clipping, defaults to False :type force: bool, optional :param force_download: force download, defaults to False :type force_download: bool, optional :param audience: whether or not to include audience estimates, defaults to False :type audience: bool, optional :param hyper_tuning: whether or not to perform hyperparameter tuning, defaults to False :type hyper_tuning: bool, optional :param lat: colname containing latitude, defaults to “latnum” :type lat: str, optional :param long: colname containing longitude, defaults to “longnum” :type long: str, optional :param interim_dir: path to interim data, defaults to c.int_data :type interim_dir: str, optional :param read_dir: path to read input, defaults to c.ext_data :type read_dir: str, optional :param save_dir: path to save output, defaults to c.proc_data :type save_dir: str, optional :param model_dir: path to model, defaults to c.base_dir_model :type model_dir: str, optional :param tiff_dir: path to tiff files, defaults to c.tiff_data :type tiff_dir: str, optional :param threshold: minimum number of surveys per hexagon, defaults to c.cutoff :type threshold: int, optional :param read_dir_target: path to directory of target data, defaults to c.raw_data :type read_dir_target: str, optional :return: dataset with features and target variable :rtype: dataframe

stc_unicef_cpi.data.make_dataset.create_target_variable(country_code, res, lat, long, threshold, read_dir, copy_to_nbrs=False) DataFrame

Create target variable :param country_code: country code related to country of interest :type country_code: str :param res: resolution of country of interest :type res: int :param lat: latitude of country of interest :type lat: numeric :param long: longitude of country of interest :type long: numeric :param threshold: minimal number of surveys per hexagon :type threshold: int :param read_dir: directory from where to read dataset :type read_dir: str :param copy_to_nbrs: include neighbouring resolution, defaults to False :type copy_to_nbrs: bool, optional :raises ValueError: no raw survey data available :return: dataset with observations satisfying conditions :rtype: dataframe

stc_unicef_cpi.data.make_dataset.preprocessed_commuting_zones(country, res, read_dir=None) DataFrame

Preprocess commuting zones :param country: country of interest :type country: str :param res: grid resolution :type res: int :param read_dir: path to read data from, defaults to c.ext_data :type read_dir: str, optional :return: processed information of commuting zones :rtype: dataframe

stc_unicef_cpi.data.make_dataset.preprocessed_speed_test(speed, res, country) DataFrame

Processing speed test data :param speed: dataset containing speed test data :type speed: dataframe :param res: grid resolution :type res: int :param country: country of interest :type country: str :return: clipped speed data to country, reprojected and aggregated :rtype: dataframe

stc_unicef_cpi.data.make_dataset.preprocessed_tiff_files(country, read_dir=None, out_dir=None, force=False) None

Preprocess tiff files

Parameters
  • country (str) – country of interest

  • read_dir (str, optional) – path to read data from, defaults to c.ext_data

  • out_dir (str, optional) – path to save data, defaults to c.int_data

  • force (bool, optional) – force clipping, defaults to False

stc_unicef_cpi.data.make_dataset.read_input_unicef(path_read) DataFrame

Read source data provided by STC and UNICEF :param path_read: path to read data from :type path_read: str :return: database with target variable :rtype: dataframe

stc_unicef_cpi.data.make_dataset.select_country(df, country_code, lat, long) DataFrame

Select country of interest :param df: input provided by UNICEF and STC :type df: dataframe :param country_code: country code :type country_code: str :param lat: colname containing latitude measures :type lat: numerical :param long: colname containing longitude measures :type long: numerical :return: database with info related to country of interest :rtype: dataframe

stc_unicef_cpi.data.process_geotiff module

stc_unicef_cpi.data.process_geotiff.agg_tif_to_df(df: ~pandas.core.frame.DataFrame, tiff_dir: ~typing.Union[str, ~os.PathLike, ~typing.List[str], ~typing.List[~os.PathLike]], rm_prefix: ~typing.Union[str, ~typing.Pattern[str]] = 'cpi', agg_fn: ~typing.Callable[[~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._generic_alias.ScalarType]]], ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._generic_alias.ScalarType]]] = <function mean>, max_records: int = 100000, replace_old: bool = True, resolution: int = 7, verbose: bool = False) DataFrame

Pass df with hex_code column of numpy_int type h3 codes, and a directory with tiff files, then aggregate pixels from tiffs within each hexagon according to given function.

Note that rather than using shapefiles, this uses pixel centroid values, hence different quantities of pixels may be aggregated in each hexagon, and it will not work sensibly at all if the resolution of the tiff file is lower than the resolution of the specified hexagons.

Parameters
  • df (pd.DataFrame) – ‘ground truth’ dataframe to aggregate tiffs to, with hex_code column at specified resolution

  • tiff_dir (Union[str, PathLike]) – Either directory containing .tifs, a single .tif file, or a list of .tif files to aggregate to given df

  • rm_prefix (Union[str, Pattern[str]], optional) – Prefix or regex pattern to remove from file string when naming variables, defaults to “cpi”

  • agg_fn (Callable[[npt.NDArray], npt.NDArray], optional) – Function to use when aggregating tiff pixels within cells, defaults to np.mean

  • max_records (int, optional) – Max number of pixels in clipped tiff before using dask, defaults to int(1e5)

  • replace_old (bool, optional) – Overwrite old columns if match new data, defaults to True

  • resolution (int, optional) – Resolution level of h3 grid to use, defaults to 7

  • verbose (bool, optional) – Verbose output, defaults to False

Raises

ValueError – hex_code column not in df

Returns

Original dataframe with new columns added from aggregated values of tiffs in hexes

Return type

pd.DataFrame

stc_unicef_cpi.data.process_geotiff.clip_tif_to_ctry(file_path: Union[PathLike, str], ctry_name: str, save_dir: Optional[Union[PathLike, str]] = None) None

Clip a GeoTIFF to a specified country boundary, and write a new file to the specified directory if given, else just plot the clipped tiff. File name is prepended with the country name.

Parameters
  • file_path (Union[PathLike,str]) – Path to file to clip

  • ctry_name (str) – Name of country to clip to

  • save_dir (Optional[Union[PathLike,str]], optional) – Path to directory to save to, defaults to None (just plot)

stc_unicef_cpi.data.process_geotiff.convert_tiffs_to_image_dataset(tiff_dir: Union[str, PathLike], hex_codes: Union[List[int], ndarray[Any, dtype[Union[int32, int64]]]], dim_x: int = 256, dim_y: int = 256) ndarray[Any, dtype[ScalarType]]

Convert set of GeoTIFFs to a 4D numpy array according to specified dataset - expect the path to a directory containing all relevant GeoTIFFs with extension ‘.tif’, and a list of h3 hexagon identifiers in numpy_int form (use import h3.api.numpy_int as h3).

Returned array is in form (hex_id, band, i, j), with i, j through the band image array defaulting to size 256 x 256, as specified by dim_x, dim_y.

Parameters
  • tiff_dir (Union[str, PathLike]) – Path to GeoTIFF directory, with file extensions ‘.tif’

  • hex_codes (Union[List[int], npt.NDArray[Union[np.int32,np.int64]]]) – Set of H3 hex codes for which you wish to extract images

  • dim_x (int, optional) – Pixel width of extracted images, defaults to 256

  • dim_y (int, optional) – Pixel height of extracted images, defaults to 256

Returns

Array of images at hex coords, in shape (hex_id, band, i, j)

Return type

npt.NDArray

stc_unicef_cpi.data.process_geotiff.extract_image_at_coords(dataset: Union[Dataset, DataArray, List[Dataset], DatasetReader], lat: float, long: float, dim_x: int = 256, dim_y: int = 256, verbose: bool = False) ndarray[Any, dtype[ScalarType]]

Extract an array of specified dimensions (num pixels) about specified lat/long - centered by default

Parameters
  • dataset (Union[Dataset, DataArray, List[Dataset], rasterio.io.DatasetReader]) – rioxarray or rasterio dataset (open tiff file)

  • lat (float) – Latitude of center point about which to extract image

  • long (float) – Longitude of center point about which to extract image

  • dim_x (int, optional) – x dimension (pixel width) of extracted image, defaults to 256

  • dim_y (int, optional) – y dimension (pixel height) of extracted image, defaults to 256

  • verbose (bool, optional) – Verbose, defaults to False

Returns

Array of tiff values (‘image’) at specified coordinates, of given size

Return type

npt.NDArray

stc_unicef_cpi.data.process_geotiff.extract_ims_from_hex_codes(datasets: Union[List[str], List[PathLike]], hex_codes: Union[List[int], ndarray[Any, dtype[Union[int32, int64]]]], width: int = 256, height: int = 256, verbose: bool = False) ndarray[Any, dtype[ScalarType]]

For a set of datasets, specified by file path, and a set of h3 hex codes, extract centered images of specified size and return a 4D array in shape (image_idx,band,i,j).

Parameters
  • datasets (Union[List[str], List[PathLike]]) – List of paths to tiff files for which you want to extract (and stack) ‘image’ bands

  • hex_codes (Union[List[int], npt.NDArray[Union[np.int32,np.int64]]]) – Set of H3 hex codes in numpy_int format for which you wish to extract images

  • width (int, optional) – Width of extracted images in pixels, defaults to 256

  • height (int, optional) – Height of extracted images in pixels, defaults to 256

  • verbose (bool, optional) – Verbose, defaults to False

Returns

Extracted images in shape (image_idx,band,i,j)

Return type

npt.NDArray

stc_unicef_cpi.data.process_geotiff.geotiff_to_df(geotiff_filepath: Union[str, PathLike], spec_band_names: Optional[List[str]] = None, max_bands: int = 5, rm_prefix: Union[str, Pattern[str]] = '', verbose: bool = False) DataFrame

Convert a geotiff file to a pandas dataframe, and print some additional info.

Parameters
  • geotiff_filepath (Union[str, PathLike]) – path to a geotiff file

  • spec_band_names (Optional[List[str]], optional) – Specified band names - only used if these are not specified in the GeoTIFF itself, at which point they are mandatory, defaults to None

  • max_bands (Optional[int], optional) – Max allowable bands before requires use of rast_to_agg_df, defaults to 5

  • rm_prefix (Union[str, Pattern[str]], optional) – Prefix (or regex pattern) to replace in file name, defaults to None

  • verbose (bool, optional) – verbose output, defaults to False

Raises
  • ValueError – No band names provided but none found either

  • ValueError – Number of band names provided when none found does not match number of bands

  • ValueError – Too many bands to handle without excessive memory - use rast_to_agg_df instead

  • ValueError – Problem with index resulting from conversion to df

  • ValueError – Problem converting bands

Returns

pandas dataframe of lat, long, val for each band

Return type

pd.DataFrame

stc_unicef_cpi.data.process_geotiff.print_tif_metadata(rioxarray_rio_obj: Union[Dataset, DataArray, List[Dataset]], name: Optional[str] = None) None

View metadata associated with a raster file, loaded using rioxarray

Parameters
  • rioxarray_rio_obj (Union[Dataset, DataArray, List[Dataset]) – rioxarray dataset object

  • name (Optional[str], optional) – Name of tiff data, defaults to “”

stc_unicef_cpi.data.process_geotiff.rast_to_agg_df(tiff_file: ~typing.Union[str, ~pathlib.Path, bytes], agg_fn: ~typing.Callable[[~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._generic_alias.ScalarType]]], ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._generic_alias.ScalarType]]] = <function mean>, resolution: int = 7, max_bands: int = 3, verbose: bool = False) DataFrame

Likely slower than using rioxarray fns, but benefit of handling groups of bands at a time, rather than all at once (v memory expensive) - only to be used for tiffs with many bands.

Parameters
  • tiff_file (Union[str, PathLike]) – Path to (many banded, large) tiff file

  • agg_fn (Callable[[npt.NDArray], npt.NDArray], optional) – Aggregation function, defaults to np.mean

  • resolution (int, optional) – Resolution of H3 grid to aggregate to, defaults to 7

  • max_bands (int, optional) – Max number of bands to process at one time, defaults to 3

  • verbose (bool, optional) – Verbose, defaults to False

Returns

Dataframe of aggregated data

Return type

pd.DataFrame

stc_unicef_cpi.data.process_geotiff.resample_tif(tif_file_path: Union[str, PathLike], dest_dir: Union[str, PathLike], rescale_factor: Optional[float] = 2.0) None

Resample a tiff file by a given factor, using bilinear resampling - greater than 1 corresponds to increased resolution, less than 1 decreased.

Parameters
  • tif_file_path (Union[str, PathLike]) – Path to tiff file to resample

  • dest_dir (Union[str, PathLike]) – Destination directory for resampled tiff file to be written to

  • rescale_factor (Optional[int], optional) – Rescale factor, defaults to 2

stc_unicef_cpi.data.process_geotiff.rxr_reproject_tiff_to_target(src_tiff_file: Union[str, PathLike], target_tiff_file: Union[str, PathLike], dest_path: Optional[Union[PathLike, str]] = None, verbose: bool = False) Optional[Union[Dataset, DataArray, List[Dataset]]]

Use rioxarray and an example (target) tiff to reproject the given (source) tiff to the same CRS and resolution.

Parameters
  • src_tiff_file (Union[str, PathLike]) – Path to tiff file you want to reproject

  • target_tiff_file (Union[str, PathLike]) – Path to tiff file that is example of desired projection and resolution

  • dest_path (Optional[Union[str, PathLike]], optional) – Path to write reprojected tiff to, defaults to None (just return reprojected raster)

  • verbose (bool, optional) – Verbosity, defaults to False

Returns

Either None (if dest_path is not None) or reprojected raster

Return type

Union[Dataset, DataArray, List[Dataset], None]

stc_unicef_cpi.data.process_netcdf module

stc_unicef_cpi.data.process_netcdf.netcdf_to_clipped_array(file_path: Union[str, PathLike], *, ctry_name: str = 'Nigeria', save_dir: Optional[Union[PathLike, str]] = None, plot: bool = False) Union[None, ndarray[Any, dtype[ScalarType]]]

Read netCDF file and return either array clipped to specified country, or a GeoTIFF clipped to this country and saved in the specified directory with same name as before

Parameters
  • file_path (Union[str, PathLike]) – Path to netCDF file to reproject and clip

  • ctry_name (str, optional) – Country to clip to, defaults to “Nigeria”

  • save_dir (Optional[Union[str, PathLike]], optional) – Directory to save to, defaults to None (just return clipped array)

  • plot (bool, optional) – Visualise clipped array, defaults to False

Returns

Either None if save_dir is not None, or clipped array

Return type

Union[None, npt.NDArray]

stc_unicef_cpi.data.process_to_torch module

class stc_unicef_cpi.data.process_to_torch.HexDataset(tiff_dir, hex_codes, labels, width=33, height=33, transform=None, target_transform=None)

Bases: Dataset

Make a torch dataset that constructs images from tiff files according to hex codes

Parameters

Dataset (_type_) – _description_

stc_unicef_cpi.data.process_to_torch.make_torch_dataloader_from_numpy(images, labels, bs=64, shuffle=False)

Take np image dataset and dataframe, and convert to a dataset amenable to train torch models

Parameters
  • images (_type_) – _description_

  • labels (_type_) – _description_

  • bs (int, optional) – _description_, defaults to 64

  • shuffle (bool, optional) – _description_, defaults to False

Returns

_description_

Return type

_type_

stc_unicef_cpi.data.stream_data module

Data Streaming From External Sources

class stc_unicef_cpi.data.stream_data.FacebookMarketingStreamer(country, force, read_path, res, logging)

Bases: StreamerObject

Stream data from Facebook Marketing Api

implement()
class stc_unicef_cpi.data.stream_data.StreamerObject(country, force, read_path)

Bases: object