stc_unicef_cpi.data package

Submodules

stc_unicef_cpi.data.cv_loaders module

class stc_unicef_cpi.data.cv_loaders.HexSpatialKFold(n_splits=5, *, random_state=None, hex_idx=None)

Bases: KFold

NB lightly modified version of GroupKFold - new code takes hex codes passed and generates n_split suitable groups, rather than requiring these to be passed along with X, y, as in original GroupKFold

get_even_clusters(X, n_clusters)

get_spatial_groups(X)

haversine(latlon1, latlon2): Calculate the great circle distance between two points on the earth (specified in decimal degrees)

split(X: Union[DataFrame, ndarray], y: Optional[Union[Series, ndarray]] = None, groups: Optional[Union[Series, ndarray]] = None) → Iterator[Tuple[ndarray, ndarray]]

Generate indices to split data into training and test set.

Parameters

X (Union[pd.DataFrame, np.ndarray]) – array-like of shape (n_samples, n_features) Training data, where n_samples is the number of samples and n_features is the number of features.
y (Union[pd.Series, np.ndarray], optional) – array-like of shape (n_samples,), The target variable for supervised learning problems, defaults to None
groups (Union[pd.Series, np.ndarray], optional) – Spatial group labels for the samples used while splitting the dataset into train/test set, defaults to None

Returns

Generator of tuples of train and test indices

Return type

Iterator[Tuple[np.ndarray, np.ndarray]]

Yield

Next set of train, test indices

Return type

Iterator[Tuple[np.ndarray, np.ndarray]]

class stc_unicef_cpi.data.cv_loaders.KerasDataGenerator(hex_idxs: ndarray, batch_size=32, dim=(16, 16), data_files: Optional[Union[List[str], List[Path]]] = None, shuffle=True)

Bases: Sequence

Generates data for Keras

on_epoch_end(): Updates indexes after each epoch

class stc_unicef_cpi.data.cv_loaders.StratifiedIntervalKFold(n_splits=5, *, n_cuts=5, shuffle=False, random_state=None)

Bases: StratifiedKFold

NB lightly edited version of stratified KFold - difference is just that class labels are generated using pd cut to make n_cuts even intervals (to improve folds w inflated vals), rather than just using values themselves

stc_unicef_cpi.data.cv_loaders.cv_split(all_hex_idxs: ndarray, labels: ndarray, k: int, mode='normal', seed=42, strat_cuts=5): Generate k folds on (fixed order) hex dataset - either fully random (normal), stratified by interval (stratified), or spatially (spatial) :param all_hex_idxs: Array of hex codes of dataset :type all_hex_idxs: np.ndarray of type int :param labels: corresponding target labels for these idxs :type labels: np.ndarray of type int :param k: Number of folds :type k: int :param mode: mode to generate folds, choice of [‘normal’,’stratified’,’spatial’]. Defaults to ‘normal’ (fully random) :type mode: str, optional :param seed: random seed, defaults to 42 :type seed: int, optional :param strat_cuts: number of intervals to cut the data into for stratified CV, defaults to 5 :type strat_cuts: int, optional :return: folds :rtype: _type_

stc_unicef_cpi.data.get_cell_tower_data module

GET CELL TOWER DATA FROM OPEN CELL ID

stc_unicef_cpi.data.get_cell_tower_data.get_cell_data(country, save_path): get_cell_data _summary_ :param country: _description_ :type country: _type_ :param token: _description_ :type token: _type_ :return: _description_ :rtype: _type_

stc_unicef_cpi.data.get_cell_tower_data.get_opencell_url(country, token): Get Open Cell Id data :param country: _description_ :type country: _type_ :param token: _description_ :type token: _type_ :return: _description_ :rtype: _type_

stc_unicef_cpi.data.get_drive_data module

stc_unicef_cpi.data.get_drive_data.download_from_drive_folder(country, folder_id, scopes=['https://www.googleapis.com/auth/drive.readonly']): Download content from google drive folder containing google earth engine images :param folder_id: folder id, retrievable from the url :type folder_id: str :param scopes: _description_, defaults to [’https://www.googleapis.com/auth/drive.readonly’] :type scopes: list, optional

stc_unicef_cpi.data.get_econ_data module

Download econ and facilities data

stc_unicef_cpi.data.get_econ_data.download_econ_data(out_dir): Download economic data :param out_dir: path to output directory, defaults to c.econ_data :type out_dir: str, optional

stc_unicef_cpi.data.get_econ_data.get_data_from_calibrated_nighttime(url, out_dir, dir): Get data from calibrated nighttime light data, dataset authored by Jiandong Chen, Ming Gao :param url: url of data to download :type url: str :param out_dir: path to output directory :type out_dir: str :param dir: path to specific data type :type dir: str

stc_unicef_cpi.data.get_facebook_data module

GET DELIVERY ESTIMATES FROM FACEBOOK MARKETING API

stc_unicef_cpi.data.get_facebook_data.define_params(lat, lon, radius, opt)

Define search parameters

Parameters

lat (str) – latitude
long (str) – longitude
radius (float) – radius
opt (string) – optimization criteria

stc_unicef_cpi.data.get_facebook_data.delivery_estimate(account, lat, long, radius, opt)

stc_unicef_cpi.data.get_facebook_data.fb_api_init(token, id)

Init Facebook API

Parameters

token – Access token
id – Account id

Returns

api and account connection

Return type

conn

stc_unicef_cpi.data.get_facebook_data.get_facebook_estimates(coords, out_dir, name_out, res)

Get delivery estimates from a lists of coordinates

Returns
Return type

stc_unicef_cpi.data.get_facebook_data.point_delivery_estimate(account, lat, lon, radius, opt): Point delivery estimate :return: _description_ :rtype: _type_

stc_unicef_cpi.data.get_osm_data module

stc_unicef_cpi.data.get_osm_data.add_neighboring_hexagons(hex_codes, hex_code_col='hex_code'): Get all hexagons and their respective coordinates :param hex_codes: list of hexagons codes :type hex_codes: list :return: hex_codes and geometry of polygons :rtype: list

stc_unicef_cpi.data.get_osm_data.assign_cluster(results, country, res, lat='@lat', long='@lon', hex_code_col='hex_code'): Assign H3 cluster with a specified resolution :param results: dataframe with central point of the road lat, lon, length, type_road :type results: dataframe :param country: country of interest :type country: str :param res: resolution :type res: int :returns: hexes with road length :rtype: dataframe

stc_unicef_cpi.data.get_osm_data.assign_road_length_to_hex(coords): Query the input though Overpass to get road length :param coords: coordinates of polygon :type coords: list :returns:lat, lon, length and type of highway :rtype: dataframe

stc_unicef_cpi.data.get_osm_data.format_polygon_coords(geometry): Format the coordinates for Overpass :param geometry: string of polygon :type geometry: str :return: formatted string :rtype: str

stc_unicef_cpi.data.get_osm_data.get_osm_info(query_osm_road): Parse query through Overpass to access Open Street :param query_osm_road: string of a query :type query_osm_road: str :returns: return dataframe with data accessed :rtype: dataframe

stc_unicef_cpi.data.get_osm_data.get_road_density(country, res): Get road density :param country: country of interest :type country: str :param res: grid resolution :type res: int :return: road density at hex level :rtype: dataframe

stc_unicef_cpi.data.get_osm_data.query_osm_road(coords, elem='way'): Build query to access lat, long, lengths and type of roads in a polygon :param geometry: string of polygon :type geometry: str :param elem: specify whether you want ways or also nodes and relations :returns: string of the query :rtype: str

stc_unicef_cpi.data.get_satellite_data module

class stc_unicef_cpi.data.get_satellite_data.SatelliteImages(country, folder='gee', res=500, start='2010-01-01', end='2020-01-01')

Bases: object

Get Satellite Images From Google Earth Engine

export_drive(config) → dict: Export tiff file into drive :param config: Configuration of output tiff :type config: dictionary

get_copernicus_data(transform, proj, ctry, geo, start_date, end_date, name='cpi_cop_land') → dict: Get status and evolution of land surface at global scale :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :param start_date: starting date :type start_date: str :param end_date: ending date :type end_date: str :param name: name of file, defaults to “cpi_cop_land” :type name: str, optional :return: task status :rtype: dictionary

get_country_boundaries(): Get countries boundaries

get_healthcare_data(transform, proj, ctry, geo, name='cpi_health_acc') → dict: Get health care data :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :param name: name of file, defaults to “cpi_health_acc” :type name: str, optional :return: task status :rtype: dictionary

get_land_use_data(transform, proj, ctry, geo, name='cpi_ghsl') → dict: Get land use data :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :param name: name of file, defaults to “cpi_ghsl” :type name: str, optional :return: task status :rtype: dictionary

get_ndvi_data(transform, proj, ctry, geo, start_date, end_date, name='cpi_ndvi') → dict: Get Normalized Difference Vegetation Index :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :param start_date: starting date :type start_date: str :param end_date: ending date :type end_date: str :param name: name of file, defaults to “cpi_ndwi” :type name: str, optional :return: task status :rtype: dictionary

get_ndwi_data(transform, proj, ctry, geo, start_date, end_date, name='cpi_ndwi') → dict: Get Normalized Difference Water Index (NDWI) :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :param start_date: starting date :type start_date: str :param end_date: ending date :type end_date: str :param name: name of file, defaults to “cpi_ndwi” :type name: str, optional :return: task status :rtype: dictionary

get_nighttime_data(transform, proj, ctry, geo, start_date, end_date, name='cpi_nighttime') → dict: Get nighttime data :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :param start_date: starting date :type start_date: str :param end_date: ending date :type end_date: str :param name: name of file, defaults to “cpi_ndwi” :type name: str, optional :return: task status :rtype: dictionary

get_pollution_data(transform, proj, ctry, geo, start_date, end_date, name='cpi_pollution') → dict: Get pollution data :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :param start_date: starting date :type start_date: str :param end_date: ending date :type end_date: str :param name: name of file, defaults to “cpi_ndwi” :type name: str, optional :return: task status :rtype: dictionary

get_pop_data(transform, proj, geo, name='cpi_poptotal') → dict: Get 2020 population estimates in country, by age and sex :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param geo: dissolved geometry of all features in the collection :type geo: geometry :param name: name of file, defaults to “cpi_poptotal” :type name: str, optional :return: task status :rtype: dictionary

get_precipitation_data(transform, proj, ctry, geo, start_date, end_date) → dict: Get precipitation data :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :param start_date: starting date :type start_date: str :param end_date: ending date :type end_date: str :return: task status :rtype: dictionary

get_projection(): Get country’s transform between projected coordinates and the base coordinate system :return: the transform, the base coordinate reference system :rtype: List, Object

get_satellite_images() → None: Get satellite images

get_topography_data(transform, proj, ctry, geo): Get topography data :param transform: transform between projected coordinates and the base coordinate system :type transform: list :param proj: the base coordinate reference system :type proj: object :param ctry: country of interest :type ctry: str :param geo: dissolved geometry of all features in the collection :type geo: geometry :return: task status :rtype: dictionary

task_config(geo, name, image, transform, proj) → dict: Determine countries parameters

stc_unicef_cpi.data.get_speedtest_data module

stc_unicef_cpi.data.get_speedtest_data.get_speedtest_info(url, name, path_save) → None: Get speedtest information :param url: url needed to retrieve information :type url: str :param name: name of the file we want to retrieve :type name: str :param path_save: directory to save information :type path_save: str :raises ValueError: unable to retrieve data message

stc_unicef_cpi.data.get_speedtest_data.get_speedtest_url(service_type, year, q) → str: Get Speed Test Url From Ookla :param service_type: type of network performance :type service_type: str :param year: year :type year: int :param q: quarter :type q: int :return: url, name :rtype: str

stc_unicef_cpi.data.get_speedtest_data.prep_tile(data, name, path_save) → None: Prepare tile for further preprocessing :param data: data containing information related to speed test :type data: dataframe :param name: name of file :type name: str

stc_unicef_cpi.data.make_dataset module

stc_unicef_cpi.data.make_dataset.aggregate_dataset(df) → DataFrame: Aggregate dataset :param df: input required to aggregate :type df: dataframe :return: agg mean, agg count :rtype: dataframe, dataframe

stc_unicef_cpi.data.make_dataset.append_features_to_hexes(country, res, encoders, gpu, force=False, force_download=False, audience=False, read_dir=None, save_dir=None, model_dir=None, tiff_dir=None, hyper_tuning=False) → DataFrame: Append features to hexagons withing a country :param country: country of interest :type country: str :param res: grid resolution :type res: int :param encoders: whether to append autoencoder features :type encoders: bool :param gpu: whether to use gpus or not :type gpu: bool :param force: force clipping, defaults to False :type force: bool, optional :param force_download: force download, defaults to False :type force_download: bool, optional :param audience: whether or not to include audience estimates, defaults to False :type audience: bool, optional :param read_dir: path to read input, defaults to c.ext_data :type read_dir: str, optional :param save_dir: path to save output, defaults to c.int_data :type save_dir: str, optional :param model_dir: path to model, defaults to c.base_dir_model :type model_dir: str, optional :param tiff_dir: path to tiff files, defaults to c.tiff_data :type tiff_dir: str, optional :param hyper_tuning: whether or not to perform hyperparameter tuning, defaults to False :type hyper_tuning: bool, optional :return: hexes with corresponding features :rtype: dataframe

stc_unicef_cpi.data.make_dataset.change_name_reproject_tiff(tiff, attribute, country, read_dir=None, out_dir=None) → None: Rename attributes and reprojection of Tiff file :param tiff: path to tiff file :type tiff: str :param attributes: attributes names :type attributes: list of lists :param country: contry of interest :type country: str :param read_dir: path to read external data from, defaults to c.ext_data :type read_dir: str, optional

stc_unicef_cpi.data.make_dataset.create_dataset(country_code, country, res, gpu=False, encoders=True, force=False, force_download=False, audience=False, hyper_tuning=True, lat='latnum', long='longnum', interim_dir=None, save_dir=None, model_dir=None, threshold=30, read_dir_target=None, read_dir=None, tiff_dir=None) → DataFrame: Create dataset :param country_code: country code :type country_code: str :param country: country of interest :type country: str :param res: grid resolution :type res: int :param gpu: whether to use gpus or not :type gpu: bool :param encoders: whether to append autoencoder features :type encoders: bool :param force: force clipping, defaults to False :type force: bool, optional :param force_download: force download, defaults to False :type force_download: bool, optional :param audience: whether or not to include audience estimates, defaults to False :type audience: bool, optional :param hyper_tuning: whether or not to perform hyperparameter tuning, defaults to False :type hyper_tuning: bool, optional :param lat: colname containing latitude, defaults to “latnum” :type lat: str, optional :param long: colname containing longitude, defaults to “longnum” :type long: str, optional :param interim_dir: path to interim data, defaults to c.int_data :type interim_dir: str, optional :param read_dir: path to read input, defaults to c.ext_data :type read_dir: str, optional :param save_dir: path to save output, defaults to c.proc_data :type save_dir: str, optional :param model_dir: path to model, defaults to c.base_dir_model :type model_dir: str, optional :param tiff_dir: path to tiff files, defaults to c.tiff_data :type tiff_dir: str, optional :param threshold: minimum number of surveys per hexagon, defaults to c.cutoff :type threshold: int, optional :param read_dir_target: path to directory of target data, defaults to c.raw_data :type read_dir_target: str, optional :return: dataset with features and target variable :rtype: dataframe

stc_unicef_cpi.data.make_dataset.create_target_variable(country_code, res, lat, long, threshold, read_dir, copy_to_nbrs=False) → DataFrame: Create target variable :param country_code: country code related to country of interest :type country_code: str :param res: resolution of country of interest :type res: int :param lat: latitude of country of interest :type lat: numeric :param long: longitude of country of interest :type long: numeric :param threshold: minimal number of surveys per hexagon :type threshold: int :param read_dir: directory from where to read dataset :type read_dir: str :param copy_to_nbrs: include neighbouring resolution, defaults to False :type copy_to_nbrs: bool, optional :raises ValueError: no raw survey data available :return: dataset with observations satisfying conditions :rtype: dataframe

stc_unicef_cpi.data.make_dataset.preprocessed_commuting_zones(country, res, read_dir=None) → DataFrame: Preprocess commuting zones :param country: country of interest :type country: str :param res: grid resolution :type res: int :param read_dir: path to read data from, defaults to c.ext_data :type read_dir: str, optional :return: processed information of commuting zones :rtype: dataframe

stc_unicef_cpi.data.make_dataset.preprocessed_speed_test(speed, res, country) → DataFrame: Processing speed test data :param speed: dataset containing speed test data :type speed: dataframe :param res: grid resolution :type res: int :param country: country of interest :type country: str :return: clipped speed data to country, reprojected and aggregated :rtype: dataframe

stc_unicef_cpi.data.make_dataset.preprocessed_tiff_files(country, read_dir=None, out_dir=None, force=False) → None

Preprocess tiff files

Parameters

country (str) – country of interest
read_dir (str, optional) – path to read data from, defaults to c.ext_data
out_dir (str, optional) – path to save data, defaults to c.int_data
force (bool, optional) – force clipping, defaults to False

stc_unicef_cpi.data.make_dataset.read_input_unicef(path_read) → DataFrame: Read source data provided by STC and UNICEF :param path_read: path to read data from :type path_read: str :return: database with target variable :rtype: dataframe

stc_unicef_cpi.data.make_dataset.select_country(df, country_code, lat, long) → DataFrame: Select country of interest :param df: input provided by UNICEF and STC :type df: dataframe :param country_code: country code :type country_code: str :param lat: colname containing latitude measures :type lat: numerical :param long: colname containing longitude measures :type long: numerical :return: database with info related to country of interest :rtype: dataframe

stc_unicef_cpi.data.process_geotiff module

stc_unicef_cpi.data.process_geotiff.agg_tif_to_df(df: ~pandas.core.frame.DataFrame, tiff_dir: ~typing.Union[str, ~os.PathLike, ~typing.List[str], ~typing.List[~os.PathLike]], rm_prefix: ~typing.Union[str, ~typing.Pattern[str]] = 'cpi', agg_fn: ~typing.Callable[[~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._generic_alias.ScalarType]]], ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._generic_alias.ScalarType]]] = <function mean>, max_records: int = 100000, replace_old: bool = True, resolution: int = 7, verbose: bool = False) → DataFrame

Pass df with hex_code column of numpy_int type h3 codes, and a directory with tiff files, then aggregate pixels from tiffs within each hexagon according to given function.

Note that rather than using shapefiles, this uses pixel centroid values, hence different quantities of pixels may be aggregated in each hexagon, and it will not work sensibly at all if the resolution of the tiff file is lower than the resolution of the specified hexagons.

Parameters

df (pd.DataFrame) – ‘ground truth’ dataframe to aggregate tiffs to, with hex_code column at specified resolution
tiff_dir (Union[str, PathLike]) – Either directory containing .tifs, a single .tif file, or a list of .tif files to aggregate to given df
rm_prefix (Union[str, Pattern[str]], optional) – Prefix or regex pattern to remove from file string when naming variables, defaults to “cpi”
agg_fn (Callable[[npt.NDArray], npt.NDArray], optional) – Function to use when aggregating tiff pixels within cells, defaults to np.mean
max_records (int, optional) – Max number of pixels in clipped tiff before using dask, defaults to int(1e5)
replace_old (bool, optional) – Overwrite old columns if match new data, defaults to True
resolution (int, optional) – Resolution level of h3 grid to use, defaults to 7
verbose (bool, optional) – Verbose output, defaults to False

Raises

ValueError – hex_code column not in df

Returns

Original dataframe with new columns added from aggregated values of tiffs in hexes

Return type

pd.DataFrame

stc_unicef_cpi.data.process_geotiff.clip_tif_to_ctry(file_path: Union[PathLike, str], ctry_name: str, save_dir: Optional[Union[PathLike, str]] = None) → None

Clip a GeoTIFF to a specified country boundary, and write a new file to the specified directory if given, else just plot the clipped tiff. File name is prepended with the country name.

Parameters

file_path (Union[PathLike,str]) – Path to file to clip
ctry_name (str) – Name of country to clip to
save_dir (Optional[Union[PathLike,str]], optional) – Path to directory to save to, defaults to None (just plot)

stc_unicef_cpi.data.process_geotiff.convert_tiffs_to_image_dataset(tiff_dir: Union[str, PathLike], hex_codes: Union[List[int], ndarray[Any, dtype[Union[int32, int64]]]], dim_x: int = 256, dim_y: int = 256) → ndarray[Any, dtype[ScalarType]]

Convert set of GeoTIFFs to a 4D numpy array according to specified dataset - expect the path to a directory containing all relevant GeoTIFFs with extension ‘.tif’, and a list of h3 hexagon identifiers in numpy_int form (use import h3.api.numpy_int as h3).

Returned array is in form (hex_id, band, i, j), with i, j through the band image array defaulting to size 256 x 256, as specified by dim_x, dim_y.

Parameters

tiff_dir (Union[str, PathLike]) – Path to GeoTIFF directory, with file extensions ‘.tif’
hex_codes (Union[List[int], npt.NDArray[Union[np.int32,np.int64]]]) – Set of H3 hex codes for which you wish to extract images
dim_x (int, optional) – Pixel width of extracted images, defaults to 256
dim_y (int, optional) – Pixel height of extracted images, defaults to 256

Returns

Array of images at hex coords, in shape (hex_id, band, i, j)

Return type

npt.NDArray

stc_unicef_cpi.data.process_geotiff.extract_image_at_coords(dataset: Union[Dataset, DataArray, List[Dataset], DatasetReader], lat: float, long: float, dim_x: int = 256, dim_y: int = 256, verbose: bool = False) → ndarray[Any, dtype[ScalarType]]

Extract an array of specified dimensions (num pixels) about specified lat/long - centered by default

Parameters

dataset (Union[Dataset, DataArray, List[Dataset], rasterio.io.DatasetReader]) – rioxarray or rasterio dataset (open tiff file)
lat (float) – Latitude of center point about which to extract image
long (float) – Longitude of center point about which to extract image
dim_x (int, optional) – x dimension (pixel width) of extracted image, defaults to 256
dim_y (int, optional) – y dimension (pixel height) of extracted image, defaults to 256
verbose (bool, optional) – Verbose, defaults to False

Returns

Array of tiff values (‘image’) at specified coordinates, of given size

Return type

npt.NDArray

stc_unicef_cpi.data.process_geotiff.extract_ims_from_hex_codes(datasets: Union[List[str], List[PathLike]], hex_codes: Union[List[int], ndarray[Any, dtype[Union[int32, int64]]]], width: int = 256, height: int = 256, verbose: bool = False) → ndarray[Any, dtype[ScalarType]]

For a set of datasets, specified by file path, and a set of h3 hex codes, extract centered images of specified size and return a 4D array in shape (image_idx,band,i,j).

Parameters

datasets (Union[List[str], List[PathLike]]) – List of paths to tiff files for which you want to extract (and stack) ‘image’ bands
hex_codes (Union[List[int], npt.NDArray[Union[np.int32,np.int64]]]) – Set of H3 hex codes in numpy_int format for which you wish to extract images
width (int, optional) – Width of extracted images in pixels, defaults to 256
height (int, optional) – Height of extracted images in pixels, defaults to 256
verbose (bool, optional) – Verbose, defaults to False

Returns

Extracted images in shape (image_idx,band,i,j)

Return type

npt.NDArray

stc_unicef_cpi.data.process_geotiff.geotiff_to_df(geotiff_filepath: Union[str, PathLike], spec_band_names: Optional[List[str]] = None, max_bands: int = 5, rm_prefix: Union[str, Pattern[str]] = '', verbose: bool = False) → DataFrame

Convert a geotiff file to a pandas dataframe, and print some additional info.

Parameters

geotiff_filepath (Union[str, PathLike]) – path to a geotiff file
spec_band_names (Optional[List[str]], optional) – Specified band names - only used if these are not specified in the GeoTIFF itself, at which point they are mandatory, defaults to None
max_bands (Optional[int], optional) – Max allowable bands before requires use of rast_to_agg_df, defaults to 5
rm_prefix (Union[str, Pattern[str]], optional) – Prefix (or regex pattern) to replace in file name, defaults to None
verbose (bool, optional) – verbose output, defaults to False

Raises

ValueError – No band names provided but none found either
ValueError – Number of band names provided when none found does not match number of bands
ValueError – Too many bands to handle without excessive memory - use rast_to_agg_df instead
ValueError – Problem with index resulting from conversion to df
ValueError – Problem converting bands

Returns

pandas dataframe of lat, long, val for each band

Return type

pd.DataFrame

stc_unicef_cpi.data.process_geotiff.print_tif_metadata(rioxarray_rio_obj: Union[Dataset, DataArray, List[Dataset]], name: Optional[str] = None) → None

View metadata associated with a raster file, loaded using rioxarray

Parameters

rioxarray_rio_obj (Union[Dataset, DataArray, List[Dataset]) – rioxarray dataset object
name (Optional[str], optional) – Name of tiff data, defaults to “”

stc_unicef_cpi.data.process_geotiff.rast_to_agg_df(tiff_file: ~typing.Union[str, ~pathlib.Path, bytes], agg_fn: ~typing.Callable[[~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._generic_alias.ScalarType]]], ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._generic_alias.ScalarType]]] = <function mean>, resolution: int = 7, max_bands: int = 3, verbose: bool = False) → DataFrame

Likely slower than using rioxarray fns, but benefit of handling groups of bands at a time, rather than all at once (v memory expensive) - only to be used for tiffs with many bands.

Parameters

tiff_file (Union[str, PathLike]) – Path to (many banded, large) tiff file
agg_fn (Callable[[npt.NDArray], npt.NDArray], optional) – Aggregation function, defaults to np.mean
resolution (int, optional) – Resolution of H3 grid to aggregate to, defaults to 7
max_bands (int, optional) – Max number of bands to process at one time, defaults to 3
verbose (bool, optional) – Verbose, defaults to False

Returns

Dataframe of aggregated data

Return type

pd.DataFrame

stc_unicef_cpi.data.process_geotiff.resample_tif(tif_file_path: Union[str, PathLike], dest_dir: Union[str, PathLike], rescale_factor: Optional[float] = 2.0) → None

Resample a tiff file by a given factor, using bilinear resampling - greater than 1 corresponds to increased resolution, less than 1 decreased.

Parameters

tif_file_path (Union[str, PathLike]) – Path to tiff file to resample
dest_dir (Union[str, PathLike]) – Destination directory for resampled tiff file to be written to
rescale_factor (Optional[int], optional) – Rescale factor, defaults to 2

stc_unicef_cpi.data.process_geotiff.rxr_reproject_tiff_to_target(src_tiff_file: Union[str, PathLike], target_tiff_file: Union[str, PathLike], dest_path: Optional[Union[PathLike, str]] = None, verbose: bool = False) → Optional[Union[Dataset, DataArray, List[Dataset]]]

Use rioxarray and an example (target) tiff to reproject the given (source) tiff to the same CRS and resolution.

Parameters

src_tiff_file (Union[str, PathLike]) – Path to tiff file you want to reproject
target_tiff_file (Union[str, PathLike]) – Path to tiff file that is example of desired projection and resolution
dest_path (Optional[Union[str, PathLike]], optional) – Path to write reprojected tiff to, defaults to None (just return reprojected raster)
verbose (bool, optional) – Verbosity, defaults to False

Returns

Either None (if dest_path is not None) or reprojected raster

Return type

Union[Dataset, DataArray, List[Dataset], None]

stc_unicef_cpi.data.process_netcdf module

stc_unicef_cpi.data.process_netcdf.netcdf_to_clipped_array(file_path: Union[str, PathLike], *, ctry_name: str = 'Nigeria', save_dir: Optional[Union[PathLike, str]] = None, plot: bool = False) → Union[None, ndarray[Any, dtype[ScalarType]]]

Read netCDF file and return either array clipped to specified country, or a GeoTIFF clipped to this country and saved in the specified directory with same name as before

Parameters

file_path (Union[str, PathLike]) – Path to netCDF file to reproject and clip
ctry_name (str, optional) – Country to clip to, defaults to “Nigeria”
save_dir (Optional[Union[str, PathLike]], optional) – Directory to save to, defaults to None (just return clipped array)
plot (bool, optional) – Visualise clipped array, defaults to False

Returns

Either None if save_dir is not None, or clipped array

Return type

Union[None, npt.NDArray]

stc_unicef_cpi.data.process_to_torch module

class stc_unicef_cpi.data.process_to_torch.HexDataset(tiff_dir, hex_codes, labels, width=33, height=33, transform=None, target_transform=None)

Bases: Dataset

Make a torch dataset that constructs images from tiff files according to hex codes

Parameters: Dataset (_type_) – _description_

stc_unicef_cpi.data.process_to_torch.make_torch_dataloader_from_numpy(images, labels, bs=64, shuffle=False)

Take np image dataset and dataframe, and convert to a dataset amenable to train torch models

Parameters

images (_type_) – _description_
labels (_type_) – _description_
bs (int, optional) – _description_, defaults to 64
shuffle (bool, optional) – _description_, defaults to False

Returns

_description_

Return type

_type_

stc_unicef_cpi.data.stream_data module

Data Streaming From External Sources

class stc_unicef_cpi.data.stream_data.FacebookMarketingStreamer(country, force, read_path, res, logging)

Bases: StreamerObject

Stream data from Facebook Marketing Api

implement()

class stc_unicef_cpi.data.stream_data.StreamerObject(country, force, read_path): Bases: object