loci package

Submodules

loci.analytics module

loci.analytics.bbox(gdf)[source]

Computes the bounding box of a GeoDataFrame.

Parameters

gdf (GeoDataFrame) – A GeoDataFrame.

Returns

A Polygon representing the bounding box enclosing all geometries in the GeoDataFrame.

loci.analytics.filter_by_kwd(df, kwd_filter, col_kwds='kwds')[source]

Returns a DataFrame with only those rows that contain the specified keyword.

Parameters
  • df (DataFrame) – The initial DataFrame to be filtered.

  • kwd_filter (string) – The keyword to use for filtering.

  • col_kwds (string) – Name of the column containing the keywords (default: kwds).

Returns

A GeoDataFrame with only those rows that contain kwd_filter.

loci.analytics.freq_locationsets(location_visits, location_id_col, locations, locationset_id_col, min_sup, min_length)[source]

Computes frequently visited sets of locations based on frequent itemset mining.

Parameters
  • location_visits (DataFrame) – A DataFrame with location ids and locationset ids.

  • location_id_col (String) – The name of the column containing the location ids.

  • locationset_id_col (String) – The name of the column containing the locationsets ids.

  • locations (GeoDataFrame) – A GeoDataFrame containing the geometries of the locations.

  • min_sup (float) – The minimum support threshold.

  • min_length (int) – Minimum length of itemsets to be returned.

Returns

A GeoDataFrame with the support, length and geometry of the computed location sets.

loci.analytics.kwds_freq(gdf, col_kwds='kwds', normalized=False)[source]

Computes the frequency of keywords in the provided GeoDataFrame.

Parameters
  • gdf (GeoDataFrame) – A GeoDataFrame with a keywords column.

  • col_kwds (string) – The column containing the list of keywords (default: kwds).

  • normalized (bool) – If True, the returned frequencies are normalized in [0,1] by dividing with the number of rows in gdf (default: False).

Returns

A dictionary containing for each keyword the number of rows it appears in.

loci.clustering module

loci.clustering.cluster_shapes(pois, shape_type=1, eps_per_cluster=None)[source]

Computes cluster shapes.

Parameters
  • pois (GeoDataFrame) – The clustered POIs.

  • shape_type (integer) – The methods to use for computing cluster shapes (allowed values: 1-3).

  • eps_per_cluster (DataFrame) – The value of parameter eps used for each cluster (required by methods 2 and 3).

Returns

A GeoDataFrame containing the cluster shapes.

loci.clustering.compute_clusters(pois, alg='dbscan', min_pts=None, eps=None, n_jobs=1)[source]

Computes clusters using the DBSCAN or the HDBSCAN algorithm.

Parameters
  • pois (GeoDataFrame) – A POI GeoDataFrame.

  • alg (string) – The clustering algorithm to use (dbscan or hdbscan; default: dbscan).

  • min_pts (integer) – The minimum number of neighbors for a dense point.

  • eps (float) – The neighborhood radius.

  • n_jobs (integer) – Number of parallel jobs to run in the algorithm (default: 1)

Returns

A GeoDataFrame containing the clustered POIs and their labels. The value of parameter eps for each cluster is also returned (which varies in the case of HDBSCAN).

loci.topics module

loci.topics.topic_modeling(clusters, label_col='cluster_id', kwds_col='kwds', num_of_topics=3, kwds_per_topic=10)[source]

Models clusters as documents, extracts topics, and assigns topics to clusters.

Parameters
  • clusters (GeoDataFrame) – A POI GeoDataFrame with assigned cluster labels.

  • label_col (string) – The name of the column containing the cluster labels (default: label).

  • kwds_col (string) – The name of the column containing the keywords of each POI (default: kwds).

  • num_of_topics (int) – The number of topics to extract (default: 3).

  • kwds_per_topic (int) – The number of keywords to return per topic (default: 10).

Returns

A DataFrame containing the clusters-to-topics assignments and a DataFrame containing the topics-to-keywords assignments.

loci.index module

loci.index.grid(pois, cell_width=None, cell_height=None, cell_size_ratio=0.01, znorm=False, neighborhood=False)[source]

Constructs a uniform grid from the given POIs.

If cell_width and cell_height are provided, each grid cell has size cell_width * cell_height. Otherwise, cell_width = cell_size_ratio * area_width and cell_height = cell_size_ratio * area_height, where area refers to the bounding box of pois.

Each cell is assigned a score, which is the number of points within that cell.

If neighborhood is True, each cell is assigned an additional score (score_nb), which is the total number of points within that cell and its adjacent cells.

If znorm is True, the above scores are also provided in their z-normalized variants, score_znorm and score_nb_znorm.

The constructed grid is represented by a GeoDataFrame where each row corresponds to a grid cell and contains the following columns:

  • cell_id: The id of the cell (integer computed as: cell_x * num_columns + cell_y)

  • cell_x: The row of the cell in the grid (integer).

  • cell_y: The column of the cell in the grid (integer).

  • score: see above

  • score_nb: see above

  • score_znorm: see above

  • score_nb_znorm: see above

  • ‘contents’: list of points in the cell.

  • ‘geometry’: Geometry column of the GeoDataFrame that contains the polygon representing the cell boundaries.

Parameters
  • pois (GeoDataFrame) – a POIs GeoDataFrame.

  • cell_width (float) – cell width.

  • cell_height (float) – cell height.

  • cell_size_ratio (float) – ratio of cell width and height to area width and height (default: 0.01).

  • znorm (bool) – Whether to include z-normalized scores (default: False).

  • neighborhood (bool) – Whether to include a total score including adjacent cells (default: False).

Returns

A GeoDataFrame as described above.

loci.io module

loci.io.import_osmnx(bound, target_crs='EPSG:4326')[source]

Creates a POI GeoDataFrame from POIs retrieved by OSMNX (https://github.com/gboeing/osmnx).

Parameters
  • bound (polygon) – A polygon to be used as filter.

  • target_crs (string) – Coordinate Reference System of the GeoDataFrame to be created (default: EPSG:4326).

Returns

A POI GeoDataFrame with columns id, name and kwds.

loci.io.import_osmwrangle(osmwrangle_file, target_crs='EPSG:4326', bound=None)[source]

Creates a POI GeoDataFrame from a file produced by OSMWrangle (https://github.com/SLIPO-EU/OSMWrangle).

Parameters
  • osmwrangle_file (string) – Path or URL to the input csv file.

  • target_crs (string) – Coordinate Reference System of the GeoDataFrame to be created (default: EPSG:4326).

  • bound (polygon) – A polygon to be used as filter.

Returns

A POI GeoDataFrame with columns id, name and kwds.

loci.io.read_poi_csv(input_file, col_id='id', col_name='name', col_lon='lon', col_lat='lat', col_kwds='kwds', col_sep=';', kwds_sep=', ', source_crs='EPSG:4326', target_crs='EPSG:4326', keep_other_cols=False)[source]

Creates a POI GeoDataFrame from an input CSV file.

Parameters
  • input_file (string) – Path to the input csv file.

  • col_id (string) – Name of the column containing the POI id (default: id).

  • col_name (string) – Name of the column containing the POI name (default: name).

  • col_lon (string) – Name of the column containing the POI longitude (default: lon).

  • col_lat (string) – Name of the column containing the POI latitude (default: lat).

  • col_kwds (string) – Name of the column containing the POI keywords (default: kwds).

  • col_sep (string) – Column delimiter (default: ;).

  • kwds_sep (string) – Keywords delimiter (default: ,).

  • source_crs (string) – Coordinate Reference System of input data (default: EPSG:4326).

  • target_crs (string) – Coordinate Reference System of the GeoDataFrame to be created (default: EPSG:4326).

  • keep_other_cols (bool) – Whether to keep the rest of the columns in the csv file (default: False).

Returns

A POI GeoDataFrame with columns id, name and kwds.

loci.io.retrieve_osm_loc(name, buffer_dist=0)[source]

Retrieves a polygon from an OSM location.

Parameters
  • name (string) – Name of the location to be resolved.

  • buffer_dist (numeric) – Buffer distance in meters.

Returns

A polygon.

loci.io.to_geojson(gdf, output_file)[source]

Exports a GeoDataFrame to a GeoJSON file.

Parameters
  • gdf (GeoDataFrame) – The GeoDataFrame object to be exported.

  • output_file (string) – Path to the output file.

loci.plots module

loci.plots.barchart(data, orientation='Vertical', x_axis_label='', y_axis_label='', plot_title='', bar_width=0.5, plot_width=15, plot_height=5, top_k=10)[source]

Plots a bar chart with the given data.

Parameters
  • data (dict) – The data to plot.

  • orientation (string) – The orientation of the bars in the plot (Vertical or Horizontal; default: Vertical).

  • x_axis_label (string) – Label of x axis.

  • y_axis_label (string) – Label of y axis.

  • plot_title (string) – Title of the plot.

  • bar_width (scalar) – The width of the bars (default: 0.5).

  • plot_width (scalar) – The width of the plot (default: 15).

  • plot_height (scalar) – The height of the plot (default: 5).

  • top_k (integer) – Top k results (if -1, show all; default: 10).

Returns

A Matplotlib plot displaying the bar chart.

loci.plots.heatmap(pois, tiles='OpenStreetMap', width='100%', height='100%', radius=10)[source]

Generates a heatmap of the input POIs.

Parameters
  • pois (GeoDataFrame) – A POIs GeoDataFrame.

  • tiles (string) – The tiles to use for the map (default: OpenStreetMap).

  • width (integer or percentage) – Width of the map in pixels or percentage (default: 100%).

  • height (integer or percentage) – Height of the map in pixels or percentage (default: 100%).

  • radius (float) – Radius of each point of the heatmap (default: 10).

Returns

A Folium Map object displaying the heatmap generated from the POIs.

loci.plots.map_choropleth(areas, id_field, value_field, fill_color='YlOrRd', fill_opacity=0.6, num_bins=5, tiles='OpenStreetMap', width='100%', height='100%')[source]

Returns a Folium Map showing the clusters. Map center and zoom level are set automatically.

Parameters
  • areas (GeoDataFrame) – A GeoDataFrame containing the areas to be displayed.

  • id_field (string) – The name of the column to use as id.

  • value_field (string) – The name of the column indicating the area’s value.

  • fill_color (string) – A string indicating a Matplotlib colormap (default: YlOrRd).

  • fill_opacity (float) – Opacity level (default: 0.6).

  • num_bins (int) – The number of bins for the threshold scale (default: 5).

  • tiles (string) – The tiles to use for the map (default: OpenStreetMap).

  • width (integer or percentage) – Width of the map in pixels or percentage (default: 100%).

  • height (integer or percentage) – Height of the map in pixels or percentage (default: 100%).

Returns

A Folium Map object displaying the given clusters.

loci.plots.map_cluster_contents_osm(cluster_borders, tiles='OpenStreetMap', width='100%', height='100%')[source]

Constructs a Folium Map displaying the streets and buildings, retreived from OpenStreetMap via OSMNX, within a given AOI.

Parameters
  • cluster_borders (GeoDataFrame) – The cluster polygons.

  • tiles (string) – The tiles to use for the map (default: OpenStreetMap).

  • width (integer or percentage) – Width of the map in pixels or percentage (default: 100%).

  • height (integer or percentage) – Height of the map in pixels or percentage (default: 100%).

Returns

A Folium Map object displaying the retreived entities.

loci.plots.map_cluster_diff(clusters_a, clusters_b, intersection_color='#00ff00', diff_ab_color='#0000ff', diff_ba_color='#ff0000', tiles='OpenStreetMap', width='100%', height='100%')[source]

Returns a Folium Map displaying the differences between two sets of clusters. Map center and zoom level are set automatically.

Parameters
  • clusters_a (GeoDataFrame) – The first set of clusters.

  • clusters_b (GeoDataFrame) – The second set of clusters.

  • intersection_color (color code) – The color to use for A & B.

  • diff_ab_color (color code) – The color to use for A - B.

  • diff_ba_color (color code) – The color to use for B - A.

  • tiles (string) – The tiles to use for the map (default: OpenStreetMap).

  • width (integer or percentage) – Width of the map in pixels or percentage (default: 100%).

  • height (integer or percentage) – Height of the map in pixels or percentage (default: 100%).

Returns

A Folium Map object displaying cluster intersections and differences.

loci.plots.map_clusters_with_topics(clusters_topics, viz_type='dominant', col_id='cluster_id', col_dominant='Dominant Topic', colormap='tab10', red='Topic0', green='Topic1', blue='Topic2', single_topic='Topic0', tiles='OpenStreetMap', width='100%', height='100%')[source]

Returns a Folium Map showing the clusters colored based on their topics.

Parameters
  • clusters_topics (GeoDataFrame) – A GeoDataFrame containing the clusters to be displayed and their topics.

  • viz_type (string) – Indicates how to assign colors based on topics. One of: ‘dominant’, ‘single’, ‘rgb’.

  • col_id (string) – The name of the column indicating the cluster id (default: cluster_id).

  • col_dominant (string) – The name of the column indicating the dominant topic (default: Dominant Topic).

  • colormap (string) – A string indicating a Matplotlib colormap (default: tab10).

  • red (string) – The name of the column indicating the topic to assign to red (default: Topic0).

  • green (string) – The name of the column indicating the topic to assign to green (default: Topic1).

  • blue (string) – The name of the column indicating the topic to assign to blue (default: Topic2).

  • single_topic (string) – The name of the column indicating the topic to use (default: Topic0).

  • tiles (string) – The tiles to use for the map (default: OpenStreetMap).

  • width (integer or percentage) – Width of the map in pixels or percentage (default: 100%).

  • height (integer or percentage) – Height of the map in pixels or percentage (default: 100%).

Returns

A Folium Map object displaying the given clusters colored by their topics.

loci.plots.map_geometries(gdf, tiles='OpenStreetMap', width='100%', height='100%')[source]

Returns a Folium Map displaying the provided geometries. Map center and zoom level are set automatically.

Parameters
  • gdf (GeoDataFrame) – A GeoDataFrame containing the geometries to be displayed.

  • tiles (string) – The tiles to use for the map (default: OpenStreetMap).

  • width (integer or percentage) – Width of the map in pixels or percentage (default: 100%).

  • height (integer or percentage) – Height of the map in pixels or percentage (default: 100%).

Returns

A Folium Map object displaying the given geometries.

loci.plots.map_geometry(geom, tiles='OpenStreetMap', width='100%', height='100%')[source]

Returns a Folium Map displaying the provided geometry. Map center and zoom level are set automatically.

Parameters
  • geom (Shapely Geometry) – A geometry to be displayed.

  • tiles (string) – The tiles to use for the map (default: OpenStreetMap).

  • width (integer or percentage) – Width of the map in pixels or percentage (default: 100%).

  • height (integer or percentage) – Height of the map in pixels or percentage (default: 100%).

Returns

A Folium Map object displaying the given geometry.

loci.plots.map_points(pois, tiles='OpenStreetMap', width='100%', height='100%', show_bbox=False)[source]

Returns a Folium Map displaying the provided points. Map center and zoom level are set automatically.

Parameters
  • pois (GeoDataFrame) – A GeoDataFrame containing the POIs to be displayed.

  • tiles (string) – The tiles to use for the map (default: OpenStreetMap).

  • width (integer or percentage) – Width of the map in pixels or percentage (default: 100%).

  • height (integer or percentage) – Height of the map in pixels or percentage (default: 100%).

  • show_bbox (bool) – Whether to show the bounding box of the GeoDataFrame (default: False).

Returns

A Folium Map object displaying the given POIs.

loci.plots.plot_wordcloud(pois, bg_color='black', width=400, height=200)[source]

Generates and plots a word cloud from the keywords of the given POIs.

Parameters
  • pois (GeoDataFrame) – The POIs from which the keywords will be used to generate the word cloud.

  • bg_color (string) – The background color to use for the plot (default: black).

  • width (int) – The width of the plot.

  • height (int) – The height of the plot.

Module contents