loci package¶
Submodules¶
loci.analytics module¶
-
loci.analytics.
bbox
(gdf)[source]¶ Computes the bounding box of a GeoDataFrame.
- Parameters
gdf (GeoDataFrame) – A GeoDataFrame.
- Returns
A Polygon representing the bounding box enclosing all geometries in the GeoDataFrame.
-
loci.analytics.
filter_by_kwd
(df, kwd_filter, col_kwds='kwds')[source]¶ Returns a DataFrame with only those rows that contain the specified keyword.
- Parameters
df (DataFrame) – The initial DataFrame to be filtered.
kwd_filter (string) – The keyword to use for filtering.
col_kwds (string) – Name of the column containing the keywords (default: kwds).
- Returns
A GeoDataFrame with only those rows that contain kwd_filter.
-
loci.analytics.
freq_locationsets
(location_visits, location_id_col, locations, locationset_id_col, min_sup, min_length)[source]¶ Computes frequently visited sets of locations based on frequent itemset mining.
- Parameters
location_visits (DataFrame) – A DataFrame with location ids and locationset ids.
location_id_col (String) – The name of the column containing the location ids.
locationset_id_col (String) – The name of the column containing the locationsets ids.
locations (GeoDataFrame) – A GeoDataFrame containing the geometries of the locations.
min_sup (float) – The minimum support threshold.
min_length (int) – Minimum length of itemsets to be returned.
- Returns
A GeoDataFrame with the support, length and geometry of the computed location sets.
-
loci.analytics.
kwds_freq
(gdf, col_kwds='kwds', normalized=False)[source]¶ Computes the frequency of keywords in the provided GeoDataFrame.
- Parameters
gdf (GeoDataFrame) – A GeoDataFrame with a keywords column.
col_kwds (string) – The column containing the list of keywords (default: kwds).
normalized (bool) – If True, the returned frequencies are normalized in [0,1] by dividing with the number of rows in gdf (default: False).
- Returns
A dictionary containing for each keyword the number of rows it appears in.
loci.clustering module¶
-
loci.clustering.
cluster_shapes
(pois, shape_type=1, eps_per_cluster=None)[source]¶ Computes cluster shapes.
- Parameters
pois (GeoDataFrame) – The clustered POIs.
shape_type (integer) – The methods to use for computing cluster shapes (allowed values: 1-3).
eps_per_cluster (DataFrame) – The value of parameter eps used for each cluster (required by methods 2 and 3).
- Returns
A GeoDataFrame containing the cluster shapes.
-
loci.clustering.
compute_clusters
(pois, alg='dbscan', min_pts=None, eps=None, n_jobs=1)[source]¶ Computes clusters using the DBSCAN or the HDBSCAN algorithm.
- Parameters
pois (GeoDataFrame) – A POI GeoDataFrame.
alg (string) – The clustering algorithm to use (dbscan or hdbscan; default: dbscan).
min_pts (integer) – The minimum number of neighbors for a dense point.
eps (float) – The neighborhood radius.
n_jobs (integer) – Number of parallel jobs to run in the algorithm (default: 1)
- Returns
A GeoDataFrame containing the clustered POIs and their labels. The value of parameter eps for each cluster is also returned (which varies in the case of HDBSCAN).
loci.topics module¶
-
loci.topics.
topic_modeling
(clusters, label_col='cluster_id', kwds_col='kwds', num_of_topics=3, kwds_per_topic=10)[source]¶ Models clusters as documents, extracts topics, and assigns topics to clusters.
- Parameters
clusters (GeoDataFrame) – A POI GeoDataFrame with assigned cluster labels.
label_col (string) – The name of the column containing the cluster labels (default: label).
kwds_col (string) – The name of the column containing the keywords of each POI (default: kwds).
num_of_topics (int) – The number of topics to extract (default: 3).
kwds_per_topic (int) – The number of keywords to return per topic (default: 10).
- Returns
A DataFrame containing the clusters-to-topics assignments and a DataFrame containing the topics-to-keywords assignments.
loci.index module¶
-
loci.index.
grid
(pois, cell_width=None, cell_height=None, cell_size_ratio=0.01, znorm=False, neighborhood=False)[source]¶ Constructs a uniform grid from the given POIs.
If cell_width and cell_height are provided, each grid cell has size cell_width * cell_height. Otherwise, cell_width = cell_size_ratio * area_width and cell_height = cell_size_ratio * area_height, where area refers to the bounding box of pois.
Each cell is assigned a score, which is the number of points within that cell.
If neighborhood is True, each cell is assigned an additional score (score_nb), which is the total number of points within that cell and its adjacent cells.
If znorm is True, the above scores are also provided in their z-normalized variants, score_znorm and score_nb_znorm.
The constructed grid is represented by a GeoDataFrame where each row corresponds to a grid cell and contains the following columns:
cell_id: The id of the cell (integer computed as: cell_x * num_columns + cell_y)
cell_x: The row of the cell in the grid (integer).
cell_y: The column of the cell in the grid (integer).
score: see above
score_nb: see above
score_znorm: see above
score_nb_znorm: see above
‘contents’: list of points in the cell.
‘geometry’: Geometry column of the GeoDataFrame that contains the polygon representing the cell boundaries.
- Parameters
pois (GeoDataFrame) – a POIs GeoDataFrame.
cell_width (float) – cell width.
cell_height (float) – cell height.
cell_size_ratio (float) – ratio of cell width and height to area width and height (default: 0.01).
znorm (bool) – Whether to include z-normalized scores (default: False).
neighborhood (bool) – Whether to include a total score including adjacent cells (default: False).
- Returns
A GeoDataFrame as described above.
loci.io module¶
-
loci.io.
import_osmnx
(bound, target_crs='EPSG:4326')[source]¶ Creates a POI GeoDataFrame from POIs retrieved by OSMNX (https://github.com/gboeing/osmnx).
- Parameters
bound (polygon) – A polygon to be used as filter.
target_crs (string) – Coordinate Reference System of the GeoDataFrame to be created (default: EPSG:4326).
- Returns
A POI GeoDataFrame with columns id, name and kwds.
-
loci.io.
import_osmwrangle
(osmwrangle_file, target_crs='EPSG:4326', bound=None)[source]¶ Creates a POI GeoDataFrame from a file produced by OSMWrangle (https://github.com/SLIPO-EU/OSMWrangle).
- Parameters
osmwrangle_file (string) – Path or URL to the input csv file.
target_crs (string) – Coordinate Reference System of the GeoDataFrame to be created (default: EPSG:4326).
bound (polygon) – A polygon to be used as filter.
- Returns
A POI GeoDataFrame with columns id, name and kwds.
-
loci.io.
read_poi_csv
(input_file, col_id='id', col_name='name', col_lon='lon', col_lat='lat', col_kwds='kwds', col_sep=';', kwds_sep=', ', source_crs='EPSG:4326', target_crs='EPSG:4326', keep_other_cols=False)[source]¶ Creates a POI GeoDataFrame from an input CSV file.
- Parameters
input_file (string) – Path to the input csv file.
col_id (string) – Name of the column containing the POI id (default: id).
col_name (string) – Name of the column containing the POI name (default: name).
col_lon (string) – Name of the column containing the POI longitude (default: lon).
col_lat (string) – Name of the column containing the POI latitude (default: lat).
col_kwds (string) – Name of the column containing the POI keywords (default: kwds).
col_sep (string) – Column delimiter (default: ;).
kwds_sep (string) – Keywords delimiter (default: ,).
source_crs (string) – Coordinate Reference System of input data (default: EPSG:4326).
target_crs (string) – Coordinate Reference System of the GeoDataFrame to be created (default: EPSG:4326).
keep_other_cols (bool) – Whether to keep the rest of the columns in the csv file (default: False).
- Returns
A POI GeoDataFrame with columns id, name and kwds.
loci.plots module¶
-
loci.plots.
barchart
(data, orientation='Vertical', x_axis_label='', y_axis_label='', plot_title='', bar_width=0.5, plot_width=15, plot_height=5, top_k=10)[source]¶ Plots a bar chart with the given data.
- Parameters
data (dict) – The data to plot.
orientation (string) – The orientation of the bars in the plot (Vertical or Horizontal; default: Vertical).
x_axis_label (string) – Label of x axis.
y_axis_label (string) – Label of y axis.
plot_title (string) – Title of the plot.
bar_width (scalar) – The width of the bars (default: 0.5).
plot_width (scalar) – The width of the plot (default: 15).
plot_height (scalar) – The height of the plot (default: 5).
top_k (integer) – Top k results (if -1, show all; default: 10).
- Returns
A Matplotlib plot displaying the bar chart.
-
loci.plots.
heatmap
(pois, tiles='OpenStreetMap', width='100%', height='100%', radius=10)[source]¶ Generates a heatmap of the input POIs.
- Parameters
pois (GeoDataFrame) – A POIs GeoDataFrame.
tiles (string) – The tiles to use for the map (default: OpenStreetMap).
width (integer or percentage) – Width of the map in pixels or percentage (default: 100%).
height (integer or percentage) – Height of the map in pixels or percentage (default: 100%).
radius (float) – Radius of each point of the heatmap (default: 10).
- Returns
A Folium Map object displaying the heatmap generated from the POIs.
-
loci.plots.
map_choropleth
(areas, id_field, value_field, fill_color='YlOrRd', fill_opacity=0.6, num_bins=5, tiles='OpenStreetMap', width='100%', height='100%')[source]¶ Returns a Folium Map showing the clusters. Map center and zoom level are set automatically.
- Parameters
areas (GeoDataFrame) – A GeoDataFrame containing the areas to be displayed.
id_field (string) – The name of the column to use as id.
value_field (string) – The name of the column indicating the area’s value.
fill_color (string) – A string indicating a Matplotlib colormap (default: YlOrRd).
fill_opacity (float) – Opacity level (default: 0.6).
num_bins (int) – The number of bins for the threshold scale (default: 5).
tiles (string) – The tiles to use for the map (default: OpenStreetMap).
width (integer or percentage) – Width of the map in pixels or percentage (default: 100%).
height (integer or percentage) – Height of the map in pixels or percentage (default: 100%).
- Returns
A Folium Map object displaying the given clusters.
-
loci.plots.
map_cluster_contents_osm
(cluster_borders, tiles='OpenStreetMap', width='100%', height='100%')[source]¶ Constructs a Folium Map displaying the streets and buildings, retreived from OpenStreetMap via OSMNX, within a given AOI.
- Parameters
cluster_borders (GeoDataFrame) – The cluster polygons.
tiles (string) – The tiles to use for the map (default: OpenStreetMap).
width (integer or percentage) – Width of the map in pixels or percentage (default: 100%).
height (integer or percentage) – Height of the map in pixels or percentage (default: 100%).
- Returns
A Folium Map object displaying the retreived entities.
-
loci.plots.
map_cluster_diff
(clusters_a, clusters_b, intersection_color='#00ff00', diff_ab_color='#0000ff', diff_ba_color='#ff0000', tiles='OpenStreetMap', width='100%', height='100%')[source]¶ Returns a Folium Map displaying the differences between two sets of clusters. Map center and zoom level are set automatically.
- Parameters
clusters_a (GeoDataFrame) – The first set of clusters.
clusters_b (GeoDataFrame) – The second set of clusters.
intersection_color (color code) – The color to use for A & B.
diff_ab_color (color code) – The color to use for A - B.
diff_ba_color (color code) – The color to use for B - A.
tiles (string) – The tiles to use for the map (default: OpenStreetMap).
width (integer or percentage) – Width of the map in pixels or percentage (default: 100%).
height (integer or percentage) – Height of the map in pixels or percentage (default: 100%).
- Returns
A Folium Map object displaying cluster intersections and differences.
-
loci.plots.
map_clusters_with_topics
(clusters_topics, viz_type='dominant', col_id='cluster_id', col_dominant='Dominant Topic', colormap='tab10', red='Topic0', green='Topic1', blue='Topic2', single_topic='Topic0', tiles='OpenStreetMap', width='100%', height='100%')[source]¶ Returns a Folium Map showing the clusters colored based on their topics.
- Parameters
clusters_topics (GeoDataFrame) – A GeoDataFrame containing the clusters to be displayed and their topics.
viz_type (string) – Indicates how to assign colors based on topics. One of: ‘dominant’, ‘single’, ‘rgb’.
col_id (string) – The name of the column indicating the cluster id (default: cluster_id).
col_dominant (string) – The name of the column indicating the dominant topic (default: Dominant Topic).
colormap (string) – A string indicating a Matplotlib colormap (default: tab10).
red (string) – The name of the column indicating the topic to assign to red (default: Topic0).
green (string) – The name of the column indicating the topic to assign to green (default: Topic1).
blue (string) – The name of the column indicating the topic to assign to blue (default: Topic2).
single_topic (string) – The name of the column indicating the topic to use (default: Topic0).
tiles (string) – The tiles to use for the map (default: OpenStreetMap).
width (integer or percentage) – Width of the map in pixels or percentage (default: 100%).
height (integer or percentage) – Height of the map in pixels or percentage (default: 100%).
- Returns
A Folium Map object displaying the given clusters colored by their topics.
-
loci.plots.
map_geometries
(gdf, tiles='OpenStreetMap', width='100%', height='100%')[source]¶ Returns a Folium Map displaying the provided geometries. Map center and zoom level are set automatically.
- Parameters
gdf (GeoDataFrame) – A GeoDataFrame containing the geometries to be displayed.
tiles (string) – The tiles to use for the map (default: OpenStreetMap).
width (integer or percentage) – Width of the map in pixels or percentage (default: 100%).
height (integer or percentage) – Height of the map in pixels or percentage (default: 100%).
- Returns
A Folium Map object displaying the given geometries.
-
loci.plots.
map_geometry
(geom, tiles='OpenStreetMap', width='100%', height='100%')[source]¶ Returns a Folium Map displaying the provided geometry. Map center and zoom level are set automatically.
- Parameters
geom (Shapely Geometry) – A geometry to be displayed.
tiles (string) – The tiles to use for the map (default: OpenStreetMap).
width (integer or percentage) – Width of the map in pixels or percentage (default: 100%).
height (integer or percentage) – Height of the map in pixels or percentage (default: 100%).
- Returns
A Folium Map object displaying the given geometry.
-
loci.plots.
map_points
(pois, tiles='OpenStreetMap', width='100%', height='100%', show_bbox=False)[source]¶ Returns a Folium Map displaying the provided points. Map center and zoom level are set automatically.
- Parameters
pois (GeoDataFrame) – A GeoDataFrame containing the POIs to be displayed.
tiles (string) – The tiles to use for the map (default: OpenStreetMap).
width (integer or percentage) – Width of the map in pixels or percentage (default: 100%).
height (integer or percentage) – Height of the map in pixels or percentage (default: 100%).
show_bbox (bool) – Whether to show the bounding box of the GeoDataFrame (default: False).
- Returns
A Folium Map object displaying the given POIs.
-
loci.plots.
plot_wordcloud
(pois, bg_color='black', width=400, height=200)[source]¶ Generates and plots a word cloud from the keywords of the given POIs.
- Parameters
pois (GeoDataFrame) – The POIs from which the keywords will be used to generate the word cloud.
bg_color (string) – The background color to use for the plot (default: black).
width (int) – The width of the plot.
height (int) – The height of the plot.