Code Documentation

The following sections provide details of functions, their arguments, and outputs to help tweaking the code for individual purposes.

mor_fun.py

Script provides functions for application at all levels, for instance, to plot data. more_fun is an acronym for ‘morpho-analyst functions’ or ‘more fun’, depending on your preference

bedPyLoad.mor_fun.annotated_plot(df, target_var, num_var, x_label=None, y_label=None, plot_type='boxplot', fig_format='png', fig_path=None, color_palette=None, dpi=300, bbox='tight', test='Kruskal', text_format='simple', text_offset=7, y_scale=None)[source]

Make an annotated plot with statannotations.stats. Read more about statannoation usage at https://github.com/trevismd/statannotations/tree/master/usage

Parameters:

df (pd.DataFrame) – DataFrame containing categorical and numerical data to be boxplotted. Categories will occur on the x-axis according to target_var. Numerical data according tu num_var
target_var (str) – name of a target variable that must be contained in the column names of df
num_var (str) – name of a numerical variable on the Y-axis that must be contained in the column names of df
x_label (str) – if provided, this string replaces the column name of target_var (categorical)
y_label (str) – if provided, this string replaces the column name of the numeric y variable
plot_type (str) – default is ‘boxplot’, options are ‘violinplot’, ‘swarmplot’
fig_format (str) – file ending of image file; default is ‘png’
fig_path (str) – name and directory of image (figure) to save WITHOUT FILE FORMAT ending, MUST end on ‘/’
color_palette (str,list,dict) – colors to be used with the hue variable
dpi (int) – dots per inch for figure (default is 300)
bbox (str) – default ‘tight’ applies narrow figure margins
test (str) – type of statistical test for calculating p-values. Default is ‘Kruskal’. Options are defined in statannotations.stats.StatTest.STATTEST_LIBRARY (line 88ff)
text_format (str) – formatting of p-value annotations. Default is ‘simple’. Options are ‘star’ and ‘full’
text_offset (int) – number of pixels for offset of p-value annotations. Default is 5.
y_scale (str) – default is None but can be set to ‘log’ for logarithmic y axis.

Return int:

0 = success, -1 = error occurred

bedPyLoad.mor_fun.get_color_list(n, name='hsv')[source]

Returns a list of n RGB colors

Parameters:

n – size of colormap list
name (str) – type of color map - must be a standard matplotlib colormap name

Return list:

colormap of size n

bedPyLoad.mor_fun.plot_df_completeness(df, figure_base_name='base', replace_col_names=None)[source]

Uses missingno package to create a plot of dataframe completeness

Parameters:

df (pandas.DataFrame) – Dataframe to be plotted
figure_base_name (str) – syllable to be used with figure names
replace_col_names (dict) – optional argument to overwrite column names

Returns:

write plot

bedPyLoad.mor_fun.plot_df_correlations(df, figure_base_name='base', fontsize=16, replace_col_names=None)[source]

Creates a heatmap plot of correlations

Parameters:

df (pandas.DataFrame) – Dataframe to be plotted
figure_base_name (str) – syllable to be used with figure names
fontsize (int) – font size
replace_col_names (dict) – optional argument to overwrite column names

Returns:

write plot

bedPyLoad.mor_fun.stats_test(dataframe: pandas.DataFrame, numeric_var_name: str, target_columns: list, numeric_var_as_categories_name: str = None, stats_results_xlsx: str = 'stats-results.xlsx', figure_path: str = 'fitting-results/figures/')[source]

Runs Dunn posthoc test on categories with reference to a non-normally distributed variable defined as numeric_var_name. This function is tweaked for this package and requires the global variables defined in config.py.

Parameters:

dataframe – A pandas dataframe containing all numeric and categorical data
numeric_var_name – Name of a numerical variable (typical response variable) to be tested. MUST be a column name of dataframe
target_columns – List of column names to be tested for differences with the numeric variable
numeric_var_as_categories_name – For the Dunn test, the numerical variable should also be categorized (e.g. in categories ‘low’, ‘average’, ‘high’). This argument is the name of a column in dataframe that contains the numerical variable as categories.
stats_results_xlsx – Name of an xlsx file to store Dunn test results (default name applies if not provided)
figure_path – directory or subdirectory where figures will be stored; MUST end on ‘/’

Return int success:

successful execution when 0, otherwise -1

workbook_fun.py

bedload_base_stats.py

plot_correlations.py

Plot and save dataset completeness and Spearman correlatins

plot_histograms.py

Plot and save histograms of measurement frequency per variable category

plot_histograms.plot_category_histograms(directory)[source]

plot histograms of all dataframe columns (as per the config.py) :param str directory: tell where plots should be saved; either absolute or relative path ending on ‘/’

Important: relative paths should not start with ‘/’ or ‘'

Returns:: 0 if successful