Cloud Continuous Generation and Publication of Docstring Documentation on Azure – using Sphinx, Pydoc, Storage Account and App Service – AMIS, Data Driven Blog

In this blog I will explain how to generate static HTML pages from your projects Pydoc (docstring) comments with Sphinx. Then we are going to host it in an Azure Web App so that everyone in your team is able to access it. Because we use a Storage Mount, when new html files are generated, you just have to replace them in the storage account and it will be reflected on the endpoint.

This way you always have a hosted version of the latest documentation. See figure 1 for the architecture.

Figure 1: Architecture

Why?

For big data engineering projects we use a lot of Azure Databricks. We created many Jupyter notebooks that live inside databricks. When you want to reuse your code, there is no easy way to do that. Besides that, testing is not easily done with plain Jupyter notebooks. With a library we can unit test all of our functionality before deploying it. This gives our team a lot of confidence while developing.

Because of these issues, we decided to create a library with all functionalities separated by packages. When this is installed on the cluster, we can easily call all functionality in the notebooks.

When you find a bug in a certain function, we don’t have to fix it in every notebook where we implemented the same code (duplication of code). Now we only have to fix the library and install it on our cluster (with CI/CD).

When programming notebooks we can instantly see the definition, parameters and an explanation of every function because we use Pydoc. But when you want to search a certain function, or easily see all functionality, we use the hosted version of our documentation.

Please read below on how to achieve this.

Repository

All code and settings you need for this blog are located in this repository: https://github.com/samvruggink/hosting-sphinx-docs-in-azure-webapp-blog

Step 1: Pydoc (docstring)

First of all we need to document our functions, we are using the industry standard Pydoc for this. Pydoc enables us to document our code in an easy way, please see the code block below for an example.

def plusOne(number: int) -> int:
    """[summary]

    Args:
        number (int): 

In this blog I will explain how to generate static HTML pages from your projects Pydoc (docstring) comments with Sphinx. Then we are going to host it in an Azure Web App so that everyone in your team is able to access it. Because we use a Storage Mount, when […]

The post Continuous Generation and Publication of Docstring Documentation on Azure – using Sphinx, Pydoc, Storage Account and App Service appeared first on AMIS, Data Driven Blog - Oracle & Microsoft Azure.

Returns: int:

In this blog I will explain how to generate static HTML pages from your projects Pydoc (docstring) comments with Sphinx. Then we are going to host it in an Azure Web App so that everyone in your team is able to access it. Because we use a Storage Mount, when […]

The post Continuous Generation and Publication of Docstring Documentation on Azure – using Sphinx, Pydoc, Storage Account and App Service appeared first on AMIS, Data Driven Blog - Oracle & Microsoft Azure.

"""

The first part is the summary where you can give a short description of what the function actually does. Afterwards you can add a description to all the args. If you specify a return value, add a description to it as well. Below is an example of a function where this is implemented.

def read_parquet(spark: SparkSession, path: str) -> DataFrame:
    """Reads a parquet and returns a DataFrame

    Args:
        spark (SparkSession): SparkSession
        path (str): path of the input file/dir

    Returns:
        [DataFrame]: A Dataframe with parquet data
    """
    df = spark.read.format("parquet").load(path, inferSchema=True)
    if isinstance(df, DataFrame):
        logger.info(f"Read parquet : type(df).__name__")
    else:
        logger.error(
            f"Is an instance of : type(df).__name__, not a DataFrame, exiting now !"
        )
    return df

When you have all your functions documented it’s time to generate Sphinx documentation.

Step 2: Generate Sphinx static HTML from your Pydoc definitions

Sphinx is an amazing library to generate static html files from pydoc. It’s super customizable with endless possibilities. This also makes it a bit more complex, the guide below will explain how to generate static HTML files from your src folder using a standard template.

This is our project structure:

Demo-project-sphinx-doc
|-- src
|   |-- __init__.py
|   |-- foo.py
|   |-- bar.py
|-- test
|   |-- __init__.py
|   |-- test_foo.py
|   |-- test_bar.py
|-- source
|    |--index.rst
|    |--conf.py
|    |-- _templates
|       |-- custom-module-template.rst
|       |-- custom-class-template.rst
| Makefile
| make.bat

We want to generate data from our functions in foo.py and bar.py. First you need to install Sphinx on your computer. You also need to have pip for this. pip is a package manager for Python (same as maven, npm, nuget). You can download and find more information here

In the project root execute the following commands:

pip install sphinx
sphinx-quickstart

sphinx-quickstart will generate basic configuration files, we are keeping the default source name directory, but you can change it. When it asks you to separate source and build directories, type “y“.

This will give you the following structure (see figure 2)

Figure 2: Project structure

Now we are going to change some Sphinx settings in order to generate our static HTML files. Change your conf.py to the following:

import os
import sys

sys.path.insert(0, os.path.abspath(".."))

# -- Project information -----------------------------------------------------

project = "demo-project-sphinx-doc"
copyright = "2021, sam"
author = "sam"

# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.

extensions = [
    "sphinx.ext.autodoc",
    "sphinx.ext.autosummary",
]

autosummary_generate = True  # Turn on sphinx.ext.autosummary
# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = []

# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages.  See the documentation for
# a list of builtin themes.
#
html_theme = "alabaster"

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]


We are adding some extensions for automatic generation of static html files. Also we give our template path and make sure it knows where to find the src directory.

Now go into the index.rst and replace it’s contents with the following:

Welcome to demo-project-sphinx-doc's documentation!
===================================================

.. autosummary::
   :toctree: _autosummary
   :template: custom-module-template.rst
   :recursive:
   
.. toctree::
   :maxdepth: 2
   :caption: Contents:



Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

The :recursive: will make sure that we can have nested structure in our src folder, and it will automatically discover it. For each module, it then summarises every attribute, function, class and exception in that module.

Now we need templates in order to parse our data from autosummary. Add 2 files to the _templates folder:

custom-module-template.rst

 fullname 

.. automodule::  fullname 
  
   % block attributes %
   % if attributes %
   .. rubric:: Module Attributes

   .. autosummary::
      :toctree:                                          
   % for item in attributes %
       item 
   %- endfor %
   % endif %
   % endblock %

   % block functions %
   % if functions %
   .. rubric::  _('Functions') 

   .. autosummary::
      :toctree:                                         
   % for item in functions %
       item 
   %- endfor %
   % endif %
   % endblock %

   % block classes %
   % if classes %
   .. rubric::  _('Classes') 

   .. autosummary::
      :toctree:                                         
      :template: custom-class-template.rst             
   % for item in classes %
       item 
   %- endfor %
   % endif %
   % endblock %

   % block exceptions %
   % if exceptions %
   .. rubric::  _('Exceptions') 

   .. autosummary::
      :toctree:                                         
   % for item in exceptions %
       item 
   %- endfor %
   % endif %
   % endblock %

% block modules %
% if modules %
.. rubric:: Modules

.. autosummary::
   :toctree:
   :template: custom-module-template.rst               
   :recursive:
% for item in modules %
    item 
%- endfor %
% endif %
% endblock %

custom-class-template.rst

 fullname 

.. currentmodule::  module 

.. autoclass::  objname 
   :members:                                    
   :show-inheritance:                          
   :inherited-members:                          

   % block methods %
   .. automethod:: __init__

   % if methods %
   .. rubric::  _('Methods') 

   .. autosummary::
   % for item in methods %
      ~ name . item 
   %- endfor %
   % endif %
   % endblock %

   % block attributes %
   % if attributes %
   .. rubric::  _('Attributes') 

   .. autosummary::
   % for item in attributes %
      ~ name . item 
   %- endfor %
   % endif %
   % endblock %

Now we can use the command make clean html to generate our documentation. If you go into build/html/ and open index.html you will see the following:

Figure 3

Now we want to host our static HTML in an Azure Webapp

In our own environment we deploy everything using CI/CD. We deploy resources using Terraform and pipelines.

Because it’s a blog, I will show you how to host documentation while provisioning everything by hand.

What do we need to provision:

  • Storage Account
    • StorageV2 (general purpose v2)
  • Linux App Service
    • A cheap tier to test is B1
  • Web App
    • Docker Container running on Linux
Figure 4 Architecture

I created a video on how to actually do this. Please view the video below.

Video on how to deploy and configure the Azure resources

Thanks for reading my blog. Leave a comment if you have any questions.

Post Author: BackSpin Chief Editor

We are the editorial staff at BackSpin Records. We love music, technology, and other interesting things!

Leave a Reply

Your email address will not be published. Required fields are marked *