Part 1. Rules of data scientist, data engineer and machine learning engineer

  • Data scientist: focus on data collection, interpretation and processing (feature engineering, model building).
  • Data engineer: focus on the infrastructure and workflow
  • Machine learning engineer: build production systems to handle updating models, model versioning, and serving predictions to end users.

There are several ways of calling shell command in Python scripts, and very often I found that when calling shell commands, it will not finish running before it begin to run Python codes next to the shell command.

After many tries, the following codes can call shell command successfully sometimes (but not all the time):

def execute_command(cmd):
process = subprocess.Popen(cmd,
stdout, stderr = process.communicate()


return_code = process.returncode

if return_code < 0:
raise Exception("fail to run "+str(cmd))

An example of calling this command is:

print("install ffmpeg")
execute_command('apt update')
execute_command('apt install ffmpeg')

Here we also use time.sleep to make sure that the command can finish before running next lines.


Part 1: Multiple Plots

Using make_subplots takes the following steps:

(1) generate a figure via make_subplots function

(2) add trace via add_trace function

(3) update_layout is used to reset the image setting

Notes on make_subplots

  • specs is an important parameter in make_subplots , and it will determine the type of plots we want to have, for example, if we want to have pie plot, then we should write specs=[[{'type':'pie'}]]
  • subplot_titles is used to denote the title of each subplots
  • we can also disable shared xaxes title function by calling shared_xaxes=False ; then use fig.update_xaxesto set individual xaxes titles.

Part 2: Basic Plot Element

Part 2.1: Pie


  • pull is used to emphasize…

Part 1: path searching

  • all folders/files (including sub-folders/files) within a directory

if only folders/sub-folders

  • all files/folders in current directory
  • all files/folders in the second current directory
  • search for a certain type of files

(1) global search including sub-directories

my_list = pathlib.Path(my_dir).rglob("*.png")

(2) only search in current directory

my_list = pathlib.Path(my_dir).glob("*.png")

Part 2: file manipulation

  • delete files
 from pathlib import Path 
my_path = Path.cwd() / 'python' / 'libpath' / "a.txt"

assert my_path.exists(), print(my_path)
  • change file name

.with_suffix : change the suffix name

Part 3: path relation

  • add a list of sub-directories
  • change file format name
  • relative path
cur_file_dir = os.path.dirname(__file__)
hydra_setting_path = os.path.relpath(hydra_setting_path, cur_file_dir)

Part 4: path elements

  • stem vs…

Part 1: Immutable object

If two variables refer to immutable objects that have equal values (a == b is True), in practice it rarely matters if they refer to copies or are aliases referring to the same object because the value of an immutable object does not change, with one exception. The exception is immutable collections such as tuples and frozensets: if an immutable collection holds references to mutable items, then its value may actually change when the value of a mutable item changes. In practice, this scenario is not so common. …

Part 1: Introduction

In order to represent data in Python, we have the following options:

  • write a class that organize data
  • use named tuple / dictionary or their enhanced versions (from typing)
  • employ dataclass

We recommend to use dataclass as it provides a lot of possibilities.

However, using dataclass can be problematic from the software engineering perspective:

These are classes that have fields, getting and setting methods for fields, and nothing else.Such classes are dumb data holders and are often being manipulated in far too much detail by other classes.

— from Refactoring of Martin Fowler and Kent Beck

The main idea of…

Part 1: Introduction

In this blog I keep track of what I have configured my new Ubuntu machine for machine learning projects.

Part 2: System

(1) Operating system

Ubuntu 18.04 LTS

(2) Settings from previous Ubuntu machine

  • .ssh
  • .bashrc
  • .hgrc

(3) GPU

a) check the GPU information by using

`sudo lshw -c video`, and you will know what kind of GPU you have in your PC.

b) `sudo ubuntu-drivers devices` find the GPU driver version that is recommended

c) install the recommended version `sudo apt install nvidia-driver-version-number` or use `sudo ubuntu-drivers autoinstall`, letting the computer choose the recommended GPU driver version for you. …

Part 1: Introduction

It may happen that sometimes we may need to install Ubuntu packages without Internet connection. Such a scenario happens to me one day when my desktop needs to compile the wireless adapter driver program using make command before it can connect with the Internet. However, the native Ubuntu 18.04 system does not contain build-essential package. So I have to find a way to install it offline. In this article, I record the way how I successfully install this Ubuntu package.

Part 2: Step-by-step Guild-line

Step 1: in a Ubuntu machine that can connect with Internet, download the docker Ubuntu system that you have installed…

Table of Contents

· Part 1: Introduction
· Part 2: Package Installation
· Part 3: Docs Structure
· Part 4: Documentation Website Generation
· Part 4: Mkdocs Enhancement
· Reference

Part 1: Introduction

In the past few years, I have been using Markdown to write technical notes and then I use Mkdocs to organize them. Here is a summary of my experience.

Part 2: Package Installation

Software Installation

Additional Plugins

  • pip install mkdocs-jupyter makes mkdocs supports jupyter-notebook
  • pip install mkdocs-material and pip install mkdocstrings make mkdocs support Python code documentation

Part 3: Docs Structure

File/Folder Structure

├── docs │…

Ma Jianglin

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store