Database in Machine Learning (1): a Summary

ifeelfree
3 min readDec 31, 2020

--

I write this blog to keep track of my database experience when working on machine learning projects.

Part 1: Database introduction

# What is a database?

It’s a structured system to put your data in that imposes rules upon that data, and the rules are yours, because the importance of these problems changes based on your needs. Maybe your problem is the size, while someone else has a smaller amount of data where the sensitivity is a high concern.

# What is a database management system?

We often mistakenly say our database is Oracle, MySQL, SQL Server, MongoDB. But, they aren’t databases, they are database management systems (DBMS). The DBMS is the software that would be installed on your personal computer or on a server, then you would use it to manage one or more database

# What are import database concepts?

- Database

Database is a collection, or a set of tables. Each table has a formalized repeating list of data about one specific information. For example a table for customers, students, orders, products and so on.Visually, it’s often shown like a spreadsheet.

- Tables

The table is the most basic building of a database. It’s the place where you will put your data, define their data type, and also their relationship with the other tables. It consists of rows, and columns.

- Rows & Columns

In a nutshell, columns define what’s the data that should be in the table, while

the rows hold the actual values that you are going to retrieve, insert, update,

and delete.

- A primary key is a column of unique values for each row.

You may have more than one student with the same name, but you can’t have more than one student wit the same primary key. And if you tried to insert a duplicate value, this will be disallowed by the DBMS.

Part 2: SQL vs NoSQL

NoSQL database were developed to provide an environment that supported change without requiring radical re-engineering of the underlying data model, high volumes

of data, and an architecture that was easy to scale.

The differences between SQL and NoSQL are as follows:

  • SQL is relational; NoSQL is non-relational.
  • SQL uses query language with a predefined schema; NoSQL is for unstructured data.
  • SQL is table based; NoSQL is key-value, column-oriented, document-oriented, Graph, Multi-model storing.
  • SQL is ACID (atomicity, consistency, isolation, durability) compliant; NoSQL is difficult to comply with ACID criteria.

A ranking of popular SQL/NoSQL database can be found here.

Structured data is a data model that organizes elements of data and standardizes how they relate to one another. Usually, this type of data is in the form of tables, JSON, etc. Whereas the unstructured data is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. This type of data is usually in the form of free text, images, signals, etc.

Part 3: SQL Database

SQL Knowledge

  • SQL databases store data in tables using primary and foreign keys.
  • Structured Query Language (SQL) is a programming language used to manage relational databases.
  • SQL database viewer sqlitebrowser
  • Python can communicate with SQL database in several ways. For example, we can use SQLITE3. A thorough guide to SQLite database operations in Python provides a nice tutorial on how to use sqlite3 Python library to invoke SQL commands. In sqlite3_demo.ipynb, we demonstrate how to use this library.
  • We can also use Pandas to manage SQL.

Part 4: NoSQL Database

4.1 MogonDB

Part 5: Reference

--

--

No responses yet