03 Jun 2022

PostgreSQL Features

Trigger

A user defined function that fires automatically when an event (insert, update, delete, truncate) occurs.

Transaction

A single unit of work that consists of one or more operations.

→ Ensures operations are completed, otherwise transaction is aborted

BEGIN TRANSACTION; / or BEGIN;
CREATE TABLE test(n int)
INSERT INTO test values(1)
END TRANSACTION; / or COMMIT;

ROLLBACK - a command that reverts changes, run before COMMIT

Streaming Replication

Allows information on the primary server to be transferred to the standby server in real time.

11 May 2022

SQL Clauses Cheatsheet

LIKE

% represents 0, 1 or more characters

_ single character

SELECT * FROM table1
WHERE column1 LIKE 'A%'

GROUP BY

Group rows that have identical data

name salary
James 20
James 10
Paul 5
SELECT name, sum(salary)
FROM company1
GROUP BY name

Result:

name sum(salary)
James 30
Paul 5

WITH

WITH some_name AS (SELECT id, name)
SELECT * FROM table1

some_name can be an existing table name (will override the table)

HAVING

Places conditions on groups created by the GROUP BY clause.

name salary
Paulina 50
Janet 10
Paulina 30
SELECT name FROM table1
GROUP BY name
HAVING count(name) > 1

Result:

name
Paulina

LIMIT

SELECT * FROM table1
LIMIT 20 OFFSET 3

20 Apr 2022

How to Choose a Database

→ What type of data are you analyzing?

  • Fits nicely into rows and columns → SQL
  • Needs flexibility → NoSQL

→ How much data will you be working with?

  • A lot of data → NoSQL
  • Not a lot of data → SQL

→ Intensive read/write operations? → SQL

What kind of resources do you have for setup and maintenance?

  • Small engineering team → SQL
    • SQL databases take less time to manage
    • A lot of developers already know how to work with SQL databases
  • Big engineering team → will handle NoSQL better
    • NoSQL requires more programming because it needs to deal with non standard data structures
    • Team might have to learn other programming languages

Do you need real-time data? → SQL

08 Apr 2022

Kernel Convolution

What they're used for

  • Blurs and filters technique (eg. Gaussian blur)
  • Edge detection

Start with a picture

What about the edge?

  • We can ignore it,
  • Wrap it around,
  • Duplicate it,
  • And many other options.

Gaussian Blur

  • Based on values from the normal distribution,
  • Prioritizes values in the middle.

27 Jan 2022

Scikit-learn fit() and transform() methods

  • The transform method is used to apply a series of transformations to a dataset
  • The fit method is used to learn the parameters or statistics (means, variances, scaling factors, etc.) of a data transformation based on the input data. The specific behavior and purpose of fit may vary depending on the estimator or transformer being used.

Example 1: Feature Scaling using StandardScaler

from sklearn.preprocessing import StandardScaler

# Create a StandardScaler object
scaler = StandardScaler()

# Fit the scaler on the training data
scaler.fit(X_train)

# Apply the learned scaling to transform the training data
X_train_scaled = scaler.transform(X_train)

# Apply the same scaling to transform the test data
X_test_scaled = scaler.transform(X_test)

In this example, fit is used to learn the mean and standard deviation from the training data (X_train). Then, the transform method is used to apply the same scaling to both the training data and the test data (X_test).

Example 2: Dimensionality Reduction using PCA

from sklearn.decomposition import PCA

# Create a PCA object
pca = PCA(n_components=2)

# Fit the PCA model on the training data and transform it
X_train_pca = pca.fit_transform(X_train)

# Apply the learned PCA transformation to the test data
X_test_pca = pca.transform(X_test)

In this example, the fit method is used to compute the mean and standard deviation of the data (X) using the StandardScaler transformer. These statistics are necessary for standardizing the data by subtracting the mean and dividing by the standard deviation. After calling fit, the scaler object contains the computed statistics.

Example 3: Learning Class Labels for LabelEncoder

from sklearn.preprocessing import LabelEncoder

# Create a LabelEncoder object
encoder = LabelEncoder()

# Fit the encoder on the target variable to learn the class labels
encoder.fit(y)

In this example, the fit method is used to learn the class labels from the target variable (y) using the LabelEncoder transformer. The encoder analyzes the target variable and assigns a unique integer label to each class. After calling fit, the encoder object contains the learned class labels.

07 Jan 2022

PostgreSQL Datatypes

Enum

CREATE TYPE month AS ENUM ["Jan", "Feb", "Mar", ...]

Composite Types

→ List of field names with data types

CREATE TYPE inventory_item AS (
    name text,
    price numeric
);

Usage

→ Creating a table

CREATE TABLE on_hand(
    item inventory_item,
    count integer
);

→ Adding values

INSERT INTO on_hand
VALUES (ROW(’box’, 2.0), 3)

→ Accessing values

SELECT (on_hand.item).name 
FROM on_hand 
WHERE (on_hand.item).price > 10

17 Nov 2021

Garbage collection in Python

Problems with memory management:

  • forgetting to free the memory,
  • freeing it up too soon.

A popular method for automatic memory management is reference counting: runtime keeps track of all the references to an object, when an objects has 0 references, it can be deleted.

The cost of automatic memory management

  • The program needs additional memory and computation to track references
  • Many languages use “stop-the-world” process, where all execution stops while the garbage collector does its thing.

Python implementation (CPython)

CPython uses:

  • reference counting, can’t be disabled
    • pros: can immediately remove an object when it has no references
    • cons: inability to detect cyclic references
  • generational garbage collection, ***************************can be disabled***************************

Viewing reference counts

To increase the reference count we can:

  • assign an object to a variable
  • add object to a data structure (e.g. a list)
  • pass the object as an argument to a function
import sys
a = 'some-string'
sys.getrefcount(a)

# Output: 2

Generational garbage collection

What happens when you add an object to itself?

class MyClass(object):
    pass

a = MyClass()
a.obj = a
del a

We’ve deleted the instance, so it’s not longer accessible, but Python didn’t destroy it from memory. It doesn’t have a reference count zero because it has a reference to itself. This is called the reference cycle.

Terminology

  1. Generation:
    1. A new object starts its life in the first generation of the garbage collector.
    2. If Python runs GC and the object survives, it moves to the second generation.
    3. Python GC has three generations.
  2. Threshold:
    1. For each generation the GC has a threshold number of objects. If the number of objects exceeds threshold, GC will trigger collection.

Generational garbage collection’s behaviour can be changed, the thresholds can be adjusted, collection can be manually triggered or it can be disabled altogether.

import gc

gc.get_count()

# Output: (596, 2, 1)

596 in 1st generation, 2 in second and 1 in third.

gc.collect()
gc.get_count()

# Output: (18, 0, 0)

General rule: Don’t change the GC’s behaviour

  • Python’s key benefit is that it enables developer productivity
  • If you find that your GC is slowing you down you might want to invest in more the power of your execution environment instead of playing around with your GC
  • Python doesn’t generally release memory back to the OS so you might not get the results you want with manual GC

03 Nov 2021

::before and ::after

In CSS, ::before and ::after are pseudo-elements that allow you to insert content before or after an element's actual content. These pseudo-elements are often used to add decorative or supplementary elements to an element without modifying the HTML structure.

One simple example of using ::before pseudo-elements is to provide quotation marks. Here we use both ::before and ::after to insert quotation characters.

<q>Some quotes</q>, he said, <q>are better than none.</q>
q::before {
  content: "«";
  color: blue;
}

q::after {
  content: "»";
  color: red;
}

Single-colon or double-colon?

  • double-colon selectors are pseudo-elements
  • single-colon selectors are pseudo-selectors

::before and ::after are pseudo-elements, so it should use a double-colon.

:: select a part of something for which there is no HTML element for (no span, no div).

: selects something that already exists, like the :nth-child(2)

So you should use ::before, but most modern browsers support it both ways, since a lot of sites use :before.

20 Oct 2021

Types of NoSQL Databases

  • Document-oriented (aka document store): pair each key with a complex document structure
  • Key-value store (dictionary)
  • Wide column store: have tables, rows and columns like SQL, but they are not as structured/restricted (rows can vary within a table)
  • Graph store: contain nodes, edges, and other graph structures

02 Sep 2021

Heroku Cheat Sheet

Database

  • The database doesn’t create itself, you have to install the PostgreSQL add-on.

Commands

  • Run run SQL query in terminal (local):
heroku pg:psql -c "command" --app "name-app"

02 Jul 2021

Loss Function

→ The loss function is like a scorekeeper that helps the model learn from its mistakes and improve its performance over time. → It measures how close the model's predictions are to the correct answers. It calculates a value that represents the difference between the predicted answers and the actual answers. This value is the "loss." → As the model gets better and better at its task, the loss decreases. This means the model's predictions become more accurate and reliable.

Example: Mean Squared Error (MSE)

The MSE loss measures the average squared difference between the predicted and actual values. It's often used in regression problems.

def mean_squared_error(y_true, y_pred):
    return np.mean((y_true - y_pred) ** 2)

29 Nov 2020

Support Vector Machines

→ A Support Vector Machine algorithm’s objective is to find a hyperplane (or in the case of 2D a line) that classifies data points. SVMs can be used for both regression and classification.

Support vectors are data points that are closer to the hyperplane and influence the position and orientation of the hyperplane. Using these support vectors, we maximize the margin of the classifier. Deleting the support vectors will change the position of the hyperplane. These are the points that help us build our SVM.

02 Nov 2020

REST API

REST (Representational State Transfer) API is a set of rules and conventions for building and interacting with web services.

Best Practices

Use JSON for Sending and Receiving Data

  1. In the past XML or HTML was used. With XML it was a bit of a hassle to decode and encode data.
  2. To ensure the client interprets JSON correctly you should set the Content-Type to application/json (many frameworks do this automatically).

Use Nouns Instead of Verbs

HTTP methods already use verbs like “GET” and “POST”. An example endpoint should look like this:

https://website.com/posts

Not like this:

https://website.com/getPosts

Name Collections with Plural Nouns

Do this:

https://website.com/post/1234

Not this:

https://website.com/posts/1234

Use Status Codes for Errors

Here’s a cheat sheet.

Use Nesting on Endpoints to Show Relationships

https://website.com/posts/postId/comments

Use Filtering, Sorting and Pagination

Retrieving data from a database can be slow. Use filtering, sorting and pagination to only retrieve the data that you need.

https://website.com/posts?tags=cheatsheet

Use SSL

HTTPS should be used instead of HTTP for all requests and responses.

Clear Versioning

REST APIs should have different versions so that you don’t force your users to migrate to new versions.

Documentation

The documentation should contain: → relevant endpoints → example requests → implementation in several programming languages → messages listed for different errors with their status codes

15 Sep 2020

SQL Constraints

Constraints

UNIQUE: prevents 2 records from identical values in the same column

CREATE TABLE Table1 (
    id INT PRIMARY KEY NOT NULL,
    age INT NOT NULL UNIQUE
) 

PRIMARY KEY:

  • Identifies a record
  • One per table
  • PKs are unique IDs
  • Can consist of multiple columns ← COMPOSITE KEY

Note: PKs can be null in SQLite (coding oversight)

CREATE TABLE Table1 (
    name INT NOT NULL,
    id INT NOT NULL,
    PRIMARY KEY(name, id)
);

FOREIGN KEY:

  • maintains referential integrity between tables
  • aka REFERENCING KEY

Employee table:

name id (PK)
Nancy 345
Lana 132
CREATE TABLE Department(
    id INT PRIMARY KEY NOT NULL,
    dept CHAR(50) NOT NULL,
    employee_id INT REFERENCES Employee(id)
);

EXCLUDE:

  • evaluates a comparison of two rows in the table,
  • kinda like a generalized UNIQUE constraint: instead of "no two rows can be equal", you can say things like "no two rows overlap", or even "no two rows can be different".

DISTINCT:

SELECT name FROM company1;
name
Paula
Teddy
Allen
Teddy
Allen
Allen
SELECT DISTINCT name FROM company1;
name
Paula
Teddy
Allen

Drop Constraints

ALTER TABLE table1 
DROP CONSTRAINT PK_Person; 

09 Sep 2020

Child Combinator

A child combinator is used to select only the immediate child elements of a parent element, ignoring any nested elements that are further down the HTML tree.

p > a:hover {
    color: red;
}

Do I need ‘>’ ?

No. The > symbol is used to select only the direct child a elements of p elements, ignoring any nested a elements that might be inside other elements inside the p element.

However, if you remove the > symbol from the CSS selector, like this:

p a:hover {
    color: red;
}

Then the CSS selector will select all a elements that are descendants of p elements, including any nested a elements that might be inside other elements inside the p element.

02 Jul 2020

One-Hot Encoding

What is categorical data?

→ Variables that contain label values.

→ Categorical variables are also called “nominal”

→ eg. A “pet” variable with the values “dog” and “cat”

Categorical data must be converted to a numerical form. We can do it by integer encoding or one-hot encoding.

Integer Encoding

Also known as label encoding.

red 1
blue 2
orange 3

The problem is that integer values have a natural ordered relationship between each other and machine learning algorithms can accidentally learn that order in applications where there isn’t one (like colours).

One-Hot Encoding

For categorical variables where no ordinal relationship exists it’s better to use one-hot encoding.

red green blue
1 0 0
0 1 0
0 0 1

26 Jun 2020

Model Parameters vs Model Hyperparameters

Model Parameters

Configuration variables whose value can be estimated from the data:

  • saved as part of the learned model
  • not set manually

Some examples of model parameters:

  • weights in a neural network
  • support vectors in a support vector machine
  • coefficients in linear regression

Model Hyperparameters

Configuration variables that are external to the model and cannot be estimated from the data. Some examples of model hyperparameters:

  • the learning rate for training a neural network
  • C and sigma for support vector machines
  • k in k-nearest neighbours

29 Apr 2020

RDBMS vs OODBMS

RDBMS - Relational Database Management System

  • eg. MS SQL Server, Oracle, MySQL
  • data represented by a collection of inter-related tables.
  • Stores entities.
  • Only stores data

OODBMS - Object Oriented Database Management System

  • eg. Versant Object Database, Objectivity/DB)
  • data represented in the form of objects (as in object oriented programming). OODB implements OOP concepts like classes, object identity, polymorphism, ****encapsulation, inheritance.
  • Stores objects.
  • Stores data and methods to use it.

17 Feb 2020

SOLID

SOLID principles of programming are a set of five design principles that help developers create code that is easy to maintain and extend over time.

Single Responsibility Principle

  • every entity should have only a single responsibility
  • a class should have only one reason to change

Open Closed Principle

  • the code should be open for extending
  • the code should be closed for changing

You should be able to extend the behaviour of a class without modifying its existing code.

For example, instead of modifying the existing code of a class to add new functionality, you can create a new derived class that extends the behaviour of the base class.

Liskov substitution principle

  • Objects of a superclass should be able to be replaced with objects of its subclasses without affecting the correctness of the program.
  • Ensures that the inheritance hierarchy’s behaviour of the classes is consistent and predictable.

For example:

  1. Base class Shape
  2. Subclass called Rectangle.

We should be able to replace an instance of Shape with an instance of Rectangle without affecting the correctness of the program. Any code that works with a Shape should also work correctly with a Rectangle.

Interface Segregation Principle

  • Clients should not be forced to depend on interfaces they do not use.
  • Many specific interfaces are better than one general interface.

For example, instead of having a single large interface that contains many methods, create smaller interfaces that each client can implement based on their specific requirements.

This helps to prevent clients from being burdened with unnecessary dependencies.

Dependency Inversion Principle

High-level modules should not directly depend on low-level modules. Instead, both high-level and low-level modules should depend on abstractions (interfaces or abstract classes).

For example, here we have a high-level class called ReportGenerator and a low-level class called DatabaseConnection. Without applying DIP, we have a direct dependency like this:

class ReportGenerator {
    private DatabaseConnection databaseConnection;

    public ReportGenerator() {
        databaseConnection = new DatabaseConnection();
    }
}

The ReportGenerator class directly creates and depends on the DatabaseConnection class. If DatabaseConnection changes it can lead to changes in the ReportGenerator class.

Now, applying DIP, we introduce an abstraction (interface) like DataStore:

interface DataStore {
    // Define methods common to data storage
}

class DatabaseConnection implements DataStore {
    // Implement the DataStore interface
}

class ReportGenerator {
    private DataStore dataStore;

    public ReportGenerator(DataStore dataStore) {
        this.dataStore = dataStore;
    }
}

27 Oct 2019

Exceptions in Python

Use Specific Exceptions

Allows you to differentiate between various errors and deliver accurate error messages, making issue identification and resolution more efficient

try:
    # some code
except FileNotFoundError:
    # handle file not found
except IndexError
    # handle index error

Implement Error Logging

Utilizing the logging module you can capture exceptions along with vital information like timestamps, error details, stack traces

import logging

logging.basicConfig(filename'error.log', level=logging.ERROR)

try:
    # some code
except Exception as e:
    logging.error('An error occurred: %s', str(e))

Custom Exception Classes

Better code readability, improved error handling and modular project development.

Use finally for Cleanup Tasks

Ensures that the code will execute regardless of whether an exception occurred or not.

try:
    # code that may raise an exception
except Exception as e:
    # handle it
else: 
    # code that executes if no exceptions are raised
finally: 
    # code that will always execute