Feature extraction for algorithmic trading

In this post, I’m going to show how to extract useful features from the raw trading data. First, you need to obtain the raw trading data. For this, you may have a look at my previous post namely ‘How to get trading data from binance‘. Let us describe what I mean by the useful features. If you have a look at the raw data, you will see some price values such as, ‘open’, ‘high’, ‘low’, ‘close’ as well as some volume information and number of trades. Those data are not standardized and descriminative. For instance, let us say the BTC close price is 9000$. When you feed this value into your estimator or predictor, it has to adjust its parameters according to this price. However, when you feed the price of another coin such as XRP, this time your predictor will not work because its close price is 0.3$. You may say that we can normalize the price information by dividing it by a maximum value. Well, when the price of the coin exceeds the maximum value, your price will not be standard anymore! Then, what are the standardized good features? Financial traders most frequently employ Bollinger Bands which are a type of statistical chart describing the prices and volatility of a financial instrument. There are three components of Bollinger Bands. The first one is the MA component which is simply the N-period moving average of the typical price. The second one is the upper band which is MA plus K times N-period standard deviation of the typical price. Finally, the last one is the lower band which is MA minus K times N-period standard deviation of the typical price. The typical values for the N and K are 20 and 2, respectively. In summary;

TP = (high + low + close)/3 # Typical price
MA = SMA(TP, N) #N-Period simple moving average
UPPER = MA + K*STD(TP,N) # STD stands for standart deviation

When the price touches the lower band some traders buy and when it exceeds the upper band some of the traders sell. Notice that those values are still not standard. In 2010, Bollinger introduced new standard indicators namely percent b (%b) and bandwidth which are based on Bollinger Bands. The following formulae summarizes these quantities:


Now, let us implement those formulae in python and extract useful features for algorithmic trading. First, import the libraries that we need.

import numpy as np 
import pandas as pd 

Then, read the raw data that we obtained as described in the post ‘How to get trading data from binance‘.

df = pd.read_csv('ltcusdt-1hour.csv')

Now, add tp (typical price) column to our dataframe df, calculate the N-period standard deviation, and the N-period moving average. Note that here we use ewm (exponential moving average) instead of sma.

N = 20
df['tp'] = df[['close','high','low']].apply(lambda x: (x[0]+x[1]+x[2])/3,axis=1)
df['std'] = df['tp'].rolling(N).std()
df['ewm-tp'] = df['tp'].ewm(span=N, min_periods=0,adjust=False,ignore_na=False).mean()

It is time to calculate the Bollinger Bands:

K = 2
df['lower'] = df[['ewm-tp','std']].apply(lambda x: x[0] - K * x[1],axis=1)
df['upper'] = df[['ewm-tp','std']].apply(lambda x: x[0] + K * x[1],axis=1)

And standardized values are calculated as

df['percent-b'] = df[['tp','lower','upper']].apply(lambda x: (x[0]-x[1])/(x[2]-x[1]),axis=1)
df['bandwidth'] = df[['ewm-tp','lower','upper']].apply(lambda x: (x[2]-x[1])/x[0],axis=1)

Finally, we store the useful features into a new csv file.

df2 = df[['percent-b','bandwidth']]

In this post, we extracted only two features. We can extend our useful features as many as we want. Rate of change information is other useful feature for automated trading. That’s all for this post. Enjoy your trading!

Sharing is caring!

Leave a Reply

Your email address will not be published. Required fields are marked *