# Feature extraction class version 2: Adding choppiness index and x indicator

In this post, I’m going to talk about the new version of our feature extraction class. You may have a look at the old version by browsing the following link:

In the old version, the features were ‘rsi’,’bandwidth’,’percent-b’, ‘cv’ and the rate of change values such as, ‘roc-1’, ‘roc-2’, …, ‘roc-n’. There are many other useful indicators in the literature. For instance, the choppiness index is commonly used for trading markets especially when the market is ‘choppy’. Choppy means the market is in a state which is not trending. It was developed by the Australlian trader Bill Dreiss to determine whether the market is trending or not. It is a standard indicator which varies between 0 and 100. In our case, it will be in the range 0 and 1. The reason that I choose this indicator is because the Bitcoin markets are highly choppy. The choppiness index can be calculated as follows:

``````chop = Log10( Sum(Atr(1), n) / ( Highest_High(n) - Lowest_low(n) ) ) / Log10(n)

n: Period defined by the user
Atr(1): 1 Period of mean true range
Sum(Atr(1), n): Sum of Atr values over past n time intervals
Highest_High(n): Maximum of high values over past n time intervals
Lowest_low(n): Minimum of low values over past n time intervals``````

The second indicator that I want to mention is the following ratio:

``x = (2 * close - open)/close``

I’m not sure whether this indicator exists in the literature, however, I believe that it is a useful one. I’m going to call it as ‘x’. Now, it is time to construct our new feature extraction class. In your working directory, create a file named ‘binanceFeatures.py’, copy and paste the following codes:

``````import pandas as pd
import numpy as np

class FeatureExtractor():
def __init__(self, input_file_name=None, output_file_name=None):
self.input_file_name = input_file_name
self.output_file_name = output_file_name
self.out_list = ['rsi','bandwidth','percent-b', 'cv','x','chop']
self.period = 20
self.chop_period = 20
self.rsi_period = 14
self.period_list = list(range(1,25))

def standardize(self,row):
return (row - row) / (row - row)

def extract(self):

#typical price (tp) calculation
df['tp'] = df[['close','high','low']].apply(lambda x: (x+x+x)/3,axis=1)

#rate of change (roc) calculation for several periods
for i, per in enumerate(self.period_list):
df['roc-'+str(i)] = df['tp'].pct_change(per)
df['roc-'+str(i)] = df['roc-'+str(i)].interpolate(method='linear').bfill()

#Bollinger bands bandwidth and percent-b calculations
df['std'] = df['tp'].rolling(self.period).std()
df['min'] = df['tp'].rolling(self.period).min()
df['max'] = df['tp'].rolling(self.period).max()

df['std'] = df['std'].interpolate(method='linear').bfill()
df['min'] = df['min'].interpolate(method='linear').bfill()
df['max'] = df['max'].interpolate(method='linear').bfill()

df['lower'] = df[['ewm-tp','std']].apply(lambda x: x - 2 * x,axis=1)
df['upper'] = df[['ewm-tp','std']].apply(lambda x: x + 2 * x,axis=1)

df['bandwidth'] = df[['ewm-tp','lower','upper']].apply(lambda x: (x-x)/x,axis=1)
df['percent-b'] = df[['tp','lower','upper']].apply(lambda x: self.standardize(x),axis=1)

# coefficient of variation (cv) calculation
df['cv'] = df[['ewm-tp','std']].apply(lambda x: x/x,axis=1)

#average true range calculation
df['close-p'] =  df['close'].shift(1)
df['close-p'] = df['close-p'].interpolate(method='linear').bfill()

df['tr'] = df[['close-p','high','low']].apply(lambda x: max([(x-x),abs(x-x),abs(x-x)]),

axis=1)
df['atr-sum'] = df['atr'].rolling(self.chop_period).sum()
df['atr-sum'] = df['atr-sum'].interpolate(method='linear').bfill()

df['max-high'] = df['high'].rolling(self.chop_period).max()
df['min-low'] = df['low'].rolling(self.chop_period).min()

df['max-high'] = df['max-high'].interpolate(method='linear').bfill()
df['min-low'] = df['min-low'].interpolate(method='linear').bfill()

#choppiness index calculation
df['chop'] = df[['atr-sum','max-high','min-low']].apply(lambda x: np.log10(x/(x-x))/np.log10

(self.chop_period), axis=1)
df['chop'] = df['chop'].interpolate(method='linear').bfill()

#x indicator calculation
df['x'] = df[['close','open']].apply(lambda x: (2*x - x)/x,axis=1)

#rsi calculation
df['diff-c'] = df[['close']].diff()
df['diff-c'] = df['diff-c'].interpolate(method='linear').bfill()
df['up'] = df[['diff-c']].apply(lambda x: x if x > 0 else 0, axis = 1)
df['down'] = df[['diff-c']].apply(lambda x: -x if x < 0 else 0, axis = 1)
eps = 1e-10
df['rsi'] = df[['roll-up','roll-down']].apply(lambda x: 1.0 - 1.0/(1.0 + x/(x+eps)),axis=1)

df[self.get_column_names()].to_csv(self.output_file_name)

def get_feature_names(self):
out = self.out_list.copy()
for i,j in enumerate(self.period_list):
out.append('roc-'+str(i))
return out

def get_column_names(self):
b = self.get_feature_names()
a = ['close']
return a + b

def main():

input_file = 'ltcusdt-1hour.csv'
output_file = 'ltcusdt-1hour-out.csv'
fe = FeatureExtractor(input_file, output_file)
fe.extract()

if __name__ == '__main__':
main()``````

1. shoptrimun says: