A complete feature extraction class for algorithmic trading

In this post, I’m going to share a complete feature extraction class for algorithmic trading with you. You may have a look at the technical meaning of the features by browsing the following past articles:

After training and testing my deep Q-agent with several financial data, I noticed that some of the features are much more useful and some others are redundant. The most useful features are ‘rsi’,’bandwidth’,’percent-b’, ‘cv’ and ‘roc-1’, ‘roc-2’, …, ‘roc-N’ values. Also, I converted the feature extraction procedure to a python class. There are several methods in the class. When the method ‘extract()’ is called, it reads the raw trading data and extracts the features into a csv file. The method ‘get_feature_names’ gives the feature names and ‘get_column_names’ function gives the ‘close’ column name as well as the feature names. The name of the class is ‘FeatureExtractor’. In your working directory, in order to use this class, create a file named ‘FeatureExtractor.py’ and copy and paste the following codes:

import pandas as pd 

class FeatureExtractor():
    def __init__(self, input_file_name=None, output_file_name=None):
        self.input_file_name = input_file_name
        self.output_file_name = output_file_name

        self.out_list = ['rsi','bandwidth','percent-b', 'cv'] 
        self.period = 30
        self.period_list = list(range(1,30))

    def standardize(self,row):
        return (row[0] - row[1]) / (row[2] - row[1])

    def extract(self):

        df = pd.read_csv(self.input_file_name)
        
        #typical price (tp) calculation
        df['tp'] = df[['close','high','low']].apply(lambda x: (x[0]+x[1]+x[2])/3,axis=1)
     
        #rate of change (roc) calculation for several periods
        for i, per in enumerate(self.period_list):
            df['roc-'+str(i)] = df[['tp']].diff(periods=per)
            df['tp-'+str(i)] = df[['tp']].shift(per)
            df['roc-'+str(i)] = df[['roc-'+str(i),'tp-'+str(i)]].apply(lambda x: x[0]/x[1],axis=1 

)
            df['roc-'+str(i)] = df['roc-'+str(i)].interpolate(method='linear').bfill()

        #Bollinger bands bandwidth and percent-b calculations
        df['std'] = df['tp'].rolling(self.period).std()
        df['min'] = df['tp'].rolling(self.period).min()
        df['max'] = df['tp'].rolling(self.period).max()

        df['std'] = df['std'].interpolate(method='linear').bfill()
        df['min'] = df['min'].interpolate(method='linear').bfill()
        df['max'] = df['max'].interpolate(method='linear').bfill()
     
        df['ewm-tp'] = df['tp'].ewm

(span=self.period,min_periods=0,adjust=False,ignore_na=False).mean()

        df['lower'] = df[['ewm-tp','std']].apply(lambda x: x[0] - 2 * x[1],axis=1)
        df['upper'] = df[['ewm-tp','std']].apply(lambda x: x[0] + 2 * x[1],axis=1)
        
        df['bandwidth'] = df[['ewm-tp','lower','upper']].apply(lambda x: (x[2]-x[1])/x[0],axis=1)
        df['percent-b'] = df[['tp','lower','upper']].apply(lambda x: self.standardize(x),axis=1)
        
        # coefficient of variation (cv) calculation
        df['cv'] = df[['ewm-tp','std']].apply(lambda x: x[1]/x[0],axis=1)
        
        #rsi calculation
        df['diff-c'] = df[['close']].diff()
        df['diff-c'] = df['diff-c'].interpolate(method='linear').bfill()
        df['up'] = df[['diff-c']].apply(lambda x: x[0] if x[0] > 0 else 0, axis = 1)
        df['down'] = df[['diff-c']].apply(lambda x: -x[0] if x[0] < 0 else 0, axis = 1)
        df['roll-up'] = df['up'].ewm(span=14,min_periods=0,adjust=False,ignore_na=False).mean()
        df['roll-down'] = df['down'].ewm

(span=14,min_periods=0,adjust=False,ignore_na=False).mean()
        eps = 1e-10
        df['rsi'] = df[['roll-up','roll-down']].apply(lambda x: 1.0 - 1.0/(1.0 + x[0]/(x

[1]+eps)),axis=1)
        
        df[self.get_column_names()].to_csv(self.output_file_name)


    def get_feature_names(self):
        out = self.out_list.copy()
        for i,j in enumerate(self.period_list):
            out.append('roc-'+str(i))
        return out

    def get_column_names(self):
        b = self.get_feature_names()
        a = ['close']
        return a + b

def main():
    
    input_file = 'ltcusdt-1hour.csv'
    output_file = 'ltcusdt-1hour-out.csv'
    fe = FeatureExtractor(input_file, output_file)
    fe.extract()
    
if __name__ == '__main__':
    main()

That’s all for this post. Enjoy your trading!

Sharing is caring!

Leave a Reply

Your email address will not be published. Required fields are marked *