Collecting Intraday Stock Data With Python

Intraday data is especially valuable to algorithmic traders. In general, the more granular the data, the better. And for this purpose, the more data, the better. The problem is that retail investors don’t have an access to intraday data for a minimal price. Even Bloomberg only supports up to 1-year of intraday data for a given asset. This prompted me to write a script to gather intraday data every day for every company within the S&P with the intention of backtesting a trading strategy in the future. A backtest is seeing how a developed strategy would have faired on a dollar basis given historical data.

The data was obtained via Google Finance. This is one of the last quality sources of intraday data available for free the URL string is formatted to accept three arguments: the number of seconds, the number of days, and the ticker symbol of course. Setting the number of seconds to 60 should do the trick. The maximum amount of supported days in the argument is 14. This is because storing intraday data since the beginning of time for 60,000+ symbols in a database is quite impractical.

Take a look at the data we will be obtaining. This is in the format of a CSV file when I requested it from my browser by entering the URL. Note the arrow, we will need to work on that variation of a Unix timestamp.

We will need a few packages in order for this script to run properly.

pandas: to read and store the data
numpy: used in conjunction with pandas to read the data coming from Google Finance
pymysql: to store the data
os: creating and referring to files and folders
schedule: to make this script run every weekday at 4:30 p.m.
time: used in conjunction with schedule

The function name is job because we are scheduling this to run every weekday. The arguments again are the number of seconds and number of days desired. Here, we will read a CSV file of tickers. Note these can change over time. You can write a function to pull and store tickers from a website that will continually update them over time like Wikipedia. We will output the tickers to a list and format them to Google Finance’s notation for stocks in its database (Example: BRK/B -> BRK.B). We will then read the data straight from pandas given the URL string with arguments passed in.

Next, we have to clean up our data a bit. The date comes in the form of a hex timestamp. Typically, with data like this, we see Unix time stamps or epoch stamps being used (every second since 1/1/1970). We have to convert this timestamp to a readable date. If we wanted to, we could keep the timestamp and build from there, however, I choose to convert it to a DateTime object in order to graph the data in later scripts (compatible format with Matplotlib).

We will convert the timestamp by first dropping a and using the DateTime module to convert it to a DateTime object. We will create a list of DateTime objects that will equate to the length of the DateTime in terms of rows. Each point will correspond to the minute which the price was recorded. We will also populate a list of the same symbol which will have the same length as the data frame previously obtained. We will turn the two lists we created into series’ and combine the two with the open, high, low, close, and volume columns from the previously obtained data. This will constitute our new data frame.

Next, we will store the data temporarily on our disk space. This is where the CSV module comes in. We will create a main folder for the all of the data, and create a subfolder for each stock. This will not be appended each day, rather overwritten when the data because the data will be inserted into a database.

Finally, we can input this data into a created database by using PyMySQL. We will do this by iterating through the data frame that will be loaded with the files we created in the previous function. This new function will automatically run when the storage process on the disk space is finished.

Finally, we want to run this script automatically with the schedule module. This will conclude our script. The next step is to make use of the data. The first thing we will do is graph it. That will be covered in another post.

About the author

programmingforfinance

Hi, I'm Frank. I have a passion for coding and extend it primarily within the realm of Finance.

View all posts

4 Comments

  • Frank,

    I love your content. Great stuff. It really is a treasure-trove.
    I will be digesting every last bit of it.

    When it comes to historical data, would you know a way or where one could get historical data on mutual funds? Right now I am getting data from a place called “Vikingen” which is a Swedish trading system development program, however it is not the most efficient way, limited number of funds and not sure about the quality of the data.

    Any way, thank you again for your content. I am diving into Monte Carlo Simulation with Python so your content is much appreciated.

    Have a great day,
    Oliver

  • “””
    Retrieve intraday stock data from Google Finance.
    “””

    import csv
    import datetime
    import re

    import pandas as pd
    import requests
    import codecs

    def get_google_finance_intraday(ticker, period=60, days=1):
    “””
    Retrieve intraday stock data from Google Finance.
    Parameters
    ———-
    ticker : str
    Company ticker symbol.
    period : int
    Interval between stock values in seconds.
    days : int
    Number of days of data to retrieve.
    Returns
    ——-
    df : pandas.DataFrame
    DataFrame containing the opening price, high price, low price,
    closing price, and volume. The index contains the times associated with
    the retrieved price values.
    “””

    uri = ‘http://finance.google.com/finance/getprices’ \
    ‘?q={ticker}&i={period}&p={days}d&f=d,o,h,l,c,v’.format(ticker=ticker, period=period, days=days)

    #http://finance.google.com/finance/getprices?q=RAIN&i=60&p=1d&f=d,o,h,l,c,v

    page = requests.get(uri)
    reader = csv.reader(codecs.iterdecode(page.content.splitlines(), “utf-8”))
    columns = [‘CLOSE’, ‘HIGH’, ‘LOW’, ‘OPEN’, ‘VOLUME’]
    rows = []
    times = []

    for row in reader:
    if re.match(‘^[a\d]’, row[0]):
    if row[0].startswith(‘a’):
    start = datetime.datetime.fromtimestamp(int(row[0][1:]))
    times.append(start)
    else:
    times.append(start+datetime.timedelta(seconds=period*int(row[0])))
    rows.append(map(float, row[1:]))
    if len(rows):
    return pd.DataFrame(rows, index=pd.DatetimeIndex(times, name=’Date’),
    columns=columns)
    else:
    return pd.DataFrame(rows, index=pd.DatetimeIndex(times, name=’Date’))

    #get_google_finance_intraday(‘RAIN’, 60, 1)

    print (get_google_finance_intraday(‘RAIN’, 60, 1))

  • hi frank,

    how could i change this program to give me just the first 20 min of trading data each day? I want the high and low from 9:30 to 9:50. i’ve been using pandas_datareader but it keeps giving me the full day. i’ve attempted changing start and end but it goes by full day either way.

    thanks,
    alejandro

Leave a Reply

Your email address will not be published. Required fields are marked *