There are many stocks trading everyday in the US market, e.g. IBM.
During the day, the trading price of IBM changes every minute or even every second. For example, on July 1, 2004 at 10:45:07 am, someone buys 400 shares of IBM from another person at the price of $87.76. (This is real). Then a few seconds later at 10:45:10 am, a buyer buys 500 shares of IBM from a seller at a different price of $87.78.
We call this transaction between a buyer and a seller a trade. In other words, IBM trades at many different prices during the day. And these different trade prices show on the “Tape” published to the world by TV, website etc.
However, behind the scene is that the US market actually consists of more than one exchange. An exchange is a place when buyer and seller make trades. Other than the well-known New York Stock Exchange (NYSE), for example, there is American Exchange (AMEX). IBM could be traded at more than one exchange at the same time. The frequency of trading activity of the same stock at different exchange could be different. The first trade mentioned above happened at PSE (Pacific Stock Exchange) while the second one at NYSE.
In addition to the actual trading prices, IBM also has something called a “Bid” price and an “Ask” price. These two prices could be the same or different from the most recent trading prices at any moment. Besides, trading prices do not often appear when there are no trades but “Bid” and “Ask” price always appear because “Bid” price is the price that someone is interested to buy while “Ask” price is the price that someone is interested to sell. And buyer/seller show their interests all the time. Only under rare situations that there are no interested buyer/seller for a particular stock.
For example, at 10:45:16, IBM has the following Bid/Ask information:
87.79 NYSE x 87.81 PSE 11 x 10
It means the Bid price for IBM at that moment is 87.79 while the Ask price is 87.81. Someone is interested to buy IBM at 87.79 while someone is interested to sell it at 87.81. The buyer and seller could not agree on a mutual price at that moment. Next to 87.79 it says NYSE which means that the Bid price comes from the NYSE exchange while next to 87.81, PSE says that the Ask price comes from PSE exchange. As mentioned, there is more than one exchange in the US market.
Then there shows 11 x 10. This is the “Bid” size and the “Ask” size. Because when the buyer is showing interest to buy at 87.79, he/she also has to mention how many shares of IBM that he/she is interested to buy. In this case, the 11 Bid size means that he/she is interested to buy 1100 shares of IBM at the price of 87.79. Similarly the “10” is the Ask size that the seller is interested to sell 1000 shares of IBM at the price of 87.81.
The Bid/Ask price/exchange/size data tuples is called a quote. The quote could change every minute or even every second. However, these six values do not necessary change at the same time. For example, the next second 10:45:17 shows:
87.79 NYSE x 87.81 PSE 3 x 10
This means that the Bid size decreases from 1100 to 300 while all other information remains the same. Even only the Bid size changes, we consider the quote has changed and so all other unchanged portions of the quote, in this case, Ask price/exchange/size, would still be republished as part of a new quote.
All these actual trading and quote information are recorded by some data vendor and sold to anyone interested.
Now, assume that you want to store all the above data for the thousands of stocks for future analysis. So you subscribe to these data and design and create a database for the data. Certainly you want to create an efficient database which would hold the information in the least amount of storage and allow quick access of the data.
Your tasks start from here:
Please write down the different tables of your database. Please show:
1. the nature of each column (e.g. integer, string, date, double etc) and if that column allows empty value
2. which column(s) are the key for that table.
1. stockname, string, empty value not allowed
2. tradedate, date, empty value not allowed
stockname and tradetime are keys.
Assume that there are 3000 different stocks trading at 10 possible exchanges, please estimate the size of your database if you need to store 20 trading days (1 calendar month) of data. Each trading day has 6 1/2 hours. Assume a trade (price/size/exchange) appears every 5 seconds. And assume that on average the quote changes every 2 seconds.
Assume you have created your database and have stored all the data of the 3000 stocks for 1 month. Please write down the SQL queries for the following questions (each question may require multiple queries):
1. What is the highest and lowest trading price of stock CSCO?
2. Trading volume is the total of all the trade sizes. For example, on a day if stock XYZ has only two trades: $15 for 300 shares at NYSE and $16 for 200 shares at AMEX. Then the trading volume of XYZ for that day is 500 shares. The trading volume of XYZ for next day would mostly be different, say 700 shares. Then the average daily trading volume for the two days is (500+700)/2 which is 600 shares.
So, what are the five stocks that have the highest average daily trading volume for that month? Your query should return two columns: stock name and corresponding average daily trading volume. Please sort the result in decreasing average daily trading volume order.
3. We define the bid/ask spread as (Ask Price – Bid Price)* 2/ (Ask Price + Bid Price). For example, when a stock ABC has bid/ask price of say $10x$10.5, the bid/ask spread is (10.5-10)*2/(10+10.5) = 4.88%.
So, what are the five stocks that have the highest average bid/ask spread over the 1 month period? Your query should return two columns: stock name and corresponding average bid/ask spread. Please sort the result in decreasing average bid/ask spread order.
Let’s say that we would continue to use this database to store future data for these 3000 stocks every day for say the next 5 years. What database software/tools e.g. MS SQL Server, Microsoft Access, that you would use to implement this database? Any what are the reasons that you choose that?
---------------- End of Question --------------