Category Archives: Uncategorized

Using Asio to build Http client (continued)

Well, we need more than send-one-request-receive-one-response.

We need establish a persistent connection and repeatedly send new requests.

Advertisements

Boost Asio as Http client

well, I know poco can do it, but I’d like to kick the tire of Boost library suite. Some people complain that Boost libraries are of varying quality and usefulness.

My goal: I have an existing Excel C add-in which need listen to a HTTP port, receive text stream. Such text is actually serialized (via Boost::Serialized) object, which need to be deserialized and pushed into library’s object cache.

So the key is deep integration with other part of the library. Also, 1) such HTTP get() is data pull model; 2) we may need get from multiple http destinations simultaneously, so true multi-threading is a must (but such MT has to be kept off from main library code).

wrong port: actively refuse (in connect_callback)

The sample is taken from:

http://www.boost.org/doc/libs/1_38_0/doc/html/boost_asio/example/http/client/async_client.cpp

The key is to understand the nested, recursive call structure.

As there is no push in http (i.e. trigger a new get() whenever new msg is ready on server), how to implement the pull?

Now, it is based on the fact Excel function is volatile. So every time Excel re-calculates, it sends a get() request. Note that our get() is asynchronized, so it takes a while before the new content comes in and update object cache. So Excel will well get an existing (older) version of the remote object.

Key fact: the real-time object as of time T won’t be available at least at T+1, i.e. next time Excel re-calculates.

It could be a big drawback, but it is the consequence of pull model.

Review of our object cache: the library is single-thread, so the object cache has no lock in general for best performance. However, asynchorous thread has to update the object cache via a special function where a lock is required. Correspondingly, the Excel volatile function must acquire the same lock before it can read from the cache. In short, the lock is built-in object cache, only applicable for such html thread purpose.

Boost Serialization in polymorphic setup

Boost Serialization is notoriously poorly documented. I am asked to provide serialization capability for a simple class hierachy. To make things a bit more fun, 1) such classes sits in namespace; 2) serialization/deserialization will be in different processes (so we cannot make assumption at runtime which runs first or runs at all).

I use Boost 1.54.0 as of now.

Let us review some basic stuff regarding Boost serialization:

1. Class must have default (param-less) cstor.

Easy to understand: deserializer is like a special cstor(). It must first call the default cstor, then it can populate the member attributes.

2. If we want to use member-function-based serialization, the class can NOT be declared in any namespace.

member-function-based means serialization is implemented as a [private] member function of the interested class.

class FooBar

{

private:

template<class Ar>

void serialize(Ar& ar, const unsigned int version) { … }

}

Why should it be private, see next point.

What if our class is declared within namespace? We have to use non-intrusive serialization (via free functions).

3. Suppose we have a class hierarchy: pure virtual base  Base with two derived classes, named DerivedOne and DerivedTwo.

We should NOT call parent’s serialization code from derived class serialization, instead, using a special base_object template.

class DerivedOne: public Base
{
public:
DerivedOne(double x=0.0, double y=0.0, double z=0.0): Base(x, y), _z(z) {; }
double _z;

template<class Archive>
void serialize(Archive & ar, const unsigned int version)
{
 //Base::serialize(ar, version);  — BIG NO!!!
 ar & boost::serialization::base_object<Base>(*this);
ar &_z;
}
}; //class DerivedOne

*. We have to hint Boost about class hierarchy, or, we get a crash. (true_type == NULL!)

Note: the program walked thru the line of throw, but then crash, kind of weird:

//oserializer.hpp

const boost::serialization::extended_type_info * true_type =
i.get_derived_extended_type_info(t);

// note:if this exception is thrown, be sure that derived pointer
// is either registered or exported.
if(NULL == true_type){  //TRUE
boost::serialization::throw_exception(
archive_exception(
archive_exception::unregistered_class,
“derived class not registered or exported”
)
);
}

We take the preferred approach, explicit registration. i.e. in .cpp file, use macro:

BOOST_CLASS_EXPORT_GUID(DerivedOne, “DOne”)

This macro just expands to other two macros, BOOST_CLASS_EXPORT_KEY2 and BOOST_CLASS_EXPORT_IMPLEMENT.

BOOST_CLASS_EXPORT_KEY2(T, K): build a k/v system for type names/C++ names in cross-platform, cross-compiler fashion. See here for argument C++ built-in typeid() cannot be used.

BOOST_CLASS_EXPORT_IMPLEMENT(T, K): force templates for derived class to be expanded and code is generated. Due to polymorphism, a module can completely manipulate a derived object via a base class pointer. Without such macro, compiler won’t generate code for (serialization) template code.

Use Python to visualize CSV data

suppose I have some data in CSV file — saved in H:\Daily\2012-03-09 folder.

In iPython

!H:
%cd Daily/2012-03-09

Note we alternate between “magic function ” (%) and system command (!).

import csv
infile = csv.reader(open("C:\\temp\\in.csv", "rb"))
x, y1, y2, y3 = zip(*infile)
plot(x,y1)

Nice book: introduce practical Machine Learning in Python

The book is named “Machine Learning: an algorithmic perspective”.

http://ishare.iask.sina.com.cn/f/17742713.html

all source code is FREE for download:

http://seat.massey.ac.nz/personal/s.r.marsland/MLBook.html

Daily Python with emphase on NumPy

Well, did not know Python has such a good eco-system for numerical math. Matlab should die.

Quick notes on NumPy.

1. Create matrix using [] input and reshape

a = arange(6)

b = reshape(a, [3, 2])

c= array([ [1.2], [-2.3] ] )

print c.shape

 

2. Matrix mult. By default, all operations are elem-wise! For true matrix mult, use “dot”.

dot(a, c)

3. assigments are reference; must use copy() for deep copy.

 

 

Daily Python with emphase on NumPy

Well, did not know Python has such a good eco-system for numerical math. Matlab should die.

Quick notes on NumPy.

1. Create matrix using [] input and reshape

a = arange(6)

b = reshape(a, [3, 2])

c= array([ [1.2], [-2.3] ] )

print c.shape

 

2. Matrix mult. By default, all operations are elem-wise! For true matrix mult, use “dot”.

dot(a, c)

3. assigments are reference; must use copy() for deep copy.

 

 

A review on my own grasp on mkt microstructure

1. UST tick data and BGC.

H:\quant\MktMicroStructure\USTreasury contains a spreadsheet with UST ticker data (10min ticks with volume). Not really high freq.

The folder also has a guide from BGCanter, for how to use the data (client guide).

BGC has its own analytics suite: http://www.bgcantor.com/analytics/thc-decisions-treasury-analytics.html

For rates business, no sophisticated analytics is ever requested.

Just see

2. Simulation Software

Jbookrunner. Assume you have obtained a order book. How to make it “live” and see how trades happen? You need a simulation software.

 

3. 2011 GlobalDerive Conference: HFT workshop

Jim Gatheral (Baruch) is focused on “highest layer” of Algo Trading tech stack.

Robert Almgren: HFT strategies for IR products
In fact, he only talks about IR futures traded on CBOT ( Tsy futures; Libor futures).

some short notes:
1. Futures trading is very centralized, not scattered (and even hidden in black pools) among many different exchanges. CBOT has the monopoly.

2. Recently [ Apr 2011] , NYSE LIPPE tried to end the monopoly, basically introduced the “same” products. But due to copyright, they are equivalent, but they are not exchangable. So trades on CBOT and NYSE cannot net.

NYSE still has a point as you can combine your bond trades with futures trades and have a combined risk profile.

3. Robert is only focused on “broker” side, i.e. design strategy to help clients to execute large trades. It is not about prop trading.

4. It is a pro-rata market; it is leading to the fact everyone over submit orders (in hope to fill), so overfilling will occur.

This is in contrast to equity market, as highlighted by Robert.

5. IR products are [obviously] driven by macro econ events [FEDS meeting; CPI; tsy auction]
NYU FinMath student wrote a paper and implemented models.

TODO: how to find him?

6. My own thought on HFT: it is a fad and a scam.
The barrier is very high: 10mm USD initial investment to start with. require high quality traders/quants/IT to work together, which is impossible for small firms. Also, how much money is on table to make? what is the financial transaction tax?

however, “Algo trading” could stay; it is not really about split-second, it is about identify signals from price stream, understand order book, and replace human by machines.

7. Where to obtain book data? where to obtain simulation software?

Michael S (BAML):  HFT signals and order execution

He sketched MRR ( Madhavan, Richardson Roomans, 1997, Why market price change?) model, though it is flawed.
Scanned copy is here:
√http://finance.wharton.upenn.edu/~rlwctr/papers/9420.pdf

Main topics

1. manage Berm book

— track mkt

— explain P/L

— explain delta/vega

2. Microstructure with new trends in fixed income and rates

— mkt data ( UST, futures, more?)

– NEW IT trend

— Excel or other front-end tech; enpower traders with latest C++ library

— New C++ tools (dev, profiling, optimization)

— data visualization and mining

 

3. Tree implementation

MUST come up an idea so that I can OWN a copy of new tree code at home!

 

QuantRead: Yieldbook MBS LMM Term Structure Model

The most celebrated part is the calculation done in Appendix 1.
claim: Increase short vol decreases the serial correlation of 10yr swap rates.
the proof applies the “roughly const coefficient” trick.
note that “short” vol means the tenor. To show a proof for 10yr swap rates starting N yrs from now and M yrs from now, we need N * (M-N) vol. M-N must be small.

Higher serial correlation means if option is in-the-money now, it will be highly likely in-the-money later. Hence the embedded option value decreases.
Such argument is kind of like: for CMO pricing, junior tranche likes the fact all collaterals are highly correlated, while senior tranche prefers all collaterals are lowly correlated.