Results

So we have our max-response pair and we want to exploit the predictive power of this information. Let’s build up our intuition about that power by focusing for a moment on a more mundane Colossal-style lagged correlation where changes in signal A are perfectly predictive of changes in signal B at some later time. One way to exploit this type of correlation is to focus on statistical outliers. Unusual changes in signal A will be predictive of comparably unusual changes in signal B, which, assuming that B is a proxy for a real-world observable, can be arbitraged in some way. In practice, a perfectly reproducible lagged correlation is a rare phenomenon, since it requires very strong system isolation. The response function is an attempt to account for the complexity of the real world, with its welter of interactions and dependencies. While the calculation of the response has the tendency to smear out the effect of a constant lagged pairwise correlation, such correlation is still the single biggest contributor to a high response, since a strong lagged correlation typically leads to a graph edge connecting the corresponding node pair. And as we’ve seen in the previous post, topological proximity is one of the main factors affecting the magnitude of the response between a pair of nodes. Nevertheless, to counteract the smearing of the correlation in the process of computing the response, our trading strategy will rely on a similarly smeared measure of the dependent signal: not the dependent signal itself, but its moving average.

To summarize: our trading strategy (statistically) arbitrages the expected dependent signal dynamics using the predictive capacity of the max response pair. To evaluate the validity of the strategy, we prepare a test statistic based on the behavior of the moving average of our dependent signal in the five periods following an outlier (pval <= 0.1) move of correct polarity in the independent signal. The number five is chosen because our response maximization is based on a three-period integral, and the five-period MA captures the preponderance of the induced response.

Our null hypothesis is that the response-as measured by the moving average-for the max response pair will not be meaningfully different from the response for a randomly chosen edge in the graph. The alternative hypothesis is that the response populations are distinct. We base the analysis on two months of minute-resolution data collected on roughly 50 financial market measures. We use the the t-test for sample means and variances, which turns into a z-test in the limit of large sample sizes:

Z-test

where x and s are the sample mean and standard deviation of the two populations, respectively. Z-test computations

Z-test computations #2

Discussion

As the computations indicate, there is not a statistically significant difference between the two populations, although the improvement in the predictive capacity of the max-response pair is promising. Possible reasons why the returns on the response-derived edge are not significantly better than a random edge include: 1. our moving average measure of the induced response may be inadequate; 2. our technique for computing the max response edge itself may be flawed (for instance, we should be using a different kinetics model), and 3. the assumed direct mapping between a system involving signals and one involving physical observables may be imperfect.

All of these potential problems ultimately overshadow the most vital insight obtained from this study, which is the considerable advantage of using Altaridey in the context of a control systems application, rather than as a means to make predictions. In the latter application, one is passively waiting for the outliers in the independent data stream, whereas the choice of the independent stream, as well as the ability to generate outliers is completely within one’s discretion in the case of a control system.