Friday, September 9, 2011

Space weather update



Exciting news in space weather today, so we pulled some recent data to get set up for follow-on analysis.  Here's a quick look at magnetic declination at Boulder, CO for the last 30 days.  47,521 points.  Blue to red is August 8 to September 8.  Note that on August 9, which happens to be the blue outlier in the afternoon, the Sun released an X-class solar flare.

The data was collected at magnetic observatories. We thank the national institutes that support them and INTERMAGNET for promoting high standards of magnetic observatory practice (www.intermagnet.org).





In this plot, the Y-axis facets are day of the year.

Tuesday, August 23, 2011

GDP Uncertainty



This post is really about data representation, not economics.

Several commentators have noted the sharp drops in the Bureau of Economic Analysis estimates of the rate of change in the U. S. GDP in and around Q4 2008.  For example, the Economist observed:
The BEA’s first estimate of output in the fourth quarter of 2008, published in January of 2009, showed a contraction of 3.8%, later revised to a 6.8% drop. The new numbers change the figure yet again, to a shocking 8.9% fall in GDP. For 2009 as a whole, the American economy shrank by 3.5% rather than the previously reported 2.6%.

Such tardy and substantial changes to the basic picture of the downturn have left many perplexed. The fault lies in the grindingly slow process of government data collection. The BEA pieces together its GDP estimates from a range of monthly economic surveys. Those data, themselves subject to annual revisions, are fed into calculations of national output. Delays plague each step of the process. The 2009 Annual Survey of Manufactures, for instance, was published at last in the fourth quarter of 2010. Its impact on GDP was not revealed until this July, however, because the BEA reports annual revisions just once a year.
In its advance (earliest) release, the BEA did caution its audience regarding what "advanced" means when talking about a release.
The Bureau emphasized that the fourth-quarter “advance” estimates are based on source data that are incomplete or subject to further revision by the source agency (see the box on page 4).  The fourth-quarter “preliminary” estimates, based on more comprehensive data, will be released on February 27, 2009.
By the way, that "box on page 4" is interesting.  It reports that
Information on the assumptions used for unavailable source data is provided in a technical note
that is posted with the news release on BEA's Web site. Within a few days after the release, a detailed "Key Source Data and Assumptions" file is posted on the Web site. In the middle of each month, an analysis of the current quarterly estimates of GDP and related series is made available on the Web site; click on Survey of Current Business, "GDP and the Economy."


That's a lot to digest to attempt to understand the caveats (assuming you understand the methodology, which happened to change in 2009).  Note that the assumption details are posted "within a few days" of the release itself.

Models outside the BEA appear to have headed toward what was apparently a more accurate estimate.  At least some are bound to.  One analyst consensus estimated the GDP rate at -5.4 percent.  When markets closed on January 30, 2008, the Financial Times had this to say :

The S&P 500 closed down 2.3 per cent at 825.88, the Dow Jones Industrial Average 1.8 per cent lower at 8,000.86 and the Nasdaq Composite index off 2.1 per cent at 1,476.42.

The market had opened higher after US Department of Commerce figures showed that fourth quarter gross domestic product contracted at a 3.8 per cent annual rate, which, although bleak, was not as bad as feared.
But leading indices slipped into negative territory after the open as analysts pointed out that the headline figure – helped by the number of unwanted unsold goods – belied more worrying underlying trends.
The BEA's subsequent 2008:Q4 "preliminary" (second) release on February 27, 2009 reported a -6.2 percent annualized (and seasonally adjusted) rate of change from the prior period.  That was down from -3.8 percent in the advanced release 28 days prior.  The preliminary number was almost twice the advanced (earlier) number.  The BEA commented:

The preliminary estimate of the fourth-quarter change in real GDP is 2.4 percentage points, or $74.4 billion, lower than the advance estimate issued last month.  The downward revision to the percent change in real GDP was widespread; the largest contributors were downward revisions to private inventory investment, to exports, and to personal consumption expenditures for nondurable goods.
Note the relationship to the Financial Times comments a month earlier.  The S&P 500 was off 2.4% that day, and the yield curve steepened, with the 2-year note down 6 basis points and the 10-year note up 4 basis points.

The BEA's GDP estimation obviously isn't easy or immediate,   In certain circumstances, estimating quarterly GDP is especially hard, and subsequent BEA estimates of 2008:Q4 GDP did not really stabilize (presumably partly due to changes in methodology).  Here's the timeline for BEA updates to the 2008:Q4 GDP rate of change:

BEA 2008:Q4 GDP change releases


That's a long time to wait for market- and policy-moving data.  (Perhaps you can argue that much of the change was due to methodological changes, but where does that gets you?)  Was 2008:Q4 a fluke?  Here's the story for the BEA GDP estimates for the four quarters of 2008:

BEA 2008 quarterly GDP change releases 


2008 was not a good year for BEA GDP estimates.  Models often depart from reality at inconvenient times.  The BEA discusses some of these challenges here.  Changes in methodology complicate things.

For forecasts, error bars and other indicators of confidence, variance, distribution, etc., are of course common.  Here's a particularly relevant example from the Federal Reserve, which has its own staff for tracking this stuff:

Source: Federal Open Market Committee
This graph comes directly from the minutes of the Federal Open Market Committee on January 27-28, 2009, and the authors depict the "central tendency" for each forecast period.  Even though 2008 was history, it's obvious now that those recent historical numbers were not themselves history but instead forecasts of the past.  Presenting them as single numbers can encourage a confidence that is not always justified.  2008 GDP estimates deserved "central tendency" bars too.

Cheap shot: What were the error bars on S&P's Lehman counter-party credit risk rating that was reaffirmed in July 2008?

We try to be circumspect when we encounter point estimates.  In our projects, we often go to a fair amount of trouble to carry around information about distributions, confidence, or caveats.  Rarely convenient but often powerful.  We'll have more to report on this topic.


Tuesday, June 14, 2011

Part of HandlerSocket's Missing Manual

Yoshinori Matsunobu's excellent HandlerSocket has some undocumented features, including filters and a kind of IN. Here's a preliminary sketch that follows the HandlerSocket documentation.

Extended 'open':
  P <indexid> <dbname> <tablename> <indexname> <columns> [<fcolumns>]
<fcolumns> has the same syntax as <columns>. These filter columns are used by <filter> specifications, which are described below.

Extended 'find':
  <indexid> <op> <vlen> <v1> ... <vn> <limit> <offset> [<in>] <filter>*

  <in> := @ <icol> <ilen> <iv1> ... <ivn>

  <filter> := W|F <fop> <fcol> <fv>
<in> specifies that <v<icol>> should be sequentially replaced with each <ivi>. For example
  1 = 2 . foo 10 0 @ 0 3 6 7 8
(where . denotes a 0-length value) will result in the following three queries:
  1 = 2 6 foo
  1 = 2 7 foo
  1 = 2 8 foo
It appears that each query returns at most one record, so, if you provided N <iv>s, you'll get no more than N records. <filter> specifies when to skip or stop results.
  W = stop
  F = skip

  <fop>: one of =, <, >, =<, =>, =!.
  <fcol>: index for <fcolumns>
  <fval>: an encoded value
Examples:
bash$ echo 'CREATE TABLE foo (x INT, y VARCHAR(8), PRIMARY KEY (x,y));' \
      | mysql
bash$ nc localhost 9999
P    1    test foo  PRIMARY   x,y
0    1
1    +    2    1    one
0    1
1    +    2    2    two
0    1
1    +    2    3    three
0    1
1    +    2    4    four
0    1
1    >    1    0    10   0
0    2    1    one  2    two  3    three     4    four
P    1    test foo  PRIMARY   x,y  x
0    1
1    >    1    0    10   0    F    <    0    3
0    2    1    one  2    two
1    >    1    0    10   0    F    <    0    3
0    2    1    one  2    two
1    >    1    0    10   0    F    >    0    3
0    2    4    four
1    >    1    0    10   0    W    <    0    3
0    2    1    one  2    two
1    >    1    0    10   0    W    >    0    3
0    2
1    =         1    10   0    @    0    2    2    4
0    2    2    two  4    four
1    =         1    10   0    @    0    3    2    4    6
0    2    2    two  4    four
1    <=        1    10   0    @    0    3    2    4    6
0    2    2    two  4    four 4    four
The above is just a quick sketch. Let us know where we are off.

Tuesday, March 29, 2011

NTSB Aviation Accident Feed Updated

Just a quick note. We updated our NTSB Aviation Accident RSS feed. Various datasources had changed, so we had to change as well. Briefly, this feed provides links, when possible, to FlightAware, SkyVector, historical weather data, and more.

Our feed is here, and the official source is here.

Wednesday, September 8, 2010

Geomagnetic storm data




For a recent project, we looked at geomagnetic storm data. Space weather is making the news these days.


Lots of interesting data is available. Among other things, we found ourselves looking at the relationships between F and H, Dst, and Kp.



For example, the figure below represents F (blue), H (red), and Dst (green) during a geomagnetic storm on October 28-30, 2003. Each X tick is one minute. Dst values are interpolated.



The following figure presents the absolute value of percentage changes from the means.



The figure below represents the percentage change in F in the same timeframe:



Our review of the literature didn't turn up anything definitive on the relationship between F and H (or Dst and Kp). Anybody have any pointers? We've begun some analysis of data like


Source: http://geomag.usgs.gov/realtime/


but we'd obviously rather not build models from scratch if we don't have to. We'll make another pass at the literature and update this post accordingly. So stay tuned for exciting space weather analysis for the upcoming season.











Monday, July 19, 2010

PDFs from "Building Numbers"

Here are some PDFs from the Building Numbers post. Click a thumbnail to get a PDF.

First, a non-spiral image.




The big spiral:


also available in a smaller version.

And finally an image using 3D numbers:


Friday, June 25, 2010

S&P 500 Composite 20-year total real returns


We've been looking at the distribution of total real returns of the S&P 500 Composite index over 20-year periods. Dividends are reinvested monthly. Data, starting from 1870, is from Robert Shiller's collection. Our perspective considers relationships between historical context and finance. Or perhaps more along the lines of the intersection of economics and finance. Anyway, here are some graphs we made.

The following graph shows those 20-year total real returns (with dividends reinvested) annualized starting each month since January 1, 1870.

The mean, median, and standard deviation are 6.74%, 7.05%, and 3.04% respectively.

Here are the historical means. A point at time t is the mean 20-year total real annualized return from January 1, 1870 to t.



The historical mean 20-year return as of 1990 happens to be the same as of 1900.

The 20-year trailing averages of those 20-year returns:



What does the distribution of the 20-year returns look like?



Not exactly a nice normal distribution. Excess Kurtosis is -0.73. The distribution appears to be multimodal.

The first graph suggests an obvious line of inquiry: to what extent are distributions affected by historical events or economic regimes? We don't know, but we offer two last graphics. First, we color the histogram above so that each point in a bin is colored according to its date.


If you started a 20-year investment in the post-WWII period (yellow to orange), you probably did well. If you were unfortunate enough to be a child of those investors (red), your 20-year run would have likely been pretty unsatisfying. In that case, should you blame your parents? Assuming no generous inheritance, of course.

If we color the first graph according to return and fill towards the mean, then at least three modes of the distribution are apparent. Use caution, of course. The spectrum and its representation might bias the eye to distinguish red, green, and blue modes. We're just kicking things around here. Should we regard moderate markets (green) as transitions to the outlying bull and bear market modes in their economic/historical contexts? Maybe we just need a lot more data before a nicer distribution appears.