Open Source Enterprise Solutions

Open Source Journal

Subscribe to Open Source Journal: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Open Source Journal: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Open Source Journal Authors: Liz McMillan, Elizabeth White, Stackify Blog, Pat Romanski, Patrick Hubbard

Related Topics: Time Series Journal, Cisco Virtualization Journal, Open Source Journal

Blog Feed Post

Catastrophe Modeling for the Insurance Industry

by Joseph Rickert At a Bay Area R User Group (BARUG) meeting this month hosted by Cisco, Dag Lohmann (the co-founder of Katrisk) gave an electrifying talk on catastrophe modeling for the insurance industry. Catastrophes: cyclones, hurricanes, floods earthquakes, terrorist attacks are rare events (from a statistical point of view) that cause losses and human suffering over large geographic areas. Insurance companies build models of these events both for underwriting, where they need estimates of local risk at various locations, and portfolio management, where it is imperative for them to estimate the correlation of risk at the different locations and also have a means for aggregating risk. Two aspects of catastrophe models that Dag’s talk really drove home are: the astounding amount of data consumed and the scope and sophistication of the modeling techniques employed. A typical professionally-done catastrophe model might attempt to use 150 years worth of meteorological data (~ 30 terabytes) over 100M locations while simulating 100,000 or so atmospheric and hydraulic scenarios. As Dag pointed out: (100K scenarios) x (100M locations) x (1/50 probability of occurrence) yields 200 billion records to feed the financial models. To get an idea of the level of modeling sophistication involved, consider that a typical model might employ: detailed fluid dynamics simulations to calculate hazards; vectorized time series models to compute correlations; advanced statistical methods for variable reduction; validation and much more. No less impressive, but not surprising, is the fact that Dag can do all of this with an open source stack built around R and supplemented with Leaflet and Mapserver. As Dag pointed out: "R is deeply embedded in the Insurance Industry".  For a serious introduction to catastrophe models have a look at Dag's slides, and then work through the R code of an elaborate sample model that is available on the KatRisk website. The following plot comes from the first part of the model. It shows a grid superimposed on a map of England colored by the hazard for an imaginary catastrophic event. For a real event, this type of plot would be the output of an extensive data analysis and modeling effort. To produce the example plot, however, an elliptical copula was defined using R's copula package to create a multivariate distribution with fixed correlation among the marginal distributions. Then, the hazard grid was filled out by sampling from the distribution. This only the beginning. After simulating the hazard events, the code goes on to simulate exposure and vulnerability, build an event loss table, work through a financial model, construct AEP (Aggregate Loss Exceeding Probability) and OEP (Occurrence Loss Exceeding Probability) curves for both expected losses and sampled losses, estimate secondary uncertainity and compute quite a few performance measures. Never having worked in this field myself, I found the LLoyd's publication "Catastrophe Modelling; Guidance for Non-Catastrophe Modellers" helpful. 

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid