White Paper

Evaluating Internet Data Management Technology for
E & P Asset Teams

Jamie Cruise, Former Software Development Manager
Hampton Data Services Ltd

Introduction

When The Economist reads like an edition of Wired magazine and Wired magazine is bought by readers of The Economist, you know something is up. "The Internet will change everything", runs the slogan. And so it will. For those who subscribe to the principle of management by fashion, this is enough. For those who need slightly more convincing we must examine the proposition closely to try and understand what it means in the context of our particular environment. With this goal in mind our presentation will attempt to evaluate the impact of Internet technology on information management within E & P Asset Teams.

The analysis is somewhat complicated by the need to make sense of the range of products and services which can be classified as Internet technology. This will be the subject of the first section of our discussion. We will then move on to cover some technical issues that we feel are especially relevant to E & P data management professionals. Within each topic we provide an introduction and overview of a piece of Internet technology and show how it addresses familiar problems. We will include references to current oil industry initiatives wherever possible.

This technical discussion should provide a feel for the practical impact that this technology can make on problems that face our industry. From this basis we then move on to explore how the new generation of tools represents a paradigm shift in the way that business is conducted; what new opportunities this sea change will throw up and investigate how our organisations stand to benefit most.

We will conclude by attempting to characterise the current E & P data management software (and vendors) according to their fit with a connected future.

Negotiating the TLA Maze

Keeping a track of developments in information and communications systems has always presented a serious challenge to IT professionals. The most recent waves of change have upped the ante being presented as a social phenomenon sold to all levels of society as an essential component of the modern lifestyle.

A cynic may argue that the technology does not justify the hype. A cynical technologist may argue that the Internet, correctly defined, is simply an extended computer network hosting applications that talk to each other using TCP/IP. The more expansive thinker may rationalise recent developments as the culmination of previous technical watersheds: unlimited CPU cycles, unlimited storage and unlimited memory. And now the Internet goldrush offers the prospect of unlimited bandwidth.

The true visionary (and there’s no shortage of them amongst the digerati) will focus less on the technology of internetworking and more on the revolution in communication. Communications at all levels: from the way that our software applications are built from a series of components that communicate according to well-defined protocols called interfaces; through the concept of large-scale distributed systems that transcend physical location to form the World-Wide-Web; to the wired wheel of global e-commerce where large organisations form the business hub and their suppliers are closely integrated spokes.

In practice, there is merit in all of these views although an overly cynical position seems unlikely to serve us well over the coming years. Conversely, perhaps we’re not all ready to join the first carriage of the first train out to the frontier just yet. Nevertheless, we will argue that a seat somewhere near the front is a good bet. Organisations are changing; dynamic federations of asset teams are replacing vertically integrated monolithic corporations. Downsizing and outsourcing imply an ever-greater reliance on external suppliers. Competition and the need to increase productivity demand levels of internal efficiency that can only be met by smart people leveraging smart technology. For these reasons that it is essential that we at least immerse ourselves in the new idioms, to understand the changes, the challenges and the opportunities.

So what is Internet Technology? It is certainly the connected infrastructure: the networking hardware and protocols; the development tools and languages; and the applications and interoperability standards. Moreover we extend our definition to include the Internet community itself: the users, developers, support organisations and standards bodies. It is the synergies arising from the combination of technology and community, which differentiates Internet Technology from previous information management "Silver Bullets".

The acronyms and the market-speak are mostly distractions. It doesn’t matter if the technology is a network computer or a PC; Java or ActiveX; COM or CORBA; UNIX or NT; AOL or IE. What matters are: How does it contribute to the solution? How will it improve the quality of my decision? Does it enable me to close-the-loop; think out-of-the box; or otherwise exploit an opportunity that was previously beyond my reach. And of course, will it do those things in an economic and timely fashion (often for nothing, now).

Having made this commitment to the big picture, we move on to look at some of the emergent technologies and investigate how they might provide some practical return on our intellectual investment.

Data Management In Context

Perhaps the most important role for data management within E & P is to facilitate high quality decision making. In an attempt to formulate a useful heuristic, we propose that the quality of a particular decision be considered a function of:

  1. time/money
  2. domain knowledge
  3. access to information
  4. quality of information
    • Timeliness
    • Accuracy
    • Completeness

Although this is a somewhat crude model and these are not necessarily independent variables, it is good enough for our current analysis.

As data management professionals we are not normally in the business of providing time and money and we expect our users to be experts in their domain. Instead, our role is to provide effective access to high quality information.

Professional librarians and archivists have traditionally played an important part in fulfilling this function within large oil companies. They deliver a managed service to end users, working to well established procedures for indexing, cataloguing, storage and retrieval. But, as Neil McNaughton has recently observed, these functions have been systematically eroded in most companies as part of cost cutting and efficiency drives.

Computer systems and ad-hoc procedures have supplanted the library. Unfortunately, these have delivered a new legacy of disconnected data islands that are to be found lurking in most office spaces, hard disks and mail systems. These islands hinder effective decision making and the process of managing them wastes valuable time, money and expertise, leading directly to the "80:20 Rule" so beloved of data management software vendors.

In an attempt to tackle this problem the E & P community have implemented ever more expensive ‘enterprise’ solutions for managing subsets of this information for use in particular applications. We have diverse, disconnected corporate systems for managing raw (e.g. acquisition and hardcopy), ancillary (e.g. documentation and correspondence) and processed (e.g. models, analysis and interpretations) data.

Whilst these systems are designed to increase operating efficiency, they can often introduce spectacular new inefficiencies. The database systems can be so complex to populate and maintain that even an upgrade to another version of the same product can result in a bill of many tens of thousands of dollars for data migration. Hence the market for tools and services to move information between them and consolidate information across them.

As a consequence of the cost of lack of interoperability, routine interchange of geophysical data between organisations still largely depends on the use of a small number of old, limited formats.

Managing Metadata

Metadata is the information about information that ought to ease the process of interoperability between systems. However metadata itself, in its current form, is an expensive commodity. It appears as a plethora of file format definitions (e.g. LIS, LAS and SEG-Y) and database schemas (e.g. POSC, PPDM) that form the basis of current "open" data management systems.

Applications are hardcoded to expect definitions and representations according to a particular version of the standards, effectively setting the metadata in stone. Modifications to the metadata normally mean significant database and system updates. Support for data that falls outside of the remit of the existing metadata normally implies another expensive round of customisations. For this reason, the standards bodies charged with producing metadata have developed all-encompassing definitions, a process that inevitably leads to a further increase in the cost of interoperability.

But help is at hand from an unlikely source. Known affectionately as "The New ASCII" - the eXstensible Markup Language (XML) provides an opportunity to tackle some of fundamental problems outlined above.

Put simply, XML is a text file standard for exchanging structured information between computer systems. XML documents can be defined, created and maintained using a range of tools, from a simple text editor, through to powerful object databases. Derived from SGML, it allows us to establish flexible vocabularies to describe and exchange information resources within a particular problem domain (e.g. G & G).

The structure and content of any specific XML file is defined in a schema file called the Document Type Definition (DTD). Together, the DTD and XML files are called an XML application. The XML application is analogous to a database and catalogue.

The current level of hype surrounding XML may seem excessive for a plain old text file format. Nobody made this much fuss about LAS after all! In fact, the enthusiasm derives from the following opportunities:

  • Industry bodies may define and publish vocabularies (schemas) for particular data types (akin to, and probably based on, existing industry standards). For example, a DTD for exchanging well log interpretations may emerge that will bridge the gap between the presentation-less LIS and LAS standards and the presentation-only CGM formats.
  • These standards will be flexible, extendable and independent of particular implementations.
  • Data and metadata exchange will be carried out using simple text file formats suitable for all transmission and storage media e.g. network, tape, disk or database.
  • Applications will be driven by the definitions that are embodied in the data, rather than hardcoded at compile time for a particular version of a file format or database schema.
  • Generic, high quality data transformation tools based on XML have emerged, which can be freely embedded within proprietary applications.

It is important to note that XML in itself will not deliver interoperability. However XML technology will enable us to implement a new generation of effective, low cost, well connected E & P data management applications whose flexibility is based on configuration rather than customisation. This new breed of applications should also help to bridge the gap between the currently disparate data management domains of raw, ancillary and processed data.

Vendors of online archive management tools should already be examining XML based initiatives, such as the Resource Definition Format (RDF), which promise to reclaim the librarian’s organisational artistry in the digital domain.

In this section we have seen how Internet technology can contribute to the development of higher quality, better integrated, lower cost digital archives. In the next section we will explore the merit of bringing these archives online, delivering a consolidated information base to the E & P users’ desktops.

Consolidating Legacy Archives

So, are we to conclude that the ability of standards such as XML to facilitate more effective interchange and indexing of a variety of data sources signals the demise of relational and other legacy database technology?

Well, not quite yet. XML is a simple text protocol and as such does not provide database management facilities. Object database vendors such as POET and ODI have jumped on the XML bandwagon as an opportunity to revive their somewhat flagging fortunes and are delivering XML based DBMS’s. Yet these are likely to have a limited impact on near term data management products and it is not at all clear that they represent a viable alternative to existing database systems for operational purposes.

Instead, expect to see vendors leveraging Internet technology to bring powerful searching, browsing, workflow and analysis applications to the end-users desktop. These applications will integrate and extend existing data sources.

It will become common practice to build application servers that simultaneously access multiple databases from a variety of hosts. These application servers and hosts can be any combination of Mainframe, Workstation or PC based machines. Results will be delivered to web-browser based client applications, accessible from any connected desktop.

Freely available Internet protocols and servers provide the plumbing to build these sophisticated distributed solutions. Attributes such as high performance, availability and scalability which were previously only achievable using the most high-end and expensive tools from specialist vendors, can now be delivered using commodity system technology.

  • Web servers that can deliver millions of pages of information an hour and which handle thousands of clients requests are installed around the world as part of the World Wide Web. The HTTP protocol that sits atop TCP/IP has evolved from the simple hypertext publishing standard into the backbone of global e-commerce. Advances in publicly available security technology are allowing organisations to extend their Intranets into extranets that reach out across the web to their customers, suppliers and partners.
  • Component technologies such as DCOM and CORBA have finally allowed users to build libraries of business objects that are deployed on the server to provide the building blocks of dynamic and extensible applications. At the same time they provide a mechanism for managing and reusing their expensive software development investment.
  • Web browsers have evolved from simple read-only hyperlink document viewers into powerful application development platforms. Standardised DHTML based browsers allow developers to build complex user interfaces in a compiler free environment, without necessarily sacrificing performance or flexibility.

For the E & P user these tools allow IT departments and project groups to begin building bridges between the data islands and slowly move the knowledge base back under corporate control. It is somewhat ironic that it has taken the anarchic explosion of Internet technology to provide an opportunity to restore the effectiveness of centralised information management.

In the future, we will provide browser based applications that allow users to identify and access project based data such as interpretations, correspondence and reports from a variety of systems. They will simultaneously be able to identify related information such as sections and logs from their raw data archive. All of the data will be accessible using helper applications such as log viewers, word processors, 3D modelling packages, spreadsheets and raster imaging tools. The users work output will be posted back to the archive and available to the next team member, regardless of their physical location.

The ability to extend these systems to remote users, partners, customers and suppliers will provide additional benefits to a business which itself is global and built on a highly volatile web of corporate and political relations.

Such projects may initially be associated with knowledge management and related initiatives. We feel however that most real value is likely to be gained from the simple restoration of order to our information resources rather than from software agents and intelligent data mining.

Up to this point we have examined Internet technology’s ability to deliver structured information to a new breed of configurable data access applications. An integral part of this new breed of applications will be the geographic interface. The next section will examine how the concept of Internet publishing can improve access to this more complex E & P data type.

Online Geographic Information

Jack Dangermond of ESRI is reputed to have claimed that up to 80% of all digital data can be geo-referenced. Aside from the fact that he actually referred to data within US government databases, the ubiquity and utility of geographical information should come as no surprise to E & P professionals – there is a strong tradition of using GIS and digital mapping as part of our core activities.

Digital geographic information was also one of the first complex data types to go online. As part of the Internet publishing revolution we have seen many world maps, street atlases and network schematics on public web sites. Yet still there is an enormous amount of confusion surrounding the task of putting serious geographic data sets online.

Some may argue that there is no need to publish our detailed GIS databases as the existing desktop and workstation solutions are adequate. In response we would suggest that many of the existing mapping packages were designed according to older paradigms, where network usage and shared operation were a low priority, making them somewhat cumbersome and inefficient in these respects. The GIS vendors themselves lend some weight to this view by supplying a range of publishing and component systems that are mostly independent of their desktop and workstation products.

Whether or not the desire to put GIS data sets online is seen as a goal in itself we feel that there is a definite need for spatial data to be distributed as part of the client interface in our new breed of connected corporate data management solutions. In this context we are interested in providing mapping information as a spatial index into other databases. After all, if we want to work on information associated with field based assets there is no substitute for a good quality map display and selection system.

The difficulty with publishing such information is partly technical, partly political. Let us first review the technical options.

In order to display the large map databases that E & P teams typically work with it is necessary to think carefully about our delivery options.

One strategy for delivering map data online is to use an application server to send pictures of particular map views down to the client as raster images. The web browser interface can be simple, requiring no plug-ins or additional software to display the map views. Simple HTML interaction features such as navigation buttons combined with form based requests to the server allows the user to generate a variety of views.

However, the simplicity of this approach comes at a heavy cost:

  • Any change in the map view results in a request to the server, with potentially long delays, before views are updated. This reduces the client’s ability to work with the map in a truly interactive manner.
  • The limited interaction rules out essential functionality such as making complex selections.
  • The constant traffic between client and server generates unnecessary network traffic. This will not be sustainable in limited-bandwidth environments.
  • Each map view needs to be rendered on the server, placing a very high CPU and memory loading on the host machine. These systems are essentially unscalable.

The alternative strategy involves transferring the vector information to the browser for rendering by the local machine. This provides the following benefits:

  • The user has a fully interactive map display system that can behave like a standard GIS package.
  • There is no need to contact the server for most interactions – reducing network bandwidth requirements and allowing a single server to scale to hundreds of users.
  • The user can easily customise their map presentations without affecting other users.
  • The application can exploit the enormous processing and rendering capabilities of modern desktop clients.

However, this approach also has drawbacks:

  • The map data must be periodically resynchronised with the server. Clients that connect infrequently may have to regularly update their vectors.
  • It requires a plug-in of some sort to be installed on the client. This may be an ActiveX control, Java applet or Netscape plug-in.

It is this last item which signifies the political problem. As of today rich client interaction may require the use of some platform specific features. That is, a feature which is specific to a machine architecture, operating system or browser. Java should theoretically provide a platform neutral environment for delivering plug-ins. In practice Java currently sits somewhat awkwardly with web based applications, contending for pre-eminence with DHTML as the de-facto platform of the future; delivering inconsistent performance across machine architectures and wrestling with version, library and intellectual control issues.

For those of us who want to deliver quality solutions today some sort of plug-in is required. However, in the near future we believe that, as browser technology matures, more of the required facilities will be incorporated directly into the browser. There is already a XML based vector graphics language called VML that has the potential to deliver high performance plug-in free map rendering for free, across platforms.

Despite these complications the first generation of map based E & P data management applications built from the ground up for the web are emerging. We contend that these provide an excellent alternative to traditional integration efforts and will prove to be powerful, intuitive, cost effective and flexible long term solutions.

Future

The technical discussion up to this point provides an indication of current initiatives that will affect the way that E & P data management professionals specify and implement systems in the near future. In this section we will attempt to go further and address issues related to the way that the Internet communication and e-commerce will change the way that E & P teams conduct their business.

In the future, bandwidth will be much more readily available. Already we have seen a massive reduction in its cost over the last few years. This trend is likely to extend into the foreseeable future with the introduction of DSL technologies and a general increase in competition amongst the telecommunication providers. Perhaps we cannot expect broadband transfer rates (e.g. 10-100MB) for a few more years, but we do anticipate widely available low cost 2MB connections soon.

These advances will initially benefit the major western centres, with the remote operating regions playing catch up. We also expect E & P document and data volumes may continue to rise proportionally.

Still, this increase in capacity will allow asset teams to build new online relationships with their suppliers, the service companies. At a simple level the new online services may be simply more efficient versions of existing practices. You will be able to ship your paper documents for scanning and digitizing and get the results delivered directly to your desktop. You will be able to track the work in progress online (a la DHL) and get easy access to the QC and approval cycle.

This type of initiative in other arenas has been shown to reduce turnaround times and improve product quality for the customer. It also allows the supplier to improve their operating efficiency by reducing overheads and improve their planning through the provision of high quality management information.

Service companies may extend these systems to provide direct quotes, preferential rates and personalised facilities to customers based on an online history of activity. New opportunities will emerge to offer the hosting of client data sets and to provide sophisticated storage, access and data delivery mechanisms as part of an integrated service. Service companies may become communications hubs at the centre of a global network of connected E & P digital archives. E & P users will be able create virtual workspaces and data rooms to online, facilitating real-time collaboration between partners across physical boundaries.

Conclusions

The current generation of data management systems has suffered from a high cost of production and the lack of any economies of scale. With technology based on late ‘80s notion of large-scale system design the drive to provide ever more sophisticated solutions in an overly consolidated market has led to the balkanisation of data management tools. In real terms, the spirit of open systems has not yet bought much to the E & P data management community.

However, this same community is well positioned to take advantage of recent advances in Internet technology. These represent a convergence between communications and information systems that places our most sophisticated technology at the core of mainstream commercial activity.

Moreover, the advances are based on a genuine commitment to openness from a significant community of stakeholders: users, developers, standards bodies and support organisations. These form a powerful driver for continued development of the technology on the open market.

The availability of this high quality commodity technology changes the economics of delivering tools and services. Along with a new emphasis on communication and knowledge sharing, this will allow suppliers to offer new services and products that directly improve the ability of E & P asset teams to make timely, high quality decisions.

In the future E & P users will still get a Rolls-Royce job. However, it is likely to be driven by a BMW engine.