Transforming 1.5 Million Unstructured Archive Records into a Usable Corporate Asset

A major global oil company's physical archive database held over 1.5 million E&P documents — but only 20% were actually discoverable through the application's search interface. HDS used data mining and intelligent text analytics to reclassify and relink the entire estate, raising data association to 80%. The method was then scaled to a further 10 million records.

The Challenge

The client's Oracle-backed archive inventory database — covering physical media and documents from Africa, Europe, the Middle East and the FSU stored across multiple Iron Mountain warehouses — had been populated over more than 10 years by numerous different personnel and contractors, and had been merged with multiple legacy databases. The result was a database with inconsistent naming conventions, incomplete metadata, missing entity relationships and poor data classification. Only 20% of over 1.5 million items could be directly located through the application's search interface.

The Solution

HDS applied data mining and text analytics across all records using GeoSCOPE — extracting meaning from the inconsistent text content to determine the data class, type and correct spatial entity (country, block/concession, field, well) for each record. Automated business rules interpreted technical terminology — for example, "9 Track, 9T, ½ Inch, 6250 BPI" as identifying a 9-track ½" media tape, or "UKCS, Beryl, MNSL, Mobil" as identifying a North Sea Beryl Field record — and mapped items to standardised corporate naming conventions.

All records were reclassified against a corporate Data Type/Class/subclass taxonomy, entity relationships were restored, and the database was delivered with 80% of items now directly associated with their correct field, well, survey, block or data type.

The Outcome

Data association improved from 20% to 80% — a fourfold improvement in dataset utility — without reprocessing or moving any physical items. The methodology proved so effective that the client engaged HDS to apply the same approach to three further corporate archive databases in the USA containing over 10 million additional items.

Previous
Previous

From 880,000 Unstructured Legacy Files to Petrophysics-Ready Well Data

Next
Next

From 50,000 Unstructured Files to a Market-Ready MegaSurvey Well Data Package