Transforming 1.5 Million Unstructured Archive Records into a Usable Corporate Asset
A major global oil company's physical archive database held over 1.5 million E&P documents — but only 20% were actually discoverable through the application's search interface. HDS used data mining and intelligent text analytics to reclassify and relink the entire estate, raising data association to 80%. The method was then scaled to a further 10 million records.
The Challenge
The client's Oracle-backed archive inventory database — covering physical media and documents from Africa, Europe, the Middle East and the FSU stored across multiple Iron Mountain warehouses — had been populated over more than 10 years by numerous different personnel and contractors, and had been merged with multiple legacy databases. The result was a database with inconsistent naming conventions, incomplete metadata, missing entity relationships and poor data classification. Only 20% of over 1.5 million items could be directly located through the application's search interface.
The Solution
HDS applied data mining and text analytics across all records using GeoSCOPE — extracting meaning from the inconsistent text content to determine the data class, type and correct spatial entity (country, block/concession, field, well) for each record. Automated business rules interpreted technical terminology — for example, "9 Track, 9T, ½ Inch, 6250 BPI" as identifying a 9-track ½" media tape, or "UKCS, Beryl, MNSL, Mobil" as identifying a North Sea Beryl Field record — and mapped items to standardised corporate naming conventions.
All records were reclassified against a corporate Data Type/Class/subclass taxonomy, entity relationships were restored, and the database was delivered with 80% of items now directly associated with their correct field, well, survey, block or data type.
The Outcome
Data association improved from 20% to 80% — a fourfold improvement in dataset utility — without reprocessing or moving any physical items. The methodology proved so effective that the client engaged HDS to apply the same approach to three further corporate archive databases in the USA containing over 10 million additional items.