CLEAN-UP AND VALIDATION OF CORPORATE PHYSICAL DATA ARCHIVE INDEX - US SUPERMAJOR
HDS was commissioned to clean up the corporate physical data archive inventory database of content stored in the Iron Mountain data warehouse.
The raw inventory Oracle backend database supplied had many millions of records defining over 1.5 million E&P technical documents and media items from Africa, Europe, Mid East and FSU, stored in several warehouses. It had been populated over the previous 10+ years by many different persons and outside contractors and was merged several times with previous legacy databases and tables. This resulted in a database that lacked consistency in well, field, survey, country, region and area naming conventions, as well as data classification and key relationships between entities were often missing. The metadata records were often incomplete and misleading.
Because of this, only 20% of the data items were directly associated with their field, well, survey, block or data type via the application search interface
APPROACH AND DELIVERABLES
HDS applied data mining and text analytics to all the data records to ascertain the Data Class and Type of the objects in question as well as the correct association to the correct spatial entity (country, block/concession, field, well).
The end result after the data clean up was that the records were relinked to corporate standardised Well/Survey/Country/Region/Basin/Area/Block/Quad naming conventions and classified against a corporate Data Type/Class/sub class naming convention and taxonomy.
The method was then applied to 3 similar corporate archive databases in the USA, which contained a further 10 million plus items.
Entity relationships were added to the database to allow the front-end application to return results for 80% of the items, a great improvement on the original 20%.