USING AUTOMATED DATA CLEAN UP TECHNIQUES
Source: Nov’19 Finding Petroleum Special Report “Solving E&P Problems with Digitisation”
With data being generated so quickly, organising data manually isn’t feasible any more, you need a machine to help. Waclaw Jakubowicz, Managing Director of Hampton Data Services, shared some tips.
In the past, or up to the present day, it was possible to manage or clean data manually, as with physical libraries. But now data is being generated so fast it is impossible to do it manually. So you need a machine assisted process, explained Waclaw Jakubowicz, managing director of Hampton Data Services. For example, machine learning tools can analyse documents to see which words occur most option, and try to classify it automatically. Another technique is to link data to objects, and then classifying the objects. They can see which data appears to be related to other data, from looking at references in the headers / metadata. You can get a sense of the general patterns of data about production, engineering, economics, and field development. Once you have a sense of how data is created, you can see which data is missing, and then try to find it. Machines can analyse data much more widely than people can – people typically just clean up the data they need to work with, Mr Jakubowicz said.
A challenge with any data clean-up project is that new data is being created all the time, which needs to be stored so the system understands which wells, assets or subject matter it relates to. Managing new data also requires active data management work. “You cannot rely on users to nicely file a file. They’ll make 20 different versions,” he said. Managing PowerPoint files as also part of today’s data management work, since they are typically made at the end of a project to summarise everything, with investments made as a result of them.
Case study - Mid-Size E&P
Hampton Data had a data clean-up project with a client that had bought a controlling interest in another entity. It came with a great deal of legacy data. The database was multilingual, including with material from Beijing and research institutes in Kazakhstan, all poured together. Some data was in Chinese, some in Cyrillic character set. The main dominant language and character set was English. The data had many co-ordinate problems, and poor notation about what comes after what. A number of different data management companies had tried to improve it. A first step for Hampton was to move the data to its own server in the UK. A separate copy of the data was kept in Kazakhstan, synchronised with the data store in London. This means there is a complete backup in both locations. This covered both new data and legacy data. Then it started a number of processes to rationalise and clean up the data. An initial problem was understanding well and place names. Some wells were given multiple names (or aliases), or their names are spelt in different ways in Cyrillic. There can be files named in English, Russian and Chinese in the same folder. “You have to be multilingual to get your head around that, he said.
Hampton Data has developed its own translation tools through its work in different countries over the years, so it can auto translate file names from Russian and Chinese into Latin characters. The headers can also be auto translated – but with the formatting maintained. Often the file name will itself indicate what the file is about, for example “core data from xyz well”, or “PVT analysis”. This means that English speaking engineers trawling through the data find it laid out for them nicely. Hampton Data works with a company called XTM, which specialises in managing technical documentation, and also works with many large automotive companies. It gathers libraries and vocabularies specific to the industry, something Google Translate does not do. Documents can also be translated for other users, not necessarily into English.
Case Study 2
Another client, a start-up company, which acquired a large gas field offshore, formerly operated by Shell UK. The data was very organised, as you might expect from Shell. But the volumes were very large. It would have taken a few months to do a data audit manually. Hampton was able to do it in a week with automated tools. The client runs with a very low number of employees, and is outsourcing as much work as possible to outside consultants. It uses Microsoft Azure for its IT infrastructure, and would like to have all of its data and applications on there. One disadvantage of Azure is that “every time you look at data, move it about, you get an invoice hitting you,” he said. “It is an unpredictable beast, no-one knows what it will cost them at the end of the day.” The company has moved data to the cloud in the same format as it was when they acquired the asset, they are not re-arranging any folders.
Hampton provides a virtual “data custodian” system which runs semi autonomously, keeping the data organised. It would be helpful if the applications and data could be stored on the same cloud infrastructure. But big subsurface software providers typically only want their software to run on their own cloud, which makes it tricky. “If you want to bring your own bit of software like Hampton Russell or something else, it is not exactly encouraged,” he said. There can be some flexibility, but it generally ends up that the larger the oil company, the more leverage they have to dictate which cloud will be used. There are many smaller software companies who would like to run tools together with other software, including subsurface time depth conversion software, various simulators, petrophysics applications. But they can’t, if they don’t have access to the same cloud that the bigger software is running on, he said. For example, one start-up company called Antaeus Technologies is looking at applications for wells, such as log interpretation and geomechanics. They have developed applications to work on the cloud.
If you would like to find out more on how HDS can help you with your data driven organisations, please contact us on firstname.lastname@example.org
Contact us to find out more
T: +44 (0) 208 335 4300