Bringing Dark Data to LightTue, 11/01/2016 - 10:17
The term “dark data” sounds ominous but companies like Open Intelligence (OPI) want to use it to help public and private organizations make better decisions. “Many institutions, private and public entities are not using data for decision-making,” says Alejandro Maza, CEO of OPI. “The opportunity that we saw is that we have really rich, but difficult-to-organize data and with all the new technology available, we could bring these batches of dark data together to obtain one clear, user-friendly database.” Dark data refers to the information companies collect and store but do not use for other purposes. OPI can restructure the data without having to spend months or millions of dollars to create infrastructure. The company can tap into different resources to retrieve information that has been in the public domain for years but for the first time somebody is organizing it and bringing it to the surface, Maza says.
The ability to use data in this way brings up questions of cybersecurity and hacking but Maza is confident that although there are inherent risks, OPI can offer sufficient protection for sensitive client data. The company takes existing data sets, cross-references them and analyzes the results with data sets from other agencies. “The main risk that agencies run when publishing these types of data sets is that users can gain access to confidential pieces of information by bringing together different sources. For instance, an agency may publish addresses but not names of people living on a block, while another agency may publish the names but not the phone numbers of the same people living on that block. This makes it possible for a third party to gather the pieces of the puzzle and put together a complete data set. An agency would not publish the complete set of data but because they do not know what other agencies are publishing, the risk of sharing confidential information increases,” he says.
Maza believes the creation of strict regulations or laws will not prevent this from taking place. The only thing that can be done is to keep a close watch on the different types of data being presented to the public. For that reason, it is important to be aware of the risks and have contingency plans for when there is a breach. OPI does not analyze personal information but instead works with financial records, administrative information and other large data sets. “In many cases, we are the first to gain access to this type of information,” Maza says. “When we come across information that should not be published, we automatically discard it and inform the agency of the risks of publishing it. Ethics play a considerable role in open data and the prevention of access to private information.”
With the government pushing for transparency across agencies, he believes care must be taken so the information published does not fall into the wrong hands. The only way to mitigate the risk is by being diligent, reactive and agile with what is formulated, he says. Maza says that completing a process within the public sector can often be more arduous than in the private sector. To prevent corruption and the misdirection of public funds, closing a contract with the government usually is a long process. It takes around eight months to receive the paperwork, making it difficult for small companies to work on government projects. “Small companies cannot afford to keep their team on standby for such long periods of time while waiting for a response from the government,” he says. “Another challenge is that there are many government people who do not want things to change. Bringing transparency to the table and enabling evidencebased decision-making could cause some people within the institutions to lose power,” he says.
The private sector has its own peculiarities. “We are working with strategic consultancy firms and several retail chains,” Maza says. “While working with the private sector, we have realized that although these companies are data savvy, most of the time they focus solely on self-generated data.” Putting all this information in one place can make it a key decision-making tool, say Maza. “We can assist companies in analyzing the most strategic locations for expansion by considering factors that cannot be achieved by surveys alone. We can tell companies how many people spend their day in that block by analyzing how much trash is generated, the amount of space in the buildings, restaurant capacity and even what forms of transportation are nearby.”