On TechRepublic: 10 dying IT skills
BNET Business Network:
BNET
TechRepublic
ZDNet

Talkback

Add your opinion
advertisement

From our video sponsors

advertisement
Data streaming

How to manage and prioritize the huge influx of real-time data coming at you.

What's the most important issue when dealing with data? In the old days, like 10 years ago, it used to be how do you organize your data into tables to make it easy to write that to a database, pull the information out and to be able to access it quickly. Now, the issue is how do you deal with data in real time as data streams into you from all sorts of transactions.

To show what I mean, let's look at what we used to be. We used to have a struggle between the flat file formats and relational database formats. And in a flat file database, every record looks exactly the same. You put every conceivable field that you need, first name, you know, address information, last name, the date of a transaction, you know, the states or the principality, zip codes, what thing they bought, the amount they paid for it, if there were taxes, and it ended up with really long files even if each record only used a fraction of those fields. So it was an inefficient way to store the data and it was also, it made it harder to get the information out.

So when dealing with how do you write and retrieve information, a relational database is superior. As you can see, independent tables are joined by a common ID field. So you can have an address database here, you could have a purchase database here, you could have a customer service database and in each one of these things, you're only storing the information you need for that particular need, but you can always join them together and get a good look at all the transactions and all the information around that particular customer.

So we think that we've come a long way in solving the problem of how to write and retrieve information about transactions that have already occurred. But with data streaming, the issue is really different. It's, I've got this huge amount of data coming at me like a hammer head shark and it's coming at me and I've got to decide what do I do with this? What do I have to deal with right now? You know, unlike in the good old days, I can't just write it to a database and deal with it the next day. There are things I've got to do. I've got to make calculations. I've got to queue my inventory routines to see if I'm running out of inventory in a particular item. I've got to be concerned about fraud and make sure that my fraud controls are looking at these transactions in real time. So retailers, financial institutions, banks are all dealing with huge amounts of information coming in. What do they have to deal with in real time and what can they afford to deal with later? So the new issue with data. The new problem is this huge influx of stuff that's coming into you and how do you make a determination as to what's important, what's not important. How do you deal with the important stuff in real time and how do you do that without losing the information or losing the opportunity to make the calculation and make the sale? So streaming, that's the new issue in data these days.