Big and Fast Data Blog

Posts

Graph Analytics: Simpler than we think?

February 05, 2019

Graph analytics are all the rage today with companies rushing to build new software platforms that target just graph analytics all the time. But, are they really that different than many types of iterative data processing? Here we argue that for iterative graph processing, it is not. In fact, one can map BSP-style computation to the good-old relational algebra, with a few extensions for user-defined functions and the ability to express a loop. How can this be done? Consider the common BSP style vertex-centric computation. The canonical graph analytics computation loop looks like this: We can map this generic BSP computation to relational algebra using the following recipe. First, store the graph data in two relational tables, a nodes table (with one tuple for each node), and an edges table (with one edge tuple for each edge in the graph). Then, use the table below to map the BSP computation to relational algebra. Essentially in the relational syst...

Database systems research: dead or alive? The answer may be backpropagation!

April 29, 2018

Much has been said recently about the future of database systems research. Mike Stonebraker has loudly complained about how core systems papers are disappearing at an alarming rate from SIGMOD and VLDB. He called it Fear #1 in a recent talk. Another excellent and a more optimistic view of the database research was published by Mike Caferalla and Chris Ré . So what is going on: Is there or is there not an issue with database systems research? While in my mind there is no doubt that there is a rich future for database systems research, it is definitely the case that publishing systems papers has gotten much harder. There is undoubtedly an urgent issue to solve with reviewing papers. One key root cause is that there is a lot of bad reviewing in our conferences. And, there is no long-term mechanism to deal with bad reviewing. With the gigantic PCs that we now have and a bandwagon approach to following-the-latest-trend (Mike makes a point about this too in his talk), the way...

Introducing Ava

February 11, 2017

Recently we published a paper @ CIDR on a new system (called Ava ) that we are building here @ UW. SIGMOD invited me to write a blog, and that blog has just been published here . In a nutshell, we claim that many data science pipelines will be constructed by chatbots in the near future, and going further in the longer run conversational bots will allow end users to use natural language (both for Q&A) to directly get insights from their data.

Quickstep is now part of Pivotal

June 08, 2015

In January, my students and I spun out a company to help commercialize the technology produced in the Quickstep project . I'm excited to announce that we have now been acquired by Pivotal. More details are here and here .

WideTable: An Accelerator for Analytic Data Processing

August 25, 2014

Last year, I told you about BitWeaving , which is a compressed columnar scan that exploits intra-cycle parallelism. With BitWeaving, you can scan a table ... pretty darn fast! As part of the Quickstep project, Yinan and I have been wondering if we could expand the "sweet spot'' of the bare metal performance of BitWeaving to a wider variety of analytical query processing. The answer that we have come up with is called WideTable . The key idea behind WideTable is to denormalize the data into a wide table , and then run complex analytic SQL queries (e.g. with joins and nested subqueries) on the WideTable using BitWeaved scans. There are a number of different ways to use a WideTable, including as a materialized view and as an accelerator. Below, we show the how WideTable can be used as an accelerator. The WideTable component plugs in as a front-end to a traditional SQL engine, which could be a traditional SQL RDBMS system (like Oracle) or a Hadoop/SQL engine (li...

From Locomatix to Twitter

August 02, 2013

Way back five years ago, just as the iPhone came out, Karthik Ramasamy ( @karthikz ) and I ( @pateljm ) started talking about building a combined real-time streaming and real-time analytical platform to power enterprise and consumer mobile services. We started a company called Locomatix to build such a platform. Sanjeev Kulkarni ( @sanjeevrk ), a founding member of the Google AdSense team, joined Locomatix soon after it was formed. Next came Chris Kellogg ( @cckellogg ), who joined in his last semester at the University of Michigan. Today, I’m very happy to announce that the Locomatix team is joining Twitter ! Special thanks to our families and friends for supporting us throughout this incredible journey. We simply wouldn't have made it without you! Sanjeev, Karthik and I all got our graduate degrees from the University of Wisconsin, so a big part of Locomatix was powered by UW grads. Go Badgers! We are working on converting Chris into a Badger fan -- its still work in (ve...