SIPping and TIPping for faster search over sorted arrays

Can you do better than binary search? TL;DR: Yes you can with our new algorithms SIP and TIP ;  and, not everything at SIGMOD/PVLDB has to be called ML-in-Systems! That was the question that two of my students Peter Van Sandt (a recent graduate from our BS program and the winner of David DeWitt Undergraduate Scholarship) and Yannis Chronis worked on over the last few years. It is a central question as searching over sorted in-memory arrays is a fundamental operation used in nearly every modern data platform. And, most of these platforms use some form of Binary Search ( BS ).  Interestingly, Interpolation Search ( IS ) has a better average-case complexity of O(log log N), but this result assumes that the data is uniformly distributed. Past attempts to use IS in any of less-than-ideal situation has generally not worked, leaving BS as the undisputed search method over sorted arrays.   We took a deeper look at this search problem in a paper that Peter and Yannis will co

Graph Analytics: Simpler than we think?

Graph analytics are all the rage today with companies rushing to build new software platforms that target just graph analytics all the time. But, are they really that different than many types of iterative data processing? Here we argue that for iterative graph processing, it is not. In fact, one can map BSP-style computation to the good-old relational algebra, with a few extensions for user-defined functions and the ability to express a loop. How can this be done? Consider the common BSP style vertex-centric computation.  The canonical graph analytics computation loop looks like this:   We can map this generic BSP computation to relational algebra using the following recipe. First, store the graph data in two relational tables, a nodes table (with one tuple for each node), and an edges table (with one edge tuple for each edge in the graph). Then, use the table below to map the BSP computation to relational algebra. Essentially in the relational system, one e

Database systems research: dead or alive? The answer may be backpropagation!

Much has been said recently about the future of database systems research. Mike Stonebraker has loudly complained about how core systems papers are disappearing at an alarming rate from SIGMOD and VLDB. He called it  Fear #1  in a recent talk. Another excellent and a more optimistic view of the database research was published by Mike Caferalla and Chris RĂ© . So what is going on: Is there or is there not an issue with database systems research? While in my mind there is no doubt that there is a rich future for database systems research, it is definitely the case that publishing systems papers has gotten much harder. There is undoubtedly an urgent issue to solve with reviewing papers. One key root cause is that there is a lot of bad reviewing in our conferences. And, there is no long-term mechanism to deal with bad reviewing. With the gigantic PCs that we now have and a bandwagon approach to following-the-latest-trend (Mike makes a point about this too in his talk), the way to game

Introducing Ava

Recently we published a paper @ CIDR on a new system (called Ava ) that we are building here @ UW. SIGMOD invited me to write a blog, and that blog has just been published here . In a nutshell, we claim that many data science pipelines will be constructed by chatbots in the near future, and going further in the longer run conversational bots will allow end users to use natural language (both for Q&A) to directly get insights from their data.

Quickstep is now part of Pivotal

In January, my students and I spun out a company to help commercialize the technology produced in the Quickstep project . I'm excited to announce that we have now been acquired by Pivotal. More details are here  and here .

WideTable: An Accelerator for Analytic Data Processing

Last year, I told you about BitWeaving , which is a compressed columnar scan that exploits intra-cycle parallelism. With BitWeaving, you can scan a table ... pretty darn fast! As part of the Quickstep project, Yinan and I have been wondering if we could expand the "sweet spot'' of the bare metal performance of BitWeaving to a wider variety of analytical query processing. The answer that we have come up with is called WideTable . The key idea behind WideTable is to denormalize the data into a wide table , and then run complex analytic SQL queries (e.g. with joins and nested subqueries) on the WideTable using BitWeaved scans. There are a number of different ways to use a WideTable, including as a materialized view and as an accelerator. Below, we show the how WideTable can be used as an accelerator. The WideTable component plugs in as a front-end to a traditional SQL engine, which could be a traditional SQL RDBMS system (like Oracle) or a Hadoop/SQL engine (li

From Locomatix to Twitter

Way back five years ago, just as the iPhone came out, Karthik Ramasamy ( @karthikz ) and I ( @pateljm ) started talking about building a combined real-time streaming and real-time analytical platform to power enterprise and consumer mobile services. We started a company called Locomatix to build such a platform. Sanjeev Kulkarni ( @sanjeevrk ), a founding member of the Google AdSense team, joined Locomatix soon after it was formed.  Next came Chris Kellogg ( @cckellogg ), who joined in his last semester at the University of Michigan. Today, I’m very happy to announce that the Locomatix team is joining Twitter ! Special thanks to our families and friends for supporting us throughout this incredible journey. We simply wouldn't have made it without you! Sanjeev, Karthik and I all got our graduate degrees from the University of Wisconsin, so a big part of Locomatix was powered by UW grads. Go Badgers! We are working on converting Chris into a Badger fan -- its still work in (ve