Tuesday, May 14, 2019

SIPping and TIPping for faster search over sorted arrays

Can you do better than binary search?

TL;DR: Yes you can with our new algorithms SIP and TIPand, not everything at SIGMOD/PVLDB has to be called ML-in-Systems!

That was the question that two of my students Peter Van Sandt (a recent graduate from our BS program and the winner of David DeWitt Undergraduate Scholarship) and Yannis Chronis worked on over the last few years. It is a central question as searching over sorted in-memory arrays is a fundamental operation used in nearly every modern data platform. And, most of these platforms use some form of Binary Search (BS). 

Interestingly, Interpolation Search (IS) has a better average-case complexity of O(log log N), but this result assumes that the data is uniformly distributed. Past attempts to use IS in any of less-than-ideal situation has generally not worked, leaving BS as the undisputed search method over sorted arrays.  

We took a deeper look at this search problem in a paper that Peter and Yannis will co-present at the upcoming SIGMOD conference.  We started by noting that a key technological trend is that memory accesses will continue to be relatively more expensive compared to CPU cycles. While IS incurs a far more expensive computation in the inner search loop (to calculate the interpolant), IS should incur lower number of memory lookups by O(log log n) vs O(log n) when compared to BS. Past attempts to use IS have in general not been able to beat a well-tuned BS implementation. 

We dug in and started to look at why IS was slower. For one if you do simple linear interpolation, you could land far away your “answer”. In the figure below, you can see how with Linear Interpolation using the left and right points, a straight-line interpolation ends up at point 1, which is element 2 in the sorted array. This is far away from the actual answer (in this case at y = 341); the skewed distribution throws off the linear interpolation calculation. We need an adaptive method to deal with skew.
Left Figure: A collection of values; Right Figure: Linear interpolation when searching for y=341 and x=6.
But, we don’t have to interpolate using just two points. If we interpolate using 3 points, we can do far better in our “guess.” However, we need to compute this 3-point interpolation efficiently. We found a beautiful result proposed by Jarratt and Nudds in 1965 for the problem of 3-point iterative root finding using linear fractions. To make it work in our case, we had to combine it with another technique called bracketing (more details are in the paper). Collectively, we end up with an efficient interpolation calculation as:

We can make the computation even more efficient in the first iteration/initialization by building on a result from Ridders’79, to get: 

There a few more mechanisms that we bring to the table, including reusing the calculation of the slope in the interpolation from one loop to the next (it doesn’t change that quickly), and using guards (switching to sequential search when the search window “narrows”). 

Another key mechanism is to use fixed-point arithmetic instead of floating-point arithmetic. Similar to work in Machine Learning (ML) acceleration methods (such as work on TPU at Google and BuckWild! at Stanford), here too we can accommodate some loss in precision in the calculations. In our case, we exploit the observation that multiplication followed by division by 264 can be fused together and done more quickly than multiplication by an arbitrary fraction. (See the paper for more details).  

Combining all this, we propose two algorithms Slope-reuse Interpolation (SIP) and Three-point Interpolation (TIP), that uses the recipe:

We compared these methods with an optimized version of BS using a number of datasets. The key result is highlighted below with one uniform dataset (Facebook user IDs), and a skewed dataset. The figure shows the relative performance of SIP, TIP and vanilla IS compared to BS (the horizontal line) to perform key lookups on a sorted array.
As can be seen above, SIP is about 2X faster than BS on the Facebook dataset. TIP is ~3X faster than BS on the skewed dataset; the other dataset are actually far slower than BS for this skewed case. There are additional results in the paper, including showing the end-to-end benefits of this approach in Numpy and LevelDB.

Three final notes: 
1. The approach we take here, namely of fast and efficient interpolation is a technique that many in the community are starting to employ, under the broader umbrella of applying ML methods to systems. Some examples of this approach are Andy Pavlo's work on OtterTune and Tim Kraska's work on SageDB. Take for example, the OtterTune work, Andy applies ML to tune systems and it is really hard to do this with traditional tuning methods. Within this context, there are interesting parallels between using optimized methods for interpolation (as what we use here), and using ML to predict the "right" point in a distribution. Exploring this connection is an interesting direction for future work. 

2. We had an interesting experience with this paper at SIGMOD. While most reviewers liked this paper in the first round, one reviewer was super-negative (among the harshest reviews I have ever seen). As far as we can tell, the key criticism was: simplicity in solving a problem is "not novel." This kind of thinking is increasingly starting to creep in our community, where everything needs to be called ML-in-Systems to "sell." Dangerously, funding proposals are subject to this trend too. While there are compelling reasons to bring ML into systems (see some examples above), I think it is dangerous to call everything as Systems-in-ML. We could have called the adaptive interpolation above a "regression" and hence magically make it an ML algorithm, but do we need to? There is in my view a more elegant and simpler way to present and solve this problem by giving credit to the amazing results from decades-old optimization/approximation theory. Why cloud it with marketing terms? Again, I want to emphasize that I think there are many valid application of ML-in-Systems, but there are also cases where you don't need to call everything that we do as ML. We should definitely not reject papers for not having "marketed right". Why not present things in the simplest possible way, like academicians often (used to?) do? 

3. In a normal year, one super-negative review in the initial round would have killed the paper. However, thanks to the excellent reviewing work at SIGMOD this year, we got a shot at rebutting the reviewer. We want to give a huge shout-out to the SIGMOD PC and the key leaders Anastasia Ailamaki (PC Chair), Amol Deshpande and Tim Kraska (Vice-Chairs) for a thorough review process. Thanks for accepting a paper even though it wasn't sold as ML-in-Systems!

Tuesday, February 5, 2019

Graph Analytics: Simpler than we think?

Graph analytics are all the rage today with companies rushing to build new software platforms that target just graph analytics all the time. But, are they really that different than many types of iterative data processing?

Here we argue that for iterative graph processing, it is not. In fact, one can map BSP-style computation to the good-old relational algebra, with a few extensions for user-defined functions and the ability to express a loop. How can this be done?

Consider the common BSP style vertex-centric computation.  The canonical graph analytics computation loop looks like this:
 

We can map this generic BSP computation to relational algebra using the following recipe. First, store the graph data in two relational tables, a nodes table (with one tuple for each node), and an edges table (with one edge tuple for each edge in the graph). Then, use the table below to map the BSP computation to relational algebra.


Essentially in the relational system, one executes a sequence of joins, aggregates and user-defined functions to carry out the graph computation. For shortest-path computation, the last line above (send messages) takes the current shortest path for each node and “sends” it to the next iteration by computing a join based on the edges table. These new values are “received” (the “received messages” row) in the next iteration, and the minimum of the current shortest path and the new path lengths is computed. These new values are now stored (“Mutate Values”) in a table that has one tuple for each node.

A more detailed description is in the Fan et al. 2015 CIDR paper titled: “The Case Against Specialized Graph Analytics Engines”. But, I hope you get the general idea: Graph analytics can be mapped to relational algebra. Now with this mapping, you can leverage relational platforms that have matured over time and have high-performance scalable data processing algorithms. Basically, you can scale your graph algorithms by simply using relational platforms. The CIDR 2015 paper has performance results comparing this approach with a few other graph system (circa 2015).

We have recently picked this work and are now implementing Grail on top of the Quickstep system. (Quickstep is a high-performance relational database system built at Wisconsin.) While this work is still on-going, here are a few quick performance results for a network security application in which we find advanced persistent threats using application and system logs. On a dataset with about 3K nodes and 15K edges Quickstep with Grail (QuickGrail) was about 4X faster than Neo4J; on a dataset with 75K nodes and 153K edges QuickGrail was about 500X faster; and on larger datasets Neo4J ran out of memory while QuickGrail was able to carry out its computation (on this single node dual socket Xeon machine). These results are preliminary and over the Spring we plan to write out a full paper. So, look for it, if you are interested in this space!

Sunday, April 29, 2018

Database systems research: dead or alive? The answer may be backpropagation!

Much has been said recently about the future of database systems research. Mike Stonebraker has loudly complained about how core systems papers are disappearing at an alarming rate from SIGMOD and VLDB. He called it Fear #1 in a recent talk. Another excellent and a more optimistic view of the database research was published by Mike Caferalla and Chris RĂ©.

So what is going on: Is there or is there not an issue with database systems research? While in my mind there is no doubt that there is a rich future for database systems research, it is definitely the case that publishing systems papers has gotten much harder. There is undoubtedly an urgent issue to solve with reviewing papers.

One key root cause is that there is a lot of bad reviewing in our conferences. And, there is no long-term mechanism to deal with bad reviewing. With the gigantic PCs that we now have and a bandwagon approach to following-the-latest-trend (Mike makes a point about this too in his talk), the way to game the system is to write a lot of papers. By definition, systems folks are at a disadvantage in this game as it takes much longer to write most systems papers. For example, the Quickstep paper (VLDB'18) took 5+ years with 7+ students to write.

So, a key issue that we need to deal with is measuring and reporting the quality of reviewers. Here I have a simple suggestion to PC chairs and the powers-to-be at the SIGMOD and PVLDB executive committees:
  1. Reduce the size of the PC so that each PC member has a significant number of papers to review. Current SIGMOD and VLDB committees are way too large. Reducing them to half is a good start (fewer random variables). For PVLDB the monthly cycle adds even more randomness. Reduce that too.
  2. Incorporate transparency in the review process so that the distribution of the overall recommendations made by the PC member is made public. There are reviewers who consistently hate everything they read, and reviewers who love nearly everything that they read. This really messes up the system, and I think is a big factor in why our system is broken. Note reducing the PC size (suggestion #1) yields more data per PC member, which you need to make sound comparisons. Also randomly assign papers to reviewers based on the interests that reviewers express to reduce the chances of someone saying "I had a bad batch of reviews."
In other words, a key problem that we need to solve is having a fairer way of accepting or rejecting papers. A key component to achieving this fairness is exposing overly optimistic or overly pessimistic reviewers. Lets do just that! Let's create a website (add this to DBLP?) with a scorecard for each reviewer that shows the distribution of their overall accept/reject scores. This information can be used by PC chairs to decide how to pick a good PC. Most importantly, it may lead to reviewers self-correcting themselves so that they are more balanced in their evaluation.

This backpropagation, if you will, can help correct those reviewers who haven't calibrated themselves (but wish to), and starkly expose those who likely shouldn't be on PCs.

Of course there are lots other good ideas floating around, including having more training for junior members of the community (I hope all advisors train their students in this), in-person PC meetings at least for the area chairs, and so on. I see all those as complementary to the proposal above.

What do you think?

Saturday, February 11, 2017

Introducing Ava

Recently we published a paper @ CIDR on a new system (called Ava) that we are building here @ UW. SIGMOD invited me to write a blog, and that blog has just been published here. In a nutshell, we claim that many data science pipelines will be constructed by chatbots in the near future, and going further in the longer run conversational bots will allow end users to use natural language (both for Q&A) to directly get insights from their data.

Monday, June 8, 2015

Quickstep is now part of Pivotal

In January, my students and I spun out a company to help commercialize the technology produced in the Quickstep project. I'm excited to announce that we have now been acquired by Pivotal. More details are here and here.

Monday, August 25, 2014

WideTable: An Accelerator for Analytic Data Processing

Last year, I told you about BitWeaving, which is a compressed columnar scan that exploits intra-cycle parallelism. With BitWeaving, you can scan a table ... pretty darn fast!

As part of the Quickstep project, Yinan and I have been wondering if we could expand the "sweet spot'' of the bare metal performance of BitWeaving to a wider variety of analytical query processing. The answer that we have come up with is called WideTable.

The key idea behind WideTable is to denormalize the data into a wide table, and then run complex analytic SQL queries (e.g. with joins and nested subqueries) on the WideTable using BitWeaved scans. There are a number of different ways to use a WideTable, including as a materialized view and as an accelerator. Below, we show the how WideTable can be used as an accelerator.


The WideTable component plugs in as a front-end to a traditional SQL engine, which could be a traditional SQL RDBMS system (like Oracle) or a Hadoop/SQL engine (like Hive). It takes data and schema and runs it through a denormalizer module that transforms the schema using outerjoins, converting the data into a WideTable. SQL queries are intercepted and run through a series of rules to determine if and how the query can be translated into scans on the WideTable. Not all queries can be translated to WideTable scans, and those that can't be transformed are sent to the original source system for evaluation in the "usual" ways. For more details, see the paper.

Two questions that naturally follow are: 1) What is the space overhead associated with WideTables, and 2) What class of queries can be translated into (fast BitWeaved) scans on WideTables.

As a yardstick, we use the TPC-H benchmark to answer these questions. For the scale factor 10 TPC-H dataset (approximately 10GB of raw data), the WideTables takes only 8.5GB of space (we end up creating more than one WideTable in this case). The BitWeaved columnar storage and its dictionary compression method helps control the space overhead that is normally associated with denomalization.

How many of the TPC-H queries can we answer using the WideTable method? All of them except for Query 21. (Query 21 has a non primary-key foreign-key join that we can't handle, and have to send that query to the source system.) Note we can also handle the update queries, though as you might guess updates are slower when using a WideTable. More details on update query performance can be found in the paper.

What is the performance benefit of WideTable? Here are numbers comparing WideTable and MonetDB on the 10GB TPC-H benchmark.


WideTable is over 10X better than MonetDB in many cases! See the paper for many more experimental results. We also compared with VectorWise and the results have the same relative trend as MonetDB.

Some parting notes. First, WideTable can be viewed as a materialized view. But, it is a unique type of materialized view that is schema-centric vs. being workload-centric. Second, it is also easier to optimize and predict the query response of the WideTable queries as the query plan is largely made of simple scan kernels. So, for example, p99 response time management is easier. Third, the WideTable technique can be used to dramatically reduce the size of the source system (e.g. use fewer nodes) if the workload is amenable to the WideTable technique. Finally, WideTable's effectiveness is dependent on the schema graph. Star schema is fairly easy to deal with, but there are ways to deal with snowflake schemas too by selectively denormalizing around each fact table, and then using fast join methods (like those described here and here). More on that at a later date as we keep working on improving the WideTable technique.

Friday, August 2, 2013

From Locomatix to Twitter

Way back five years ago, just as the iPhone came out, Karthik Ramasamy (@karthikz) and I (@pateljm) started talking about building a combined real-time streaming and real-time analytical platform to power enterprise and consumer mobile services. We started a company called Locomatix to build such a platform. Sanjeev Kulkarni (@sanjeevrk), a founding member of the Google AdSense team, joined Locomatix soon after it was formed.  Next came Chris Kellogg (@cckellogg), who joined in his last semester at the University of Michigan.

Today, I’m very happy to announce that the Locomatix team is joining Twitter!

Special thanks to our families and friends for supporting us throughout this incredible journey. We simply wouldn't have made it without you!

Sanjeev, Karthik and I all got our graduate degrees from the University of Wisconsin, so a big part of Locomatix was powered by UW grads. Go Badgers! We are working on converting Chris into a Badger fan -- its still work in (very slow) progress.