More

willbmoss · on Feb 12, 2013

Turns out escaping json suck, so I decided to lean on Python. v2: https://gist.github.com/wmoss/4774406

willbmoss · on May 28, 2012

At high concurrency, I'd argue you will probably end up being more cpu efficient as well. The cost of context switching effectively larger frames and getting into and out of privileged mode can get expensive.

CookWithMe · on May 28, 2012

I don't know what exactly you are referring to with high concurrency, but if you mean "high number of tasks to work on in parallel" then a ThreadPool will eliminate the problems you are describing.

DanWaterworth · on May 28, 2012

It probably depends, but the cost of creating a heap allocated closure and then running it and deallocating it can be quite high against the cost of 2 context switches.

willbmoss · on May 15, 2012

We had two replica sets of three nodes each. After the first migration, we took it down to one replica set.

coops · on May 15, 2012

did the standard replica set election process not work for you? it is very rare for us to see a failed failover.

willbmoss · on May 14, 2012

I'll agree they are not operational nightmares, but now that we're set up with Riak we can do things that I'm pretty sure your Postgres/MySQL/etc. setup cannot.

1. Add individual nodes to the cluster and have it automatically rebalance over the new nodes.

2. Have it function without intervention in the face of node failure and network partitions.

3. Query any node in the cluster for any data point (even if it doesn't have that data locally).

I'm sure there's other things I'm missing, but the point made by timdoug is the key one. We're at a scale now where it's worth trading up-front application code for reliability and stability down the line.

dolinsky · on May 15, 2012

Overall I enjoyed reading the article and the specific tradeoffs you had to consider when comparing Riak to Mongo for your specific use cases. I'm curious if you had any problems w/r/t the 3 items you highlighted above when using Mongo, as none of those were mentioned in the linked article.

(these points are for MongoDB 2.0+)

1. Adding a new shard in Mongo will cause the data to be automatically rebalanced in the background. No application-level intervention required.

2. Node failure (primary in a RS) is handled without intervention by the rest of the nodes in the cluster. Network partitions can be handled depending on what the value set for 'w' is (in practice it's not an isolated problem with a specific solution).

3. Using mongos abstracts the need to know what node any data is on as each mongos keeps an updated hash of the shard keys. Queries will be sent directly to the node with the requested data.

willbmoss · on May 15, 2012

I agree with you in theory, in practice, my experience has been a bit different. Specifically,

1. Since Mongo has a database (or maybe collection now) level lock, doing rebalancing under a heavy write load is impossible.

3. Mongos creates one thread per connection. This means that if you've got to be very careful about the number of clients you start up at any given time (or in total).

taligent · on May 15, 2012

1. Why would you be rebalancing under heavy write load. Wouldn't it be scheduled for quieter periods ?

2. I was under the impression that almost all Mongo drivers had connection pools.

kermatt · on May 15, 2012

> Why would you be rebalancing under heavy write load. Wouldn't it be scheduled for quieter periods ?

I think the example case is when a node fails during a load period - unplanned vs planned.

Some of the public MongoDB stories come from shops that were forced to rebalance during peak loads because they failed to plan to do so in a maintenance window.

Of course waiting to scale out until _after_ it becomes a need will result in pain (of various degrees) no matter what the data storage platform.

zorkian · on May 15, 2012

Yes, Riak gives you certain flexibilities that make certain things easier. If a node dies, you generally don't have to worry about anything. Stuff Just Works. (Well, in the Riak case, this is true until you realize that you have to do some dancing around the issue that it doesn't automatically re-replicate your data to other nodes and relies instead on read repair. This puts a certain pressure on node loss situations that I find is very similar to traditional RDBMS.)

But of your list, I have done all of these things in a MySQL system and for a comparable 1000 lines of code.

1. We implemented a system that tracks which servers are part of a "role" and the weights assigned to that relationship. When we put in a new server, it would start at a small weight until it warmed up and could fully join the pool. Misbehaving machines were set to weight=0 and received no traffic.

2. Node failure is easy given the above: set weight=0. This assumes a master/slave setup with many slaves. If you lose a master, it's slightly more complicated but you can do slave promotion easily enough: it's well enough understood. (And if you use a Doozer/Zookeeper central config/locking system, all of your users get notified of the change in milliseconds. It's very reliable.)

Network partitions are hard to deal with for most applications more so than most databases. It is worth noting that in Riak, if you partition off nodes, you might not have all data available. Sure, eventual consistency means that you can still write to the nodes and be assured that eventually the data will get through, but this is a very explicit tradeoff you make. "My data may be entirely unavailable for reading, but I can still write something to somewhere." IMO it's a rare application that can continue to run without being able to read real data from the database.

3. In a master/slave MySQL environment you would be reading from the slaves anyway unless your tolerance for data freshness is such that you cannot allow yourself to read slightly stale data. I.e., payment systems for banks or things that would fit better in a master/master environment. Since the slaves are all in sync, broadly speaking, you can read from any of them. (But you should use the weighted random as mentioned in the first point.)

...

Please also note that I am not trying to knock Riak. It's neat, it's cool, it does a great job. It's just a different system with a different set of priorities and tradeoffs that may or may not work in your particular application. :)

But to say that it can do things the others can't is incorrect. Riak requires you to have lots of organizational knowledge about siblings and conflict resolution in addition to the operational knowledge you need. A similar MySQL system requires you to have a different set of knowledge -- how to design schemas, how to handle replication, load balancing, etc.

Is one inherently better than the other? I don't think so. :)

willbmoss · on March 21, 2012

I added an implementation [1] in diesel [2][3], which uses select.epoll (or libev, on non-Linux systems) and got a around 150x speedup [4]. I only repeated the tests a few times (but they were all close) and didn't install the Go compiler so I could test against Go (I'd be interested to see how this stacks up on your machine). Like you say in your post, it's nice to have something wrap up the bother of epoll for you.

[1] https://github.com/wmoss/Key-Value-Polyglot

[2] diesel.io

[3] https://github.com/jamwt/diesel

[4] The first run is against the diesel one

wmoss@wmoss-mba:~/etc/Key-Value-Polyglot$ time python test.py

real 0m0.134s

user 0m0.040s

sys 0m0.020s

wmoss@wmoss-mba:~/etc/Key-Value-Polyglot$ time python test.py

real 0m20.164s

user 0m0.096s

sys 0m0.072s

codeape · on March 21, 2012

I lack knowledge on networking/event-based systems on a fundamental level.

Here's what I don't understand:

* test.py is sequential: It first does 500 sets then 500 gets, all in one thread, using a single connection to the server.

* The socket handling function (memg.py:handle_con/memg-diesel.py:handle_con) is called once. There is no parallell execution going on.

* So why is the memg-diesel.py code so much faster? What makes the code for sending and receiving data to/from the socket so much faster?

Could someone please explain to me why an epoll-based solution is so much faster?

chmike · on March 21, 2012

What is the difference between diesel and gevent ? Note that gevent 1.0 uses libev.

willbmoss · on March 21, 2012

In a nutshell, gevent monkey patches the socket library, whereas diesel doesn't. This means that you can use any (previously) blocking libraries with gevent, whereas, in diesel you have to write them again. The upside of the rewriting is that it creates a more coherent (and opinionated) ecosystem.

jamwt · on March 21, 2012

http://diesel.io/faq

jamwt · on March 21, 2012

also, the diesel one is half the LOC

/shameless plug :-)

(for the record, on my machine the go comparison was 97ms vs. 173ms, so pure python + diesel was 1.78x slower)

willbmoss · on Oct 25, 2011

At Bump we tried Campfire, but ended up using IRC (and use an open source IRC bot, https://bitbucket.org/yougov/pmxbot/src). I'm curious why you decided to go with Campfire instead of IRC.

technoweenie · on Oct 25, 2011

Hubot supports IRC too (though probably not as well, we only hop in when Campfire goes down).

We depend on having a nice, customizable native Campfire client. We keep a custom js script in DropBox that Propane (http://propaneapp.com/) loads on startup. We've been able to modify the UI to show avatars, highlight successful/failed builds in Git push notifications, etc.

CF also gives us a few other nice features, like offline transcripts with search, and STARS.

Surely, you can do all this with IRC... and there are other chat apps that people like too. We're hoping that people add support for them to Hubot.

dabeeeenster · on Oct 25, 2011

There's nothing in the docs to explain how to configure IRC server address/password etc?

dannytatom · on Oct 25, 2011

Seems like it's done by env variables, same way it's setup for Campfire:

https://github.com/github/hubot/blob/master/src/hubot/irc.co...

dabeeeenster · on Oct 25, 2011

Ah great thanks. Shame there's no IRC server password support just yet it seems...

anaisbetts · on Oct 25, 2011

It's on its way, will probably finish it up this week

xnxn · on Oct 25, 2011

(note: I'm not a githubber)

It has some web-oriented features like auto-linkification, condensing long responses, embedding of images, tweets, videos...

It's better to think of Campfire as a client than a protocol; there's nothing it doesn't do that a sufficiently advanced IRC client + logging bot combo can't replicate. I prefer the latter, myself, but I can see the value it adds for many.

willbmoss · on Oct 25, 2011

Agreed, I should have clarified. We don't just use IRC, but IRC along with other tools to handle pasting text and file uploads. I think what has always bothered me about Campfire (which I'm sure can be solved) is that it's not easy to do thing from the command line. I want to be able to pipe the output of git diff to my pastebin, or upload a log file from a remote machine.

willbmoss · on Oct 14, 2011

We (Bump) have found two bugs in Redis and both times we've worked closely with Salvatore and had them fixed in less than 48 hours. He's a great programmer who really cares about the product.

willbmoss · on Oct 14, 2011

One thing you don't note in your post (but made a big difference for us) is that 2.4 uses jemalloc. This reduced memory fragmentation for us by around 25%.

willbmoss · on Oct 14, 2011

Just as a word of warning, Redis 2.2 cannot read rdb or aof files created by 2.4 (meaning I'm pretty sure 2.2 can't be a slave of 2.4. So, if you have a fail over scenario, you might be forced to upgrade your master to 2.4.

sehugg · on Oct 15, 2011

Good point. Hopefully karma will shine upon us :)

willbmoss · on Sept 27, 2011

Bitcask can guarantee one disk seek, whereas LevelDB will do one disk seek per level, so at least from that perspective, it can't be better.

Level also has to look down the entire tree if a key is missing. This means inserts end up being more expensive than reads or updates (which are all just a hash lookup in Bitcask).

fizx · on Sept 27, 2011

"Bitcask can guarantee one disk seek, whereas LevelDB will do one disk seek per level, so at least from that perspective, it can't be better."

Yep, this is a standard tradeoff. When you want your data to be iterable, you have to take the hit. In practice (I oversee a large cassandra cluster), this hit happens about ~1% of the time, which is either a lot, or a little, depending on your constraints.

"Level also has to look down the entire tree if a key is missing."

This is why Cassandra has a bloom filter on top of a very similar data store.

nosequel · on Sept 27, 2011

LevelDB is there as the replacement for those who are currently using Innostore as their backend and not for those who have a dataset that fits bitcask.