Jump to content

Dev Blog 2/5/2020 - Resolving Rollbacks


jcsnider
Message added by jcsnider

This is mostly a technical breakdown on the rollback issue some of you all experienced. If you'd like you can skip this read and just go update to 0.6.0.180+.

Recommended Posts

a4727b61d3221e25d4960d124f383986.png

Resolving Rollbacks

2/5/2020

 

 

Intro

Some of you have experienced data loss and rollbacks using Intersect since the new database system was introduced in Beta 5.  These rollbacks were extremely rare, required very specific conditions to be met in order to occur, and only a handful of users ever reported the issue.  I'm happy to announce that as of Intersect version 0.6.0.180 rollbacks have been resolved, and we've added extra logging and safeguards so they will never happen again!

 

That's pretty much all you need to know, but I want to get a little bit technical and share exactly what happened,, how we've fixed the problem, and how we're making sure it won't happen again going forward. Let's dive in!

 

 

Overview

Before diving into the cause of rollbacks there are a few things you need to know. First, saving changes to a database takes time, based on your processor speed, disk speed, and other factors it can take as little as few milliseconds or as much as a minute. It is not acceptable to halt the server while the database saves, so we run the saving functions on separate threads so game logic/npcs/events/etc are not impacted.

 

Secondly, when building a database you can define various rules. For example, if I designed a database for a blog it would consist of blog posts, users, and comments. One of the rules I would create is comments belong to blog posts. That belonging relationship means that if a blog post get's deleted then all of the comments will automatically get wiped as well.

 

Intersect's database has similar sets of rules, and rollbacks to the player database were caused by one of those rules not being met after deleting a player. When the rule wasn't met database saves would fail until the next server restart.

 

 

What caused the rollbacks?

In order for rollbacks to occur, you the following actions must have taken place in the following order. First, multiple players had to have been on your server. Players must have become friends at one point in time. The server must have been rebooted. Finally one of the players must have tried to delete their character.

 

Our database has a rule on the friends table that makes sure a 'friendship' has  2 existing characters. When the player tried to delete their character, that rule would have been broken, and all database saves from that point on would fail.

 

 

What else went wrong?

We had several issues which made solving these rollbacks difficult. Very few users were reporting the bug because that chain of events wouldn't happen in games unless they were open to the public. Due to us running the database saves in a separate thread, database errors were being lost, instead of logged by Intersect's bug handler.

 

 

What's changed now in 0.6.0.180?

The bad 'rule' that caused saves to fail after characters were deleted has been fixed. We now have better error handling code outside of our primary logic threads. If new bugs are introduced that would result in a rollback, Intersect will detect the issue, log the error for us to fix, and promptly shutdown so data loss is a few minutes at the max instead of days or potentially weeks.

 

 

Going Forward!

Update to 0.6.0.180 ASAP! And after you've done that it's time to get back to work!

 

I want to say thanks to @Dashplant, @Aesthetic, and @Mapyo for helping provide logs & databases while we figured this out.

 

@panda put a ton of time into debugging and improving our database logging capabilities and deserves credit as well.

 

No more rollbacks! :D

 

 

Link to comment
Share on other sites

Interesting piece of information, I'm curious what the thought process behind having a constraint on something as dynamic as a friendlist was though.

Makes you wonder if there are similar constraints that may cause similar issues in other areas of the database.

 

Definitely seems like a hard one to crack though, if there's no live games there's not many people using your friendlist.

 

(Also, before anyone misunderstands. I'm not trying to be negative here, this sort of stuff just tickles my fancy okay. lol)

Link to comment
Share on other sites

We use Entity Framework as an ORM which allows us to utilize a code first DB design. When setting up the DB structure we created a HasMany relationship for players having friends. EF (without our intention) created the contraints, which while not necessarily isn't the worse thing in the world unless it's setup wrong. That was the case here. We're definitely still learning as we go, luckily the friends table is actually really unique and the likelihood of this ever happening again is very very low. 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...