Wednesday 26 September 2007

Punting 3NT

Phil asked:
We all play club pairs type games and we always see daft 3NT contracts on the traveller when we play in 4/ (just because we have an 8 or 9 card fit). Maybe some simulations on the matchpoint benefits/ lack thereof of random 3NT punts would be good. Or just some definitive answer as to how much the computer says they should be losing by...

It's a bit hard to define what a "random 3NT punt" is, so I'm going to reword the question like this:

Partner opens 1NT and we have any balanced hand worth 3NT, regardless of major length. Of the hands where we have an 8-card major fit, how much better is it to play there than to play in 3NT?

In fact, I'm just going to restrict it to hands where we have a heart fit as it makes no difference. The criteria are as follows:
  • Partner is 12-14 balanced
  • RHO has 0-14 points and 7 losers or worse
  • We have 12-18 balanced
  • We have 8+ hearts between us

Over 25,000 tests, we see the following:

Tricks 3NT 4

0 0 0
1 0 0
2 1 0
3 4 0
4 12 0
5 94 1
6 490 20
7 1893 160
8 4785 1129
9 6660 4898
10 5555 9249
11 3780 7075
12 1528 2195
13 198 273

NT is best: 10280
Hearts is best: 12619
Both the same: 2101
So, on that evidence, it seems that just punting 3NT has a lot going for it. It's a loser, but not by very much so if you fancy a swing then it might be worth a shot. Incidentally, at IMPs scoring you lose 1.02 IMPs/board by bidding 3NT, so it's really not a good idea.

Some experienced players will be shifting around in their seats now, crying that this is rubbish analysis. Yep, it is. The criteria are too general. We know from playing lots of bridge that if you have thin game values, you'd rather be in the major but if you have plenty of extras then the need to ruff losers isn't so great and 3NT will often make the same number of tricks. Also, we know that with a 4333 hand we will usually just bid 3NT over 1NT regardless of whether we have a major or not as it's likely to play the same. So let's break it down.

Using test runs of 1000:

Your point count

12-13 14-15 16-18
NT is best: 345 454 623
Hearts is best: 522 500 371
Both the same: 133 46 6

NT IMPs: -1.23 -0.96 -0.66


Combined Hearts

8 9 10
NT is best: 421 354 236
Hearts is best: 483 591 689
Both the same: 96 55 75

NT IMPs: -0.81* -2.16 -2.76


Your hand shape

3433 2533 4432
NT is best: 467 458 324
Hearts is best: 393 473 561
Both the same: 140 69 115

NT IMPs: +0.59 -1.07 -1.64


Type of fit

4-4 5-3
NT is best: 339 516
Hearts is best: 549 401
Both the same: 112 83

NT IMPs: -0.75* -0.23*
Now, I should probably break it down even further and find out what point ranges are best for 3NT in a 5-3 fit etc. etc. but that's going a bit over the top, I think. Generally speaking: 4333 shapes indicate 3NT; stronger point counts indicate 3NT; 5-3 fits indicate 3NT. Combining these factors will just do what you expect it to do.

Note that punting 3NT is never a really stupid thing to do at pairs. Even if you have a 5-5 heart fit that you're missing, you'll still get a good result about a quarter of the time! Note also that playing in 3NT is almost always a substantial loser at IMPs, the only exception being when you're 4333. So punting 3NT and eschewing your major is purely a pairs manoeuvre.

This is just a brief skimming of this area as the subject of 3NT simulations seems to have been done to death on rec.games.bridge. Try this thread for starters.

So to answer Phil's question, those bastards who punt 3NT and get a good result against you are indeed being lucky, but not as lucky as you might have first thought.

* Edit: see comments for slightly more accurate results.

Tuesday 25 September 2007

Kicking Off For Real

And I think with that last post, we'll call our Introduction to Bridge Simulations finished. I hope it made some sense. The example hand I plucked out of thin air on day one didn't turn out to be quite as interesting as I'd hoped but that wasn't the point of it all. The idea was to show the processes involved in doing simulations like this, what sort of things we can find out, what sort of questions we need to ask ourselves, and how we can evaluate the results we get.

I started this blog because there were basically no resources on the subject that I could find. Therefore, thought I, I must be a universal expert perfectly placed to impart my great wisdom. Bollocks! I don't know anything really and I expect you all to point out any flaws or inaccuracies in any of the below.

What next? Posting will still be intermittent (so you'd be a fool to check back daily) but I hope to just get down and run a bunch of simulations — anything that interests me or that crops up when I'm playing. I might even take requests, but don't be offended if I ignore you. Just post a comment.

Do It Yourself

It just occurred to me that I haven't made any mention of just how I run these simulations. I use Deal and GIB, the links to which I just put up on the right. Deal is a very flexible deal generator which uses a programming language called Tcl to produce deals with just about any criteria you like. It's hands down better than anything else I've tried, but is not remotely user friendly unless you have programming experience! I may some day write a tutorial if there's any interest, but it's all there on the Deal website. GIB is a bridge playing program written by Matt Ginsberg. It's a single-dummy program (you can play against it on BBO) but it uses a double-dummy engine at its heart and you can interface this double-dummy engine with Deal. You used to be able to download it from their website, but it doesn't look like you can any more. So you might have to just buy the GIB software, unless anyone can point me to a free (and legal!) version?

Double Dummy Usefulness

I was going to write another long post analysing in detail how useful double-dummy analysis is, but I don't feel it's necessary. For one, it's mostly just about using common sense and experience — you get a feel for what kind of results are useful and what aren't — but also pretty much every simulation we do will involve some degree of assessment so the same topics will come up in future blog posts.

Also, I'm getting bored of this introduction to simulations and want to get on to more interesting things!

I will, though, give you some random bits of crap to use as manure for your thoughts to grow.
  • A DD engine declares perfectly, but also defends perfectly. In a lot of hands the two cancel each other out.
  • Higher level contracts are usually more accurate than low level contracts. There are less points of decision for either side in a typical 7NT contract than in 1NT and the less decisions that need to be made by a stupid human, the closer to 'perfection' they will get!
  • Balanced distributions tend to be more accurate than wild ones. This is because with a wild distribution a lot of tricks will often hinge on one decision which, for a human, is just a guess. In a balanced no-trump contract, the cost of not making a double-dummy decision isn't so severe and you can often get it back. The perfect example is leading against a Gambling 3NT opening. Which Kxxx major suit do you lead from when one will lead to six off and the other will lead to plus three?
  • Hands with holdings like AQ10 or KJx will be over-valued as the double dummy engine will always get any guess right. Therefore, you should be mindful when fixing hands with tenaces like this.
  • In most cases, when you're comparing two strategies, the strategy that comes out best with double-dummy analysis will also come out best if you use single-dummy. Thus you can decide whether you prefer to play in 3NT or 4 on our example deal based purely on which one comes out top.
  • If you're scoring with IMPs or aggregate points, you don't really care if the contract makes one overtrick or two, so any error between DD and SD will be marginalised.
  • You're sitting in the pub after a session of bridge and looking at the Deep Finesse analysis. How often do you think to yourself that the number of tricks is completely unrealistic? Occasionally you'll notice it dropping singleton Kings or leading unsupported Aces to give partner a ruff, but for the most part you say: "Hmm, DF says it can make 9 tricks. Yep, 3NT should have made — sorry partner!"
  • What is single dummy anyway? The difference between club level play and Bermuda Bowl level play is quite a few tricks! Just treat Deep Finesse as the epitome of expert play.

That is all.

Monday 17 September 2007

Being Sloppy

Let's say you had a convention where a 1 opening showed either 10 points or 30. You want to run a simulation based upon this opening and so you need to deal out a bunch of hands which fit. So you tell the computer to give you some hands with 10 points and some hands with 30 points and go on your merry way.

Ah, but the fact is that 30 point hands are much rarer than 10 point hands. Much rarer. So if you want to do a proper investigation, you need to deal out the hands with the same proportion as they would occur in real life, otherwise you'll be skewing the data massively. Anyone care to work out how many 10 point hands we should deal for every 30 point hand? No, I can't be bothered either.

The simple way to do this is to just deal out completely random hands with no restrictions (random deals actually, but we'll let the distinction slide) and then pick out all the ones which don't fit. We'll count the points on every one of possibly millions of hands that come our way and only include it in the investigation if it has 10 or 30. This will ensure that the proportion of 10pt hands to 30pt hands in our sample set is more or less as it would be in real life. Computers are super-fast at dealing random bridge hands so the fact that we're dealing a heck of a lot more than we should isn't an issue.

It took my computer about 30 seconds to deal out 10 million completely random deals. Of these, South held 10 points 940,000 times and 30 points a mere 22 times. We've wasted over 9 million deals! We can now perform our analysis on the ones left, knowing that it will broadly mirror reality.

This is the same thing that we did with our example hand. We dealt out millions and millions of random deals and only kept the ones which fit the knowledge we had been given. If we did it right then the deals which we kept will be consistent with the action so far (LHO passed, partner opened 1NT, RHO passed). The full set of rules that we used were as follows:

  • North has 12-14 points
  • North is balanced
  • West has 0-11 points
  • West has an 8 loser hand or worse
  • East has 0-14 points
  • East has a 7 loser hand or worse
  • South (you) has the problem hand, 4 A Q 2 A J 7 6 2 J 9 4 2

"But", you say, "I would compete over 1NT with lots of 7-loser hands! Your analysis is flawed and I'm never going to look at your stupid blog again!". And that's a fair point. We originally stipulated that oppo aren't hyper-aggressive but they surely would act with, say:

Q J 10 2 A K 8 7 2 4 J 6 5

That's seven losers so our analysis will include it. But so is this:

4 2 K Q 4 3 K Q 6 2 J 4 2

I'm not completely sure what you're supposed to call with that if not Pass!

The conclusion is, unsurprisingly, that the losing trick count isn't a great way to assess hands for overcalls. Perhaps we should do something more sophisticated like use the Rule of 20/21/22 or count shortage points or implement Binky points or Zar points or a number of other hand evaluation methods. Perhaps we should, and our analysis would undoubtedly be stronger if we did. The question is, though, how much? We don't only want to model the kinds of hands which East can hold, we want to model them with the correct frequency, so how much does it really matter if the odd attractive 7-loser 11-count slips in?

Over 100,000 deals I counted the HCP and the loser count of all the East hands. They were distributed as follows:

HCP
3    :   991
4 : 3123
5 : 7750
6 : 11703
7 : 14802
8 : 15785
9 : 15105
10 : 12677
11 : 8917
12 : 5325
13 : 2827
14 : 995

Losing Trick Count (in half-tricks)
14   : 22143
15 : 4988
16 : 29206
17 : 7738
18 : 21251
19 : 5338
20 : 7104
21 : 1433
22 : 755
23 : 44

You will see that a healthy majority of hands are clear passes. In terms of HCP, you might consider overcalling on some 11-14pt hands but these only account for 18.1% of the sample. If we're looking at losers, we might act on some 7-loser hands (like the example above) and also some 7.5-loser hands which makes up 27.1% in total.

What does it look like if we use the Rule of X (HCP + length of longest suit + length of second longest suit) for overcalling? Here are the stats:

Rule of X
10   :   121
11 : 800
12 : 2591
13 : 5822
14 : 9629
15 : 12991
16 : 15580
17 : 15605
18 : 14158
19 : 10900
20 : 6956
21 : 3536
22 : 1172
23 : 136
24 : 3

We can probably say that we'd be acting on a Rule of 24 hand (say, 14 HCP with a 5-5 shape) but note that, because of the losing trick count restriction, it won't be anything like as good as:

K Q J 10 2 A K 8 7 2 4 J 5

More likely it'll be:

J 6 5 4 2 A J 8 7 2 K K Q

Ok, we'd be bidding but it's not exactly beautiful. We might also get involved with the Rule of 23 hands and some of the Rule of 22 hands. But not all of them. A balanced 14-count fulfills the Rule of 22 and our oppo won't be bidding with that (I just asked them). And even if we add in the Rule of 21 hands (some of which we'd act on; most of which we wouldn't), it only amounts to 4.8% of the sample.

We're not getting anywhere very fast like this, and we won't either unless we sit here and painstakingly map out a detailed hand evaluation metric for overcalling a weak NT when vulnerable. But that's hard to do and I'm too lazy anyway. And I don't believe that we have to. If more than 5% of our sample size is flawed because we're including hands which would have overcalled and therefore do not accurately mirror the problem scenario then I'd be very surprised (at a guess, I'd put it at more like 1-2%).

And what does it hurt us if we include these hands, anyway? Instead, for example, partner will open 1NT, RHO will overcall 2 naturally and we'll still be in a similar position, wondering whether to bid 3NT or try for 4 or perhaps look for a minor game. The decision doesn't change — we're just handicapping ourselves by pretending we didn't see the overcall. The same could be said if RHO overcalls something else. It becomes harder the higher he overcalls, but even if he bids 2 we can still wonder to ourselves whether to bid 3NT (if partner has a stop) or look for a heart game (perhaps by cue-bidding 3 in a way that denies a stop).

What I'm trying to say, in a very long-winded way, is that it would be great if we could model the problem scenario in perfect detail such that every hand is consistent with the knowledge we would have at the table, but that's hard. Much easier is to be sloppy and use crude evaluation systems such as high-card points or the losing trick count or both. If we do this, we'll find that the majority of hands work out just fine and, of the hands which we analyse when we shouldn't, most of them won't make any difference anyway. We don't have to be perfect — getting it right 95-99% of the time is more than enough.

The moral: try not to be sloppy but if you are then don't worry too much!