Testing Maia's Puzzle Performance

One idea that I think would be interesting to test is using maia networks with a depth > 1 node. It seems plausible that the 1900 network would find a hard tactic in less nodes than the 1100. At the very least it should be relatively easy to try if you already have everything set up for testing maia tactics.

jk_182

#12

@FireWorks said in #10:
> Engines generally aren't that good at solving tactics/puzzle, even SF struggles with some puzzles that a human would solve. Some puzzles seem impossible to solve because they're simply too many moves deep and will not be solved by SF even when ran at a supercomputer for several hours, but might be solved by a decent human player because they might recognize a repeating pattern or theme/motif.

Are you sure about that? When I read about Google's search-less engine (arxiv.org/pdf/2402.04494) I saw that they tested Stockfish 16 with a thinking time of 0.05 seconds on Lichess puzzles up to a rating of 2800 and SF got 100%. I've included their plot in my post about the paper: www.chess-journal.com/googleSearchless.html

FireWorks

#13

@jk_182 said in #12:
>
Check out some of the puzzles I've collected in my studies at my profile, there's even one where SF fails to see a mate in 7 and even a mate in 5 two moves later after e4 is played.

1RR3Q1/8/5p2/4p3/P3B3/5ppp/K3prqk/4Nbrn w - - 0 1

FireWorks

#14

BTW, which software runs SF the most efficiently? I'm not sure if running SF through lichess is that great, considering javascript can be slow.

LauOnChess

#15

@FireWorks said in #10:
> Engines generally aren't that good at solving tactics/puzzle

It doesn't make sense. Of course, they're good at puzzles, the whole Lichess puzzle database was generated with the help of an engine github.com/ornicar/lichess-puzzler

There are indeed some very peculiar crafted edge cases where a puzzle is obvious for a human and the engine chokes for some reason, but it is far from the general rule.

jk_182

#16

@FireWorks said in #14:
> BTW, which software runs SF the most efficiently? I'm not sure if running SF through lichess is that great, considering javascript can be slow.

The fastest way must be running it directly through the command line. But I would guess that most GUIs with UCI support will run a local version of SF just as fast.

FireWorks

#17

@LauOnChess said in #15:
> It doesn't make sense. Of course, they're good at puzzles, the whole Lichess puzzle database was generated with the help of an engine github.com/ornicar/lichess-puzzler
>
> There are indeed some very peculiar crafted edge cases where a puzzle is obvious for a human and the engine chokes for some reason, but it is far from the general rule.

Engines are better than most humans in chess and have been so for probably the last 15-20 years or so, but they still struggle with some puzzles that humans can solve. One example is the infamous Plaskett's Puzzle, which Mikhail Tal supposedly solved at a tournament. I'm sure SF will improve in puzzle solving eventually and be able to solve most puzzles, but it's an area where it still shows some weakness.

goodspellr

#18

I'm surprised that the Maia accounts do not have the bots play tactics puzzles. It would be interesting to see what sort of puzzle rating they each achieve under the lichess tactics rating system. From your analysis, it seems like they'd all converge to a similar puzzle rating in the mid 1300s.

ScratchChess

#19

Интересно но всё с рейтингом 1100 может быть сильным а 1500? может слабым.

Graque

#20

@FireWorks said in #17:
> I'm sure SF will improve in puzzle solving eventually and be able to solve most puzzles, but it's an area where it still shows some weakness.

Perhaps you found some anti-computer puzzles that's true of. But it's totally wrong as far as the Lichess puzzle database. SF does them much better than any human and can basically instantly solve all of them.

As far as the main article, it seems the problem is basically with Maia? If Maia 1100 is supposed to play like an 1100-rated human, and Maia 1900 is supposed to play like a 1900-rated human, yet they play at similar levels, then something has gone wrong.