A thread for unboxing AI

5mo ago

I should add that Jimmy cannot scupper us by purposely picking a number with 4 distinct digits (our worst case) because he does not know our encoding scheme in advance. For example, if we choose to allocate 1 to small and 2 to large the resultant number may end up with 3 distinct digits, but if we do it vice versa it might end up with 2 distinct digits. In repeated trials we would randomise our encoding scheme each time so that Jimmy can't infer it from our prior guesses. No shenanigans for Jimmy.

5mo ago

There is an additional complexity that I'm pretty sure you are not accounting for. Jimmy raises his hand regardless of whether a selection hits on one, two, or three of his preferred attributes. (If you hit all four, he of course just says that you win.)

5mo ago

by Rococo k

Not sure why you think I'm not accounting for it?

by d2_e4 k

Jimmy is going to raise his hand when the number we have picked has at least one digit correct, per the conditions of the original problem.

5mo ago

by d2_e4 k

Not sure why you think I'm not accounting for it, the whole solution is predicated on this being the case!

Never mind. I misread what you wrote. My gut tells me that Case D isn't optimized.

For example, if Jimmy doesn't raise his hand on your first random selection, doesn't that give you additional information that you can use to exclude some of the remaining 23 combinations from which you are choosing at random.

5mo ago

by Rococo k

Never mind. I misread what you wrote. My gut tells me that Case D isn't optimized.

Optimising case D doesn't add a huge amount to our overall win % as every % we gain here is a % of 24/256. If we could optimise case C down to a 1 in 2 pick from a 1 in 3 pick that would be massive as we would go from 1/3 of 144/256 to 1/2 of 144/256 for a 24% improvement overall. So if I were looking to optimise, I'd definitely take a look at case C a lot more closely.

There is probably some algorithm we can come up with to optimise the worst case for case D. I think finding it is going to involve a fair amount of trial and error.

5mo ago

by d2_e4 k

I'm not sure that you can guarantee there will be at least one hat for which he doesn't raise his hand in the first 5 picks. When he raises his hand, we gain no additional information. So if that's the case, we can't optimise the worst case scenario, which is what I've calculated. We might be able to optimise the best case & average case, but those require additional calculations anyway.

Say he is thinking 0123

We pick 0213 he raises his hand
We pick 1203 he raises his hand
we pick 2103 he raises hi

If the solution is 0123, and your first guess is 3210, he won't raise his hand, at which point you can eliminate all remaining combinations that have 3 as the first digit, 2 as the second digit, 1 as the third digit, or 0 as the fourth digit. Right?

5mo ago

by Rococo k

Yeah I ****ed that up. I edited my post, was hoping you hadn't started replying to it, too late!

But also keep in mind that all our calculations here are worst case. So we have to assume that we go down the unhappy path, assuming that we hit the jackpot and he doesn't raise his hand with the first number we pick is invalid for this calculation.

We do gain additional information when he raises his hand though, which is where I ****ed up. If we pick 0213 and he raises his hand, we know that the first digit is a 0 and/or the second digit is a 2 etc. which is more than we knew beforehand. Designing an algorithm around this is possible I'm sure, but would take a fair amount of tinkering.

5mo ago

by d2_e4 k

(1)4/256 + (1)84/256 + (1/3)(144/256) + (1/4)(24/256) = 55.5%. We (as the group of 10) can take an even money bet on this game and win.

Yeah, after doing some scratch match, I am convinced that the bolded coefficient for Case D is very wrong, and almost certainly much closer to 1 than 1/4. You alluded to the reasons.

Once you start using your remaining six guesses, you are able to rapidly eliminate combinations, regardless of whether Jimmy raises his hand. If, on your first guess, Jimmy doesn't raise his hand, then you can eliminate 14 of the remaining 23 combinations (i.e., all remaining combinations that have one digit in the same place as your guess). If Jimmy does raise his hand, then you can eliminate 9 of the remaining 23 combinations (i.e., all combinations that do not have any digits in the same place as your guess). And this ability to eliminate combinations continues with each successive guess, regardless of whether Jimmy raises his hand. You will never eliminate as many combinations on successive guesses as you do on the first guess, but as you eliminate combinations over six iterations, you dramatically increase your chances of binking the correct answer.

I haven't thought about Case C yet.

5mo ago

Fair enough. I approached it as I would a complex problem at work, essentially: design an overarching framework that allows us to split the problem into sub-tasks, implement an initial solution for each task that's "good enough", then optimise each sub-task individually as necessary. After all, the initial question wasn't "what's the answer?", it was "how would you calculate the answer?" I would consider the base 4 encoding + 4 initial guesses the "framework", which then allows us to optimise each case individually.

Note that D in total contributes less than 10 points to our overall win %, so even if we manage to find a perfect algorithm for D we won't be increasing our overall "performance" by more than 7.5%. If I were solving this problem for practical reasons I'd definitely be focusing on optimisations for case C as that's where the big wins are, but I understand that from a theoretical standpoint finding the algorithm for case D may be more interesting.

5mo ago

by d2_e4 k

I understand that from a theoretical standpoint finding the algorithm for case D may be more interesting.

Not more interesting. Just easier for me to get my head around.

5mo ago

For Case C, the coefficient is also far too low because, as you mention, you are taking the worst case scenario when using the masking method to uncover each of the first three digits. I also want to give some more thought to whether the masking method is the most efficient strategy, even as you move to the second and third digit. It may not be, in part of because of our ability to eliminate combinations even without using a masking method (as in Scenario D), and in part because each time you use the masking method, you deprive yourself of an opportunity to simply bink the answer by guessing a combination that conceivably could be correct.

I am relatively certain that, over the entire problem, optimal strategy will yield a better than 75% chance of picking Jimmy's hat.

5mo ago

by Rococo k

Agreed with the above. More generally, I don't know how to a) find a "good" algorithm or b) prove that a given algorithm is or isn't the optimal algorithm. The algorithms I provided were my best guesses. It's likely that even case B is not optimised, it just gets us to 100% within the required number of guesses, so it doesn't need to be. Once the framework is in place, it essentially becomes a pure algorithmic optimisation problem, like writing an efficient sort for example.

5mo ago

by d2_e4 k

More generally, I don't know how to a) find a "good" algorithm or b) prove that a given algorithm is or isn't the optimal algorithm.

I have the same issue.

It's likely that even case B is not optimised, it just gets us to 100% within the required number of guesses, so it doesn't need to be.

I didn't even consider optimization for Case B because it didn't matter for the purposes of my question.

5mo ago

If you manage to get e d'a to have a look at this, he might well be able to come up with something better than I did. He seems to know a lot about comp sci and algorithmic complexity. I haven't studied anything like that in depth, so there are probably both theoretical and practical approaches to optimisation problems like this that I just have never learnt about.

5mo ago

by Rococo k

For Case C, the coefficient is also far too low because, as you mention, you are taking the worst case scenario when using the masking method to uncover each of the first three digits.

Just on this point - when evaluating algorithmic efficiency, you usually have a best case, worst case and average case statistic for a given algorithm. Our best case is trivially 100%, that seems easy enough - we can get lucky with the first pick, or we can get lucky and hit Case A, or a bunch of other things can happen. The worst case is what I've been trying to caclulate, and it gives us a lower bound for our expected success rate on repeated trials. The average case seems like it would be more difficult to calculate, as you need to take the worst case paths and all the other paths and somehow average them out. That seems very daunting. But I believe that would give us our actual expected success rate, not just the lower bound for it.

Usually with puzzles like this though, like "what's the minimum number of weighings needed to find the fake coin" etc, you are looking for the worst case scenario.

And finally - we don't really even know that the strategy of initially guessing the four repdigits first is optimal. There could well be other strategies for some number of first guesses that allow us to come up with a method of subsequently categorising cases different from which distinct digits they contain. Maybe if there is a method of initially guessing such that the subsequent cases are symmetrical, e.g. case 1 is 0-63, case 2 is 64-127 etc. and then we can have one algorithm that works for all the cases. Guessing repdigits to start was quite honestly just the first thing that came to mind.

5mo ago

Actually, in case C by guess 9 we have 2 guesses left for 5 numbers, so rather than ****ing about isolating digits we can take the 2 in 5 shot by picking 9 and 10 at random which is already better than 1/3. This brings our lower bound up to 59.22%.

5mo ago

I have an improvement on the algorithm.

Use the first 2 guesses to guess 0000, 1111. Proceed case by case. (U = Hand up, D = Hand down)

Case A: D,D (16 combos). This case is trivial, there are 16 combos which contain only 2,3. This can be solved easily with 8 guesses using the masking method or probably a bunch of other methods.

Case B: U,D or D,U (130 combos). I have an algorithm using a variation of the masking method which solves these 100% of the time (I think!). I'll post it if this line proves fruitful, it's a little finnicky.

Case C: U,U (110 combos). Haven't thought about this yet.

This gives us (1)16/256 + (1)130/256 + (?)(110/256) = 57% + X. X Has to be at least (1/8)(110/256) so this already gets us to 62.4% before we start optimising case C.

Following this line, it becomes a question of solving case C (numbers containing both 0 and 1, no other information available) in 8 guesses.

4mo ago

The Nobel Prize in Physics has been awarded to two scientists, Geoffrey Hinton and John Hopfield, for their work on machine learning.

British-Canadian Professor Hinton is sometimes referred to as the "Godfather of AI" and said he was flabbergasted.

He resigned from Google in 2023, and has warned about the dangers of machines that could outsmart humans.

The announcement was made by the Royal Swedish Academy of Sciences at a press conference in Stockholm, Sweden.

American Professor John Hopfield, 91, is a professor at Princeton University in the US, and Prof Hinton, 76, is a professor at University of Toronto in Canada.

https://www.bbc.co.uk/news/articles/c62r...
Not sure it's really physics but ...

checkraisdraw

4mo ago

by Dunyain k

Bump:

Can anyone who understands programming/AI better than me walk through exactly how one "teaches" AI to be an ideological bad faith actor.

And question for everyone, does anyone see how this can be problematic? Elon Musk is very adamant that teaching/prompting AI to lie for any reason is a very bad idea; but maybe he is just crazy and over-reacting and it is no biggie. I dunno.

My theory is that the AI get’s a lot of “Trump” “Assassination” queries and doesn’t want to be the accessory to a real assassination m

smartDFS

4mo ago

by checkraisdraw k

My theory is that the AI get’s a lot of “Trump” “Assassination” queries and doesn’t want to be the accessory to a real assassination m

they've been trained to steer clear of anything that remotely approaches controversial

Gregory Illinivich

4mo ago

Speaking of AI, this Destiny stream is both entertaining and unsettling. It's funny because he didn't realize how widespread the bot problem was on social media and it takes him forever to realize that these videos are AI generated, but it is eerie. If anyone actually decides to watch this, 26:45 is where it gets interesting. Strange times.

https://www.bbc.co.uk/news/articles/czrm...

4mo ago

On top of the nobel prixe for physics goign to AI. We have:

British computer scientist Professor Demis Hassabis has won a share of the Nobel Prize for Chemistry for "revolutionary" work on proteins, the building blocks of life.

Prof Hassabis, 48, co-founded the artificial intelligence (AI) company that became Google DeepMind.

Large Language Model Influence on Diagnostic Reasoning

3mo ago

This is a fun one. Study to see if AI heped doctors improve diagnosis.

It didn 't but the ai on its own outperformed them.

Importance Large language models (LLMs) have shown promise in their performance on both multiple-choice and open-ended medical reasoning examinations, but it remains unknown whether the use of such tools improves physician diagnostic reasoning.

Objective To assess the effect of an LLM on physicians’ diagnostic reasoning compared with conventional resources.

Design, Setting, and Participants A single-blind randomized clinical trial was conducted from November 29 to December 29, 2023. Using remote video conferencing and in-person participation across multiple academic medical institutions, physicians with training in family medicine, internal medicine, or emergency medicine were recruited.

Intervention Participants were randomized to either access the LLM in addition to conventional diagnostic resources or conventional resources only, stratified by career stage. Participants were allocated 60 minutes to review up to 6 clinical vignettes.

Main Outcomes and Measures The primary outcome was performance on a standardized rubric of diagnostic performance based on differential diagnosis accuracy, appropriateness of supporting and opposing factors, and next diagnostic evaluation steps, validated and graded via blinded expert consensus. Secondary outcomes included time spent per case (in seconds) and final diagnosis accuracy. All analyses followed the intention-to-treat principle. A secondary exploratory analysis evaluated the standalone performance of the LLM by comparing the primary outcomes between the LLM alone group and the conventional resource group.

Results Fifty physicians (26 attendings, 24 residents; median years in practice, 3 [IQR, 2-8]) participated virtually as well as at 1 in-person site. The median diagnostic reasoning score per case was 76% (IQR, 66%-87%) for the LLM group and 74% (IQR, 63%-84%) for the conventional resources-only group, with an adjusted difference of 2 percentage points (95% CI, −4 to 8 percentage points; P = .60). The median time spent per case for the LLM group was 519 (IQR, 371-668) seconds, compared with 565 (IQR, 456-788) seconds for the conventional resources group, with a time difference of −82 (95% CI, −195 to 31; P = .20) seconds. The LLM alone scored 16 percentage points (95% CI, 2-30 percentage points; P = .03) higher than the conventional resources group.

Conclusions and Relevance In this trial, the availability of an LLM to physicians as a diagnostic aid did not significantly improve clinical reasoning compared with conventional resources. The LLM alone demonstrated higher performance than both physician groups, indicating the need for technology and workforce development to realize the potential of physician-artificial intelligence collaboration in clinical practice.

https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2825395

jamanetwork

Google unveils 'mind-boggling' quantum computing chip

2mo ago

Google has unveiled a new chip which it claims takes five minutes to solve a problem that would currently take the world's fastest super computers ten septillion – or 10,000,000,000,000,000,000,000,000 years – to complete.

The chip is the latest development in a field known as quantum computing - which is attempting to use the principles of particle physics to create a new type of mind-bogglingly powerful computer.

Google says its new quantum chip, dubbed "Willow", incorporates key "breakthroughs" and "paves the way to a useful, large-scale quantum computer."

However experts say Willow is, for now, a largely experimental device, meaning a quantum computer powerful enough to solve a wide range of real-world problems is still years - and billions of dollars - away.

https://www.bbc.co.uk/news/articles/c791ng0zvl3o

BBC News

tick tock