My failed attempt at AGI on the Tokio Runtime

111 points by openquery 7 months ago

dhruvdh 7 months ago

I wish more people would just try to do things just like this and blog about their failures.

> The published version of a proof is always condensed. And even if you take all the math that has been published in the history of mankind, it’s still small compared to what these models are trained on.

> And people only publish the success stories. The data that are really precious are from when someone tries something, and it doesn’t quite work, but they know how to fix it. But they only publish the successful thing, not the process.

- Terence Tao (https://www.scientificamerican.com/article/ai-will-become-ma...)

Personally, I think failures on their own are valuable. Others can come in and branch off from a decision you made that instead leads to success. Maybe the idea can be applied to a different domain. Maybe your failure clarified something for someone.

openquery 7 months ago

Thank you for saying this. I agree which is why I wrote this up.
mindcrime 7 months ago

I wish more people would just try to do things just like this and blog about their failures.
Came here to say the same thing. Actually, I guess I did say the same thing, just in a much more long-winded form. Needless to say, I concur with you 100%.

markisus 7 months ago

> The only hope I have is to try something completely novel

I don’t think this is true. Neural networks were not completely novel when they started to work. Someone just used a novel piece — the gpu. Whatever the next thing is, it will probably be a remix of preexisting components.

openquery 7 months ago

Right. Ironically I chose a model that was around in the 1970s without knowing it.
My point was more a game-theoretic one. There is just no chance I would beat the frontier labs if I tried the same things with less compute and less people. (Of course there is almost 0 chance I would beat them at all.)

namero999 7 months ago

Isn't this self-refuting? From the article:

> Assume you are racing a Formula 1 car. You are in last place. You are a worse driver in a worse car. If you follow the same strategy as the cars in front of you, pit at the same time and choose the same tires, you will certainly lose. The only chance you have is to pick a different strategy.

So why model brains and neurons at all? You are outgunned by at least 300.000 thousand years of evolution and 117 billion training sessions.

andrewflnr 7 months ago

Because bio brains aren't even in the same race.

skeledrew 7 months ago

Interesting. I started a somewhat conceptually similar project several months ago. For me though, the main motivation is that I think there's something fundamentally wrong with the current method of using matrix math for weight calculation and representation. I'm taking the approach that the very core of how neurons work is inherently binary, and should remain that way. My basic thesis is that it should reduce computational requirements, and lead to something more generic. So I set out to build something that takes an array of booleans (the upstream neurons either fired or didn't fire at a particular time sequence), and gives a single boolean calculated with a customizable activator function.

Project is currently on ice as after I created something that builds a network of layers, but ran into a wall figuring out how to have that network wire itself over time and become representative of whatever it's learned. I'll take some time and go through this, see what it may spark and try to start working on mine again.

openquery 7 months ago

Nice. Interested to see where this leads.
The network in the article doesn't have explicit layers. It's a graph which is initialised with a completely random connectivity matrix. The inputs and outputs are also wired randomly in the beginning (an input could be connected to a neuron which is also connected to an output for example, or the input could be connected to a neuron which has no post-synaptic neurons).
It was the job of the optimisation algorithm to figure out the graph topology over training.
- RaftPeople 6 months ago
  
  I did a similar project previously and had what I considered "good" results (creatures that did effectively control their bodies to get food) but not the kind of advanced brains I had naively hoped for.
  The networks were really configurable (number of layers, number of "sections" within a layer (section=semi-independent chunk), number of neurons, synapses, types of neurons, type of synapses, amount of recurrence, etc.), but I tended to steer the GA stuff in directions that I saw tended to work, these were some of my findings:
  1-Feed forward tended to work better than heavily recurrent. Many times I would see a little recurrence in the best brains, but that might have been because due to percentages it was difficult to get a brain that didn't have any of it.
  2-The best brains tended to have between 6 and 10 layers, and the middle layers tended to be small like information was being consolidated before fanning out to the motor control neurons.
  3-Activation functions: I let it randomly choose per neuron or per section of layer, or per layer or per brain, etc. I was surprised that binary step frequently won out compared to things like sigmoid or others.
  - openquery 6 months ago
    
    Were the brains event-driven? How did you implement the GA? What did individual genes encode?
    
    RaftPeople 6 months ago
    
    This was the setup:
    1-Creature shape: hexagon with a mouth/proboscis perpendicular to one side of hexagon
    2-Senses:
    2.1-Mouth could detect if it was touching plant food or creature (which is also food), and would transfer energy from food surce if touching it
    2.2-Sight:
    Two eyes each sent out 16 lidar-like rays (spread out a bit)
    Eye neurons triggered based on distance of object and type of object (wall,plant,hazard,creature)
    2.3-Touch:
    Each body segment had 32 positions for touch
    Each position had neurons for detecting different types of thing: wall, plant, hazard, creature, sound
    2.4-Sound:
    Each creature emanated sounds waves (slower than light rays but faster than creature movement)
    Sound detected by touch senses
    3-Motor: multiple neurons controlled forward, backward and rotation left/right.
    4-Brain:
    Layer 1=all sense neurons plus creature state (e.g. energy level)
    Layers 2 - N=Randomly created and connected and evolved
    Final Layer=motor
    5-World:
    A simple 2D space with randomly placed items
    Walls+obstructions blocked movement
    Hazards sucked energy from creature
    Plants added food to creature if mouth touched plant
    Between 20 and 50 Creatures
    > Were the brains event-driven?
    It was a sequential flow as follows:
    1-Move creatures based on motor neurons and perform collision detection
    2-Set sensory neurons in layer 1 based on current state (e.g. is mouth touching plant, eye ray detection, creatures bodies touching, etc.)
    3-Calculate next state of brain in a feed-forward fashion through the layers. For recurrence it means, for example, a layer 2 neuron receiving input from a layer 6 neuron is using the value calculated from the previous cycle.
    goto step 1
    > How did you implement the GA? What did individual genes encode?
    I did not have a DNA or genes that drove the structure of the NN. I played with many ideas for a long time, but nothing seemed to be able to encode a higher level capability without dependence on a very specific NN circuit structure. I looked at varous ideas from other people, like NEAT from U Texas but I never found anything that I felt worked at the abstraction level I was hoping for when I started. It's a really fun, interesting and challenging problem, I really wonder how nature does it.
    I ended up creating an "evolution control" object that had many parameters at many different levels (entire network, specific leyers, specific section, etc.) that would guide (somewhat control, but mixed with randomness) the initial structure of the brains (layers, sections per layer, conectivity, etc.) and also the extent it could change each generation.
    Example of config parameters:
    "Chance of Changing A Neurons Activation Function=3%"
    "Types of Activation Functions Available for Layer 2, Section 3=..."
    After each generation, the creatures were ranked and depending on how good or bad they would be assigned a level of change for next generation.
    The level of change drove how much of the NN was eligible to possibly change (e.g. 10% of layers, sections, neurons, synapses, etc.)
    The evolution control object drove how possible different types of changes were (e.g. 3% chance switch activation function) and the magnitude of changes (e.g. up to 20% change in synapse weight)
    I'm curious how you handled ga/dna/gene stuff?

mindcrime 7 months ago

I haven't read the comments here yet, but I'm predicting there will be at least a few of the form "why would you bother doing this, you aren't an expert in AI, this is stupid, leave AGI to the experts, why would you think this could possibly work" etc. I hope not, but this being HN, history suggests those people will be out en-force.

I hope not. I think this is GREAT work even if the result was ultimately less than what was desired. And I want to encourage the author, and other people who might make similar attempts. I think we need more people "taking a stab" and trying different ideas. You might or might not succeed, but in almost every case the absolute worst scenario is that you learn something that might be useful later. If taking on something like this motivates someone to spend time studying differential equations, then I say "great!" Or if it motivates someone to study neuroscience, or electronics (maybe somebody decides to try realizing a neural network in purpose built hardware, for example) then also "Great!" Do it.

About the only serious negative (aside from allusions to opportunity cost) that I can see for making an effort like this, would be if somebody gets really deep in it and winds up blowing a shit-ton of money on the project, whether that be for cloud compute cycles, custom hardware, or whatever. I wouldn't necessarily recommend maxing out your credit cards and draining your retirement account unless you have VERY solid evidence that you're on the right path!

You are a worse driver in a worse car. If you follow the same strategy as the cars in front of you, pit at the same time and choose the same tires, you will certainly lose. The only chance you have is to pick a different strategy.

Yes, exactly. I adhere to a similar mindset. I do AI research in my spare time. And I cannot possibly afford to spend the kind of money on training ginormous ANN's that OpenAI, Microsoft, Google, Twitter, Meta, IBM, etc. can spend. To even try would be completely ludicrous. There is simply no path where an independent solo researcher can beat those guys playing that game. So the only recourse is to change the rules and play a different game. That's no guarantee of success of course, but I'll take a tiny, even minuscule, chance of achieving something over just ramming my head into the wall over and over again in some Sisyphean attempt to compete head to head in a game I know a priori that I simply cannot win.

Anyway.. to the OP: great work, and thanks for sharing. And I hope you decide to make other attempts in the future, and share those results as well. Likewise to anybody else who has considered trying something like this.

openquery 6 months ago

Thank you. I hope you write up findings from your own research.
To your point, I wasn't pretending that this work is novel or something that the AI community should take seriously. If anything, my point was that you can just do things.
I also feel like in the SWE community folks are generally concerned that LLMs are getting considerably better at doing our jobs. This was a poetic attempt at trying to regain some agency and not just let life happen _to_ you.
- mindcrime 6 months ago
  
  > I hope you write up findings from your own research.
  If and when I achieve anything useful I definitely will. Writing up failed experiments? Yes in principle, per the above. Finding time is probably the biggest challenge. The intention is definitely there though.
  But even aside from that, bits and pieces of what I'm working on at any given time dribble out via my participation in various oline forums, Github discussions[1] and posts here, on LinkedIn, etc.
  > If anything, my point was that you can just do things.
  Yep, yep. Absolutely. Again, even if the outcome isn't some earth shattering new discovery, you've still gained something from the process (in all likelihood).
  > This was a poetic attempt at trying to regain some agency and not just let life happen _to_ you.
  Well said. That very much echoes a lot of my own philosophy on life. Just do something at least. To me, I'd rather fail trying to do something, than just give up and do nothing.
  [1]: https://github.com/jason-lang/jason/discussions

Onavo 7 months ago

> Ok how the hell do we train this thing? Stochastic gradient descent with back-propagation won't work here (or if it does I have no idea how to implement it).

What's wrong with gradient descent?

https://snntorch.readthedocs.io/en/latest/

openquery 7 months ago

Thanks for sharing. I thought the discontinuous nature of the SNN made it non-differentiable and therefore unsuitable for SGD and backprop.
- Onavo 7 months ago
  
  Lol in differentiable programming they usually hard code an identity for the problematic parts (e.g. if statements)
thrance 7 months ago

Gradient descent needs a differentiable system, the author's clearly not.

whatever1 7 months ago

Try fewer neurons and solve it to global optimality with gurobi. This way you will know if the optimization step was your bottleneck.

cglan 7 months ago

I’ve thought of something like this for a while, I’m very interested in where this goes.

A highly async actor model is something I’ve wanted to explore, and combined with a highly multi core architecture but clocked very very low, it seems like it could be power efficient too.

I was considering using go + channels for this

jerf 7 months ago

The idea has kicked around in hardware for a number of years, such as: https://www.greenarraychips.com/home/about/index.php
I think the problem isn't that it's a "bad idea" in some intrinsic sense, but that you really have to have a problem that it fits like a glove. By the nature of the math, if you can only use 4 of your 128 cores 50% of the time, your performance just tanks no matter how fast you're going the other 50% of the time.
Contra the occasional "Everyone Else Is Stupid And We Just Need To Get Off Of von Neumann Architectures To Reach Nirvana" post, CPUs are shaped the way they are for a reason; being able to bring very highly concentrated power to bear on a specific problem is very flexible, especially when you can move the focus around very quickly as a CPU can. (Not instantaneously, but quickly, and this switching penalty is something that can be engineered around.) A lot of the rest of the problem space has been eaten by GPUs. This sort of "lots of low powered computers networked together" still fits in between them somewhat, but there's not a lot of space left anymore. They can communicate better in some ways than GPU cores can communicate with each other, but that is also a problem that can be engineered around.
If you squint really hard, it's possible that computers are sort of wandering in this direction, though. Being low power means it's also low-heat. Putting "efficiency cores" on to CPU dies is sort of, kind of starting down a road that could end up at the greenarray idea. Still, it's hard to imagine what even all of the Windows OS would do with 128 efficiency cores. Maybe if someone comes up with a brilliant innovation on current AI architectures that requires some sort of additional cross-talk between the neural layers that simply requires this sort of architecture to work you could see this pop up... which I suppose brings us back around to the original idea. But it's hard to imagine what that architecture could be, where the communication is vital on a nanosecond-by-nanosecond level and can't just be a separate phase of processing a neural net.
- openquery 7 months ago
  
  > By the nature of the math, if you can only use 4 of your 128 cores 50% of the time, your performance just tanks no matter how fast you're going the other 50% of the time.
  I'm not sure I understand this point. If you're using a work-stealing threadpool servicing tasks in your actor model there's no reason you shouldn't get ~100% CPU utilisation provided you are driving the input hard enough (i.e. sampling often from your inputs).
  - jerf 6 months ago
    
    To work steal, you must have work to steal. If you always have work to steal, you have a CPU problem, not a CPU fabric problem. CPU fabrics are good for when you have some sort of task that is sort of parallel, but also somehow requires a lot of cross-talk between the tasks, preferably of a very regular and predictable nature, e.g., not randomly blasting messages of very irregular sizes like one might see in a web-based system, but a very regular "I'm going to need exactly 16KB per frame from each of my surrounding 4 CPUs every 25ms". You would think of using a GPU on a modern computer because you can use all the little CPUs in a GPU, but the GPU won't do well because those GPU CPUs can't communicate like that. GPUs obtain their power by forbidding communication within cells except through very stereotyped patterns.
    If you have all that, and you have it all the time, you can win on these fabrics.
    The problem is, this doesn't describe very many problems. There's a lot of problems that may sort of look like this, but have steps where the problem has to be unpacked and dispatched, or the information has to be rejoined, or just in general there's other parts of the process that are limited to a single CPU somehow, and then Amdahl's Law murders your performance advantage over conventional CPUs. If you can't keep these things firing on all cylinders basically all the time, you very quickly end up back in a regime where conventional CPUs are more appropriate. It's really hard to feed a hundred threads of anything in a rigidly consistent way, whereas "tasks more or less randomly pile up and we dispatch our CPUs to those tasks with a scheduler" is fairly easy, and very useful.
openquery 7 months ago

Give it a shot. It isn't much code.
If you want to look at more serious work the Spiking Neural Net community has made models which actually work and are power efficient.
grupthink 6 months ago

I've implemented something similar to this using Golang Channels during Covid lockdowns. You don't get an emergence of intelligence from throwing random spaghetti at a wall.
Ask me how I know.

upghost 6 months ago

This. I love everything about this. Everything. Thank you. This is the kind of sh!t that made me get into programming in the first place.

Regarding specific reading, three books I think you would love are [1] the self assembling brain, [2] the archaelogy of the mind, and [3] evolutionary optimization algorithms.

People can talk whatever sh!t they want but this pushed us closer to actual AGI than anything this (useful but) deadend LLM craze is pushing us towards, and towards which you thoughtfully made an effort.

The most basic function of learning and intelligence is habituation to stimuli, which even an ameoba can handle but not a single LLM does.

Thanks again for this.

[1]: https://a.co/d/4TG1ZvP

[2]: https://a.co/d/aYReWjs

[3]: https://a.co/d/1cod8Bq

openquery 6 months ago

Thanks for the kind words and the recommendations. Ordered!
- upghost 6 months ago
  
  Archaelogy of the mind is an enjoyable marathon but a marathon nonetheless. Audiobook therefore highly recommended, as well. If you get past the "seeking" system section it will probably reframe your entire view of consciousness, intelligence, and what it means to be human. You will cringe significantly harder at the concept of LLMs becoming AGI once you get past the part on "primary affective processes". Self-assembling brain will help reconcile those conversations. And then evolutionary algos will help you build that brain out! Keep posting!!

henning 7 months ago

The author could first reproduce models and results from papers before trying to extend that work. Starting with something working helps.

skeledrew 6 months ago

I've found that many times new approaches are discovered by not having the bias created by knowledge of existing research. Knowing what others have done can sometimes lead to the abandoning of what could actually be a novel path, or unwittingly guide one too far down already discovered paths.
Of course, the knowledge may also sometimes be helpful. So ideally it's good that different persons tackle the problems with knowledge of different amounts and parts of existing work.

alecst 7 months ago

Love the drawings. Kind of a silly question, but how did you do them?

openquery 7 months ago

Excalidraw[0] and a mouse and a few failed attempts :)
[0] https://excalidraw.com/

andsoitis 7 months ago

If you’re looking for a neuroscience approach, check out Numenta https://www.numenta.com/

RaftPeople 6 months ago

Has HTM had any good results?
I followed them for a long time, but I really never heard of anything where they were beating other approaches.

fitzn 6 months ago

Thank you very much for writing this up. Good, thought-provoking ideas here.

oksurewhynot 7 months ago

Damn AGI got hands

robblbobbl 7 months ago

Finally singularity confirmed, thanks.

jowdones 7 months ago

[dead]

tyronehed 7 months ago

[dead]

dudeinjapan 7 months ago

The greatest trick AGI ever pulled was convincing the world it didn't exist.

homarp 7 months ago

using https://news.ycombinator.com/item?id=42324444 you could make a better joke
Also I was wondering about the source of the original quote, https://quoteinvestigator.com/2018/03/20/devil/