Anyone hosting OpenCTI

JoshCodes@programming.dev · 1 day ago

Dammit, so my comment to the other person was a mix of a reply to this one and the last one… not having a good day for language processing, ironically.

Specifically on the dragonfly thing, I don’t think I’ll believe myself naive for writing that post or this one. Dragonflies arent very complex and only really have a few behaviours and inputs. We can accurately predict how they will fly. I brought up the dragonfly to mention the limitations of the current tech and concepts. Given the worlds computing power and research investment, the best we can do is a dragonfly for intelligence.

To be fair, Scientists don’t entirely understand neurons and ML designed neuron-data structures behave similarly to very early ideas of what brains do but its based on concepts from the 1950s. There are different segments of the brain which process different things and we sort of think we know what they all do but most of the studies AI are based on is honestly outdated neuroscience. OpenAI seem to think if they stuff enough data into this language processor it will become sentient and want an exemption from copyright law so they can be profitable rather than actually improving the tech concepts and designs.

Newer neuroscience research suggest neurons perform differently based on the brain chemicals present, they don’t all always fire at every (or even most) input and they usually present a train of thought, I.e. thoughts literally move around in the brains areas. This is all very different to current ML implementations and is frankly a good enough reason to suggest the tech has a lot of room to develop. I like the field of research and its interesting to watch it develop but they can honestly fuck off telling people they need free access to the world’s content.

TL;DR dragonflies aren’t that complex and the tech has way more room to grow. However, they have to generate revenue to keep going so they’re selling a large inference machine that relies on all of humanities content to generate the wrong answer to 2+2.

JoshCodes@programming.dev · edit-2 1 day ago

I think you’re anthropomorphising the tech tbh. It’s not a person or an animal, it’s a machine and cramming doesn’t work in the idea of neural networks. They’re a mathematical calculation over a vast multidimensional matrix, effectively solving a polynomial of an unimaginable order. So “cramming” as you put it doesn’t work because by definition an LLM cannot forget information because once it’s applied the calculations, it is in there forever. That information is supposed to be blended together. Overfitting is the closest thing to what you’re describing, which would be inputting similar information (training data) and performing the similar calculations throughout the network, and it would therefore exhibit poor performance should it be asked do anything different to the training.

What I’m arguing over here is language rather than a system so let’s do that and note the flaws. If we’re being intellectually honest we can agree that a flaw like reproducing large portions of a work doesn’t represent true learning and shows a reliance on the training data, i.e. it cant learn unless it has seen similar data before and certain inputs provide a chance it just parrots back the training data.

In the example (repeat book over and over), it has statistically inferred that those are all the correct words to repeat in that order based on the prompt. This isn’t akin to anything human, people can’t repeat pages of text verbatim like this and no toddler can be tricked into repeating a random page from a random book as you say. The data is there, it’s encoded and referenced when the probability is high enough. As another commenter said, language itself is a powerful tool of rules and stipulations that provide guidelines for the machine, but it isn’t crafting its own sentences, it’s using everyone else’s.

Also, calling it “tricking the AI” isn’t really intellectually honest either, as in “it was tricked into exposing it still has the data encoded”. We can state it isn’t preferred or intended behaviour (an exploit of the system) but the system, under certain conditions, exhibits reuse of the training data and the ability to replicate it almost exactly (plagiarism). Therefore it is factually wrong to state that it doesn’t keep the training data in a usable format - which was my original point. This isn’t “cramming”, this is encoding and reusing data that was not created by the machine or the programmer, this is other people’s work that it is reproducing as it’s own. It does this constantly, from reusing StackOverflow code and comments to copying tutorials on how to do things. I was showing a case where it won’t even modify the wording, but it reproduces articles and programs in their structure and their format. This isn’t originality, creativity or anything that it is marketed as. It is storing, encoding and copying information to reproduce in a slightly different format.

EDITS: Sorry for all the edits. I mildly changed what I said and added some extra points so it was a little more intelligible and didn’t make the reader go “WTF is this guy on about”. Not doing well in the written department today so this was largely gobbledegook before but hopefully it is a little clearer what I am saying.

JoshCodes@programming.dev · 2 days ago

Studied AI at uni. I’m also a cyber security professional. AI can be hacked or tricked into exposing training data. Therefore your claim about it disposing of the training material is totally wrong.

Ask your search engine of choice what happened when Gippity was asked to print the word “book” indefinitely. Answer: it printed training material after printing the word book a couple hundred times.

Also my main tutor in uni was a neuroscientist. Dude straight up told us that the current AI was only capable of accurately modelling something as complex as a dragon fly. For larger organisms it is nowhere near an accurate recreation of a brain. There are complexities in our brain chemistry that simply aren’t accounted for in a statistical inference model and definitely not in the current gpt models.

JoshCodes@programming.dev · 1 month ago

I’m thinking data entry for threat hunters, and integrations with our other platforms apis but I couldn’t say anything specific. SSDs are a good shout, I might have tried setting it up with hdds if you hadn’t said.

Did you find it easier to add connectors in seperate docker containers or within the main octi container?

It feels like there’s a pretty high ceiling for this platform and the data you can generate. Do you find it easy to create good data? Do you have any habits?

I’m pretty keen to learn so feel free to answer what you can.

JoshCodes@programming.dev · 2 months ago

So save files exist. Also custom user content. So the hash will change accordingly. Plus some cheats don’t require a modification of game files anyway, they use memory analysis to get, say, the location of other player objects, then they manipulate local information to give the player an advantage. This is how aim hacks and wall hacks work.

Cheats are hard to prevent for the sole reason of you don’t own the computer they could be running on. You can’t trust the user or the machine, and have to design accordingly. This leads many to the “solution” that is kernel level anticheat, it gives total access to the system.

JoshCodes@programming.dev · 2 months ago

Not who you asked, but did you ever hear of Valiant and their kernel level anti cheat.

This is not a 1:1 comparison but anticheat software running in the kernel has the ability to monitor all other processes due to its permission levels. It can monitor all scheduled tasks and infer from that information.

Drivers need similar access but for different reasons, they need access to os functionality a user would absolutely never be granted. This is because they interface directly with hardware and means when drivers crash, they generally don’t do it gracefully. Hence the BSOD loop and the need for booting windows without drivers (i.e. safe mode) and the deletion of the misconfiguration file.

JoshCodes@programming.dev · 2 months ago

Really don’t care much about my cv. This program is a great way to learn about the STIX protocol so no idea what you mean about “no actionable skills”. STIX is an interesting information sharing method, the program is well designed to educate the user on it and seeing the format it imports and exports data will teach me a buttload.

More to the point, maybe could you be less cynical and share some advice. I’m not going to flex my qualifications cos they’re mediocre but I’ve got smart people around me who just don’t know this particular program and I’m interested to hear from those who do.

Do you run this program at work or at home? Have you learned anything interesting from using it? Are there avoidable mistakes I could not repeat from hosting it? Answers to those questions would be very useful.

JoshCodes@programming.dev · 2 months ago

I dont see myself doing too much configuration with connectors to begin with which brings some of the difficulty down. I was asking to see if others run anything similar in their home configuration. I’ve met people who run MISP from home before so it sounded feasible to me.

I was also looking for the community aspect of this, I already knew they had a docker-compose config. I wanted to know who had attempted this before and what they’d learned, that sort of thing.

JoshCodes@programming.dev · 2 months ago

Anyone hosting OpenCTI

JoshCodes@programming.dev · 2 months ago

Only man I’ve ever seen pick shit from between his toes and eat it while having a philosophical discussion about FOSS.

10/10 agree with the ideology and think he’s an amazing programmer 0/10 agree with his culinary recommendations

https://piped.video/watch?v=Rhj8sh1uiDY&t=11

JoshCodes@programming.dev · 2 months ago

Eyyyy, I’m on Mint!

JoshCodes@programming.dev · 2 months ago

My bad, what linux distro you running?

JoshCodes@programming.dev · 2 months ago

Nice try Microsoft, I still don’t like your monthly “small” ui changes that hide the features I use and add extra “get copilot now” buttons

JoshCodes@programming.dev · 4 months ago

Pretty sure it is, might just be their grammar.

I read it as “Godot, or DirectX (which my aim hallucinated is a game engine)”

JoshCodes@programming.dev · 4 months ago

git commit -m “if this doesn’t fix it I’m looking up availabilities at my nearest maccas”

JoshCodes@programming.dev · 5 months ago

Cyber security guy here: we care about 22 for SSH, 443 and 80 for Web traffic, 3389 for RDP and 21 for FTP. Everything else we google and we all have to google 21 and 3389 because we all forget them half the time anyway.

JoshCodes@programming.dev · 5 months ago

This is a great explanation, pretty much what I would have said

JoshCodes@programming.dev · 5 months ago

Relevant xkcd

JoshCodes@programming.dev · 10 months ago

Not the shark fucker but could you send me the guide on how to do this? I would love to set this up. Also does it work for multiple accounts?

JoshCodes@programming.dev · 10 months ago

The Windows network troubleshooter is black magic from the depths of hell itself and is very opinionated and selective in choosing which issues to fix and whether you’ll need to bargain your soul to recieve said fix. I have red hair and find it doesn’t bother bartering with me, but your mileage may vary.

JoshCodes@programming.dev · 1 year ago

I suppose I was lucky in some ways. I stopped using Reddit a few months ago, after 5 years of addiction, but I was on the way out anyway. I had some bad experiences asking for help, never really posted otherwise and just generally the community made me feel like being inexperienced with anything was the same as being an asshole. I moved to lemmy and I instantly started posting more, answering questions and basically just enjoy talking to people on here. I haven’t been back and deleted my account months ago.

JoshCodes

Anyone hosting OpenCTI

Anyone hosting OpenCTI