by Patrix | Apr 30, 2026
Something about Qwen3.6 stuck with me when I read through the release notes: it has a switch. Not a metaphorical one. A literal toggle between thinking mode and non-thinking mode. You decide, per conversation, whether you want the model to reason through a problem or just respond.
That sounds like a minor feature. It’s not.
Most local AI models are a single gear. You give them a prompt, they generate tokens, done. Qwen3.6 introduces a hybrid reasoning mode that works through complex problems step by step, then carries that reasoning trace into the next turn if you want it to. Or you can turn it off entirely for faster, more conversational responses.
The mechanics are cleaner than they sound. Thinking mode is for hard problems: code, math, reasoning chains where you want the model to show its work before committing to an answer. Non-thinking mode is for everything else, where speed matters more than depth. And there’s a third option called “Preserve Thinking” that keeps the reasoning trace alive across the entire conversation, so the model builds on what it already reasoned through rather than starting cold each turn. In practical terms, that means fewer tokens spent re-deriving context — reportedly around a 40% reduction in token usage on complex agentic workflows, with no measurable accuracy loss.
The architecture behind this is worth a brief look. Qwen3.6 combines linear attention with sparse mixture-of-experts routing. What that means in practice: it retains context more efficiently than standard attention models, which is why you get 256K context without the usual degradation at longer sequences. The thinking toggle sits on top of this architecture rather than requiring a separate model endpoint. One model, one download, two modes. That keeps the implementation clean in a way that matters when you’re running this locally.
I’ve been setting up local models on the Mac Mini M4 Pro, and the 27B variant is the one worth paying attention to here. At 4-bit quantization it weighs 18GB, which fits comfortably in 24GB unified memory. On the M4 Max, the closest available benchmark to the M4 Pro, Q4_K_M quantization runs at around 16 tokens per second. Fast enough for real work. One practical note if you’re pulling GGUFs manually: use Q4_K_M, not IQ4_XS. There’s a known llama.cpp/Metal regression that drops IQ4_XS performance to around 5 tokens per second on Apple Silicon. Q4_K_M sidesteps it entirely.
The benchmark numbers are also unusual for a 27B model. On SWE-bench Verified, a coding test that involves actually solving real GitHub issues, Qwen3.6-27B scores 77.2%. That matches or beats Qwen’s own 397B parameter model on major agentic coding benchmarks, despite having 14 times fewer total parameters. The architecture is doing real work there, not just compressing the same capability into a smaller shell.
One honest caveat: Qwen3.6 doesn’t run in Ollama yet. The multimodal components use separate projection files that Ollama’s current architecture doesn’t handle. You’ll need llama.cpp or Unsloth Studio. Unsloth Studio installs with a single curl command and auto-configures inference parameters when you select the model. MLX quants are also available for Apple Silicon if you want a more native Mac experience, though llama.cpp with Metal support is fast enough that the practical difference is small.
The question that interests me more than the setup: when do you actually want an AI to think? Not as a philosophical exercise. As a practical decision you make before sending each message. Thinking mode takes longer and burns more tokens. Non-thinking mode is faster but shallower. Most of the time the right answer is obvious in retrospect, but Qwen3.6 forces you to have an opinion about it upfront.
Most AI tools don’t ask you to think about how they think. This one does. You’re not just prompting — you’re choosing whether to engage the model’s reasoning machinery or route around it. That’s an unfamiliar position if you’re used to treating local models as fast autocomplete. For structured tasks with clear requirements, thinking mode earns its slower response time by reasoning through edge cases the model would otherwise skip. For quick Q&A, non-thinking mode is the right call. The sweet spot I’m most curious about is agentic workflows, where toggling thinking mode per subtask could cut token costs significantly without sacrificing quality on the steps that actually need careful reasoning.
Worth installing. Worth testing. Particularly if you have a 24GB Mac and you’ve been looking for a local model that doesn’t feel like a compromise.
by Patrix | Mar 11, 2026
You gave it a shot. You typed something into Claude or ChatGPT, got back something that was technically correct but felt flat. Generic. The kind of output that could have come from anyone. You wondered what everyone was so excited about.
Here’s what I’ve figured out: that experience isn’t evidence that AI tools are overhyped. It’s diagnostic. It tells you exactly where you are in a progression, and that progression has six levels.
Level 1 Is Where Almost Everyone Starts (And Where Too Many People Stop)
At Level 1, you’re using the tool like a search engine with extra steps. You type a command. It responds. You type another command. It responds. It’s a one-way relationship. You’re directing, it’s executing. And here’s what happens when you run any AI system that way: it defaults to the mean. Average outputs, average aesthetics, average thinking. That’s why so many AI-generated websites have the same purple gradient and the same generic fonts. That’s where “AI slop” comes from. Not from the model being bad — from the interaction being shallow.
The good news: you don’t have to stay there.
The Six Levels of Claude Code Fluency
A practitioner named Chase put together the clearest framework I’ve seen for thinking about this progression. It’s framed around Claude Code specifically, but the underlying arc applies to any AI tool you’re learning seriously.
- Level 1, Prompt Engineer: Commands only. One-way. Generic outputs.
- Level 2, Planner: You start asking instead of just telling. Collaborative questions, back-and-forth, letting the AI push back on your ideas.
- Level 3, Context Engineer: You learn that what you feed the AI shapes what you get. Context management, examples, constraints. Less is often more.
- Level 4, Tool User: You extend the AI with external tools like web scraping, browser automation, and deployment pipelines. You also start understanding the building blocks of what you’re creating, not just the output.
- Level 5, Skill Author: You turn your best workflows into reusable, personalized skills. The tool starts working the way you work.
- Level 6, Orchestrator: Multiple AI instances working in parallel, handling different parts of a problem simultaneously. You’re the manager now.
Most people are at Level 1. A lot of people make it to Level 2 or 3 before plateauing. Levels 4 through 6 are where the real compounding happens.
This Is Exactly How Learning Any Craft Works
When you pick up a guitar for the first time, it sounds terrible. That’s not evidence the guitar is a bad instrument. It’s evidence that you’re at the beginning of a skill curve that takes time to climb. The gap between “making sound” and “playing music” is enormous. Crossing it requires understanding that there are levels, that they’re learnable, and that the early frustration is part of the process.
AI fluency is the same. The first stage, “I can make it say things,” is the equivalent of plucking a string. It works. It produces output. But it’s nowhere near what the instrument can do.
The frustrating thing about AI tools right now is that nobody hands you a roadmap. You’re expected to figure out the levels on your own, in a space that’s changing fast enough to make even experienced practitioners feel like they’re behind.
The Thing to Try Next
You don’t need to get to Level 6 to feel the difference. The jump from Level 1 to Level 2 is the biggest one, and it comes down to a single habit shift: stop commanding and start asking.
Instead of “build me a website,” ask: “What questions do you have before we start?” Instead of “write me a post about X,” ask: “What’s missing from my brief?” Let the AI push back. Ask it what you haven’t thought of.
That shift — from director to collaborator — is where the tool stops feeling like a fancy autocomplete and starts feeling like something genuinely useful.
The craft is worth learning. The map is there. You’re probably closer to the interesting part than you think.
by Patrix | Mar 10, 2026
Most people’s note-taking system has a dirty secret: it’s a graveyard.
You capture something interesting (an idea, an article, a half-formed thought) and it disappears into a folder you’ll never open again. Weeks later you vaguely remember reading something relevant to exactly what you’re working on, but good luck finding it. Your notes app knows where everything is. You don’t.
The concept of a “second brain” has been kicking around productivity circles for a while now, popularized by writer and teacher Tiago Forte. The idea is simple: your biological brain is for having ideas, not storing them. Offload the storage to a trusted system, and your real brain is free to do what it’s actually good at: connecting things, making decisions, creating.
The problem is that most second brain systems are still passive. You put things in. You search for things. The system itself doesn’t do anything. It’s a very organized graveyard.
That’s starting to change.
The Right Foundation: Obsidian
If you haven’t heard of Obsidian, it’s a note-taking app built on a simple but powerful idea: your notes are just plain text files on your own computer, linked together like a personal Wikipedia. No subscription, no cloud lock-in, no company that can change the rules on you. Just a folder of Markdown files you own forever.
What makes Obsidian special isn’t any single feature — it’s the philosophy. Notes link to other notes. You can see a visual map of how ideas connect. And because everything is just files on your disk, it plays nicely with other tools in ways that cloud-based apps never will.
That last part matters more than it sounds.
Structure: Where the Magic Actually Lives
A second brain without structure is just a pile of notes with extra steps. The structure is what turns a collection into a system.
One approach that works well is called IPARAG, a folder system that maps to how life actually works, not just how information works:
- Inbox: the landing zone. Everything goes here first, raw and unprocessed.
- Projects: active work with a finish line. A trip you’re planning, a post you’re writing, a decision you’re working through.
- Areas: ongoing responsibilities that don’t end. Your health, your finances, your garden, your hobbies. Things you maintain, not complete.
- Resources: reference material, templates, SOPs, things you want to be able to find when you need them.
- Archives: completed projects, old captures, anything that’s done its job but might be worth keeping.
- Galaxy: durable knowledge. The ideas worth synthesizing, the conclusions worth preserving, the thinking that compounds over time.
What’s interesting about this structure is that it doesn’t just organize information. It organizes life. Projects have momentum and finish lines. Areas have ongoing priorities and open questions. Galaxy is where scattered notes eventually become real understanding. It’s a system that reflects the way things actually work, not an idealized filing cabinet.
Where Claude Code Changes Everything
Here’s where things get genuinely interesting.
Claude Code is an AI coding assistant, but calling it that undersells what it actually does. It’s an AI agent that can read files, write files, search through directories, and reason about what it finds. And because Obsidian vaults are just folders of text files, Claude Code can work directly inside one.
That’s a different proposition than asking an AI a question. This is an AI that lives in your system.
What does that make possible? Quite a bit:
Inbox processing. Drop raw captures into your Inbox folder. Ask Claude to process them: normalize the formatting, add frontmatter, suggest tags, route them to the right folder, flag anything that deserves a deeper note. A chore that used to take 20 minutes becomes a two-minute conversation.
On-demand briefings. Ask “What do I know about fermentation?” or “Catch me up on my investing notes from the last six months” and instead of searching, you get a synthesized answer drawn from your own vault. Your notes, organized and summarized, on request.
Synthesis. You’ve been capturing ideas on a topic for months. Ask Claude to read the relevant notes, find the patterns, surface the tensions, and draft a synthesis note that pulls it all together. This is the step most people never get to. It’s where captured ideas actually become thinking.
Project support. Ask for a summary of where a project stands, what the blockers are, what the next logical action is. Claude has read everything you’ve written about it.
The common thread: the AI isn’t replacing your thinking. It’s doing the overhead that keeps most people from thinking at all.
What This Makes Possible (If You Zoom Out)
Stack all of this together and something interesting emerges: a system that gets smarter as you use it.
Every note you add is context Claude can draw on. Every connection you make teaches the system something about how you think. Weekly reviews start writing themselves. Research for a new project surfaces what you already know. Decisions get easier because your past reasoning is actually accessible.
This isn’t science fiction. The tools exist today. They’re not seamless yet. There’s real setup involved, and the experience still has rough edges. But the core of it works, and the trajectory is obvious.
The Honest Caveat
None of this is magic out of the box. Setting up a vault with real structure takes time. Learning to capture consistently takes habit. Getting Claude Code configured and pointed at your files takes some technical comfort.
But here’s the thing about systems: the setup cost is one-time. The payoff compounds.
If you’re the kind of person who finds this genuinely interesting, who gets curious about what AI can actually do and not just what it promises, this rabbit hole is worth going down.
The Bigger Picture
We’re at an early moment with this stuff. The idea of an AI agent that understands your personal knowledge system, helps you process it, and helps you use it? That’s new. It’s not fully baked. It’s going to keep evolving.
But the ingredients are here. A local, structured, plain-text vault. An AI agent that can read and write files. A folder system that maps to real life. Put them together and you have something that’s genuinely different from anything that existed a few years ago.
Your notes don’t have to be a graveyard. They can be a thinking partner. That’s worth paying attention to.
by Patrix | Nov 21, 2025
If you’ve been refreshing your news feeds like I have for the past 48 hours, you know the wait is finally over. Google officially dropped Gemini 3 on Tuesday, and to say it’s a “step up” would be the understatement of the year. It feels less like a software update and more like we just unlocked a new tier of the simulation.We’ve seen AI that can paint, and we’ve seen AI that can code. But Gemini 3 is the first model that genuinely feels like it understands the soul of both disciplines. Whether you’re a digital painter trying to render perfect typography or a developer building the next big agentic app on the newly released Antigravity platform, everything just shifted.Let’s start with the visuals, because this is where the leap is most visceral. For the longest time, AI image generation was a game of “prompt roulette”—spinning the wheel and hoping for six fingers instead of seven.Enter: Nano Banana Pro
I know, the name sounds like a smoothie ingredient, but Nano Banana Pro is the official name of the new image generation engine built on the Gemini 3 foundation, and it is an absolute beast.
- Text That Actually Reads: We can finally say goodbye to the days of gibberish alien languages on AI-generated signs. Gemini 3 renders text within images with near-perfect accuracy. If you need a cyberpunk street scene with a neon sign that says “Artsy Geeky 2025,” it just does it. No Photoshop patch-up required.
- 4K Native Resolution: We are talking about crisp, 4K output straight out of the gate. The details in lighting, texture, and depth of field are startlingly photorealistic.
- Fine-Tune Controls: This is the “Pro” part. You aren’t just prompting; you’re directing. You can now adjust specific parameters like camera angle, f-stop (depth of field), and lighting temperature using natural language.
Multimodal “Vibe” Checks
The “multimodal” buzzword gets thrown around a lot, but Gemini 3 lives it. You can now upload a video clip—say, a scene from a movie you love—and ask Gemini to “capture this mood for a short story.” It analyzes the lighting, the pacing, the audio cues, and the emotional subtext to generate writing that feels like that video looks. It’s synesthesia as a service.
PhD-Level Reasoning
Okay, devs and data nerds, huddle up. The pretty pictures are nice, but what’s under the hood is where the real revolution is happening.
The “Deep Think” Protocol
Google has introduced a new mode called Deep Think, and it’s terrifyingly smart. In benchmark tests (specifically the GPQA Diamond), Gemini 3 is hitting PhD-level reasoning scores that leave previous models in the dust.
This isn’t just about answering questions faster; it’s about thinking longer. When you hit “Deep Think,” the model allocates more compute time to structure its chain of thought before outputting a single character.
- Complex Logic Chains: It can dismantle multi-layered logic puzzles that would trip up Gemini 2.5.
- Code Architecture: Instead of just spitting out a Python script, it plans the entire directory structure, dependencies, and edge-case handling before writing a line of code.
The Antigravity Platform
This is the big one for the builders. Alongside Gemini 3, Google launched Antigravity, a dedicated platform for building “Agentic” apps.
We aren’t just building chatbots anymore; we are building agents that do things.
- Autonomous Workflows: You can task a Gemini 3 agent to “monitor this GitHub repo, and if a PR matches these criteria, run this specific test suite and Slack me the results.”
- 1 Million Token Context (Stable): The 1M context window isn’t experimental anymore; it’s the standard. You can dump an entire legacy codebase into the context and ask Gemini to refactor it for modern standards, and it won’t “forget” the beginning of the file halfway through.
Vibe Coding
“Vibe Coding” is the term getting tossed around the developer discords right now. It refers to using Gemini 3’s natural language capabilities to build apps based on a “vibe” rather than a spec sheet.
Because Gemini 3 understands visual and tonal nuance so well, you can describe an app: “I want a to-do list app that feels like a calm, rainy Sunday morning in Tokyo.”
Gemini 3 won’t just build a to-do list; it will:
- Select a muted, cool-toned color palette.
- Suggest a minimalist UI with soft rounded corners.
- Write the CSS and React components to match that specific aesthetic.
Gemini 2.5 vs. Gemini 3: The Cheat Sheet
For those scanning for the upgrade incentives, here is the raw data:
| Feature | Gemini 2.5 Pro | Gemini 3 |
|---|
| Reasoning | Strong | PhD-Level (Deep Think) |
| Context Window | 1M (Experimental) | 1M (Stable/Native) |
| Image Gen | Standard (Imagen 3) | Nano Banana Pro (Text + 4K) |
| Dev Platform | Vertex AI Standard | Antigravity (Agent First) |
| Video Understanding | ~83% MMMU Score | 87.6% MMMU Score |
The Elephant in the Room
We can’t talk about this without addressing the creative anxiety. I’ve seen the threads. Artists are worried. Writers are worried. And honestly? That fear is valid.
When a machine can replicate a “mood” or render perfect typography, the barrier to entry for creating “good enough” art drops to zero. But here is my take after 48 hours with Gemini 3: It raises the ceiling more than it lowers the floor.
The “Deep Think” mode is brilliant, but it still needs a Thinker. The Nano Banana engine renders beautiful pixels, but it needs a Visionary to direct the camera. Gemini 3 is the most powerful co-pilot we have ever seen, but it is still sitting in the passenger seat. The destination? That’s still up to us.
Gemini 3 isn’t just an upgrade; it’s a challenge. It challenges us to dream bigger, code smarter, and create with more audacity than ever before. The tools are no longer the bottleneck. The only limit now is your own imagination.
by Patrix | Nov 17, 2025
There is a quiet shift happening in small business offices, garages, studios, and spare bedrooms everywhere. Owners are discovering that generic AI use is helpful, but strategic AI use is transformative. The key is pairing the right model with the right task instead of treating every problem as something a single chatbot should solve. That approach wastes time, produces mediocre results, and hides the true power of these tools.
Choosing the right AI model for each job is similar to building a reliable toolbox. A socket wrench, a Phillips screwdriver, and a hammer all sit under the same lid, but nobody expects them to do the same thing. Models differ the same way. Some are built for writing, some for vision, some for coding, some for speech, some for data analysis, and some for workflow automation. Organizing them intentionally can simplify daily operations for any small business owner.
What follows is a practical and creative look at how specific AI models fit into specific functions of small business life. None of this replaces real judgment or real craftsmanship. It simply makes room for more of it.
Content Creation with Language Models
The most obvious use of AI in small business is writing. Marketing copy, newsletters, proposals, product descriptions, and internal documentation all eat time. General chat models can handle these tasks, but targeted language models make them smoother and more accurate.
Modern text generation models excel when you give them clear roles. Instead of asking a generic model to write everything, choose specialized versions or specialized prompt structures tuned for tone, length, and consistency. Use them to generate drafts, refine messages, or rewrite material into a house voice that feels natural to customers. Language models are also ideal for repurposing content across platforms so one idea can serve Instagram, a blog post, an email, and a short video script.
Small businesses benefit most when they treat writing models as partners, not printers. They help clarify ideas, break creative blocks, document processes, and keep communications steady even when schedules get chaotic.
Vision Models for Product, Branding, and Operations
Image generation and vision analysis models open a second arena of opportunity. They are useful far beyond creating pretty pictures. Vision models help develop product prototypes, test packaging ideas, explore branding directions, and even analyze photos from real environments.
Small retailers use vision tools to stage products in hypothetical rooms without paying for studio time. Local restaurants use them to explore menu display ideas or experiment with digital signage looks. Artists or makers use them to visualize variations of a piece before committing materials. Service businesses use them for brand moodboards or social media assets that match a unified style.
Vision models also help with practical tasks. They can interpret images from a job site, identify materials, compare before and after results, and speed up quality control. They do not replace the human eye, but they save time and reduce uncertainty.
Speech Models for Calls, Voice Notes, and Transcription
Many small business owners run companies through conversations. Calls with clients, voice memos after appointments, quick walkthroughs of ideas, and fast notes between meetings all contain valuable information. The trouble is getting that information into a usable form.
Speech models solve this. They transcribe, summarize, and extract action items from phone calls, meetings, field recordings, and brainstorming sessions. They turn days of scattered notes into structured plans. They can even translate or clean up audio for clear communication with clients who prefer verbal updates.
When used consistently, speech models create a living record of daily operations. That record supports continuity, training, onboarding, and future planning.
Data Models for Analysis and Forecasting
Small businesses generate data without realizing it. Sales, appointments, website traffic, customer feedback, inventory cycles, and marketing performance all point to patterns worth understanding. Data analysis models take these raw numbers and reveal practical insights.
These tools help answer real operational questions. Which items sell together. Which days will likely be busy. Which marketing channels actually convert. How long new customers tend to stay engaged. Where waste happens in production. Which tasks slow down growth.
Data models are not there to replace accountants or financial professionals. They provide a clear picture so owners can walk into those meetings prepared. They give clarity without requiring a degree in statistics.
Automation Models for Workflow and Integration
The true efficiency of AI shows up when models are connected. Workflow engines and automation models coordinate multiple steps so tasks run in the background instead of eating up the business owner’s time.
Imagine this chain happening automatically:
- A customer fills out a form.
- A structured summary is created by a language model.
- A vision model processes any images attached.
- A data tool updates the CRM.
- A writing model drafts a follow up email.
- A workflow runner sends the email.
- A speech model generates a voicemail script.
This is normal now. Small businesses can run sophisticated systems without hiring teams. When each model does what it does best, workflows become smooth instead of fragile.
Choosing the Right Model for Each Job
There is no universal chart that works for everyone. Each business has its own rhythm, its own pressure points, and its own creative style. The most effective approach is to start by identifying where time disappears.
Look at weekly patterns. Identify repetitive tasks. Examine where bottlenecks happen. Notice what work gets dropped when the schedule fills up. Then assign the right model to take pressure off that region. Use writing models for content, vision models for branding and review, speech models for knowledge capture, data models for clarity, and workflow tools to tie everything together.
The value is cumulative. Each improvement frees the owner to think, create, and lead rather than chase small tasks.
The Creative Advantage
Every small business is ultimately a creative act. AI models, when used with intention, protect that creative energy. They allow owners to shift from constant reaction to thoughtful direction. They help transform scattered effort into focused momentum.
The point is not automation. The point is space. Space for ideas. Space for listening. Space for customers. Space for building something that reflects its founder.
Small businesses that choose models intentionally do not work more. They work better.