Surveying the Agent Ecosystem: MCP, LLMs, and Cloud Files

After defining the core concepts in the introduction, I wanted to look at the current landscape more closely: how modern AI agents are built, which tools already exist, and where the gaps still are.

This part mattered because the project was not only about building something with AI. I also needed to understand the architecture around agentic systems well enough to design a practical solution for automated file management across local and cloud environments.

That review gave me a clearer picture of what already existed and where the missing pieces were. It also corrected one of my early assumptions: there was a lot of surface-level tooling around agents, but much less depth once real file workflows entered the picture.

At the center of that ecosystem are agents that interact with large language models. They need to understand user prompts, create an execution plan, and follow it until the goal is completed. Alongside them, MCP servers expose the predefined tools the agents rely on. In my case, those tools were focused on file operations on both a local file system and a remote server.

How AI agents evolved

AI agents can be described as autonomous software entities that understand natural language and can use tools to act on behalf of the user.

A familiar example is the chatbot. Before the rise of modern LLMs, chatbots were often frustrating to use because they relied on predefined question-and-answer pairs. If a website had ten hardcoded FAQ responses and the user asked something slightly different, the bot usually failed. Even minor grammar mistakes or small changes in phrasing could prevent the system from matching the question to the right answer.

That was one of the reasons older chatbot experiences often felt brittle and unhelpful.

Large language models changed that. Because they are trained on vastly larger datasets and contain far richer internal representations than rule-based systems, they can interpret language more flexibly and generate more relevant responses. If a user misspells part of a prompt or phrases it in an unexpected way, an LLM-based system can still often infer the intended meaning with reasonable accuracy. That sounds obvious now, but it is the shift that made the whole agent idea practical.

That shift made chat interfaces much more useful. Modern assistants can summarize text, generate content, provide factual responses, and ask clarifying questions when the context is incomplete.

The more interesting transition came when these systems moved beyond conversation and into action.

Researchers began training and structuring LLM-based systems not only to respond, but also to plan and execute. That is where chatbots started turning into agents. Instead of just returning a text answer, an agent can break a user request into steps, identify the starting point and the desired outcome, and choose the right actions to move from one to the other.

Those actions may include:

asking the user for missing information
calling external tools
returning a direct text response
combining multiple steps in sequence

The agent coordinates this execution loop. Based on the type of step, it either generates output directly or invokes a tool exposed by an MCP server. It is also responsible for maintaining context, which in practice means tracking the message history and state of the current interaction.

The final architectural piece is the host application. The host launches the agent and acts as the bridge between the user interface and the agent itself.

Existing agent tools I reviewed

As part of the research, I focused mainly on tools that could be tested for free. Paid tools were noted, but I did not include them in hands-on evaluation unless a free version was available.

Claude

Claude, developed by Anthropic, is a conversational assistant designed to be safe, accurate, and useful for tasks such as information processing, ideation, writing, and coding.

It is available both through the web and through local applications. At the time of writing this research, Claude was available in two relevant forms:

Claude Desktop, a desktop application with a graphical interface
Claude Code, a command-line-oriented product

Both products supported relatively straightforward MCP server integration, which made them especially relevant to my research. However, Claude Code was not available as a free-tier tool.

One detail I found notable during testing was how Claude handled tool-based file operations. When a prompt would result in a file system change, the system requested explicit user permission before applying it. The user could then choose to:

reject the action
allow only the current action
allow all further actions within the current prompt flow

That pattern matters because it highlights an important design principle in agent systems: balancing autonomy with user control.

Claude Code also introduced the idea of subagents. In this model, separate subagents can handle different subtasks, such as:

user interface work
database changes
API design

I think this is a strong architectural pattern because each agent works in a smaller, more focused context. Instead of forcing one agent to reason over everything at once, the work becomes more modular. That can improve both execution quality and context efficiency. The more I looked at agent systems, the less I believed in one giant general-purpose agent handling everything well.

Cursor IDE

Cursor is a modern code editor built specifically around AI-assisted programming.

What makes it different from a traditional IDE is that the agentic layer is not just an add-on around the editor. It is embedded into the core workflow. The system is designed to help with writing, understanding, and improving code as part of the editing experience itself.

Cursor supports a wide range of languages and tools, which makes it flexible across different project types. From a product perspective, it is a useful example of how AI can be integrated into the development environment in a way that feels native rather than bolted on.

That is especially relevant to my broader interest in agentic UI and developer experience.

TypeMind

TypeMind is an advanced interface for working with intelligent conversational agents and multiple AI models through a clean and organized user experience.

It supports access to multiple model providers, such as GPT, Claude, and Gemini, and includes features like:

conversation history
search
customizable workspaces
team support
bring-your-own-API-key usage
cross-device synchronization
external tool integration

From a workflow perspective, TypeMind shows how much value can come from the orchestration layer around models, not just from the models themselves. That connects closely to how I think about AI products in general: the model matters, but the surrounding interface and system design often decide whether the experience is actually useful.

Existing MCP servers

The second major part of the chapter focused on MCP servers.

MCP servers are distributed across many repositories and websites, and in many cases they are duplicated or slightly modified versions of similar ideas. For the survey, I focused on two well-known sources for MCP server discovery:

Awesome MCP Servers
MCP Bench

What I found was pretty simple: major CDN providers such as Cloudflare, Google, Fastly, Akamai, Bunny.net, and Amazon did not offer official MCP servers for file editing workflows. That absence was more important than any individual server I found.

Open-source developers had created some unofficial servers, but mostly for well-known cloud providers. My assumption was that official support has been limited for two reasons:

MCP is still relatively new and not yet widely adopted
file-editing workflows are not the core product focus of most CDN platforms, which are primarily designed for content delivery and caching

That gap became one of the main motivations for my implementation work. Since Bunny.net did not have an MCP server for this use case, I used existing servers as reference points and then designed my own MCP server with a smaller but practical feature set.

MCP server patterns that influenced my implementation

Dropbox MCP server

The Dropbox MCP server acts as a bridge between an AI agent and the Dropbox API for cloud file storage.

It uses OAuth 2.0 authentication with PKCE, which allows secure authorization without exposing sensitive client-side credentials. Before use, the user must define environment variables such as:

DROPBOX_APP_KEY
DROPBOX_APP_SECRET

The server supports core file operations, including CRUD behavior, directory listing, folder creation, and file moving.

It also allows access restrictions through:

DROPBOX_ALLOWED_PATHS

That restriction is an important safety feature because it limits which parts of the Dropbox account the server can access.

One design issue I noticed was naming. The tool names were fairly generic and described the action, but not always the provider clearly enough. In a multi-provider environment, this can quickly lead to collisions or ambiguity.

Some example tools included:

list_files
upload_file
download_file
move_item

From an implementation perspective, the server was useful as a baseline for file capability design, but it also highlighted the importance of better naming conventions.

Google Drive MCP server

The Google Drive MCP server follows a similar idea, but integrates with Google Drive and related Google services.

Through the Google Drive API, it enables file search, browsing, and content access. That makes it useful for bringing notes, documentation, and other cloud-hosted resources into the working context of a host application.

Compared with the Dropbox version, its tool naming felt clearer and more scalable because the provider was reflected directly in the tool names.

Examples included:

gdrive_search
gdrive_read_file
gsheets_read
gsheets_update_cell

This naming approach is simple, but effective. It reduces ambiguity and makes it easier to understand which backend a tool belongs to, especially when multiple providers are connected to the same host application.

What this research changed for my implementation

This review shaped several practical decisions in the research project.

First, it clarified that an agent is only one part of the system. To make agentic workflows useful, you also need:

a host application
a reliable context model
a tool layer
clear permission boundaries
predictable naming and integration patterns

Second, it highlighted that the current ecosystem is still fragmented. There are strong examples of agent products and useful MCP server implementations, but the coverage is uneven, especially when it comes to cloud file workflows. In other words, the agent story is moving faster than the tool ecosystem underneath it.

Third, it reinforced something I keep seeing across both frontend engineering and AI systems: architecture matters as much as capability. A model can be powerful, but if orchestration is unclear, interfaces are inconsistent, or permissions are unsafe, the overall experience breaks down quickly.

Final note

For me, this part was less about listing tools and more about identifying patterns:

what makes an agent useful
how tool access should be structured
where current solutions are strong
where there is still room to build better developer workflows

From there, I could define the architecture of my own system much more deliberately.

That next step became designing and implementing my own agent-driven CLI and MCP-based file management system for local and cloud environments.