Data Scraping Has Gotten A Lot Simpler

Last Week's Feature Makes A Comeback!

For those that are new around here, this is weekly newsletter where I highlight new and innovative AI products that are worth exploring.

Hey hey!

Happy Friday! We’re back for another 2024 issue.

In this week’s issue:

  • Product of the Week

  • Help Make The AI Product Report Magical!

  • Other AI Things Happened (Last week’s Wondercraft.ai’s latest!)

  • What I’m Reading

PRODUCT OF THE WEEK

After testing dozens of new AI products this week. Here’s my top pick.

This week’s product pick is Browse.AI, a pretty clever platform that's enabling a pretty big part of how scraping (read: performing data collection from various online sources, lists, search results, etc...) can be made that much smoother. A question I found myself with when reviewing this product was pretty simple:

Is it a better wrench? Or a whole new tool? Truth is, it’s both.

With its AI-powered backend, Browse AI lets you effortlessly create custom automations to extract THEN monitor data from any website, all without a single line of code. Whether you're a market researcher, a startup enthusiast, or a part of a large enterprise, this tool is your gateway to unlocking the web's vast potential. From prebuilt robots that cater to common use cases to the flexibility of crafting your own solution using a browser extension, Browse AI makes it simple, fast, and incredibly effective.

I'm just amazed at how data scraping has evolved into a breeze compared to its complex past. Gone are the days of cumbersome processes and endless coding headaches. Now, with just a few clicks and some savvy tools, I can get just as far with a fraction of the sweat.

To get us started, I’ve added in the 1-minute pitch video in here so check it out:

Browse AI’s offers two main flavors of codeless scraping technology. Firstly, users can statically collect present-day data via a point in time execution of their marked data points (technical term: attributes). Secondly, users can start tracking The evolution of their desired data source over time with gradual monitoring features.

In the field tests, it really did perform as simply as the video shows, demonstrating some robustness to some of the trials I put it through.

Getting started with this tool was as easy as 1-2-3:

  1. You install a Browser extension that lets the Browse recorder do its thing

  2. Just like in the video, you really do highlight the data points you want to collect on a website (Limitation here to the robustness: Text visible on screen that is covered by an invisible copy protection isn’t loadable. Not even via a combination of HTML URL-to-OCR caption matching)

  3. and then you confirm your pick, and wait for exports

Starting with a Lifehacker tutorial, I got my hands on all the mentioned resources’ useful links from the page, and was on my way in about 5 minutes, from install to results.

The user experience (UX) design overall was easy to navigate throughout, forgiving of user mistakes, and is clearly designed to guide users intuitively through the process. This intent comes through in clear instructions, feedback on actions, and the minimization of user error through thoughtful interface design. It's like having a step-by-step guide to steer you back on track if you stray or make an error, and ensures that you always know what to do next. This type of UX design made me feel supported and confident while interacting with the product.

A Cornerstone Of The Browse.ai Conversation - Pre-builts For Days

The team at Browse.ai went heavy with the pre-builts! I was happily surprised at the sheer volume of well-known sites that were fully ready to go without lifting a finger. The range of pre-builts was also quite exhaustive as there was a range of “templates” ranging from:

  • Hiring tools (Glassdoor, Indeed, and others)

  • Informative sites

  • Discovery Sites (Redfin results for real estate research)

  • eCommerce (Amazon for product discovery)

  • Press release websites

  • And trust me, there were a whole lot more. See my pass through for yourselves, the volume is staggering:

Scrolling to the end of the list had me editing the clip and running it at 2x speed to make it a GIF.

For the good sense of making my life easier producing quality content, I thought to myself: “Wait a minute, I can use this to make a raw-but-detailed list of AI-Powered Products, and I can add this list to my others for later review & exploration!” … so I did just that with the Product Hunt pre-built.

As you can see below the templated task results provided a number of useful attributes ready for consumption whether human or robotic consumption - because, yes, this SaaS connects to the near and dear Zapier platform. Default settings everywhere yielded 9 fields like default rankings, Product Titles, descriptions, review scores, product detail URLs, etc.

Trailing thoughts on the long-term

The future of this SaaS product is quite bright, but recent events present a second side of its value proposition: a note of risk due to its role in enabling the wholesale collection of internet data by AI model creators sourcing their training data from private online sources. I’ve personally got a growing curiosity on how this product will evolve, and how its management team will navigate the increasing global pressure around the use (or in this case enablement to programmatic use) of proprietary content. While such content is intended for public consultation, it often ends up being harnessed for training AI models.

This situation underscores the ongoing challenges regulators & industry standard setters have in establishing a generalizable, ethical framework for internet data collection that is at least somewhat deterministic, rather than being so high-level that it would brush near uselessness. As we can see in the news for the week ending January 2024 (read the recap below!), the complexities and debates surrounding the ethical use of internet data for AI development are far from reaching a resolution.

YOU STILL HAVE A CHANCE TO STEER THE SHIP!

“With a 4-digit platform comes 4-digit responsibilities” - Still relevant

Last week I launched the year’s first community poll! The idea with this was to ask many of you a few questions (7 questions at most) to let me know what sparks your interest, and how I can improve the newsletter.

The Community poll is going to be open until the 8th of February, 2024, so please get in there and submit your opinions!

Thank you for being the best part of this journey. 💌

OTHER AI THINGS HAPPENED

Some other notable news and product launches from this week

  • Watch out retail shelves, AI is coming for your planograms - This new product launched by Hivery looks to enhance retail planning’s capabilities to display the right products at the right place, with the end goal to likely optimize margins.

  • SaaS providers who make no promises for the long-term safety of our data have met their (up and coming) match - Fantastic opportunities are opening up in the backups & data conservation side are opening up, thanks to R-Scout by HYCU. This middleware drafting tech has the ability to help build API integrations with a growing number of SaaS solutions in order to pull data from client instances to provide backup services onto the HYCU platform.

    Fun side note about SaaS backups, even Microsoft 365 blatantly tells you that you’re better off with, rather than without a backup solution if you’ve got critical business information stored with them, thanks to their iconic Section 6b of their Service Agreement:

     “We strive to keep the Services up and running; however, all online services suffer occasional disruptions and outages, and Microsoft is not liable for any disruption or loss you may suffer as a result. In the event of an outage, you may not be able to retrieve Your Content or Data that you’ve stored. We recommend that you regularly backup Your Content and Data that you store on the Services or store using Third-Party Apps and Services.” 
  • ChatGPT is formally under fire by the Italian organization for Data Protection and Privacy. OpenAI is left with 30 days as of January 29th to respond to the Italian body, although their opening statement in their seems quite fair: “We want our AI to learn about the world, not about private individuals. We actively work to reduce personal data in training our systems like ChatGPT, which also rejects requests for private or sensitive information about people". More details here.

  • Oh, and yep, Service Now and IBM are publicly becoming AI-friends. Having 2 giants of the service infrastructure sector working together just makes a lot of sense.

    Raises & Mergers Recap:

  • We were talking about them just last week! Wondercraft has raised a Seed round of 3 Million to establish an even bigger foothold into the media landscape independent creators? 

  • Kore.ai has secured a massive $150 million in funding to enhance its conversational AI technology for enterprises, showcasing significant growth in the AI industry

  • Rob Bearden's New AI Startup: Sema4.ai, an AI startup led by former Cloudera CEO Rob Bearden, has raised $30.5 million, aiming to revolutionize enterprise work with open-source AI and the acquisition of automation company Robocorp​​​​.https://www.bloomberg.com/news/articles/2024-01-29/ex-cloudera-ceo-rob-bearden-raises-30-5-million-for-new-ai-startup

  • Berlin’s AI Procurement solution procured funding! Akirolabs' Funding Success: Akirolabs secured $4.6 million in funding, marking a significant step in its growth journey https://siliconcanals.com/crowdfunding/akirolabs-secures-4-6m/

     

WHAT I'M (still) READING

If you’ve made it this far, have a bonus: Here’s a free handful of tips for creating an ethical use policy surrounding AI for your team, or business. This article is quite succinct, but carries a solid base for establishing private policy on the matter.

"Curiosity is, in great and generous minds, the first passion and the last." - Samuel Johnson

Shameless confession team: I'm still reading the same book from last week's feature because of just how information-rich it really is. Wrapping my head around its entirety is taking me a bit longer than planned 😅. Solving Product has been a fantastic journey back through the fundamentals, and provoking quite a bit of thoughts on how I can add the most value wherever I’m employed. I do want to impress on you: This book is more than just a one-time read; it's a comprehensive resource that demands a loop of “Reading-> Reflecting-> Mental notes” to truly benefit from its wealth of knowledge.

Stay well, and until next week.

-✌🏽 Sam

P.S. Have tips or suggestions for a future issue? Get involved with the Community Poll

P.P.S. Interested in having me give you private feedback about a product that you are building? Send me an email: [email protected]

Reply

or to participate.