https://github.com/Skyvern-AI/Skyvern
Hey HN, we’re Suchintan and Shu from Skyvern (<a href="https://www.skyvern.com">https://www.skyvern.com</a>). We’re building an open source tool to help companies automate browser-based workflows using LLMs.<p>Our open source repo is at <a href="https://github.com/Skyvern-AI/Skyvern">https://github.com/Skyvern-AI/Skyvern</a>, and we're excited to share our cloud version with you (<a href="https://app.skyvern.com">https://app.skyvern.com</a>) :)<p>Skyvern allows you to define a single (or a series of) goal-based prompts to instruct an agent to complete complex tasks on websites. Here’s a quick demo of Skyvern: <a href="https://www.loom.com/share/76b231309df74a528061fcf102e1967f" rel="nofollow">https://www.loom.com/share/76b231309df74a528061fcf102e1967f</a><p>We built this to solve a specific problem: building browser automations often requires companies to either hire people and scale out operations teams to do tedious manual work, or hire developers to use products like UI-Path or Selenium to build automations.<p>Code-based solutions always run into the same problem: they’re brittle (wow this website added a new pop-up dialog and my script broke), and fail to achieve the same objective across multiple websites (how can I fill out a contact-us form on hundreds of different websites?)<p>We did a Show HN a few months ago (<a href="https://news.ycombinator.com/item?id=39706004">https://news.ycombinator.com/item?id=39706004</a>), and
since then, we’ve onboarded customers for a wide variety of use cases: generating insurance quotes on websites like Geico.com; applying to jobs on websites like lever.co; automating filing permits in local government portals; registering new corporations for employment identification; fetching invoices from hundreds of different portals such as hydroone.com; automating purchasing on a handful of e-commerce websites like zooplus.com; and filling out contact us forms on a bunch of random smb websites (such as HVAC websites).<p>To be able to service all of these, we’ve built and open-sourced quite a few interesting features:<p>(1) a fully-featured React application allowing you to see every action Skyvern is taking in real-time;<p>(2) livestreaming browser instances to allow our users to see what Skyvern is doing when running inside of a docker container;<p>(3) authenticated sessions, integrating with Bitwarden and allowing users to specify Email + Phone + QR-code based 2FAs;<p>(4) “workflows” allowing users to chain multiple goal-based prompts together, which can handle tasks like invoice downloading, or automating purchasing pipelines;<p>(5) processing HTML Elements (ex. identifying + summarizing SVGs) and performing website interactions (ex. Iterating over dynamic autocompletes to fill in address information correctly)<p>(6) “cached workflows”, allowing Skyvern to memorize previous interactions (ie text inputs) and re-use them in future runs.<p>We’ve also been blessed with a few model advancements to solve some of the cost concerns the community brought up. Skyvern’s token costs went down 80% from $15 / 1M tokens (GPT-4V) to $2.50 / 1M tokens (GPT-4O)<p>Despite the model costs going down 80%, Skyvern is still quite expensive to run, so we give every new user $5 of credits to try it out and see if it can be useful for you.<p>We would be honored if you could give it a try at <a href="https://app.skyvern.com">https://app.skyvern.com</a> and share some feedback with us, and we look forward to any and all of your comments!