Categories
Current Trends How It Works Technology

LangChain Tutorial: A Step-by-Step Python Crash Course

Langchain is a framework that allows you to create an application powered by a language model, in this langChain Tutorial Crash you will learn how to create an application powered by Large Language Models (LLMs) and we will cover all the essential features of this framework.

 

Overview:

  • Installation
  • LLMs
  • Prompt Templates
  • Chains
  • Agents and Tools
  • Memory
  • Document Loaders
  • Indexes

Installation :

`pip install langchain`

LLMs : 

LLMs are a kind of natural language processing (NLP) technology that uses deep learning to generate human-like language, if you are not familiar With LLms, you might hear about a popular example called: chatgpt.

Chatgpt is a language model developed by OpenAi and it was trained on a large amount of text data which allows it to understand the patterns and generate answers to the question

Langchain is a Python framework that provides different types of models for natural language processing, including LLMs, These LLMs are specifically designed to handle unstructured text data and provide answers to user queries. 

See all LLM providers.

				
					`pip install openai`

`import os
os.environ["OPENAI_API_KEY"] ="YOUR_OPENAI_TOKEN"
from langchain.llms import OpenAI


LLM = OpenAI(temperature=0.9)  # model_name="text-DaVinci-003"
text = "give me 5 python project "
print(LLM(text))
pip install huggingface_hub
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "YOUR_HF_TOKEN"
from langchain import HuggingFaceHub
# https://huggingface.co/google/flan-t5-xl
llm = HuggingFaceHub(repo_id="google/flan-t5-xl", model_kwargs={"temperature":0, "max_length":64})
llm("Who won the FIFA World Cup in the year 1994?")`
				
			

Prompt Templates

what exactly are prompt templates for? 

I use prompt templates to structure my input to give it to the AI model, the reason is to guide the Ai model to output in a specific direction to make sure a more consistent and desired response.

 Why would I use this versus just chatting with the bot directly as a user like the ChatGPT UI?

The main difference between template and chat directly with chatgpt (like chatgpt ui) chatgpt UI allows for general conversation and is excellent for that purpose. Still, when you need more control, consistency, efficiency, or complexity that prompt templates come in handy right.

This feature allows developers to use PromptTemplates to construct prompts for user queries, which can then send to LLMs for processing

				
					llm("Can joe biden have a conversation with George Washington?")
				
			

Most of the time we don’t want to paste the question directly into the model like this
Output 

No, it is impossible for Barack Obama to have a conversation with George Washington as George Washington passed away in 1799.

How to write a better Prompt:

The better way to design the prompt is to Say

				
					prompt = """Question: Can joe biden have a conversation with George Washington?
Let's think step by step.
Answer: """
LLM(prompt)
output
				
			

No, Barack Obama and George Washington cannot have a conversation because George Washington is no longer alive.

PromptTemplates can help you accomplish this task:

				
					from langchain import PromptTemplate
template = """Question: {question}
Let's think step by step.
Answer: """
prompt = PromptTemplate(template=template, input_variables=["question"])
prompt.format(question="Can joe biden have a conversation with George Washington?")
llm(prompt)
				
			

I you want to run the code you will get the error because we can not pass the prompt directly to LLM

So we are going to use chain to pass to LLms

Chains

Chains offer a way to integrate diverse components into a unified application. For example, a chain can be created that takes user input, processes it using a PromptTemplate, and then feeds the structured response to a Language Learning Model (LLM). More intricate chains can be formed by interlinking numerous chains with other components.

				
					from langchain import LLMChain
llm_chain = LLMChain(prompt=prompt, llm=llm)
question = "What are the steps to start a successful online business?"
print(llm_chain.run(question))
				
			

Agents and Tools

Agents determine which actions to take and in what order. Agents can be incredibly powerful when used correctly. To successfully utilize agents, you should understand the following concepts:

Tool: A function that executes a specific task. This can be things like performing a Google Search, using another chain, or another task. See available Tools.

LLM: The language model powering the agent.

Agent: The agent to use. See also Agent Types.

				
					from langchain.agents import load_tools
from langchain.agents import initialize_agent
pip install Wikipedia
from langchain.llms import OpenAI
llm = OpenAI(temperature=0)
tools = load_tools(["Wikipedia", "LLM-math"], llm=llm)
agent = initialize_agent(tools, LLM, agent="zero-shot-react-description", verbose=True)
agent.run("Can you explain the concept of blockchain technology?”)
				
			

Memory

Memory refers to the concept of persisting state between calls of a chain or agent. LangChain provides a standard interface for memory, a collection of memory implementations, and examples of chains and agents that use memory.

Why Memory is Important?

Memory allows the model to maintain the context of a conversation. Without memory, each user prompt user would be processed in isolation

				
					from langchain import OpenAI, ConversationChainM

llm = OpenAI(temperature=0)
conversation = ConversationChain(llm=llm, verbose=True)

conversation.predict(input="Hi there!")
conversation.predict(input="Can we talk about BlockchainI?")
conversation.predict(input="I'm interested in Solona.")
				
			

Document Loaders

Combining language models with your own text data is a powerful way to differentiate them. The first step in doing this is to load the data into “Documents” – a fancy way of saying some pieces of text. The document loader is aimed at making this easy.

See all available Document Loaders.

				
					from langChain.document_loaders import NotionDirectoryLoader

loader = NotionDirectoryLoader("Notion_DB")

docs = loader.load()
				
			

Indexes

Indexes refer to ways to structure documents so that LLMs can best interact with them. This module contains utility functions for working with documents, different types of indexes, and then examples for using those indexes in chains.

  • Embeddings: Embeddings are a measure of the relatedness of text strings, and are represented with a vector (list) of floating point numbers.
  • Text Splitters: When you want to deal with long pieces of text, it is necessary to split up that text into chunks.
  • Vector databases store and organize special codes that represent the meaning and context of words, sentences, and documents in a way that helps search engines provide better and more relevant results. See available vectorstores.
				
					import requests

url = "https://raw.githubusercontent.com/hwchase17/langchain/master/docs/modules/state_of_the_union.txt"
res = requests.get(url)
with open("state_of_the_union.txt", "w") as f:
  f.write(res.text)


# Document Loader
from langchain.document_loaders import TextLoader
loader = TextLoader('./state_of_the_union.txt')
documents = loader.load()


# Text Splitter
from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)


pip install sentence_transformers


# Embeddings
from langchain.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings()

#text = "This is a test document."
#query_result = embeddings.embed_query(text)
#doc_result = embeddings.embed_documents([text])


pip install faiss-cpu


from langchain.vectorstores import FAISS

db = FAISS.from_documents(docs, embeddings)

query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)


print(docs[0].page_content)


# Save and load:
db.save_local("faiss_index")
new_db = FAISS.load_local("faiss_index", embeddings)
docs = new_db.similarity_search(query)
print(docs[0].page_content)

				
			

Conclusion : 

LangChain offers a comprehensive approach to building applications powered by generative models and LLMs. By integrating core concepts from data science, developers can create innovative ideas which are beyond traditional metrics by leveraging multiple components, and prompt templates.

As technology advances, more complex elements, including chat interfaces, are incorporated into agents, providing more comprehensive support in many different use cases.

Whether you’re developing chatbots, sentiment analysis tools, or any other NLP application, LangChain will be your best helper to unlock the full potential of your data. As advancements in Natural Language Processing (NLP) technology continue growing, platforms such as Langchain will only become more and more valuable

All the code will be found on GitHub

Reference:

https://python.langchain.com/en/latest/index.html

https://github.com/hwchase17/chat-langchain

Categories
Current Trends How It Works Technology

Where do I find my OpenAI API Key?

Many AI applications and tools need their users to obtain their own OpenAI API key. This key enables programmatic access to the OpenAI backend on behalf of the user, essentially “charging” an AI-powered tool.

 

Please note: at the time of writing, new users are given $ 5 USD in free “tokens” for 3 months. Afterwards, you will need a credit card to continue using any API keys. This is NOT the same as ChatGPT Plus.

Getting Started

To get started, go to the Open AI website. If you haven’t already created an account to use the ChatGPT UI, you can easily create an Open AI account by navigating to Developers -> API Reference (don’t worry about the code, we won’t be dealing with that today!)

If you’ve already set up an account and are signed in, you can ignore this part since you should see your profile icon and name in the top-right of the image in place of `Login` and `Sign up`.

Finding Your API Key

To get your API key, click on your name in the top right corner, which will display the drop-down menu. From the menu, select the “View API keys” option.

At this stage, you will see the option to `Create a new secret key at the centre. If you have any previously created API Keys, they will be visible here (you can only copy them once on creation, so be sure to copy it somewhere secure). If you don’t have an API key, click to get one.

Using Your API Key

Now that you have gotten your API key, you can give your applications and tools OpenAI power! Please be aware that some applications will consume more tokens than others. You can read more about how pricing is calculated on the OpenAI pricing page.

Categories
Current Trends How It Works Technology

How to Scam People with AI

Just kidding


I thought I would take some time to go over some of the scams (either loosely or tightly) related to AI that have begun to surface. In the dawn of this new and exciting age, scams will likely continue to be on the rise in new forms.

I’ll go over some of the different scams I’ve spotted across the web. If you’ve got any more to share, drop a comment below!

👆Upwork Scamming

đŸ€–AI Proposals

In some cases, the lack of authenticity in the proposal may be especially [random adjective] when they use the AI equivalent of lorem ipsum in the text:

Bonus points if you also used an open-source image generation tool that contains all of the image generator prompts in the file name to make your attachments seem legit.

In another case, I was helping a client hire some UI help on Upwork to assist us with some design work. I usually have a pretty good knack for spotting good proposals, but this week really threw me for a loop.

The first sign to look for is checking the responses to questions that you post in the proposal. If they don’t respond to the answers well or use ChatGPT to generate responses, that’s a red flag to look for:

đŸ”ȘAccount Hijacking?

If the account is having issues, you won’t be able to see the profile of the contractor.

You can also spot potential issues if the account has been flagged if the contractor withdraws their proposal.

Lastly, when they come to the meeting, if their appearance doesn’t match the proposal, then you can be sure they are either a fake account, or the account has been compromised and they are fishing to pawn off a cheap project for a large amount:

Our conversation basically entailed an output that was NOT in spec with the proposal at all (I needed a simple automation setup using make.com or another glue tool, and they had obviously not read my proposal and were trying to sell an full-blown Application stack 😅

đŸ“șYouTube Scamming

While going over my home feed, a live stream with Elon Musk from OpenAI was airing! Curious to see what it was, I opened the video to find a QR in the corner that lead to a link that will allegedly change my life within minutes, according to a “screenshot” from Elon Musk overlaid on the video stream:

Let’s dig in, shall we?

  • Compromised or fake YouTube account? ✅
  • A single live stream with a relatively large amount of viewers? ✅
  • Does QR Code lead to “Tesla bonus dot live” (really)? ✅

I’ve reported 3 of these to YouTube, and they’ve been taken down within roughly 45 minutes. Stay sharp y’all!

🔍tl;dr

  • When hiring help on platforms like Upwork, it’s crucial to keep an eye out for red flags.
  • Examine the responses to your proposal questions; if they seem off or AI-generated, be cautious.
  • Watch for account hijacking signs, such as inaccessible profiles or withdrawn proposals.
  • During meetings, ensure the contractor’s appearance matches their proposal.
  • Watch out for fake live streams on YouTube. Check the authenticity of the channel for signs of account hijacking/bot views.
Courtesy of https://i.redd.it/dcz26dc7jlia1.png

By staying vigilant, you can avoid falling prey to scammers seeking to profit from your hard-earned cash. đŸ˜ŽđŸ’ŒđŸ’Ą

Follow us for more AI shenanigans. If you need help wrangling with this new world, call us for help automating your workflows IF and ELSE statements.

Categories
Business Current Trends Technology

Demystifying AI for Business Leaders: Current Trends and Challenges Part 1

Everyone and their mother has something to say about AI these days. Whether you’re trying to garner media attention by signing a moratorium or making AI music videos, the entire world is focused on grappling with this horrifically powerful and sometimes down right silly and fun technology. Individuals and organizations both are looking towards how to best integrate AI into an existing project or start a totally new venture in this brave new world.

Here are some insights that can help you navigate the AI landscape with confidence.

Machine Dreams

Without providing an exhaustive list of examples (which can generally be found with a quick Google search), hallucination refers to AI responses that are incoherent given their training data. Unfortunately (or fortunately?) at the time of writing, the best practice of detecting hallucinations involves an actual human fact-checking the output of the AI.

Prepare for takeoff

Let me share a personal experience I had with the gpt-4 model while using it for data analysis. I provided it with a CSV file to run calculations, and as I checked the intermediary steps, I discovered that it had generated a completely fake dataset to “answer” my question. At first, it seemed to be on the right track, stating: I need to compare the two datasets to find the rows in the temp_sheet.tsv "Location" column that has corresponding cities in the demo.csv "City" column. However, things quickly went downhill from there.

```python
import pandas as pd

data1 = {'Name': ['John', 'Paul', 'Ringo', 'George'],
'Age': [20, 21, 22, 23]}

data2 = {'Name': ['John', 'Paul', 'George', 'Ringo'],
'Height': [180, 170, 175, 165]}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

pd.merge(df1, df2, on='Name').head(3)

The Rise of Agent Behavior

With the launch of ChatGPT Plugins (most notably the ability to book flights) came the concept of “agent” behavior. An Intelligent Agent is essentially an AI that can receive environmental data and perform actions based on contextual information. In theory, it’s quite simple to “string” the inputs and outputs of different AI models to each other in a way that allows a main AI to interact with different assets in an agentic manner. LangChain is a popular open-source library for integrating this kind of behavior in your Python or JS project. There are many pre-built tools, clear examples in the documentation, and a highly-active developer community to help you integrate advanced AI usage into your project.

Bond, James Bond

Interested in implementing this AI solution for your business? Contact us for a consultation!

On a side note, OpenAI, if you happen to come across this article, I’m eagerly awaiting access to the plugin SDK! 😉

Data Leakage

Due to the nature of powering Large Language Models (LLM) and the advent of Reinforcement Learning from Human Feedback (RLHF), proprietary information has already been “accidentally” leaked into the verification and training data for OpenAI. This trend is likely to continue to get worse before it gets better as researchers and developers seek a balance between powering these feedback and resource-intensive engines while also securing user-submitted data. Unfortunately, the line between AI “power” and privacy is very thin due to the nature of how ML models work.

Data Leakage vs. Data Breaches

One way that organizations (such as Samsung) are combatting this is by developing in-house LLMs, given enough data, that allow their users to keep user prompts and interactions on internal servers. The downside here is that this relies on:

  • A large enough supply of data to fine-tune an open-source model
  • Enough users and verifiers to implement RLHF within organizational processes

Next Steps

Armed with these insights, I hope you can embark on your AI journey with greater confidence, even in this rapidly changing landscape. Remember the timeless machine learning mantra: “Garbage In = Garbage Out.” Always double-check your work and keep humans in the loop to minimize any potential negative side effects of utilizing AI in your project.

đŸ“» Stay tuned for Part 2 of this series, where we’ll dive into privacy and security issues related to AI. In the meantime, follow Automation Architech for more great content!

đŸ§™â€â™‚ïž We are AI application experts! If you want to collaborate on a project, drop an inquiry here, stop by our website, or shoot us a direct email.

📚 Check out some of our other content:

Categories
Current Trends How It Works Technology

Bidirectional (2 Way) Sync Using Pipedream

What is Bidirectional Sync?

Bidirectional sync, also known as 2-way sync, is a type of data synchronization process that involves data being synced in both directions, meaning information can be transferred from one system to another and vice versa. It ensures that any changes made in either system are reflected in both systems. This type of synchronization is commonly used in cloud-based applications and services, where data needs to be shared across multiple devices and users.
While most automation workflows will generally flow one-way (such as in a DAG), there are occasions where you may want to keep data synchronized in a non-hierarchical manner between two different applications. In this article, we will use Trello and Google Sheets as a use cases.
In classical software engineering cases, this is accomplished using a server-client model where the server acts as the “single source of truth” from which the client updates. However, if we are referring to essentially two applications that have their own “servers”, what is the best way to keep information organized and synchronized between these two resources?

Challenges in Bidirectional Sync

When referring to bidirectional synchronization, there are a few key challenges that need to be weighed and balanced when designing a bidirectional sync system:

Maintaining Data Consistency

Bidirectional sync between two web applications can be challenging when it comes to maintaining data consistency. For example, if one application is updated, the changes must be reflected in the other application to keep the data consistent across both applications. This problem is trickier than most people realize due to issues like how fields are mapped, how individual “rows” or entries are keyed to each other, and how frequently a service can send/receive updates.

Conflict Resolution

When synchronizing data between two web applications, conflicts can arise in how the data is represented or stored. This can lead to discrepancies between the two applications and must be resolved in order to keep the data consistent. Depending on how robust the data pipeline is, it would be possible to maintain a clean connection if there is a transparent keying system as mentioned above.
Depending on the scale and how critical the data is, it may make sense to either have a database external to both services that store transactions or have one service act as the “single source of truth” for both services. That way, in a conflict, the system falls back to a single service.

Security

Bidirectional sync between two web applications can pose a security risk if not implemented properly. It’s important to ensure that both applications are secure and that any data transmitted between them is encrypted. Many low-code/serverless workflow automation pipelines exist that can accomplish this in a secure manner, including Pipedream, Make.com, and Zapier, to name a few. 

However, at scale, these options can become expensive to maintain. In many cases, it’s best to prototype with these serverless workflow automation tools, then invest in engineering a custom cloud solution based on the design of your low-code solution. We can help with the transition from low-code to full-scale cloud builds. Contact us today for details!

Performance

Bidirectional sync can be resource intensive, as data must be constantly transmitted between devices. This can lead to slow performance and a degraded user experience. This is often dependent on the scale of the data transferred and stored and HOW it’s transmitted. If each operation requires a complete synchronization of both data sources, then this will be much more computationally costly than single incremental updates as they occur. 

Strategies for Establishing Bidirectional Sync

The main thing to consider when setting up a bidirectional sycn workflow is to prevent the “infinite loop” problem in the case of a non-hierarchical system. The main strategies we can implement to prevent this would be:

Last Modified

When performing a sync, we would want to include data about when a “row” in our data was last updated/created. If it’s relatively recent (within the past X seconds or so) we would want to ignore an update and stop the workflow from continuing to prevent wasted resources. As an extension of this, we could also check if the values are different from each other and if there is no difference between the existing row and the incoming update, break the workflow cycle. 

Is User

Some servers and automation tools may allow us to check if the incoming change is coming from either a human user or a program/API and react accordingly. If the change is coming from a non-human actor, we could easily close the loop this way and prevent wasted computation resources. 

Toggle

In some situations, a toggle could be added in a server-stored value that would prevent a workflow or function from initiating if something like isUpdating is true. After the update is complete, this could be flipped to false to reallow flow between the two resources. 

Bidirectional Sync Use Case: Trello and Google Sheets

Google Sheets is a fantastic spreadsheet application that is popular for it’s ease of sharing, free access to anyone, plethora of plugins, and the ability to write pseudo-javascript to drastically increase it’s functionality.

Trello, one of the top project management tools, is able to organize tasks and just about anything related to a project’s needs. 

We were approached to test a bidirectional sync system between the two tools since they both have API access, and we decided to give it a go! The architecture we came up with involved using Pipedream as our server to communicate and relay changes between the two applications. For this project, we simply mapped the Title, Description, and Status fields between a Google Sheet and the Trello Cards on a given board, like so:

To do this, we created two workflows in Pipedream: one responsible for processing updates FROM Trello TO Google Sheets, and another FROM Google Sheets TO Trello. Initially, we also wanted to account for the “infinite loop” problem (where one update would create a continuous loop of updates), but it turned out this was not necessary for this task likely due to the onEdit() event on Google Sheets side only responding to changes initiated by humans (and not the API). 

Given Google Sheets only takes one dimensional data, there were some limitations with how many fields from a Trello Card could be included in Sheets due to parsing and data structure complexity. However, in this scenario, for full-scale access to all data structures, you could use Airtable or Notion as an alternative database since it has more complex data structures built in like lists and the ability to key to other tables. 

Challenges

In addition to the data structure issues above, while implementing this workflow in Pipedream, the batch updates from Google Sheets also proved to be problematic as Pipedream does not handle iteration by default in a single “run”. We had to essentially split the workflow into “receiving” and “processing” to be able to handle these edge cases. You can read more about the status of this on their GitHub.