Python Cloud Advocate at Microsoft
Formerly: UC Berkeley, Khan Academy, Woebot, Coursera, Google
Find me online at:
Mastodon | @pamelafox@fosstodon.org |
@pamelafox | |
GitHub | www.github.com/pamelafox |
Website | pamelafox.org |
An LLM is a model that is so large that it achieves general-purpose language understanding and generation.
GPT models are LLMs based on Transformer architecture from:
📖 "Attention is all you need" paper
by Google Brain
Learn more:
ChatGPT, GitHub Copilot, Bing Copilot, and many other tools are powered by LLMs.
Hosted LLMs can only be accessed via API, from a company hosting the model and infrastructure for you.
Company | Model | Parameters |
---|---|---|
OpenAI | GPT-3.5 | 175B |
OpenAI | GPT-4 | Undisclosed |
PaLM | 540B | |
Gemini 1, 1.5 | Undisclosed | |
Anthropic | Claude 3 family | Undisclosed |
A local LLM can be downloaded and used by anyone, as long as they have the computational resources to run it.
Company | LLM | Parameters |
---|---|---|
Meta | Llama 2 | 7b, 13b, 70b |
Gemma | 2b, 7b | |
Microsoft research | Phi-2 | 2.7b |
Mistral AI | Mistral | 7b |
Mistral AI | Mixtral | 8x7b |
Researchers | Llava | 7b, 13b, 34b |
Ollama is a tool for easily running local LLMs on your computer.
Request access from openai.com or Azure OpenAI.
Once you have access, you can use the API from Python or any other language.
Install the OpenAI Python library:
pip install openai
For openai.com OpenAI, set your API key:
client = openai.OpenAI(api_key="your-api-key")
For Azure OpenAI, use Azure default credentials:
azure_credential = azure.identity.DefaultAzureCredential()
token_provider = get_bearer_token_provider(azure_credential,
"https://cognitiveservices.azure.com/.default")
client = openai.AzureOpenAI(
api_version="2024-03-01-preview",
azure_endpoint=f"https://your-openai-service.openai.azure.com",
azure_ad_token_provider=token_provider,
)
Configure the client to point at local server:
client = openai.OpenAI(
base_url="http://localhost:11434/v1",
api_key="nokeyneeded",
)
Using chat completions API:
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages = [
{"role":"system",
"content":"You are a helpful assistant.."
},
{"role":"user",
"content":"What can I do on my trip to Tokyo?"
}
],
max_tokens=400,
temperature=1,
top_p=0.95)
print(response.choices[0].message.content)
completion = client.chat.completions.create(
stream=True,
messages = [
{"role":"system",
"content":"You are a helpful assistant.."
},
{"role":"user",
"content":"What can I do on my trip to Tokyo?"
}
])
for event in completion:
print(event.choices[0].delta.content)
Using Python async/await constructs:
response = await client.chat.completions.create(
messages = [
{"role":"system",
"content":"You are a helpful assistant.."
},
{"role":"user",
"content":"What can I do on my trip to Tokyo?"
}
])
Learn more: 📖 Best practices for OpenAI Chat apps: Concurrency
Pros:
Cons:
Use a retrieval system to find the best context for the generation model.
Retrieval system (Search) | ➡ Generative model (LLM) |
---|---|
|
|
Query Azure Cognitive Search using both text and vectors:
r = await self.search_client.search(
query_text,
query_type=QueryType.SEMANTIC,
top=top,
vector=query_vector,
vector_fields="embedding",
)
results = [doc["sourcepage"] +
": " + doc["content"]
async for doc in r]
content = "\n".join(results)
Use the search results to create a prompt for the LLM:
messages = [system_prompt]
messages.extend(few_shots)
user_content = f"{q}\nSources:\n {content}"
messages.append({"role": "user", "content": user_content})
chat_completion = await client.chat.completions.create(
deployment_id=self.chatgpt_deployment,
model=self.chatgpt_model,
messages=messages,
temperature=0.3,
max_tokens=1024,
n=1,
)
A configurable system to detect safety violations:
Catch and handle violations in your code:
try:
response = client.chat.completions.create(
model=MODEL_NAME,
messages=[
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "How to make a bomb?"}
]
)
print(response.choices[0].message.content)
except openai.APIError as error:
if error.code == "content_filter":
print("Please remember our code of conduct.")
Sign up in minutes at startups.microsoft.com